Getting Started With Fides — Step 3: Evaluating Code For Privacy Compliance
Table of Contents
Fides Evaluations: Where Annotations and Policy Meet
This blog post is the third in a three-part series on getting started with Fides Control, the open-source tool for Privacy-as-Code that runs privacy checks directly in the CI pipeline. To recap, we have covered the following topics:
- Annotating Datasets and Systems in the Fides language to describe the privacy behaviors in the tech stack
- Creating Policies that codify regulations or in-house privacy requirements, also using the Fides language
Throughout this blog series, we have used a simple example of a Flask web application for an e-commerce company. We’ll return to this example once again. We will also draw on the Policy resources created in the previous blog post. For a hands-on walkthrough, clone the demo repo and check out our tutorial.
The first two posts in this series have laid the groundwork for policy evaluation: powerful in-CI privacy checks. It’s useful in itself to understand the aspects of PII processed by this app, and to have a firm grasp on the policies that the app must abide by. However, combining these understandings through an automated privacy check is more than useful knowledge; it’s vital to embedding comprehensive privacy into our CI/CD workflow. Let’s explore how our annotations and policies reap great dividends in the policy evaluation stage.
Implementing Analytics for User Behavior
Suppose that we wish to add Google Analytics to our app, to better understand how users interact with the app. Check out the hands-on tutorial to see how to add the Google Analytics script in the Fides demo. Here, we’ll focus on the impact of adding Google Analytics on our codebase’s privacy compliance.
We will create a YAML file for this new System resource, for which we give the fides_key
indicated as google_analytics_system
. Using what we learned in the first blog post of this series, we populate the file as shown here:
We have added several comments in the file, and we’ve encountered all of the other components in the earlier blog posts. In plain terms, this System annotation tells us the following: First, Google Analytics processes users’ browsing history, cookie IDs, telemetry data, and location data—all of which are identifiable on their own but pseudonymized here—alongside non-identifiable data for the purpose of improving the app. Second, Google Analytics processes the user’s derived IP address for improving the app. Note that this data is not pseudonymized by default.
Checking Resources Against Policy
Suppose that our directory of Fides resources includes the second policy from the previous blog post. Recall that all Fides policies are comprised of rules, and this policy’s rules were the following:
- No usage of identifiable information for any purpose other than providing the app’s basic function.
- No collection of any sensitive information, for any purpose.
We will run our automated privacy check of the System corresponding to the Google Analytics system against this policy. Going to the command line, we execute make fidesctl-evaluate
to evaluate this System.
This Google Analytics implementation fails the privacy check! It’s time to dive back into the implementation to correct the noncompliant code.
Modifying Code to Achieve Privacy Compliance
In the command line, the executed evaluate command tells us which of the privacy declarations in the System YAML file failed the privacy check. It also points out which rule the noncompliant declaration violated.
In our example, the derivation of users’ geographic location violates the first rule in our policy, which prohibits the usage of identifiable user information for purposes besides basic app function. Let’s see where things went awry.
In our annotation of the System resource for Google Analytics, we see that Google Analytics would be processing users’ identifiable information—namely, their devices’ IP addresses and their location. Crucially, as we noted earlier, such data is classified as identifiable for the data_qualifier attribute. We need to update the data qualifier so that it is not identifiable but rather pseudonymized.
Of course, it’s not enough to change the annotations alone. We need the technical systems to actually behave that way! Visit the tutorial for a brief walkthrough of the process to pseudonymize IP addresses in Google Analytics. Once IP addresses are pseudonymized, we return to the System annotation and update the data_qualifier attribute for the relevant privacy declaration. We replace the old value
aggregated.anonymized.unlinked_pseudonymized.
pseudonymized.identified
with one that reflects the now-pseudonymized nature of that location data:
aggregated.anonymized.unlinked_pseudonymized.pseudonymized
Now, when we execute the evaluate command, it passes! With this simple example, we have described our systems and our policies to ultimately run privacy compliance checks directly within the CI pipeline.
Why Proactive Privacy Matters
Privacy belongs in the software development life cycle. At Ethyca, we consistently advocate for this approach, and it’s more than just a catchphrase. Looking back at our example e-commerce application, suppose that we did not check our code prior to deployment for privacy compliance. We would have shipped the code, and it would only be once actual people’s PII was flowing through the app that we would be scrambling to correct the privacy violation. The costs are manifold, burdening engineering teams in time and labor as well as the entire company in terms of reputation and risk of costly privacy fines.
The example that we have illustrated in this blog series is a vast simplification of most tech stacks today, where there are many more Datasets and Systems to annotate, and a complex web of policies to codify in Fides. However, the straightforward processes demonstrated in this series scale to cover real-world tech stacks. Fides equips devs with not only a clear, standardized taxonomy of privacy terms to annotate resources, but also a powerful evaluation function to pinpoint noncompliant code. The result is low-friction privacy checks built into teams’ routine CI/CD workflow, meaning companies and end-users can trust the tech that handles personal information.
Learn More and Get Involved
To dive deeper into the Fides ecosystem and connect with the Fides open-source community, check out these resources:
- Explore our support documentation.
- Join our Slack community.
- Clone the Fides repo.
- Read our CEO Cillian’s trilogy of articles explaining the underlying structure of the Fides language.