Getting Started With Fides — Step 2: Creating Privacy Policies As Code
Overview
Introduction
Following our recent blog post on annotating Datasets and Systems in Fides, we take the next step in building Privacy-as-Code. Here, we walk through the process of codifying privacy policies for the purpose of being used in automated compliance checks. In doing so, your team identifies and roots out noncompliant code before it’s ever shipped.
Anatomy of a Policy
In Fides, a policy is a collection of rules. Each rule can be thought of intuitively as: “For Specific Condition X, Perform Specific Action Y.” In this blog post, we’ll use the following policy example. Suppose that we want to build a proactive check into the CI pipeline to confirm that all shipped code complies with this policy: Users’ contact information cannot be collected for the purpose of marketing.
In this blog post, we’ll build an example policy on marketing-related collection of contact information. Along the way, we’ll get familiar with the necessary components of a Fides policy. As we had discussed in the previous post, embedding these policy checks in CI offer substantial savings in time, money, labor, and risk when contrasted with a reactive approach.
Naming and Describing a Policy
In Fides, rules are codified within a YAML file by a handful of straightforward components. First, a fides_key
uniquely identifies the rule. In this case, we use reject_direct_marketing
as the value for fides_key.
Next, we add a human-friendly name and description for the rule. We choose “Reject Direct Marketing” as the rule’s name. As for the description, we give a human-readable summary of the policy: “Disallow collecting any user contact info for marketing.”
Privacy Primitives
From here, we describe the four privacy primitives, which you might recall from the annotation process:
data_categories
data_uses
data_subjects
data_qualifier
We use terms from the Fides privacy taxonomy to add values for each primitive.
For data_categories
, we wish to describe the specific types of sensitive data. When we look back to the policy we aim to enforce in CI, the scope of the policy encompasses any contact information gathered from the user, so we add the following value: user.provided.identifiable.contact
.
For data_uses
, we give a formal label to the categories of data processing in the organization. The use case under consideration for this policy is advertising, so we add it accordingly: advertising
.
For data_subjects
, we define the individual persons whose data the rule pertains to. In this policy, it is customers’ data that we are concerned with: customer
is the appropriate value.
And for data_qualifier
, we indicate the acceptable or non-acceptable level of de-identification for this data. The data in question, user-provided contact information, directly identifies an individual, so we add the following value: aggregated.anonymized.unlinked_pseudonymized.
pseudonymized.identified
Inclusion Criteria
Using Fides, we have the power to further refine the semantics for policy enforcement. Inclusion criteria are basic logic gates on what kinds of data categories, use cases, subjects, and qualifiers should be considered when running automated privacy checks in CI. In particular, the inclusion criteria are:
ANY
ALL
NONE
When specifying values for each of the four privacy primitives, an inclusion criterion is included to indicate whether the given rule should be applied to code with ANY
, ALL
, or NONE
of the values entered.
Our example policy only provides one value for each privacy primitive, so the distinction between ANY
and ALL
might look trivial. However, let’s suppose for a moment that we wanted to create another rule that prevented the processing of any contact information or gender for marketing purposes. Then the choice between ANY
and ALL
has real consequences for permissible code in the automated CI check. While choosing ANY
would catch instances of processing contact information and/or gender, ALL
would only catch instances in which both contact information and gender are processed.
Actions
We have now formalized, in detail, the kind of data that falls under the scope of this processing: user-provided contact information for the purposes of marketing. Next comes the action we want from the automated privacy review.
To begin, note that we have framed our policy negatively. That is, we have defined what we don’t want in our shipped code: collection of customers’ contact information for advertising purposes. So if our codebase demonstrates that undesired behavior, we should reject it, so we add REJECT
.
A Full-Fledged YAML Policy
Using our basic example, we have all of the pieces needed for our policy manifest.
Let’s look at one more policy. This one demonstrates multiple values for privacy primitives, so the choice of inclusion criteria—ANY
versus ALL
—is not a trivial one. Codifying this policy might look daunting, but it can be summarized in just two plain-language statements. First, the policy prohibits the usage of identifiable data for any purposes besides to provide the app’s basic functions.
Second, the policy prohibits any collection of sensitive data, for any purpose.
We’ll revisit this policy in the next blog post, where we will execute a policy evaluation in CI.
Conclusion
As with resource annotations in Fides, policies must be kept up-to-date with in-house privacy policies as well as relevant regulations that affect your company. By embedding Fides policy reviews into your team’s development processes, you maintain an accurate and powerful method of enforcing privacy compliance in the CI pipeline, before code ever handles PII out in the wild.
For the next and final installment in this three-part series, we dive into policy evaluation.
Learn More and Get Involved
Explore the rest of this three-part blog series to get acquainted with Fides:
To dive deeper into the Fides ecosystem and connect with the Fides open-source community, check out these resources:
- Explore our support documentation.
- Join our Slack community.
- Clone the Fides repo.
- Read our CEO Cillian’s trilogy of articles explaining the underlying structure of the Fides language.