Privacy-As-Code: Correcting TikTok’s $92M BIPA violation using Fides Open-Source
Introduction
We spend a lot of time at Ethyca talking about the future of privacy. It makes sense; the Fides open-source privacy engineering platform promises a future where true Privacy by Design is achievable for any business, with any type of technical infrastructure. But in seeking to illustrate just how that future could differ from today’s status quo, it’s useful to look at recent high-profile privacy cases, and show how applying Fides could have led to a different, better outcome for users and businesses.
For example, if you were TikTok in 2019, Fides could have rooted out a certain type of unlawful data use pre-deployment, before offending code ever handled user data. And by doing so, it could have prevented the privacy violations that led to a $92M fine from the State of Illinois.
Could a few lines of code in a vast product ecosystem really save a company ninety-two million dollars? With all the necessary caveats about culture, business model, and correct configuration, the answer in this case is: “absolutely.”
In this post I’ll describe the specifics of TikTok’s unlawful data uses per a $92M settlement in 2019 under the Biometric Information Privacy Act (BIPA) of Illinois. I’ll point out what went wrong on a technical level, and codify guardrails against this behavior by writing a policy in the Fides language.
When I previously considered the case of Fides’ utility to Facebook in an FTC investigation, I noted that a codebase, or a platform like Fides, doesn’t operate in a vacuum. It’s vital to consider the cultural and organizational aspects that contribute to good privacy too. Nevertheless, while flaws in internal processes are often key contributors to privacy infractions, my focus here is on what can be technically achieved to make privacy and respect low-friction and meaningful.
TikTok’s violation under BIPA
When it comes to privacy laws in the United States, BIPA is a heavyweight, in large part because it’s one of the few US privacy laws that gives a private right of action; individuals have the right to sue a company for violations. Beyond its enforcement features, BIPA places tight technical demands on how companies must respect biometric identifiers and biometric information of Illinois residents. The law defines a biometric identifier as:
“a retina or iris scan, fingerprint, voiceprint, or scan of hand or face geometry.”
And BIPA defines biometric information as:
“any information, regardless of how it is captured, converted, stored, or shared, based on an individual’s biometric identifier used to identify an individual.”
With these categories of personal information, BIPA contains strict requirements on how companies must collect users’ opt-in consent to process this information. Companies must also respect a suite of other restrictions on biometric data processing, retention, disclosure, and more.
As the 2019 TikTok lawsuit points out, biometric privacy is particularly high-stakes since the information involved is often immutable. While I can change my password or my home address, I’m not going to be able to change my fingerprint..
Looking at Section 15(c) of BIPA:
“No private entity in possession of a biometric identifier or biometric information may sell, lease, trade, or otherwise profit from a person’s or a customer’s biometric identifier or biometric information.”
Per Blank Rome LLP, courts have found that “for a claim to exist under Section 15(c), actual biometric data or the sharing of access to the underlying biometric data must be transferred or exchanged in return for some benefit.”
The TikTok settlement finds the following:
“Defendants [being TikTok] are, and at all relevant times were, ‘in possession of’ the Illinois Plaintiffs’ and the Illinois Subclass’s ‘biometric identifiers,’ including but not limited to their face geometry scans, and ‘biometric information.’ Defendants profited from such ‘biometric identifiers’ and ‘biometric information’ by using them for targeted advertising, improvements to Defendants’ artificial intelligence technologies, Defendants’ patent applications, and the generation of increased demand for and use of Defendants’ other products…”
Now, you might contend that this data use can be viewed as integral to the particular defendant’s business model, rather than an unfortunate misalignment between product and legal stakeholders… and you may well be right! But it’s also very easy to imagine the misalignment scenario. Indeed we know that at some of the world’s largest companies, privacy engineers are lamenting that:
“We can’t confidently make controlled policy changes or external commitments such as ‘we will not use X data for Y purpose.’ And yet, this is exactly what regulators expect us to do”
So the example of TikTok and BIPA proves a very suitable candidate to demonstrate Fides’ privacy engineering power. With this context, I’m going to use Fides to proactively flag any code that could violate Section 15(c). In other words, the CI pipeline will have an automatic check that any code handling a biometric identifier or biometric information — I’ll hereafter group these as “biometric data” — cannot be used for any of the cases prohibited above.
Examine our policy
As with the Facebook/FTC example I discussed in my previous post, let’s translate the legal requirement into a technical guardrail on the codebase. The Fides policy would be:
tiktok_fides_policy.yml
For a walkthrough of each of these fields, I encourage you to see the first post in this series. Here, I’ll focus on the new material.
First, I’ve used the Fides taxonomy to pinpoint which kinds of data this policy applies to. In this case, I am interested in any kind of derived user data as well as any labels that apply to biometric data. That’s where I get the values for data_categories:
user.derived.identifiable.biometric_health,
user.provided.identifiable.credentials.biometric_credentials, and
user.provided.identifiable.biometric.
Under the data_uses, I want my policy to identify any instances of code processing the data categories I’ve previously specified. Again referencing the Fides taxonomy, I identify any data uses that involve commercialization, which might be in the form of advertising, training an AI system, improving the product, or sharing data with third parties.
Next, I specify that the biometric data in question is that of a customer, so I specify the data_subjects as such. And finally, I describe the degree of identifiability of this data: it applies to data that directly identifies an individual:
aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified.
With these pieces together, this policy could be summarized as:
If any form of customers’ biometric data is processed for purposes of advertising, training an AI system, improving a product, or sharing with third parties; then trigger a violation in the automated privacy check.
So for example, if the TikTok product team wants to ship a release that shares biometric data with third parties, at the time of commit, their automated Fides privacy check will fail, and the release will not proceed.
This policy, in tandem with up-to-date annotation of the codebase’s privacy behaviors (here is how a dev can do that), becomes an indispensable tool in aligning the tech stack with modern laws like BIPA. There are myriad organizational and governance benefits to integrating privacy checks into the CI pipeline, and proactively flagging code for non-compliance cuts out the technical debt that makes privacy improvements elusive for so many companies today.
Conclusion
I’ve been writing for a while now on why we at Ethyca believe low-friction devtools are the key to solving technical privacy challenges. Software developers are like civil engineers, building vital infrastructure that billions of people rely on—for employment, payment, education, entertainment. With this significant power comes a need for transparent, rigorous standards in how personal data is respected.
Ultimately, users deserve systems that are trustworthy: systems that behave as users expect them to. The common thread of the biggest privacy stories is that companies break their promises around personal data processing. Even when engineers deeply care about users and seek to respect their data, it can be an uphill battle to keep track of loose ends across complex data infrastructure. An incomplete picture of data context and data control can cause even the best-intentioned team to expose users to significant privacy risks. In this post, I’ve aimed to share a specific instance of the Fides devtools equipping teams with the context and control that they need, to deliver privacy that their users deserve.
Thanks for reading, and stay tuned for my next post, where I’ll cover another major privacy story and demonstrate how Fides can solve the challenge upstream.