Anatomy of a Breach

Anatomy of a Breach: Clearview AI — 3 Billion Scraped Photos and the Surveillance Company That Scraped the Internet

> series: anatomy_of_a_breach —— part: 134 —— target: clearview_ai —— photos_scraped: 3,000,000,000 —— then: client_list_stolen<span class="cursor-blink">_</span>_

Hedgehog Security 28 February 2020 13 min read

3 billion photos scraped from the internet. Then the company that scraped them was breached.

In January 2020, a New York Times investigation revealed that Clearview AI — a secretive startup founded in 2017 — had scraped approximately 3 billion photos from Facebook, Instagram, YouTube, Twitter, and millions of other websites to build a facial recognition database. The company sold access to this database to over 600 law enforcement agencies in the US, enabling officers to upload a photo of an unknown person and receive matches from the scraped dataset — effectively creating a surveillance capability that dwarfed anything previously available.

In February 2020, Clearview AI disclosed that its entire client list had been stolen in a data breach — revealing which law enforcement agencies, government departments, and private companies had purchased access to the facial recognition service. The double revelation — mass scraping of public photos combined with the theft of the client list — created a unique privacy crisis: not only were billions of people's photos in a surveillance database they had never consented to, but the list of organisations using that database was now also compromised. Multiple countries, including the UK (through the ICO), subsequently investigated Clearview AI and imposed fines for violating data protection law.


Recommended

Not sure where to start?

We'll scope your test for free and tell you exactly what you need. No obligation, no hard sell.

Free Scoping Call

When 'public' data is collected at scale, it becomes surveillance.

Public Photos ≠ Consent for Surveillance
Photos posted publicly on social media were scraped without consent and used for facial recognition by law enforcement. Under GDPR and the UK's Data Protection Act, mass scraping of personal data (including biometric data derived from photos) requires a lawful basis — which Clearview AI did not have for UK citizens. The <a href="https://ico.org.uk/action-weve-taken/enforcement/clearview-ai-inc/">ICO fined Clearview AI £7.5 million</a> and ordered deletion of UK citizens' data.
Client List Stolen
The theft of Clearview AI's client list revealed which agencies were secretly using the tool — creating political and legal exposure for those agencies. When a surveillance vendor is breached, its clients' use of surveillance tools becomes public. <a href="https://www.socinabox.co.uk">SOC in a Box</a> monitors for data exposure from third-party vendor breaches.
Biometric Data at Scale
Facial recognition data is biometric data — it cannot be changed if compromised. The creation of a 3-billion-photo facial recognition database represents an irreversible privacy violation at global scale. For organisations processing biometric data, our <a href="/penetration-testing/infrastructure">security assessments</a> evaluate biometric data protection controls.
Global Regulatory Response
The ICO (UK), CNIL (France), DPA (Italy), and the Australian Privacy Commissioner all investigated or fined Clearview AI — demonstrating that GDPR-era regulators will act against companies that scrape personal data without consent, regardless of where the company is based. <a href="/cyber-essentials">Cyber Essentials</a> and GDPR compliance require lawful data processing.

Public data is not free data. And surveillance companies get breached too.

Clearview AI taught two lessons: first, mass collection of publicly available data for purposes the individuals did not consent to is a data protection violation under GDPR. Second, surveillance companies — like Hacking Team (2015) and NSO Group (2019) — are themselves breach targets, and their compromise exposes their clients' activities.

For UK organisations, Cyber Essentials and GDPR compliance require that data collection is lawful, proportionate, and consented. Our web application testing assesses whether your platforms are being scraped. SOC in a Box monitors for scraping activity against your web assets. And UK Cyber Defence provides incident response when data scraping or misuse is detected.


Clearview scraped 3 billion photos without consent. The ICO fined them £7.5 million. Is your data processing lawful?

<a href="/cyber-essentials">Cyber Essentials</a> addresses data protection. <a href="/penetration-testing/web-application">Application testing</a> detects scraping vulnerabilities. <a href="https://www.socinabox.co.uk">SOC in a Box</a> monitors for scraping activity.

Next Step

Not sure where to start?

We'll scope your test for free and tell you exactly what you need. No obligation, no hard sell.

Free Scoping Call

Related Articles