What the Great Personal Data Leak of 2021 tells us about Facebook, the GDPR, and privacy

Updated on Jan 24, 2024 by Glyn Moody

By now, many people will have heard about the appearance of 533,000,000 Facebook records online, first revealed in a tweet by Alon Gal at the weekend. You can find out whether you are one of the unlucky ones using the excellent free site, Have I Been Pwned, which has now added the latest Facebook account details to its depressingly large database of leaks, which currently includes 521 pwned Web sites, and 11 billion pwned accounts.

For the latest Facebook leak, details include phone number, Facebook ID, full name, location, past location, birth date, email address (for some), account creation date, relationship status, and biography. As Gal points out, this is core information, and it will be used for identity theft, social engineering, scams and hacking. Much of this data can’t be changed, which means the loss of it is extremely serious. And yet Facebook’s official response is this:

We believe the data in question was scraped from people’s Facebook profiles by malicious actors using our contact importer prior to September 2019. This feature was designed to help people easily find their friends to connect with on our services using their contact lists.

Facebook is trying to draw a distinction between somebody breaking in to its system and exfiltrating the personal data of half a billion accounts, and somebody using Facebook’s own tools to exfiltrate data from half a billion accounts. This quibbling seems to be an attempt to claim that it was not the company’s fault, but purely that of the “malicious actor”. Whatever you want to call it, the fact remains that the data was accessed because of a vulnerability in the Facebook system’s design. Moreover, in the post, words like “sorry” are notable for their absence. Not only is Facebook trying to evade responsibility, it doesn’t even recognize that it owes an apology to half a billion people whose personal details are now floating around the Internet.

There’s another important aspect to Facebook’s attempt to avoid any serious consequences from the latest leak, specifically fines under the EU’s GDPR. Facebook could potentially be fined up to 4% of its global turnover as a punishment for failing to protect the personal data of EU citizens. Because of the way the GDPR is enforced, it is up to the Irish Data Protection Commission (DPC) to investigate and decide whether a fine is in order. As the DPC’s Web site explains:

Previous datasets were published in 2019 and 2018 relating to a large-scale scraping of the Facebook website which at the time Facebook advised occurred between June 2017 and April 2018 when Facebook closed off a vulnerability in its phone lookup functionality. Because the scraping took place prior to GDPR, Facebook chose not to notify this as a personal data breach under GDPR.

Notification matters, because under the GDPR, Facebook has an obligation to announce personal data breaches. However, the DPC notes that the latest leak may contain additional records from a later period when the GDPR was in force. If that’s the case, Facebook should have notified the loss, and could be fined. Facebook says it has no plans to do so. The DPC added: “The DPC attempted over the weekend to establish the full facts and is continuing to do so. It received no proactive communication from Facebook.” One reason for this unhelpfulness might be that Facebook is banking on the DPC failing to take any serious action against the company. As a previous post on this blog has explained, that’s precisely what has happened with many other DPC investigations, arguably resulting in an undermining of the GDPR’s effectiveness. Another failure to act would undermine the Irish DPC’s position as the premier enforcement agency in the EU yet further. The fact that personal details of dozens of EU officials are among the latest leak may help to concentrate minds at the DPC.

An article in Wired explains that however large it may be, the Great Personal Data Leak of 2021 is actually nothing special. It is just the latest in a line of serious lapses on the part of Facebook:

One source of the confusion was that Facebook has had any number of breaches and exposures from which this data could have originated. Was it the 540 million records – including Facebook IDs, comments, likes, and reaction data – exposed by a third party and disclosed by the security firm UpGuard in April 2019? Or was it the 419 million Facebook user records, including hundreds of millions of phone numbers, names, and Facebook IDs, scraped from the social network by bad actors before a 2018 Facebook policy change, that were exposed publicly and reported by TechCrunch in September 2019?

The Wired article points out that Facebook reached a settlement with the FTC in July 2019 over its previous privacy failures. In exchange for paying a $5 billion fine and agreeing other terms, the company was indemnified for all its failures before 12 June 2019. An important issue is whether any of the data in the latest leak dates from after that cut-off point.

Whenever the data was extracted, the size and seriousness of the data holdings that are now out in the wild emphasize a number of important points. First, that creating huge stores of personal data is inherently dangerous, since there will always be leaks. Secondly, that Facebook seems largely indifferent to the problems its users suffer when their account information is leaked, and is more interested in deflecting blame than apologizing. Ideally, people with Facebook accounts would close them, but moving to other social networks is hard. At the very least, they would be advised to give fake details where possible so that key data can’t be used against them, as will certainly happen now.

Featured image by US Department of Defense.