Data Protection Authorities Around the World Investigate ChatGPT Over Privacy Concerns
At the beginning of February, when the excitement over generative AI systems like ChatGPT was starting to build, I warned on the PIA blog “it seems certain that much of the data ChatGPT has ingested is subject to the EU’s General Data Protection Regulation (GDPR)”. Now Italy’s data protection agency, “Il Garante per la protezione dei dati personali”, has shut down the use of ChatGPT in the country precisely because of concerns that it violates the GDPR. As BBC News reports, the Italian authorities see multiple reasons why ChatGPT should be blocked:
The watchdog said on 20 March that the app had experienced a data breach involving user conversations and payment information.
It said there was no legal basis to justify “the mass collection and storage of personal data for the purpose of ‘training’ the algorithms underlying the operation of the platform”.
It also said that since there was no way to verify the age of users, the app “exposes minors to absolutely unsuitable answers compared to their degree of development and awareness”.
It’s not just Italy. According to a report by Reuters, privacy regulators in France have contacted the relevant authorities in Italy for more details about the ban, and Ireland’s Data Protection Commissioner has said it will do the same.
ChatGPT’s Problems with EU Data Privacy Are Just Starting
Meanwhile, the German commissioner for data protection has said that it may also block ChatGPT because of GDPR issues. A report in Der Spiegel explains (translation by DeepL):
the German data protection authority supports the approach of its Italian colleagues. “Training data from an AI is subject to the GDPR, just like other data, if it is personal data,” a spokeswoman explains. Since OpenAI is based in the U.S., any European supervisory authority could take action.
At the same time, they are very interested in the results of their Italian colleagues: “We have already asked them for further information on the blocking of ChatGPT and will then pass this on to the competent state data protection supervisory authorities and state media authorities,” explains the authority spokeswoman.
The UK’s data protection authority, the Information Commissioner’s Office, has published a blog post listing eight questions that companies working in this area need to ask themselves, noting that protecting privacy “isn’t optional – if you’re processing personal data, it’s the law.”
The Italian Garante has given ChatGPTuntil April 30 to comply with its privacy requirements. Failure to resolve the issues mentioned above could lead to a fine of 20 million euros, or 4% of the total worlwide annual turnover of OpenAI, the company behind ChatGPT, whichever is higher.
It is unlikely that such a large fine would be levied so soon in a field that didn’t exist a few months ago, but nonetheless the threat is real, and could be used at any time if the Italian authorities decide that ChatGPT or any other AI system has indeed infringed on the GDPR.
The demand by the Garante raises a new and important issue: how exactly can these new “large language models” (LLMs) – which are not so much large as enormous – be updated? Given the interdependence of the key elements of LLMs, it seems unlikely that it will be possible to remove specific items from the input data so as to satisfy the privacy problems outlined by the Italian data protection authority.
ChatGPT Issues Are Bigger Than Data Privacy
ChatGPT has some other very specific problems under the GDPR, for example the right to correct erroneous data held by a company. When ChatGPT and other LLMs start “hallucinating”, that is generating false information, about individuals, what recourse will the latter have to “correct” those hallucinations?
It seems that false statements in LLM output can arise even if the underlying data is true. In these circumstances, the model itself is the problem. And how can that be reconciled with the rights granted to EU citizens under the GDPR?
This is not just about the GDPR and the EU, even though the problems there are particularly acute. Countries outside that region are also starting look at the privacy implications of LLMs. For example, the Office of the Privacy Commissioner of Canada has just launched an investigation into ChatGPT:
“AI technology and its effects on privacy is a priority for my Office,” Privacy Commissioner Philippe Dufresne says. “We need to keep up with – and stay ahead of – fast-moving technological advances, and that is one of my key focus areas as Commissioner.”
The investigation into OpenAI, the operator of ChatGPT, was launched in response to a complaint alleging the collection, use and disclosure of personal information without consent.
What the US Is Doing About ChatGPT
In the US, which lacks a federal privacy law comparable to the EU’s GDPR, there are still concerns about the new generative AI systems. The Center for AI and Digital Policy, an AI think-tank, has made a formal complaint to the Federal Trade Commission (FTC). It asks for an investigation into OpenAI and a halt on further commercial releases of the company’s new GPT-4, because of fears that it is “biased, deceptive, and a risk to privacy and public safety.”
Things are moving so fast in this sector that it is hard to keep up with them. But it seems clear that after an initial phase of wide-eyed excitement about generative AI systems like ChatGPT, people are now waking up to the extremely serious privacy issues they raise. The question now is to what extent those issues can be addressed, and whether they will lead to the field of LLMs being stifled in some parts of the world if data protection authorities slam on the regulatory brakes.
Featured image by OpenAI.