ChatGPT Is a Privacy Disaster Waiting To Happen
To say that the AI-based chatbot ChatGPT is making headlines around the world would be something of an understatement. Every day, people are finding new ways to apply its amazing capability to engage in conversation and debate, and to generate well-written explanations, articles, stories, poems, songs, jokes and much more.
The initial flood of ChatGPT posts gave the impression that it was an almost limitless tool that would magnify human intellect – and perhaps even replace it someday. With so many people using it, it didn’t take long for problems to start emerging.
One of the biggest problems with ChatGPT concerns your privacy and almost nobody’s talking about it. The service can collect and process highly sensitive details from the prompts you provide, associate that information with your email and phone number, and store it all indefinitely.
What Are ChatGPT’s Major Problems?
People began to realize that ChatGPT had no understanding of the texts it generated, that they were simply statistically plausible sequences of words. In particular, it turned out that ChatGPT’s answers were often wrong – sometimes trivially, sometimes seriously.
A new paper on the preprint site arXiv has a great summary of the many ways in which ChatGPT can produce incorrect responses. These include failures of:
- Spatial reasoning
- Physical reasoning
- Temporal reasoning
- Psychological reasoning
- Basic mathematics and arithmetic
- Factual errors
- Bias and discrimination
- Wit and Humor
- Syntax, spelling and grammar
Artificial, sure. Intelligent? Not so much. Right at the end of the paper (which is well-worth reading to get an idea of the scale of the problems with ChatGPT) there is a short paragraph noting that another risk of using ChatGPT is that it can compromise privacy in a variety of ways.
One major risk flows from signing up and using ChatGPT. To sign up requires both an email address and a mobile phone number. According to a post on the Sue Donimus site, you cannot get around this by using disposable or masked email addresses.
In addition, OpenAI, the company behind ChatGPT, has taken measures to prevent people from re-using phone numbers and from using disposable ones. The net effect is that a Chat GPT account is very firmly tied to two key markers of online identity: your email and your mobile phone number.
How ChatGPT Collects Personal Information
ChatGPT keeps a record of every message you send it. From those, it can learn a lot about your interests, beliefs, obsessions and concerns – this is a highly capable machine learning system, after all. That’s also true of today’s search engines, but ChatGPT can engage with a user in a completely new way.
Engaging in a dialog is one of the primary ways in which ChatGPT can collect highly personal information. The power of the software means that it is easy for the user to forget that it is an AI system, and to begin chatting as you might with a human. In doing so, you may reveal things that you would never type into a search engine… and all of these personal facts are now tied to your email and to your phone number.
Other risks flow from the poor privacy practices of the many of ChatGPT clones on the loose. But a more general concern about ChatGPT stems from the fact that at its heart lies “vast amounts of data from the internet written by humans, including conversations,” as the ChatGPT FAQ explains. According to an article on the BBC’s Science Focus site, that amounts to 570 gigabytes of data, some 300 billion words in total.
Because of the indiscriminate way that ChatGPT gathers data, much of it will refer to people, and it will include things that they have written or said over the last few years or even decades, in the most varied contexts, including on social media, personal websites, and in chat or even email threads if they are publicly available.
Much of ChatGPT’s power lies in its ability to bring all these disparate inputs together and analyze them on a hitherto impractical scale. Inevitably, this will result in it finding and making explicit connections and associations that may not be otherwise apparent.
When users interrogate ChatGPT, it could expose information or rumors about themselves or others that people would not want made public. Since ChatGPT has no understanding of what it produces, it will not hold back information that might be embarrassing or lead to careers or relationships being wrecked. It will be very hard to prevent this: even when basic safeguards are built in to the system, it turns out it is quite easy to “jailbreak” them by crafting suitable questions.
In addition, ChatGPT’s proven tendency to get things wrong and even make things up will doubtless lead to damaging and untrue allegations. Some people will nonetheless believe them and spread false statements further in the belief that the chatbot has uncovered hitherto withheld and secret information.
What Does OpenAI Say About ChatGPT’s Impact On Privacy?
OpenAI seems untroubled by these issues, perhaps too excited by the undoubted capabilities of its new technology. In its FAQ, it writes: “Please don’t share any sensitive information in your conversations,” which is both naive and unhelpful. It’s like telling someone not to think of a bear.
People are inevitably going to be entering all kinds of personal data in their “prompts” to ChatGPT in order to elicit fascinating answers to questions that matter to them, prompts that cannot be deleted according to OpenAI’s FAQ. And if you’re not careful enough to protect intimate details your life in the prompts you submit to ChatGPT… that’s that. Once submitted, nothing can be done to eliminate the personal information that the system already acquired in its 570 gigabytes of training data.
Will Privacy Laws Catch Up to ChatGPT?
Even if OpenAI is unconcerned by the enormous privacy risks of its system, it is likely that it will soon be forced to take action because of privacy laws in the EU and elsewhere. For example, it seems certain that much of the data ChatGPT has ingested is subject to the EU’s General Data Protection Regulation (GDPR), which restricts how personal data can be used for purposes other than those for which it was initially collected.
Moreover, the new EU AI Act, currently being finalized, could have a major impact on how ChatGPT is used and policed.
If the problem were only about ChatGPT leaking personal details in its responses to prompts, the issue might be tractable. But now that Google and Microsoft have entered the sector with their own recently-announced AI-based chatbots, the scale of the problem is much larger.
Add in the fact that the Chinese online giant Baidu is also about to launch its own system, together with Microsoft’s plans to release software to allow any company to create its own chatbot, and it is clear that the privacy problems outlined above are likely to be far greater than anyone currently imagines. AI-based chatbots are indubitably amazing technology, but they are also a privacy disaster waiting to happen.
Featured image by Microsoft.