Mozilla study reaffirms that internet history can be used for “reidentification”

A recent research paper has reaffirmed that our internet history can be reliably used to identify us. The research was conducted by Sarah Bird, Ilana Segall, and Martin Lopatka from Mozilla and is titled: Replication: Why We Still Can’t Browse in Peace:On the Uniqueness and Reidentifiability of Web Browsing Histories. The paper was released at the Symposium on Usable Privacy and Security and is a continuation of a 2012 paper which highlighted the same reidentifiability problem.
Just your internet history can be used to reidentify you on the internet
Using data from 52,000 consenting Firefox users, the researchers were able to identify 48,919 distinct browsing profiles which had 99% uniqueness.
This is especially concerning because internet history is routinely sold by your internet service provider (ISP) and mobile data provider to third party advertising and marketing firms which are demonstrably able to tie a list of sites back to an individual they already have a profile on – even if the ISP claims to be “anonymizing” the data being sold. This is legally sanctioned activity ever since 2017 when Congress voted to get rid of broadband privacy and allow the monetization of this type of data collection.
This type of “history based profiling” is undoubtedly being used to build ad profiles on internet users around the world. Previous studies have shown that an IP address usually stays static for about a month – which the researchers noted “is more than enough time to build reidentifiable browsing profiles.”
It isn’t just our ISPs and mobile data providers that are siphoning up browsing history and using it for fingerprinting purposes, though. The authors noted in the abstract:
“[…] we observe numerous third parties pervasive enough to gather web histories sufficient to leverage browsing history as an identifier.”
These third parties include obvious players with a lot of insight into internet traffic such as Facebook and Google. All hope is not lost, though. In their user-facing recommendations section, the researchers commented:
“Until the state of the web has improved, the onus of ensuring privacy often falls on the user.”
Reidentification is a provable, real problem on the internet that internet users need to prepare for. It’s unfortunate that the internet infrastructure isn’t set up to respect privacy, and it’s unclear if it ever will be.