How pervasive real-time bidding for online ads silently undermines your privacy

Posted on Sep 8, 2018 by Glyn Moody

Most people have heard of Moore’s Law, which roughly means that computers have doubled in power every few years. One of the benefits of Moore’s Law is that it has put a supercomputer in everyone’s pocket, in the shape of a low-cost mobile phone. Less well known is the profound impact on online advertising it has had – and, as a result, on our privacy online.

Privacy News Online reported over a year ago that the speed of computers is such that it is possible to conduct a real-time auction for the advertising space on a Web page in the few milliseconds that it takes for the page to load. Information about the person who will view the page and its ads is sent to multiple advertisers so that they can calculate how much to bid in the automated real-time auction.

Auction bids are further informed by any personal information that advertisers have previously gathered about the visitor. This is typically collected using browser cookies, which allow people to be recognised as they move around the Internet. The use of cookies to track us explains why the same ad for the same product mysteriously appears on multiples sites – a practice known as “remarketing“.

Since advertisers often use different cookie identifiers, they may not be able to recognize a visitor to a Web site that is conducting a real-time ad auction from the information supplied. In order to enable the greatest number of advertisers to participate in the bidding, “cookie syncing” has become common. This is a practice whereby the different cookie identifiers for an individual are linked to form an aggregated identity across several advertising platforms. That way, when a real-time ad auction takes places, most advertisers know whose attention they are bidding for.

Ad auctions combined with cookie syncing raise important privacy issues. As part of the real-time bidding process, advertisers are given information about an individual’s visit to the Web site conducting the auction, whether or not they ultimately serve an ad on the Web page in question. New research from two computer scientists at Northeastern University in Boston explores this aspect, with some disturbing results.

Since it is not possible to observe real-time bidding across the Web directly – even if it were technically feasible, and ethically acceptable, advertisers would probably be unwilling to allow such a close inspection of their operations – the researchers built a model of the process. To do this, they needed to establish which advertisers have synchronized their cookies in the manner described above. Key to gathering this information was noting that if remarketing takes place – that is, if a previously viewed ad on one site appears on a completely different site – this shows that information has been passed from the first site to the second. Where the advertising for each site is handled by a different company, this indicates that they have shared information, presumably via cookie synchronization.

The researchers used a modified version of the open source Chromium browser, and created “shopper personas” – artificial online presences that accessed shopping sites and then publishers sites, described in more detail in an earlier paper. The targets consisted of 738 major e-commerce Web sites and 150 popular publishers. The shopper personas searched for 10 manually-selected products per e-commerce site to “signal strong intent to trackers and advertisers”, followed by 15 randomly-chosen pages per publisher to elicit display ads. Any remarketed ads that appeared on the publishers’ sites that came from the e-commerce sites were then used to establish which advertisers were exchanging data about individuals.

Using this knowledge about how personal data was passed between advertisers, the researchers then further simulated the browsing behavior of typical users. They generated “browsing traces” – histories of Web visits – for 200 virtual users. On average, each user generated 5,343 ad impressions on 190 unique publishers. From this they were able to model how much of the “browsing traces” the leading advertisers and advertising exchanges were able to observe as a result of real-time auctions sharing user information. The results are striking: under a variety of different conditions, the major advertisers and their networks received notifications regarding approximately 90% of the Web pages that were visited. It’s important to note that they received this information irrespective of whether they eventually won bids to serve up ads, since details of what pages were visited by an individual were sent before the auction even began, with no obligation to delete them afterwards.

This result suggests that real-time bidding is causing almost all information about our movements around the Web to be shared with major advertisers and advertising exchanges. Many people use ad blockers in an attempt to protect their privacy from this kind of information leakage. The researchers examined to what extent these browser add-ons reduce the sharing of personal information. Here’s what they found with the most popular of these, AdBlock Plus:

it is troubling to observe that AdBlock Plus barely improves users’ privacy, due to the Acceptable Ads whitelist containing high-degree ad exchanges.

The problem here is that the whitelist of acceptable ads leads to information leaking out to many other advertisers that may not be on the whitelist. Eyeo, the company behind AdBlock Plus, told the German site netzpolitik.org that it is adding an option to use the acceptable ads whitelist without tracking, which it says should address this problem. As for other ad blockers, the researchers write:

we find that users can improve their privacy by blocking A&A [advertising company] domains, but that the choice of blocking strategy is critically important. We find that the Disconnect blacklist offers the greatest reduction in observable impressions, while Ghostery offers significantly less protection. However, even when strong blocking is used, top A&A domains still observe anywhere from 40–80% of simulated users’ impressions.

According to the model used by the researchers, it seems that the sharing of information by the real-time bidding process is extremely hard to counter, because industry cooperation is so pervasive. Although limited, the data that is shared can be highly revealing, since it allows a wide range of advertisers to track which Web pages an individual is visiting. Some of these – for example, about health problems – are extremely personal by their very nature. Coupled with all the other tracking data that advertisers routinely store, the threat to privacy from this kind of large-scale automated sharing is plain.

The evident difficulty of controlling the leakage of personal data as a result of tracking and real-time auctions lends heightened importance to attempts to pass legislation to address the issue. Privacy News Online wrote recently about the EU’s ePrivacy regulation, one of whose key proposals is to make it easy for people to opt out of online tracking. That would obviously have a major adverse impact on the online advertising industry, which is probably one reason why the lobbying against the ePrivacy regulation is so fierce.

Featured image from Wikimedia.