The latest threat to your online privacy: exfiltration of personal data by website session-replay scripts
Last week, Privacy News Online reported on a worrying trend of increased surveillance in the workplace. This kind of spying includes capturing every keystroke workers make. The practice is regarded in many jurisdictions as acceptable because people are working on equipment provided by their employer, and use it to carry out tasks for the company that pays their wages. So the logic is that an employer has permission to check that the equipment is being used properly, and that employees are working diligently. But a blog post on the Freedom to Tinker blog reveals that keystroke capture and more is taking place on public websites too:
“You may know that most websites have third-party analytics scripts that record which pages you visit and the searches you make. But lately, more and more sites use “session replay” scripts. These scripts record your keystrokes, mouse movements, and scrolling behavior, along with the entire contents of the pages you visit, and send them to third-party servers. Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder.”
The researchers looked at services from Yandex, FullStory, Hotjar, UserReplay, Smartlook, Clicktale, and SessionCam. They found the named services in use on 482 of the Alexa top 50,000 sites, but just one of them – Yandex – says that its Yandex.Metrica product is on 8 million sites, so the number of websites using this technology globally is probably even higher. Adding constant surveillance is simple: FullStory claims “One small snippet records every user action. No maintenance and no manual tagging.” The key feature offered by all these companies is session replay, described here by Yandex:
“Find the “why” behind every lost conversion by seeing how people interact with your site, such as with video footage. Clicks, scrolls, keystrokes, and mouse movements are all recorded in a single informative movie. Get an all-round view by looking at desktop, mobile, and logged-in sessions. Never miss something interesting with up to 150,000 recordings per day.”
Some services also enable mobile device gestures to be captured, including pinch, zoom, tap, double tap, swipe and tilt. As a result, huge quantities of personal data from computing use can be gathered and stored. Yandex says:
“Send any amount of data to Yandex.Metrica and handle it the way you want: adjust the sampling rate to get reports faster, or use unsampled data for maximum accuracy. Storage time is unlimited, too – no matter how much data you have.”
Once gathered, the data is often intensively analyzed in order to understand “digital consumer psychology“. Perhaps inevitably, machine learning algorithms are applied increasingly, in order to “automate the discovery of signatures left by struggling customers and determine whether the detected anomalies represent significant revenue opportunities.” The idea of identifying “struggling customers”, together with “key journeys” and “customer funnels”, as many services promise, is natural enough for Websites that wish to maximize online sales. But the Freedom to Tinker post reveals that this kind of commercial surveillance brings with it major privacy problems:
“Collection of page content by third-party replay scripts may cause sensitive information such as medical conditions, credit card details and other personal information displayed on a page to leak to the third-party as part of the recording. This may expose users to identity theft, online scams, and other unwanted behavior. The same is true for the collection of user inputs during checkout and registration processes.”
While it is true that services offer manual and automatic redaction tools to stop sensitive information from being collected, the Freedom to Tinker blog post points out that applying them in real-world situations is a mammoth task that realistically few website owners will undertake. As a result, the Freedom to Tinker researchers found a number of serious issues in their tests.
For example, passwords may be included in session recordings, even though the services attempted to prevent this. Similarly, despite the use of redaction tools, sensitive user inputs are masked in a partial and imperfect way. Moreover, the redaction tools only applied to user input: they did nothing to hide sensitive information that may be present on the Web page that is captured and stored. For example, on one site selling medicines, the researchers found that medical conditions and prescriptions are leaked along with the names of users.
A more general problem with all these session-replay services is that they gather and store large quantities of highly personal data. There are bound to be concerns about the security of personal data held by the companies providing the services. Leaks on a massive scale are now commonplace, so assurances that such information is safe can hardly be relied upon. On top of that, the researchers found that some playbacks of recorded sessions took place within an HTTP page, even for recordings of user actions on a page originally sent via HTTPS. This means that data that was previously protected by HTTPS is now vulnerable to passive network surveillance, making privacy breaches more likely.
Although the desire to monitor how visitors are using websites is natural enough, and can be beneficial for users, the approaches discussed above seem disproportionate. It is therefore likely that they will fall foul of the European Union’s new General Data Protection Regulation, which will be enforced from next May. As a useful information sheet from Big Brother Watch explains, key elements of the new law include a requirement that personal data can only be collected if an organization has explicitly asked for and received consent, telling people in detail how their data will be used. Personal data must be gathered for a specific, explicit and legitimate purpose, and must be adequate, relevant and limited to the purpose of the processing – general collection is not allowed. An important new principle of accountability says that companies must not only comply with the EU law, they must show they are complying properly. It is hard to see how many of these services offering complete session replays will be able to operate in the EU without drastically limiting their data collection habits.
Featured image by George Shuklin.