Even encrypted data streams from the Internet of Things are leaking sensitive information; here’s what we can do

Updated on Aug 5, 2020 by Glyn Moody

As the Internet of Things (IoT) begins to enter the mainstream, concerns about the impact such “smart” devices will have on users’ privacy are growing. Many of the problems are obvious, but so far largely anecdotal. That makes a new paper from four researchers at Princeton University particularly valuable, because they analyze in detail how IoT devices leak private information to anyone with access to Internet traffic flows, and what might be done about it. Now that basic privacy protections for Internet users have been removed in the US, allowing ISPs to monitor traffic and sell data about their customers’s online habits to third parties, it’s an issue with heightened importance.

The Princeton team looked at seven popular IoT devices: Sense Sleep Monitor, Nest Cam Indoor security camera, Amcrest WiFi Security IP Camera, Belkin WeMo switch, TP-Link WiFi Smart Plug, Orvibo Smart WiFi Socket, and the Amazon Echo. The data streams were assumed to be encrypted, and therefore not susceptible to direct inspection. However, merely looking at the traffic rates of the encrypted data flows turned out to be highly revealing:

Traffic rates from a Sense sleep monitor revealed consumer sleep patterns, traffic rates from a Belkin WeMo switch revealed when a physical appliance in a smart home is turned on or off, and traffic rates from a Nest Cam Indoor security camera revealed when a user is actively monitoring the camera feed or when the camera detects motion in a user’s home.

Similarly, the researchers found that looking at the traffic flow and some of its unencrypted metadata was generally enough to allow the devices to be identified. They point out that somebody carrying out surveillance of the external data stream to the Internet could use the first three bytes of device MAC addresses (the organizational unique identifier) to assign manufacturer labels to each flow. In addition, DNS queries within the data flow allowed four of the seven devices studied to be uniquely identified. The researchers suggested that the other three could be identifiable from their DNS calls to more than one domain, which might create a device-specific domain fingerprint.

The paper discusses another fruitful technique that can be applied to identify IoT devices purely from the traffic flow issuing from a location using them. Analysis revealed that changes in traffic rate correlated to device state changes caused by user activities for all of the seven systems tested. That means an adversary could use the changes in traffic patterns as another kind of fingerprint in order to identify all the devices. Simply knowing what devices are present is in itself highly revealing. For example, if there are signals from a pacemaker, then it is clear that someone at that location has heart problems.

After they had established that highly-personal information was leaking from the traffic streams, even assuming that they were sent in an encrypted form, the researchers looked at possible ways to mitigate that privacy threat. The simplest one – disconnecting IoT devices from the outside Internet – failed because the manufacturers built their products assuming always-on connectivity. Three of the devices – the two video cameras, and the sleep monitor – became unusable without a Net connection, while the others retained only a limited functionality.

Another obvious approach is to use a VPN. In theory this prevents anyone looking at the data stream from discerning individual flows. Aggregating traffic in this way did indeed reduce the leakage of potentially sensitive information, but not entirely. The Princeton team found three circumstances in which even the use of VPNs provided no protection. These were when only one device was transmitting; when multiple devices were all transmitting, but only sparsely, and could therefore be studied separately; and when there was one dominating device whose traffic swamped all the others.

However, the researchers developed a technique that overcame those issues. It involved using VPNs, but then added traffic-shaping in the form of “cover” traffic that pads out the data flow to create a uniform, informationless encrypted stream:

Though a VPN alone does not provide any guarantee of privacy against the traffic rate metadata attack, it is a necessary component of our implementation. Traffic shaping requires that we send cover traffic when no real packets are available. The VPN renders the real traffic packet headers indistinguishable from cover traffic packet headers.

Traffic shaping solves the problem of masking the data flow details, but brings with it other issues. Specifically, it increases the latency of the connection, and also uses up extra Internet bandwidth. However, the researchers were able to show that the IoT devices they studied were easily able to tolerate the higher latencies the traffic shaping approach produced. For broadband speeds that were typically available in the US, the extra bandwidth required was also unlikely to cause increased connectivity costs.

The new paper notes that traffic shaping of this kind requires a trustworthy VPN supplier, since the provider will be able to inspect the data flows within the VPN connection. The researchers also offer an alternative approach whereby a VPN company acts as an endpoint for traffic from multiple smart homes and offices, perhaps located in a single building. In that case, it would be difficult for the VPN supplier to associate devices with a particular smart home or office because the traffic would be mixed together.

As well as new markets for VPN suppliers, the researchers point out that there is scope for new kinds of “smart” routers designed specifically to protect “smart” homes. These devices would allow people to fine-tune their traffic shaping, maximising cover packets when there is lots of sensitive data being sent, and scaling it back when revealing data flows are low. Particularly intelligent smart routers could learn from a user’s activities and preferences to make an optimal tradeoff between privacy protection and cost automatically. As ever, where there are new problems, there are new opportunities too.

Featured image by Chris Price.

Comments are closed.

2 Comments

davecb

Can I niggle a bit about “anecdotal” ?

It usually means “not a good chunk of evidence”, but it can often mean “only one or a few reports” from people who mean it statistically as opposed to logically.

On the logical side, even one credible report of a failure is a proof that the device fails (;-))

That there’s now been a formal study of it is a good thing, but it’s mostly because we can now refer to one or more pieces of evidence reviewed and sworn to by a group of scientists. Which is the sense I read you as using.

–dave
[I think I’ll call the bad sense “argumentum ad statisticum (:-)]

9 years ago
1. Glyn Moody
  
  I simply meant it in the dictionary sense: “not necessarily true or reliable, because based on personal accounts rather than facts or research.”
  
  In fact, the anecdotes are true, as the new research shows. But they weren’t rigorous, whereas the new paper seems pretty good in that respect, and therefore is welcome.
  
  9 years ago