Google’s Infinite Reach – How Google Builds a Profile on Everyone

Posted on Apr 17, 2019 by Derek Zimmer

There’s a lot of talk about the invasive practices of FaceBook right now, but somehow Google has skated by largely unfazed by the public outcry against privacy violations and the snowballing impact of too much personal information being out there with ambiguous consent from the parties being snooped on.

Through the lens of privacy activism, this is interesting, because FaceBook is in many ways less bad than Google. For a lot (but not all) of FaceBook services, you have to have an account or install their apps in order to give up your data. People who refuse to participate are still tracked by FaceBook cookies and url tracking.

The difference is that Google’s method of attack is by aggregation. Google builds many tools that are useful to users and administrators, but they are engineered to gather data about users and build as much of a profile as possible, which is then sold to advertisers, governments, other data brokers, or anyone else who is willing to pay.

This article breaks down some of the services that Google uses to build profiles about users, and discusses how these servers threaten the privacy of users without their consent. For this reason, we will avoid talking about Android and Chrome, because you can simply choose not to use them or to use versions that have much of the analytics stripped from them like the Chromium browser (instead of Chrome), or Lineage OS (instead of Android).

Google AMP:

Google AMP is a service that caches data, usually media, on Google servers around the world. This means that when you load a website with AMP enabled, the images and media come from Google’s servers. This means that when you are visiting a website with AMP enabled, Google knows every resource that you’ve loaded on the page. Interestingly, this gives Google access to substantially more information than your ISP would be able to get, because https encryption prevents the ISP from seeing what specific pages you visit. They can only see the domain. As an example, your ISP could see that you visited Reddit, but not what subreddit or posts. Google AMP linked content on Reddit (there is a ton of it) gives Google a direct IP – Content link that they can document and use to profile user behavior and activity.

This problem is widespread. WordPress sites, which is the most popular content management system in the world, have AMP on by default.

Even worse, Google has recently announced that mobile Chrome users wont even be able to tell when they are using amp-served content. Chrome will hide the AMP content behind the original URL.

GMail:

Gmail is another service that gives Google far more information than your ISP could ever get. Most E-mail in 2019 is encrypted in transport, so your ISP cannot read the contents of the email and can only get the metadata. However, even if you do not use Gmail, you have to interact with Gmail users on a daily basis. Google scans all email that goes to and comes from Gmail users for content, meaning that they read the entire email and scan the attachments as well.

This is extremely problematic, considering Google’s market share here. You more or less have to avoid email altogether, or get everyone to commit to using PGP encryption to avoid allowing Google to read all of your email. Because of G-suite and the ability for Gmail to integrate into custom domains, you can’t tell from an email address if an email is going to Google servers.

Google Analytics:

Google Analytics uses cookies and cross-site tracking to identify and follow users as they traverse the web in their daily lives. Analytics works by assigning a user a cookie with a unique ID number, and then every time a user hits a site with Google Analytics enabled, Google records that activity and links it to the profiled user with that particular cookie. Often this is done without the user being aware.

Recently, the European Union’s GDPR has created a lot of warnings for users about this type of analytics software, but it has been ineffective at curbing its use because Google Analytics is so ubiquitous that it bothers users with cookie warnings at nearly every visited site, and many do not follow the GDPR and allow users to browse the site or use services without tracking.

Google Cloud:

Another big data point for Google is Cloud. As of 2018, Google hosts about 9.5% of all “cloud” content on Google Cloud (by revenue, much of Google’s cloud services are “free” and they may host substantially more by volume). If you are using an app or website that uses Google Cloud infrastructure, that is another direct line to your information. A user does not give Google consent to retain data about them to use Google Cloud services.

Google Maps API:

Every time you visit a business’ website and it uses Google Maps (and not a screenshot of it), that is using the Google Maps API. This data is combined with Google Analytics to attach location information to your Google profile.

Google FireBase:

FireBase is a tool that allows developers to easily sync data between different websites, apps, and services. The caveat is that this data is synced through Google’s servers which record all of that data and profile it without the knowledge of the user.

Aggregation Amplifies the Problem:

It is important to understand that all of this data is pooled together into a dossier about every user on the web who encounters these services regularly (just about everyone). It is an enormous cache of data and it gives tremendous power (and profit!) to Google to serve you targeted advertising, or broker your data to other groups that want to influence you.

You can’t quit Google by just avoiding their products. Surveillance as a business is deeply built into the web and it is hard to avoid without breaking A LOT of sites who rely on these malicious tracking services.