For almost twenty years, it had been in existence and been used widely. Blemished since its birth by its seemingly dyslexic parents, it has served web masters and more recently online marketers alike. But alas, our good ol’ pal, the HTTP header field ‘Referer’ is under attack!
The ‘Referer’ was introduced in the early days of the web to allow webmasters to determine broken links to their sites:
This allows a server to generate lists of back-links to documents, for interest, logging, etc. It allows bad links to be traced for maintenance.
Since then, basically all web browsers submit the URL of the referring web page to the target web server as part of the request header. And pretty much every web analytics tool uses the referrer information to report about traffic souurces. As a webmaster I always liked the information I get from the referrer field, but of course it is also clear that not everyone would be happy to tell webmasters which web page they have come from.
The recent discussion about the potential sensitivity of referrer data is not exactly new. The authors of RFC 1945, the document that describes version 1.0 of the Hypertext Transfer Protocol, have written in 1996:
A server is in the position to save personal data about a user’s requests which may identify their reading patterns or subjects of interest. This information is clearly confidential in nature and its handling may be constrained by law in certain countries. People using the HTTP protocol to provide data are responsible for ensuring that such material is not distributed without the permission of any individuals that are identifiable by the published results.
So no one can say that the transfer of the referrer data is a new problem, it’s been around as long as the web.
Knowing this, it looks a bit awkward to me when Christopher Soghoian, clearly an expert in the field of online privacy, filed an FTC complaint last month to stop “the intentional leakage of search query information to third parties by Google”.
Yes, it’s true, if you search for something on Google without using their encrypted search, the web page you visit as a result of your search will know that a) you come to them from a Google search and b) what you were searching for. But the same is true for any other search engine I know. Actually, Google is the only search engine which does provide an encrypted search page and since mid-October they provide encrypted search by default to all logged-in users.
With the encrypted search Google is now jumping through hoops to still provide webmasters with the information that visitors came from their search product while simultaneously stripping the search keyword from the HTTP referrer.
In his complaint Soghoian argues that Google knows about the importance of protecting the search keywords by referring to the inbox search of Google Mail (p.18):
Google takes proactive steps to shield inbox search query information from leaking via referrer headers to third parties, such as when a user clicks on a link in an email message.
He also uses Google’s early steps to provide encrypted search in May 2010 and their statement that SSL search adds another layer of privacy by not sending the Referer header field to the target web site.
This is wrong! I mean, seriously, the only search company that addresses privacy concerns over search query data at all is attacked for “intentionally leaking search information”.
But why does Google not simply switch all search to be encrypted? In my opinion the main reasons for this are
- Technical constraints
While Google maintains thousands of servers and seemingly endless computing resources and data transfer capabilities, they are also processing more than a billion search queries each day. Encrypting and decrypting web pages are resource intensive tasks. Additionally, encrypted pages and other elements are usually not stored in a cache which is used to avoid unnecessary transfer of previously transferred elements. To serve all web searches in an encrypted manner will require Google to seriously beef up their computing centers
See above, apart from the required effort in labour, additional hardware will be required to solve all these technical constraints. Additionally Google’s already high energy consumption will become even higher.
- User Experience
Establishing a secure connection between your browser and Google takes longer than for a non-encrypted connection. While one might think that such a small delay is not important, it is common knowledge that Google is obsessed with speed.
To those who argue that Google could simply enable the keyword stripping from the referrer field, I’d like to argue that this would be ineffective. The privacy of search queries would still be at risk e.g. if you use Google over insecure Wifi. Furthermore ISPs, who know exactly who their users are can still monitor the keywords being searched. The same is true if one searches Google from his employer’s network while at work.
Christian Soghoian’s post finishes with the lines:
I have petitioned the FTC to compel the company to begin scrubbing this data, and to take appropriate steps to inform its existing customers about the fact that it has intentionally shared their historical search data with third parties. This, I think, is the right thing to do.
Maybe we should put one thing straight: As someone searching on Google you are not their customer, you are just using a service they provide for free – sponsored by those who pay for all the ads which are placed on the results page. Knowing what you searched for enables Google to display highly-targeted ads and this is what they sell to their customers: the ability to put ads in front of consumers at just the right time and in the perfect context, while they are researching information.
To say Google has intentionally “leaked” potentially sensitive information is – in my opinion – not correct either. They have not taken counter-measures to prevent a standard functionality of all major web browsers from transferring information about the search being conducted to the target web site. They refrain from tempering with the referrer so that their web site works the same way as any other.
Nevertheless, Soghoian makes a good point about this data being private and therefore I second his quest for more privacy around the referrer. I just think he’s addressing the wrong company, or to be more precise, his complaint is missing some other addressees, notably Microsoft, Apple, Opera, the Mozilla Foundation and other browser manufacturers.
The solution to his privacy problem has already been described in 1996 in the above mentioned RFC 1945 in section 10.13 (emphasis mine):
Note: Because the source of a link may be private information or may reveal an otherwise private information source, it is strongly recommended that the user be able to select whether or not the Referer field is sent. For example, a browser client could have a toggle switch for browsing openly / anonymously, which would respectively enable / disable the sending of Referer and From information.
Most browsers can be configured to not submit the referrer header, but it is clearly not as simple as a “toggle switch”. E.g. Firefox has a configuration setting for this and Google Chrome knows a command-line parameter to stop the referrer being transferred.
Somehow this quest reminds me of German data protection officer Thilo Weichert’s attack against Facebook about the privacy issues around the like button. A fight for a good cause, but not necessarily addressing the right target. (N.B. Weichert is threatening organisations within his area of responsibility with legal actions if they don’t take down the Social Plugins and close down their FB fan pages)