Abusing the Cache: Tracking Users without Cookies
I’ve been doing a little bit of research into ways to misuse browser history and cache and came across a very simple technique for tracking users without the need for cookies. Firstly, a demo. If you watch the HTTP requests you’ll see that there are no cookies being used.
To track a user I make use of three URLs: the container, which can be any website; a shim file, which contains a unique code; and a tracking page, which stores (and in this case displays) requests. The trick lies in making the browser cache the shim file indefinitely. When the file is requested for the first – and only – time a unique identifier is embedded in the page. The shim embeds the tracking page, passing it the unique ID every time it is loaded. See the source code (thanks to Nathan for pointing out the date error).
One neat thing about this method is that JavaScript is not strictly required. It is only used to pass the message and referrer to the tracker. It would probably be possible to replace the iframes with CSS and images to gain JS-free HTTP referrer logging but would lose the ability to store messages so easily.
As to how useful this actually is; the only use cases I can really think of are not exactly legitimate. The most obvious is to track users who won’t accept cookies. This does have advantages over cookies too; namely that this kind of tracking is completely silent. Virus scanners which search for an delete tracking cookies won’t affect sites using this method. Likewise, manually clearing cookies won’t work.
The most practical implementation would be to use this in concert with cookies to make tracking IDs more sticky, so they could outlast a user clearing their cookies. I’ve also been looking into adapting the link colour hack to store custom values in the browser history (this is easily doable). Combining these three techniques would mean a user would have to simultaneously clear their cache, their history and their cookies to circumvent tracking.
January 29th, 2010 at 8:37 am
In firefox 3.5.6 on linux, when I go to Tools -> Clear Recent History…, all 3 items necessary to thwart this method are checked by default.
January 29th, 2010 at 10:01 am
Another interesting technique would be to use the Etag.
You can issue custom Etags to people and use that. Or abuse Date-Modified (although that would have precision issues potentially).
January 29th, 2010 at 4:08 pm
Quite a neat idea. I just have the impression that the average user clears the browser cache far more often than their cookies, and so this method seems not to be as permanent as we would like it to be.
January 29th, 2010 at 4:51 pm
How long do pages really last in the cache though? In Firefox, there’s only a 50MB cache by default. Hardly enough for a good hour of surfing.
January 29th, 2010 at 7:26 pm
Nice idea, tokens can do a lot of neat things but I never thought of using them like this! I wonder if any big names are using this, would be a benifit for sites like amazon as a backup to cookies, although what browsers actually obey cache rules still, is it just ie?
January 29th, 2010 at 7:40 pm
Very clever! Thinking about this a bit more, you could have this replicated in JavaScript simply enough by making a simple Ajax request to an indefinite-cache page, which would just contain your embedded cookieless ID.
In my experience, these get cached the same way as ordinary pages, although your mileage may vary
January 29th, 2010 at 8:02 pm
This is going to quickly become one of those things you wish you could uninvent.
January 29th, 2010 at 8:55 pm
[...] Josh on the Web » Blog Archive » Abusing the Cache: Tracking Users without Cookiesjoshduck.com [...]
January 29th, 2010 at 11:39 pm
And what happens if the user is behind a caching proxy? Do all users of this proxy occur as a single user?
January 30th, 2010 at 1:27 am
Social comments and analytics for this post…
This post was mentioned on Reddit by k4st: That is surprisingly cool. I think that if the author can make this method work without Javascript and show that it is a cross-browser solution then it could be used (almost) anywhere where a cookie could be u…
January 30th, 2010 at 7:28 am
Ray, the initial request is marked with Cache-control: private so intermediate proxies should know not to cache the page. Only the client will keep the cache.
January 30th, 2010 at 7:35 am
@Mogden, that’s a good point. I haven’t done much testing as to how long the data is retained as this was more a proof of concept. Like I said in the post I believe it would be more suited towards being used with cookies to make sessions more sticky.
February 10th, 2010 at 6:17 pm
Three Ways Sites Can Track Visitors Without Cookies, Part 2…
In part 1, I wrote about the EFF’s Panopticlick project and the implications for anonymity. I’ve got two more methods up my sleeve. 2. Use the cache. Cookies aren’t the only thing your browser downloads and keeps around, and for good reaso….
April 6th, 2010 at 6:40 am
I think you have a mistake in your code.
Line 56 reads:
header(“Last-Modified: $expires GMT”);
You probably meant:
header(“Last-Modified: $emodified GMT”);
As it is now, you’re claiming the last-modified date is 10 years in the future.
April 6th, 2010 at 6:40 am
heh, I topoed my corection, lol. it should say:
header(“Last-Modified: $modified GMT”);
April 6th, 2010 at 6:46 am
Thanks Nathan. Good spot.
April 28th, 2010 at 7:20 pm
The technique doesn’t work if web address contains port number like http://www.mysite.com:8082