Abusing the Cache: Tracking Users without Cookies

I’ve been doing a little bit of research into ways to misuse browser history and cache and came across a very simple technique for tracking users without the need for cookies. Firstly, a demo. If you watch the HTTP requests you’ll see that there are no cookies being used.

To track a user I make use of three URLs: the container, which can be any website; a shim file, which contains a unique code; and a tracking page, which stores (and in this case displays) requests. The trick lies in making the browser cache the shim file indefinitely. When the file is requested for the first – and only – time a unique identifier is embedded in the page. The shim embeds the tracking page, passing it the unique ID every time it is loaded. See the source code (thanks to Nathan for pointing out the date error).

One neat thing about this method is that JavaScript is not strictly required. It is only used to pass the message and referrer to the tracker. It would probably be possible to replace the iframes with CSS and images to gain JS-free HTTP referrer logging but would lose the ability to store messages so easily.

As to how useful this actually is; the only use cases I can really think of are not exactly legitimate. The most obvious is to track users who won’t accept cookies. This does have advantages over cookies too; namely that this kind of tracking is completely silent. Virus scanners which search for an delete tracking cookies won’t affect sites using this method. Likewise, manually clearing cookies won’t work.

The most practical implementation would be to use this in concert with cookies to make tracking IDs more sticky, so they could outlast a user clearing their cookies. I’ve also been looking into adapting the link colour hack to store custom values in the browser history (this is easily doable). Combining these three techniques would mean a user would have to simultaneously clear their cache, their history and their cookies to circumvent tracking.

19 Responses

  1. In firefox 3.5.6 on linux, when I go to Tools -> Clear Recent History…, all 3 items necessary to thwart this method are checked by default.

  2. Another interesting technique would be to use the Etag.

    You can issue custom Etags to people and use that. Or abuse Date-Modified (although that would have precision issues potentially).

  3. Quite a neat idea. I just have the impression that the average user clears the browser cache far more often than their cookies, and so this method seems not to be as permanent as we would like it to be. :(

  4. How long do pages really last in the cache though? In Firefox, there’s only a 50MB cache by default. Hardly enough for a good hour of surfing.

  5. Nice idea, tokens can do a lot of neat things but I never thought of using them like this! I wonder if any big names are using this, would be a benifit for sites like amazon as a backup to cookies, although what browsers actually obey cache rules still, is it just ie?

  6. Very clever! Thinking about this a bit more, you could have this replicated in JavaScript simply enough by making a simple Ajax request to an indefinite-cache page, which would just contain your embedded cookieless ID.

    In my experience, these get cached the same way as ordinary pages, although your mileage may vary

  7. This is going to quickly become one of those things you wish you could uninvent.

  8. [...] Josh on the Web » Blog Archive » Abusing the Cache: Tracking Users without Cookiesjoshduck.com [...]

  9. And what happens if the user is behind a caching proxy? Do all users of this proxy occur as a single user?

  10. Social comments and analytics for this post…

    This post was mentioned on Reddit by k4st: That is surprisingly cool. I think that if the author can make this method work without Javascript and show that it is a cross-browser solution then it could be used (almost) anywhere where a cookie could be u…

  11. Ray, the initial request is marked with Cache-control: private so intermediate proxies should know not to cache the page. Only the client will keep the cache.

  12. @Mogden, that’s a good point. I haven’t done much testing as to how long the data is retained as this was more a proof of concept. Like I said in the post I believe it would be more suited towards being used with cookies to make sessions more sticky.

  13. Three Ways Sites Can Track Visitors Without Cookies, Part 2…

    In part 1, I wrote about the EFF’s Panopticlick project and the implications for anonymity. I’ve got two more methods up my sleeve. 2. Use the cache. Cookies aren’t the only thing your browser downloads and keeps around, and for good reaso….

  14. I think you have a mistake in your code.

    Line 56 reads:
    header(“Last-Modified: $expires GMT”);

    You probably meant:
    header(“Last-Modified: $emodified GMT”);

    As it is now, you’re claiming the last-modified date is 10 years in the future.

  15. heh, I topoed my corection, lol. it should say:

    header(“Last-Modified: $modified GMT”);

  16. Thanks Nathan. Good spot.

  17. The technique doesn’t work if web address contains port number like http://www.mysite.com:8082

  18. That’s an excellent idea.

    Though, taking the AJAX idea from Richard a bit further, what about using the ETag header and embedding the uniq ID in there. You could have one file request without query strings.

    Change the Cache-Control to max-age=0, must-revalidate and Expires: -1, then add ETag:. Subsequent requests will use the If-None-Match: request header.

    I wrote a blog entry about HTTP headers last week that explains these headers and how they function more in depth here: http://symkat.com/45/understanding-http-caching/

    Of course, the best resource if you’re the type who likes huge documents would be RFC 2616. Specific attention to section 14.

  19. [...] aware that at least two researchers are researching into tracking users using only the cache (using iframes or ETags). The truly paranoid Firefox user will therefore want Firefox to regularly clear its [...]