Abusing the Cache: Tracking Users without Cookies

I’ve been doing a little bit of research into ways to misuse browser history and cache and came across a very simple technique for tracking users without the need for cookies. Firstly, a demo. If you watch the HTTP requests you’ll see that there are no cookies being used.

To track a user I make use of three URLs: the container, which can be any website; a shim file, which contains a unique code; and a tracking page, which stores (and in this case displays) requests. The trick lies in making the browser cache the shim file indefinitely. When the file is requested for the first – and only – time a unique identifier is embedded in the page. The shim embeds the tracking page, passing it the unique ID every time it is loaded. See the source code (thanks to Nathan for pointing out the date error).

One neat thing about this method is that JavaScript is not strictly required. It is only used to pass the message and referrer to the tracker. It would probably be possible to replace the iframes with CSS and images to gain JS-free HTTP referrer logging but would lose the ability to store messages so easily.

As to how useful this actually is; the only use cases I can really think of are not exactly legitimate. The most obvious is to track users who won’t accept cookies. This does have advantages over cookies too; namely that this kind of tracking is completely silent. Virus scanners which search for an delete tracking cookies won’t affect sites using this method. Likewise, manually clearing cookies won’t work.

The most practical implementation would be to use this in concert with cookies to make tracking IDs more sticky, so they could outlast a user clearing their cookies. I’ve also been looking into adapting the link colour hack to store custom values in the browser history (this is easily doable). Combining these three techniques would mean a user would have to simultaneously clear their cache, their history and their cookies to circumvent tracking.

17 Responses to “Abusing the Cache: Tracking Users without Cookies”

  1. codepanda Says:

    In firefox 3.5.6 on linux, when I go to Tools -> Clear Recent History…, all 3 items necessary to thwart this method are checked by default.

  2. Andrew Shuttlewood Says:

    Another interesting technique would be to use the Etag.

    You can issue custom Etags to people and use that. Or abuse Date-Modified (although that would have precision issues potentially).

  3. Lukas Says:

    Quite a neat idea. I just have the impression that the average user clears the browser cache far more often than their cookies, and so this method seems not to be as permanent as we would like it to be. :(

  4. Mogden Says:

    How long do pages really last in the cache though? In Firefox, there’s only a 50MB cache by default. Hardly enough for a good hour of surfing.

  5. Woody Says:

    Nice idea, tokens can do a lot of neat things but I never thought of using them like this! I wonder if any big names are using this, would be a benifit for sites like amazon as a backup to cookies, although what browsers actually obey cache rules still, is it just ie?

  6. Richard Says:

    Very clever! Thinking about this a bit more, you could have this replicated in JavaScript simply enough by making a simple Ajax request to an indefinite-cache page, which would just contain your embedded cookieless ID.

    In my experience, these get cached the same way as ordinary pages, although your mileage may vary

  7. Fluck Says:

    This is going to quickly become one of those things you wish you could uninvent.

  8. Josh on the Web » Blog Archive » Abusing the Cache: Tracking Users without Cookies « Netcrema – creme de la social news via digg + delicious + stumpleupon + reddit Says:

    [...] Josh on the Web » Blog Archive » Abusing the Cache: Tracking Users without Cookiesjoshduck.com [...]

  9. Ray Says:

    And what happens if the user is behind a caching proxy? Do all users of this proxy occur as a single user?

  10. uberVU - social comments Says:

    Social comments and analytics for this post…

    This post was mentioned on Reddit by k4st: That is surprisingly cool. I think that if the author can make this method work without Javascript and show that it is a cross-browser solution then it could be used (almost) anywhere where a cookie could be u…

  11. admin Says:

    Ray, the initial request is marked with Cache-control: private so intermediate proxies should know not to cache the page. Only the client will keep the cache.

  12. admin Says:

    @Mogden, that’s a good point. I haven’t done much testing as to how long the data is retained as this was more a proof of concept. Like I said in the post I believe it would be more suited towards being used with cookies to make sessions more sticky.

  13. JasonMorrison.net Says:

    Three Ways Sites Can Track Visitors Without Cookies, Part 2…

    In part 1, I wrote about the EFF’s Panopticlick project and the implications for anonymity. I’ve got two more methods up my sleeve. 2. Use the cache. Cookies aren’t the only thing your browser downloads and keeps around, and for good reaso….

  14. Nathan Friedly Says:

    I think you have a mistake in your code.

    Line 56 reads:
    header(“Last-Modified: $expires GMT”);

    You probably meant:
    header(“Last-Modified: $emodified GMT”);

    As it is now, you’re claiming the last-modified date is 10 years in the future.

  15. Nathan Friedly Says:

    heh, I topoed my corection, lol. it should say:

    header(“Last-Modified: $modified GMT”);

  16. Josh Says:

    Thanks Nathan. Good spot.

  17. Dmitry Vl.Bondar Says:

    The technique doesn’t work if web address contains port number like http://www.mysite.com:8082

Leave a Reply