Saturday, May 12, 2007

Don't kill your HTTP cache-ability!

In the 7 years I've been actively involved in web development, I have never seen any of my peers bother implementing proper controls to allow web proxies, and browser caches to correctly cache dynamic content.

I took the time to do this for a recent project, and I partly credit the HTTP caching code for allowing the site to survive huge traffic surges driven by TechCrunch and BoingBoing articles. I believe correctly implemented HTTP caching for a dynamic site is one of the smartest things a developer can do to mitigate the effect of these surges, and make best use of the CPU cycles and bandwidth of a web server.

For a long time though, I struggled with strange caching bugs and I ended up having to turn off the caching mechanims for authenticated users. Not an optimal solution... then last week I came across a comment from mnot.net which spelt out my error.


Changing the content based on IP address or cookies really damages cacheability and the idea that a GET is idempotent, as I understand it.


This is exactly what I had been doing... dynamic customisation of page content based whether a user is authenticated... or not. ARgh!

Unfortunatly, most of the Python web frameworks I've worked with encourage and even demonstrate this technique in documentation and examples.

The solution to this problem is shown in the same comment.


Rather than separating a user identifier...

Cookie: userid={USER_ID}
/content/foo/

Why not try something like this for personalization...

/content/{USER_ID}/foo/


Thanks for the tip l.m.orchard!

2 comments:

PhilipBober said...

Brilliant! Let's leak usernames in referrer logs.

Simon Wittber said...

Good point, using a login name as the USER_ID is probably not a great idea. It's probably a better idea to expose customised member pages via a unique customisable URL rather than a user name.

Popular Posts