Webpage Component Caching (Part 2 of 2)

Website Performance: Taxonomy of Tips intro­duced a clas­si­fi­ca­tion scheme to help us or­ga­nize the many per­for­mance tips found on the In­ter­net.  Empty Src and Href Attributes then started to examine the “journey from the ser­ver to the cli­ent” cate­gory by dis­cuss­ing the one tip that Yahoo lists as the most im­por­tant for web appli­ca­tion per­for­mance.Part One of this ar­ti­cle de­scribed cach­ing and showed how to im­ple­ment it.  You are now read­ing part two, which dis­cuss­es some of the think­ing and plan­ning a de­vel­o­per goes through to im­ple­ment cach­ing.

How Long to Cache?

The more we cache, the more is stored on the cli­ent ma­chine, and that means few­er hits on the ser­ver.

Sta­tic pages and com­po­nents should al­ways be cached.  If they’re not go­ing to change, store them on the cli­ent ma­chine so they on­ly have to make the trip once.  How­ever, cach­ing them forever creates its own prob­lem:  The brow­ser will nev­er ex­pire them.  If we change them some­time down the road, the brow­ser will not pick up the changed com­po­nent be­cause the cached com­po­nent has not yet ex­pired.  [The work­around is sim­ple enough, though.  When chang­ing a com­po­nent, just change its name (some peo­ple ap­pend the ver­sion num­ber or re­lease date to the name of the com­po­nent).  The old com­po­nent, with the old name, will cease be­ing used and will be purged by the cli­ent ac­cord­ing to an LRU (least recently used) al­go­rithm.]

Dy­na­mic pages and com­po­nents should not be cached un­less we can pre­dict when they will be changed next.  If we can, we can set them to ex­pire at that time and al­low cach­ing un­til then.

In ge­ne­ral, an ex­pi­ry date should be as far into the fu­ture as poss­ible, but not later than the date of the next change to the com­po­nent.  If we set a com­po­nent to ex­pire dai­ly, we will in­con­ve­nience our end-users once a day.  If we set a com­po­nent to ex­pire month­ly, we will in­con­ve­nience our end-users once a month.  How of­ten are we will­ing to in­con­ve­nience our end-users?

Creating Peak Load Unintentionally

If we set an ex­pi­ry date with the expires header line, every brow­ser in the world will ex­pire the com­po­nent at the same time.  If the com­po­nent is used on many brow­sers world-wide, this can cause a sud­den, per­haps crip­pling, peak on our ser­ver.  To a­void this sud­den peak, use cache-control: max age to set a du­ra­tion instead of using expires to set a date.  [How­ev­er, if we know our component will change on a cer­tain date, we can use ex­pires instead to make sure all browsers pick up the change at the right time.]

Viewing Component Load Times

Paid Monitor’ Free Page Load Test­ing shows us what com­po­nents were load­ed, how long they took to load, and what they wait­ed for pri­or to load­ing.  It pro­vides the same in­for­ma­tion that is avail­able through Paid Monitor’ Trans­action Mo­ni­tor­ing, but we can see the re­sults im­me­di­ately.  Paid Monitor’ Trans­action Mo­ni­tor­ing is for con­ti­nu­al mo­ni­tor­ing; the page load test­ing tool is for one-time use.

Page load test­ing and trans­ac­tion mo­ni­tor­ing pro­vide in­for­ma­tion about page loads from Mo­ni­tis’ com­pu­ters world­wide, but they do not show the re­sults from our end-user’s com­pu­ter.  If the end-user gets dif­fe­rent re­sults, the causes are more lo­ca­lized.


Separate Static Content from Dynamic Content

If we sepa­rate sta­tic con­tent from dy­na­mic con­tent, we can cache the sta­tic con­tent and ex­pire the dy­na­mic con­tent rath­er than ex­pir­ing every­thing im­me­di­ately.  Java­Script and CSS are both good can­di­dates for this tech­nique.

Some Technical Gotchas

Here are some special situ­a­tions to watch for:

  • HTTPS is not cache­able.
  • If we are using .htaccess, Apache’s ExpiresActivedi­rec­tive must be on.
  • Some brow­sers will con­si­der the page uncache­able if the last-modifiedhead­er line is miss­ing.
  • Cookies can in­ter­fere with cach­ing.
  • PHP is non-cache­able by de­fault, so make sure to use the headerfunc­tion to set an ap­pro­pri­ate ex­pi­ry.
  • If appropriate, use GET rather than POST.

Proxy Caching

Cach­ing on cli­ent ma­chines is a good idea, but it does no­thing to help first-time vi­si­tors to a web page.  Cli­ent cach­ing is on­ly use­ful on sub­se­quent vi­sits.

As a re­quest jour­neys from the ser­ver to the cli­ent, it hops from one ma­chine to a­noth­er un­til it reaches its des­ti­na­tion.  Some of those in-between ma­chines are will­ing to cache our con­tent if we ask them.  First time vi­si­tors may not have to re­trieve the con­tent all the way from the ser­ver.  It may be a­vail­able on a clos­er ma­chine.

This is sim­ple e­nough to im­ple­ment.  Just make sure the ser­ver in­cludes the cache-control: public and vary: accept-encoding head­ers in the re­sponse.  [The lat­ter is re­quired to get around a prob­lem with some soft­ware that serves com­pressed con­tent when it should be serv­ing un­com­pressed con­tent.]


