Everything about Web and Network Monitoring

Webpage Component Caching (Part 2 of 2)

Website Performance: Taxonomy of Tips intro­duced a clas­si­fi­ca­tion scheme to help us or­ga­nize the many per­for­mance tips found on the In­ter­net.  Empty Src and Href Attributes then started to examine the “journey from the ser­ver to the cli­ent” cate­gory by dis­cuss­ing the one tip that Yahoo lists as the most im­por­tant for web appli­ca­tion per­for­mance.Part One of this ar­ti­cle de­scribed cach­ing and showed how to im­ple­ment it.  You are now read­ing part two, which dis­cuss­es some of the think­ing and plan­ning a de­vel­o­per goes through to im­ple­ment cach­ing.

How Long to Cache?

The more we cache, the more is stored on the cli­ent ma­chine, and that means few­er hits on the ser­ver.

Sta­tic pages and com­po­nents should al­ways be cached.  If they’re not go­ing to change, store them on the cli­ent ma­chine so they on­ly have to make the trip once.  How­ever, cach­ing them forever creates its own prob­lem:  The brow­ser will nev­er ex­pire them.  If we change them some­time down the road, the brow­ser will not pick up the changed com­po­nent be­cause the cached com­po­nent has not yet ex­pired.  [The work­around is sim­ple enough, though.  When chang­ing a com­po­nent, just change its name (some peo­ple ap­pend the ver­sion num­ber or re­lease date to the name of the com­po­nent).  The old com­po­nent, with the old name, will cease be­ing used and will be purged by the cli­ent ac­cord­ing to an LRU (least recently used) al­go­rithm.]

Dy­na­mic pages and com­po­nents should not be cached un­less we can pre­dict when they will be changed next.  If we can, we can set them to ex­pire at that time and al­low cach­ing un­til then.

In ge­ne­ral, an ex­pi­ry date should be as far into the fu­ture as poss­ible, but not later than the date of the next change to the com­po­nent.  If we set a com­po­nent to ex­pire dai­ly, we will in­con­ve­nience our end-users once a day.  If we set a com­po­nent to ex­pire month­ly, we will in­con­ve­nience our end-users once a month.  How of­ten are we will­ing to in­con­ve­nience our end-users?

Creating Peak Load Unintentionally

If we set an ex­pi­ry date with the expires header line, every brow­ser in the world will ex­pire the com­po­nent at the same time.  If the com­po­nent is used on many brow­sers world-wide, this can cause a sud­den, per­haps crip­pling, peak on our ser­ver.  To a­void this sud­den peak, use cache-control: max age to set a du­ra­tion instead of using expires to set a date.  [How­ev­er, if we know our component will change on a cer­tain date, we can use ex­pires instead to make sure all browsers pick up the change at the right time.]

Viewing Component Load Times

Paid Monitor’ Free Page Load Test­ing shows us what com­po­nents were load­ed, how long they took to load, and what they wait­ed for pri­or to load­ing.  It pro­vides the same in­for­ma­tion that is avail­able through Paid Monitor’ Trans­action Mo­ni­tor­ing, but we can see the re­sults im­me­di­ately.  Paid Monitor’ Trans­action Mo­ni­tor­ing is for con­ti­nu­al mo­ni­tor­ing; the page load test­ing tool is for one-time use.

Page load test­ing and trans­ac­tion mo­ni­tor­ing pro­vide in­for­ma­tion about page loads from Mo­ni­tis’ com­pu­ters world­wide, but they do not show the re­sults from our end-user’s com­pu­ter.  If the end-user gets dif­fe­rent re­sults, the causes are more lo­ca­lized.

 

Separate Static Content from Dynamic Content

If we sepa­rate sta­tic con­tent from dy­na­mic con­tent, we can cache the sta­tic con­tent and ex­pire the dy­na­mic con­tent rath­er than ex­pir­ing every­thing im­me­di­ately.  Java­Script and CSS are both good can­di­dates for this tech­nique.

Some Technical Gotchas

Here are some special situ­a­tions to watch for:

  • HTTPS is not cache­able.
  • If we are using .htaccess, Apache’s ExpiresActivedi­rec­tive must be on.
  • Some brow­sers will con­si­der the page uncache­able if the last-modifiedhead­er line is miss­ing.
  • Cookies can in­ter­fere with cach­ing.
  • PHP is non-cache­able by de­fault, so make sure to use the headerfunc­tion to set an ap­pro­pri­ate ex­pi­ry.
  • If appropriate, use GET rather than POST.

Proxy Caching

Cach­ing on cli­ent ma­chines is a good idea, but it does no­thing to help first-time vi­si­tors to a web page.  Cli­ent cach­ing is on­ly use­ful on sub­se­quent vi­sits.

As a re­quest jour­neys from the ser­ver to the cli­ent, it hops from one ma­chine to a­noth­er un­til it reaches its des­ti­na­tion.  Some of those in-between ma­chines are will­ing to cache our con­tent if we ask them.  First time vi­si­tors may not have to re­trieve the con­tent all the way from the ser­ver.  It may be a­vail­able on a clos­er ma­chine.

This is sim­ple e­nough to im­ple­ment.  Just make sure the ser­ver in­cludes the cache-control: public and vary: accept-encoding head­ers in the re­sponse.  [The lat­ter is re­quired to get around a prob­lem with some soft­ware that serves com­pressed con­tent when it should be serv­ing un­com­pressed con­tent.]

References

Caching Tutorial for Web Authors and Webmasters by Mark Nottingham.  Published 2010.06.29 by Mark Nottingham at www.mnot.net/cache_docs.  Accessed 2011.12.14.

Paid Monitor Free Page Load Testing Tool.  Published by Paid Monitor at pageload.monitor.us.  Accessed 2011.12.20.

Paid Monitor Transaction Monitoring.  Published by Paid Monitor at portal.monitor.us/index.php/products/transactions-monitoring.  Accessed 2011.12.05.

Optimize Caching.  Published by Google at code.google.com/speed/page-speed/docs/caching.html.  Accessed 2011.12.14.  This is a detailed explanation of HTTP caching and a how-to tutorial.

Website Performance: Taxonomy of Tips by Warren Gaebel.  Published 2011.12.29 by Paid Monitor at blog.monitor.us/2011/12/website-performance-taxonomy-of-tips.  Accessed 2012.01.02

YSlow Ruleset Matrix.  Published by Yahoo at developer.yahoo.com/yslow/matrix.  Accessed 2011.11.30.  Yahoo’s implicit rankings can be inferred from the second column of the table (weight).  The bigger the number, the more important Yahoo considers the tip to be.

Try Paid Monitor For Free.  A 15-day free trial.  Your opportunity to see how easy it is to use the Paid Monitor cloud-based monitoring system.  Credit card not required.

The Paid Monitor Exchange at GitHub.  This is the official repository for scripts, plugins, and SDKs that make it a breeze to use the Paid Monitor system to its full potential.

Post Tagged with

About Warren Gaebel

Warren wrote his first computer program in 1970 (yes, it was Fortran).  He earned his Bachelor of Arts degree from the University of Waterloo and his Bachelor of Computer Science degree at the University of Windsor.  After a few years at IBM, he worked on a Master of Mathematics (Computer Science) degree at the University of Waterloo.  He decided to stay home to take care of his newborn son rather than complete that degree.  That decision cost him his career, but he would gladly make the same decision again. Warren is now retired, but he finds it hard to do nothing, so he writes web performance articles for the Monitor.Us blog.  Life is good!