- Part II – the client side (sending the request)
- Part III (this article) – the journey from client to server
- Part IV – the server side (up and down the protocol stack)
- Part V – the server side script
- Part VI – the database management system
- Part VII – the journey from the server to the client
- Part VIII – up the protocol stack
- Part IX – the client-side script
- Part X – the Document Object Model
- Part XI – after the document is complete
- Part XII – parallelism
- Part XIII – wrap-up; best practices
A Hop, Skip, and Jump
Performance Consideration: All other things being equal, some machines are faster than others. For example, a layer 2 switch is faster than a layer 3 switch, which is faster than a router, which is faster than a computer. However, this can be misleading because these devices serve different purposes, so other things are usually not equal. A switch works within a local area network and knows nothing about the Internet. A router connects a local area network to another local area network, so it is very Internet-aware. A computer may function as a switch or router, but it is (should be) used only if it has other non-switching, non-routing functions to perform. Whether switch, router, or computer, each machine’s configuration can dramatically affect performance.
Up The Stack
How does the network layer know where to send the IP datagrams next? That’s easy. It just looks it up in its internal routing tables. But that really doesn’t answer the question, does it? We really want to know how the routing information gets into the routing tables.
Adaptive (dynamic) routing constructs routing tables on the fly. There are several algorithms they can use for this purpose, so let’s simplify: Neighbouring machines tell each other what’s in their routing tables and how long it will take to deliver data to various destinations. Based on the destination IP address and what its neighbours say, the network layer decides which neighbouring machine should receive the current packet.
If routers go offline, their neighbours find out about it rather quickly. They then recompute where to send data and tell their neighbours, who tell their neighbours, who tell their neighbours, and so on.
Examples of adaptive routing algorithms: Routing Information Protocol (RIP), Open Shortest Path First (OSPF).
Performance Consideration: Routing can impact performance by sending data along a circuitous or indirect route, so routing should be monitored on an ongoing basis. This is called “route analytics.” Many tools and techniques exist to help with this task. For example, reliable, free monitoring is available from Monitor.Us.
Performance Consideration: A smaller number of hops will most likely improve performance. If your end users are geographically localized, place a direct-connect router near them. This router can pass data to your server with just one hop, and you have direct control over the performance characteristics of that router. Of course, it would be better to place the server geographically close to the end users, but that’s not always possible.
Performance Consideration: Configure the routers you control. Most of the routers your webapp uses are not under your control, so there’s not much you can do about them. However, don’t ignore the ones that are under your control. Make sure they are part of the solution rather than part of the problem. Example: Make sure your routers determine route costs according to live ping data. The Internet changes quickly; so should your routers’ cost calculations.
Performance Consideration: Turn caching on by specifying last-modified, expires, and cache-control:max-age in the HTTP headers of every resource. [Expires and max-age do the same thing. If both are present, most servers ignore expires and use max-age. Expires is only used by those few browsers that don't understand max-age. Include both, but make sure they expire the component at the same point in time.]
Performance Consideration: Turn proxy caching on by specifying cache-control:public in the HTTP headers of every resource. Don’t let security concerns dissuade you. Deal with them by encrypting the data.
Performance Consideration: Expire components as far into the future as possible, even if they will be changed in the near future. To change a component before its expiry date, change its name. Example: If the company’s logo (logo.png) changes, but the expiry date is still far in the future, put the new logo in a file named yyyymmdd.logo.png (where yyyymmdd is the date of the change). Don’t forget to change all the <img> src attributes to the new name. [Some people use a PHP function to get the logo's URL - when they change the logo, src is changed inside the PHP function, but there is no need to change it anywhere else.]
Performance Consideration: Wouldn’t it be nice if proxy servers would update their cached components at a low priority but on a regular basis before anyone requested them. Too much of this would clog up the Internet, but judicious use would almost guarantee that current components were waiting in the proxy’s cache at all times.
Performance Consideration: If caching interferes with your statistics, piggyback your stats program onto a dynamic component that cannot be cached. If no such component exists, put
at the top of your html, then create logAHit.php to log whatever statistical information you need. This program must be optimized, must produce no output, and should have these four lines at the beginning:
header("HTTP/1.1 204 No Content"); header("Cache-Control: private, no-cache"); header("Expires: Fri, 30 Oct 1998 14:19:41 GMT"); flush();
If the proxy server finds an expired copy of the requested component in its cache, it asks the web server whether the expired copy is still good (down the protocol stack, through the Internet, and up the web server’s protocol stack). The web server validates the cached copy by comparing its dates and perhaps other metadata to the most recent version.
Performance Consideration: The only way to avoid communication between the proxy server and the web server is to specify expiry and max-age in the HTTP headers.
The web server may determine that the cached copy, although expired, is still identical to the most recent version. If so, the web server tells the proxy server (down the stack, through the Internet, and up the proxy server’s stack). In this case, the web server does not resend the component; it sends only a few header lines, which likely fit into a single packet. Although there is still a performance hit for this service, validating a cached web page is faster than downloading it again because fewer packets are transmitted in the response.
The web server may determine that the cached copy is different from the most recent version. If so, it notifies the proxy server and sends it the most recent version. To minimize back-and-forth traffic, the web server sends the updated copy without requiring another, separate request for it. The response to the “is it current” question contains the “no, it’s not” answer and the fresh copy of the component. The two are packaged together into one HTTP response. Then it’s down the protocol stack, through the Internet, and up the proxy server’s protocol stack. Upon receipt, the proxy server discards the old copy and puts the new one into its internal cache.
The proxy server sends its cached copy, which may have just been updated, to the browser (down the protocol stack, through the Internet, and up the client machine’s protocol stack).
It’s easy to get lost in all those details. Here are the main points to remember:
- if proxy server has an unexpired copy, zero packets sent to/from the web server
- if proxy server has an expired copy, ask the server if it’s fresh (one packet sent)
- if web server sees that it is fresh, tell the proxy server that it is fresh (one packet sent)
- if web server sees that it is not fresh, send the component to the proxy server
Performance Consideration: Note that the following interferes with caching, so avoid them whenever possible:
- HTTP authentication that doesn’t use Cache-Control: public, no-cache
- SSL encryption (the HTTPS protocol)
- forms that use the POST method
- server-side scripts that don’t set cache control headers
Performance Consideration: Cookies interfere with caching. Store cacheable components in a directory that has never used cookies (the ancestor directories must be cookieless, too). [Putting cacheable content on a separate cookieless domain works, too. Serving from two or three domains also increases parallel downloading, but balance the load between the domains.]
Performance Consideration: Avoid SSL. It interferes with caching. Example: If an SSL-secured page contains the company logo, which is available in unsecured form on other pages, do not serve the logo through an SSL-secured connection.
Performance Consideration: Always refer to components the same way in all web pages. Although it may be possible to refer to components through several different URL’s use one and only one of those URL’s throughout the entire website. If you use more than one URL, the caches will have to store the component once for each URL rather than just once.
Performance Consideration: Working with dynamic content: Ask yourself how stale the dynamic content can be without causing a problem. Perhaps you will find that some dynamic content can be a few minutes, hours, or even days old. If so, create a low-priority background process to store the latest dynamic content as static content. Run this process every few minutes, hours, or days, depending on your needs. Serve the static copy of the dynamic content and configure the HTTP headers to cache it for the longest time that is appropriate. [It may be more practical to regenerate the page every time the dynamic content changes. You decide.]
Performance Consideration: In some cases, redesigning a dynamic component can make it (or much of it) cacheable. There’s more than one way to skin a cat.
Performance Consideration: During the transfer to production, copy only the changed files to the production hierarchy. Copying unchanged files alters the last-modified date, which will needlessly refresh caches during the validation process. There’s no need to download the entire site if only one or two components have changed.
Performance Consideration: If you use HTTP authentication, set the Cache-control header to public, no-cache. This allows caching, but requires re-authentication from the server before the page can be released to the end-user.