As with pretty well every other stage in this process, the web server performs better if its environment is configured to make optimized use of its resources, and if those resources are plentiful. Example: Swapping and paging are never appropriate on a machine that hosts a web server.
All other software installed on the web server’s machine will compete with it for available resources, even if the software is not running. Install the web server on its own dedicated machine.
If you buy the cheapest possible equipment, you’ll get what you paid for.
Buy an Internet service with a high outgoing:incoming bandwidth ratio. Web servers send much more data than they receive, so we need much more outgoing bandwidth than incoming. Note that typical home services are not appropriate because they have a very low outgoing:incoming bandwidth ratio (because typical home users download much more than they upload).
Apache is used in the examples below to illustrate concepts that apply to most web servers. The syntax on other servers will be different, but the concepts in most cases will be similar.
Specify a complete list of options instead of using a wildcard specification with DirectoryIndex. The list should be in order, with the most commonly-used filenames first.
RAID (Redundant Array of Inexpensive Disks) allows us to stripe data across multiple drives, which not only gives a performance boost, but can also be used to improve data integrity.
Use separate servers for static content and dynamic content. This lets us have separate configurations for the two different types of content.
Modularization & Dynamic Configuration
Apache also allows dynamic configuration through .htaccess files. Each .htaccess file relates to the direcory in which it is found and to all subdirectories. Example: If the web server is serving a resource from a directory named x, which is a subdirectory of y, which is a subdirectory of z, which is a subdirectory of docroot, it must look for and parse .htaccess files in all four of those directories. If it finds a BrowserMatch directive in both y and docroot, it uses the directive in y and ignores the directive in docroot.
Note the number of disk accesses required to check all those ancestor directories. Avoid .htaccess whenever possible; use the Directory directive in httpd.conf instead. [This may not be possible if the web server is hosted by a third party.]
If you must use .htaccess, use it as close to docroot as possible. Every additional level is another disk access.
Check out the performance tips on Apache’s performance tuning page.
Virtual and Per-User Hosting
Virtual hosting creates multiple websites for multiple domains. A domain may share its IP address and/or docroot with another domain or have exclusive access to its own IP address and/or docroot. Example: Web hosting companies use virtual hosting for most of their clients. Their clients usually share IP addresses, but do not share docroots.
If websites share an IP address, those websites will not be able to serve resources by IP address; they can serve resources by domain name only. So what? Resources served by IP address avoid DNS lookups and cookie processing. IP addresses should be used instead of domain names whenever possible.
In addition to virtual hosting, per-user hosting allows each user account to host his own website. For example, if user ted on a machine named yellow.example.com wants to host his own website, he can put his files in his public_html directory and access them at http://yellow.example.com/~ted/. This technique is fading away as we trend toward dedicated servers that have only root and webadmin users.
Concurrency, Threads, & Processes
Apache manages concurrency with pluggable concurrency modules (also called multiprocessing modules or MPMs).
MPMs are specific to an environment, so we can’t choose one that’s not built for our web server’s platform. Of the remaining choices, though, we must realize that our choice can affect the speed and scalability of the web server. If scalability or performance is important, we can choose something like worker or event, which have multiple threads per process. If compatibility with older software is important, prefork may be more suitable, but it has only one thread per process. [Other MPMs are available from Apache and from third parties.]
Limit the resources given to each connection so no connection can starve the other connections.
Use MaxClients to set the maximum number of simultaneous connections.
Limiting Resource Usage
A server administrator might want to use these techniques to guard against denial of service attacks, spam, or users who hog the server with multiple downloads.
Reverse IP Lookups
DNS lookups journey through the Internet, so they are slow (and can be very slow). Avoid this traffic by always turning HostnameLookups off. However, note that double-lookups will be performed if we use mod_authz_host, even if we turn HostnameLookups off.
Authentication, Authorization, & Access Control
Apache provides several techniques for protecting directories and files:
- Place the directories and files outside the docroot hierarchy if no one should ever access them from a web browser.
- Use Order, Allow, and Deny directives within the scope of a Files or Directory directive to allow/deny access by domain, IP address, or any HTTP header (see SetEnvIf).
- Restrict access by userid/password. [Instructions]
Avoid DNS lookups by using IP addresses instead of domain names whenever possible in Allow and Deny directives.
Redirection & Rewriting
Avoid redirection and missing resources. They result from a bad request coming from one of our web pages, so they can be avoided by putting the correct link into the web page.
If the web server gets past the above process, it then follows a set of rewrite rules supplied by the website developers. These rules map the URL to a file on the local machine. When the web server is finished with the rewrite rules, it has a specific file name within the docroot hierarchy. The path and filename may look nothing like the original URL.
Avoid rewriting. If the URL hierarchy is the same as the file hierarchy on the hard disk, rewriting can be bypassed.
Server-Side Includes (SSI)
SSI can be slow if configured poorly, which is easily done. We might be better off with a server-side scripting language (e.g., PHP) or XMLHttpRequest from the client side.
The browser can prioritize its list, which makes the server’s choice a little easier, but content negotiation is not as simple as comparing the browser’s list to the sever’s list and selecting the best choice from the intersection set.
- What if there is no intersection set between the browser’s list and the server’s list?
- What if the browser does not provide a list?
Type maps and multiviews make it possible to specify different files for different media types, languages, and encodings. The URL need only name the resource in a generic way; the server will choose the right file based on the HTTP accept-… headers.
Example: If the URL is http://example.com/navbar, the server may have files named navbar.en.php, navbar.es.php, and navbar.fr.php. It knows which one to use by inspecting the accept-language header from the request.
Type maps (.var files) list resources; the files that contain the resources; and the media types, language, and encodings of those files. It was not originally intended to be a handy, easy-to-use documentation tool; it just turned out that way. The type maps can help with both debugging and ongoing maintenance.
Content can be in its own file or included within its type map file.
Placing the content into the type map file instead of in its own, separate file will save the web server one file-open operation. [It also groups all the files for a single resource together in one place.]
Content negotiation negatively impacts performance. Turn it off. However, if you decide that the benefits of content negotiation outweigh the performance hit, use type maps instead of multiviews. This will eliminate extra disk accesses.
Retrieve the Content
Fix programs that spawn a large number of processes, don’t release connections, or have memory leaks. Add more memory if you must, but it’s better to locate and fix the code that causes the problem.
Beware potential performance problems with dynamic output filtering. “Dynamic” means it happens while the user is waiting for a response – and that’s not a good time to be doing anything.
Create the HTTP Response
Make sure the web server always includes the Last-Modified, Expires, Cache-Control, Content-Length, and Content-Type headers. Caching can be affected if these headers are missing.
Only encrypt when you need to. Avoid the extra processing time and SSL handshaking.
Web Server Caching
To keep or not to keep is decided on the basis of configuration options and available memory. How long to keep it for is decided the same way. Which cached content to dispose of first when memory fills up is decided by a simple LRU (least recently used) algorithm.
In-memory caching uses similar algorithms and protocols to the proxy caching that we discussed in part 3. One difference to keep in mind, though, is that UseCananonicalName should be turned on if you are using virtual hosting.
Although the web server’s caching abilities are good, the operating system’s file/disk caching abilities are probably better. Make sure the o.s. is configured to make the best possible use of file/disk caching.
Stick to CLF. There are a good number of log-analysis tools out there, but most of them assume we are using CLF.
Input to and output from CGI scripts can also be logged.
Logging input to and output from CGI scripts can be handy in a test environment, but never use it in production.
All logging impairs performance. If performance were our only concern, we would turn it off completely. However, because we must also consider the need for information during problem determination, we leave minimal logging turned on in production environments. The tradeoff is very real, so we need to give it some thought rather than blindly accept the default settings.
Close the Connection
After serving some content, Apache will wait for another request without closing the connection. However, it will not wait forever. The KeepAliveTimeout configuration directive specifies how long to wait.
A modern web server is far different from a simple file server. The selection and configuration of its features can dramatically impact performance. If a third party hosts your web server, you may not have enough control over its configuration to ensure acceptable performance.
The whole series is comprised of 15 parts:
- Part I – an overview of the entire process from beginning to end
- Part II – down the protocol stack (client side)
- Part III – the journey from client to server
- Part IV – up the protocol stack (server side)
- Part V (this one) – the web server (software)
- Part VI – the server side script
- Part VII – the database management system
- Part VIII – down the protocol stack (server side)
- Part IX – the journey from server to client
- Part X – up the protocol stack (client side)
- Part XI – the client-side script
- Part XII – the Document Object Model
- Part XIII – after the document is complete
- Part XIV – parallelism
- Part XV – wrap-up; best practices