Everything about Web and Network Monitoring

Website Performance: Accessing the Database

This is the third of four articles about Database Management Systems’ (DBMS) performance. The first of these four articles presented an overview and some installation tips. The second article talked about building the database’s tables and indexes. Parts three and four get into the meat of accessing the database, with non-SQL tips in the article you are reading now and SQL tips in the fourth article.Website Performance: Taxonomy of Tips introduced a classification scheme to help us organize the many performance tips found on the Internet.  Database Management Systems fall into category #3.2 (the-server-side-script-accesses-a-service).

Connections

Opening and closing database connections is time consuming, so reusing connections is an easy way to access data faster. Whether the DBMS documentation refers to “persistent connections,” “connection pooling,” or some other term, we’re really just talking about reusing existing connections.

In PHP, for example, we can uses pconnect() rather than connect() to connect to a database. That’s really all there is to it. We are now reusing connections when they are available.

A word to the wise: Always close connections to guard against leakage. Never trust a garbage collector to do it for you.

Another word to the wise: Always close connections as soon as possible to maximize their reuse. Open the connection, get the data, close the connection, then process the data. Do not hold a connection open while processing data.

Compression

Compress data that is sent from the DBMS to the server-side script. The DBMS may implement compression by default, but if not, change the configuration. Both the DBMS server and its client (the server-side script in this case) need to agree on compression, so the script needs to request it. In PHP, for example, the request is made by setting the MYSQL_CLIENT_COMPRESS flag in the mysql_pconnect() function call.

Locking

The DBMS must implement some kind of locking mechanism if it allows simultaneous access by multiple queries. However, locking can lead to deadlocks and starvation. A deadlock occurs when two requests are waiting for each other to release a lock and neither does. Starvation occurs when a request waits too long for other requests to finish, which can result from excessive locking.

Although necessary, locking negatively impacts DBMS performance.

Copy transactionally-busy tables to read-only tables, then write to the original tables and read from the copies. The frequent locking and unlocking on the original table will not affect SELECTs because they are reading from the copy. [It is important to notice, though, that the copy quickly becomes out of date. This may be acceptabe for awhile, but we will also need a process to copy changed/new rows frequently enough that the out-of-date data is not a problem.]

Deleting a large number of rows with a single DELETE statement can delay other queries if the DELETE locks the table until it is finished. However, if our DBMS provides a LIMIT clause, we can delete a few rows at a time rather than all at once, which allows other queries to execute in between the DELETEs. We can put the DELETE statement inside an iteration structure and repeat until there are no more rows left to delete. Including a short pause between DELETEs gives other queries an opportunity to execute without starvation.

Always release locks as soon as possible to allow other processes access to the data. Prepare the data, lock, access the database, unlock, then process. Do not hold a lock while manipulating the data.

Buffering the data that is retrieved from a SELECT query (e.g., SQL_BUFFER_RESULT in MySql) releases locks earlier by reading the data into a temporary table and releasing the locks before serving it to the client process.

If our DBMS gives us control over locking granularity, we should lock only what we need by using fine-grain locks. This leaves as much data as possible accessible to other processes.

In MySql, SHOW STATUS LIKE ‘Table%’ shows how often locks make other requests wait. Its output is worth monitoring on a regular basis. Using a cron job to populate a monitor would work well in this case. [Monitoring Directory Size talked about monitoring the output from the command line. The same concept can be used to monitor the output from SHOW STATUS LIKE ‘Table%’.]

In MySql, we should request a READ LOCAL lock rather than a READ lock. This allows other processes to perform concurrent inserts while we have the table locked.

Background Processing

Most maintenance tasks and some operational tasks can be spun off into the background at a low priority. Cron, timers that allow callbacks, and PHP’s pcntl extension can be used to create background tasks.

Passing data to and from the background process may require a little thought. Shared memory and temporary tables may be used for this purpose.

References

MySQL Internals ClientServer Protocol: Compression. Published at forge.mysql.com/wiki/MySQL_Internals_ClientServer_Protocol#Compression. Accessed 2012.02.10.

MySQL Internals Manual: 8.7. Client Authentication Packet, rev 4036. Published 2006.11.26 by MySQL AB at scribd.com/doc/10035754/76/Client-Authentication-Packet. Accessed 2012.02.10.

PHP Manual: mysql_pconnect. Published at ca2.php.net/manual/en/function.mysql-pconnect.php. Accessed 2012.02.11.

Ten Tips for Writing High-Performance Web Applications by Rob Howard.  Published January 2005 by Microsoft at msdn.microsoft.com/en-us/magazine/cc163854.aspx.  Accessed 2012.01.13.

Website Performance: Taxonomy of Tips by Warren Gaebel, B.A., B.C.S.  Published 2011.12.29 by Paid Monitor at blog.monitor.us/2011/12/website-performance-taxonomy-of-tips.  Accessed 2012.01.31.

Try Paid Monitor For Free.  A 15-day free trial.  Your opportunity to see how easy it is to use the Paid Monitor cloud-based monitoring system.  Credit card not required.  Accessed 2012.01.13.

The Paid Monitor Exchange at GitHub.  This is the official repository for scripts, plugins, and SDKs that make it a breeze to use the Paid Monitor system to its full potential.  Accessed 2012.01.13.

Post Tagged with

About Warren Gaebel

Warren wrote his first computer program in 1970 (yes, it was Fortran).  He earned his Bachelor of Arts degree from the University of Waterloo and his Bachelor of Computer Science degree at the University of Windsor.  After a few years at IBM, he worked on a Master of Mathematics (Computer Science) degree at the University of Waterloo.  He decided to stay home to take care of his newborn son rather than complete that degree.  That decision cost him his career, but he would gladly make the same decision again. Warren is now retired, but he finds it hard to do nothing, so he writes web performance articles for the Monitor.Us blog.  Life is good!