Everything about Web and Network Monitoring

Website Performance: SQL

This is the fourth of four articles about Database Management Systems’ (DBMS) performance. The first of these four articles presented an overview and some installation tips. The second article talked about building the database’s tables and indexes. Parts three and four get into the meat of accessing the database, with non-SQL tips in the third article and SQL tips in the article you are reading now.Website Performance: Taxonomy of Tips introduced a classification scheme to help us organize the many performance tips found on the Internet.  Database Management Systems fall into category #3.2 (the-server-side-script-accesses-a-service).

The information below is particular to MySql. However, most relational database management systems have similar features, so the concepts are relevant to all.

Intelligence Gathering

Some SQL performance tips conflict with each other, but for good reason. In certain cases, we get maximum performance by following one, and in other cases, we get maximum performance by following another. When we see conflicting tips like this, we realize they can both be right, but in different situations.

This demonstrates why intelligence gathering is so important. Before attempting to fix a performance problem, it is important to know its cause. And if we don’t yet have a performance problem, intelligence gathering can show us where one is lurking just below the covers.

Finding Slow Queries:

Perhaps the most relevant source of performance data is the slow query log, which identifies queries that take a long time to execute. We control the definition of “slow” by setting two system variables (long_query_time and min_examined_row_limit). We can use mysqldumpslow to see the results.

Slow query logging must be turned on and configured. Instructions are included in the documentation.

Logging impacts performance, so the slow query log should be turned on only when needed. Turning it on at the beginning of a peak period and then turning it off at the end may be enough. Because data and usage patterns can change, remember to run it on a regular basis. Query logging should not be left on for monitoring purposes.

Status Variables:

SHOW GLOBAL STATUS shows the current values of the system variables. Some of these variables provide performance information.

Some system variables provide useful information only when their value is compared to their previous value. The first value is recorded, we wait for some time to pass, then we compare the new value to the old value. The difference tells us the activity during the time interval. For example, Queries can give us an indication of the server’s load, Select_full_join can tell us how many queries don’t use indexes, and Slow_queries can show us how many queries took longer than long_query_time.

Status variables also provide information about compression, connections, SSL, locking, threads, temporary tables, handlers, files, and more. As we gain an understanding of our usage patterns and our DBMS’s response to those patterns, we learn which status variables should be monitored.

Analyzing Individual Queries:

Once a slow query has been identified, we need to find out why it is a problem. The EXPLAIN statement helps by showing us the database engine’s plan for accessing the data. We specify the query that has caught our attention, and the EXPLAIN statement tells us how the data will be accessed for that query. We can then make a change that we think will help and re-run EXPLAIN to see whether we were right.

Our goal is to minimize the number of row comparisons, which is the product of the values in the “rows” column of EXPLAIN’s output, which is described in the MySql documentation. The same web page also provides a short tutorial that shows how to use the output to improve performance. It doesn’t cover all possibilities; it is intended to be only a quick and easy introduction.

Adding indexes is the first thing most people think of because it often helps. However, keep in mind that columns that are compared (e.g., foreign key to primary key) should be the same data type and length, and the length should be small. Also, if the value in EXPLAIN’s filtered column is high, indexing may not help.

If we think we know better than the optimizer, we can use FORCE INDEX or IGNORE INDEX to impose our will. This may or may not help. The only way to know is to measure performance with and without.

ANALYZE TABLE gathers information about the columns’ data distributions. This information is used by the optimizer to create the access plans. Run it regularly (but not at peak load time). Data distributions are usually stable in mature databases, but if we find they’re not, we should run ANALYZE TABLE more frequently. Always run it immediately before using EXPLAIN.

Minimize the Size of the Result Set

Accessing less data improves performance and is sometimes easy to implement. It breaks down into two parts: accessing fewer columns and accessing fewer rows.

Accessing Fewer Columns:

How many times have we seen SELECT * in our code? Downloading every column in the table when we’re only using some of them wastes resources and impedes performance. Our code should specifically name the columns it intends to use, and forego downloading all those extra columns.

There is another, more subtle reason for avoiding SELECT * even if our code uses every column in the table. If, at some future time, we add new columns to our table, every SELECT * will automatically start downloading those extra columns, even though they’ re not being used.

Accessing Fewer Rows:

Reducing the number of rows is not as easy. It requires a little thought. We need to look at every query and ask the simple question, “Am I really using all the rows that I’m downloading?” Sadly, that question often remains unasked.

One way to reduce the number of rows being downloaded is to download the data from summary tables (perhaps permanent, perhaps temporary). This won’t work in all cases, but it’s worth thinking about.

Providing paged access to data has become popular. Instead of fetching all the rows, we get the first n rows, then ask the user to click on a “next” link or button to see the next n rows. A “prev” link or button lets them go backwards as well as forwards.

The logic behind paged access is that the user may find what he’s looking for in the first few pages, which means we don’t need to download the rest of the data. This may be true, but we also need to consider the user’s perspective. Even if our database is lightning fast, we have a performance issue on our hands if the user has to click, wait, click, wait, click, wait, click, wait, …

Paged access may not be the right answer, especially if n is small. Providing the user with an index into the data or using a more specific search may better serve our purposes.

Some Specific Code-Writing Tips

There are too many SQL-writing tips to even list them in point form here. Here are a few widely-applicable tips that will most likely improve performance. Again, please remember that any tip may or may not help in a given situation.

  • Move SQL outside loops. Execute the SQL statement once to get all the data rather than many times to get it a piece at a time.
  • Use INSERT DELAYED where possible. The writes aren’t committed until later, but the query completes sooner.
  • Use GROUP BY instead of DISTINCT.
  • Batch INSERT, DELETE, and REPLACE statements into a background process that runs at a lower priority (if appropriate). Use LOAD DATA instead of INSERT.
  • Don’t use SELECT RAND() AS R … ORDER BY R on large result sets.
  • Avoid wildcards at the beginning of LIKEs.
  • Use JOINs instead of correlated subqueries.
  • Make similar queries consistent so the query cache can be used.
  • IN vs JOIN vs EXISTS – Much is said, but this may or may not make a difference because the optimizer takes care of it in some cases.
  • Delete a small number of rows at a time so the process doesn’t hold a lock for so long that it starves other queries.
  • If you intend to search or sort e-mail addresses by domain, store the userid and domain in separate columns. [Some tips say to reverse the e-mail addresses, but this is a little kludgy.]
  • Use triggers sparingly and wisely.
  • Use SQL_NO_CACHE if any of the underlying tables are frequently updated.
  • Splitting complex queries may or may not help. JOINing smaller queries may or may not help.
  • A bigger query cache does not necessarily mean better performance.
  • A much-talked-about tip says to use INSERT … ON DUPLICATE KEY UPDATE (INSERT IGNORE) rather than a SELECT + INSERT combination. However, Eddie on Everything points out that merely letting the query fail may perform better.

References

Custom Monitors in Paid Monitor with Python by Hovhannes Avoyan.  Published 2011.07.07 by Paid Monitor at blog.monitor.us/index.php/2011/07/07/custom-monitors-in-paid-monitor-with-python.  Accessed 2012.01.13.  This article shows us how to populate a monitor from the command line.  We can extend the concept to access the DBMS’s status variables in a cron job.

Let’s make the web faster: PHP performance tips by Eric Higgins. Published by Google at code.google.com/speed/articles/optimizing-php.html.  Accessed 2012.02.16.  This article’s example demonstrates one of many ways we can remove SQL from a loop and use a single statement instead.

M3 – Paid Monitor Monitor Manager by Dan Fruehauf.  Published 2011.07.21 by Paid Monitor at blog.monitor.us/index.php/2011/07/21/m3-paid-monitor-monitor-manager.  Accessed 2012.01.13. Dan shows us how the Paid Monitor Monitor Manager can be used to capture output from an iostat command. We can use the same approach to capture server variables from our DBMS and send them to a monitor.

Monitoring Directory Size by Warren Gaebel, B.A., B.C.S.  Published 2011.12.22 by Paid Monitor at blog.monitor.us/2011/12/monitoring-directory-size.  Accessed 2012.01.31.  This article talks about monitoring directories, but the same approach can be used to monitor DBMS status variables.

MySQL 5.6 Reference Manual: 7.2.1.4. How to Avoid Full Table Scans. Published at dev.mysql.com/doc/refman/5.6/en/how-to-avoid-table-scan.html. Accessed 2012.02.08.

Optimized Pagination using MySQL by Robert Eisele. Published 2011.10.24 by Robert Eisele at xarg.org/2011/10/optimized-pagination-using-mysql/.  Accessed 2012.02.16.

Ten Tips for Writing High-Performance Web Applications by Rob Howard.  Published January 2005 by Microsoft at msdn.microsoft.com/en-us/magazine/cc163854.aspx.  Accessed 2012.01.13.  This article presents some of the above concepts from an ASP.NET perspective.

Top 15 Ways to Kill MySQL Performance by Jay Pipes. Published 2011.05.08 by YouTube at youtube.com/watch?v=kxer0A23sQ4.  Accessed 2012.02.16.

Top10SQLPerformanceTips by Jay Pipes, et al. Published by MySql.com at forge.mysql.com/wiki/Top10SQLPerformanceTips.  Accessed 2012.02.16. Contrary to what the title says, this article actually presents 99 tips.

Website Performance: Taxonomy of Tips by Warren Gaebel, B.A., B.C.S.  Published 2011.12.29 by Paid Monitor at blog.monitor.us/2011/12/website-performance-taxonomy-of-tips.  Accessed 2012.01.31.

The Paid Monitor Exchange at GitHub.  This is the official repository for scripts, plugins, and SDKs that make it a breeze to use the Paid Monitor system to its full potential.  Accessed 2012.01.13.

Post Tagged with

About Warren Gaebel

Warren wrote his first computer program in 1970 (yes, it was Fortran).  He earned his Bachelor of Arts degree from the University of Waterloo and his Bachelor of Computer Science degree at the University of Windsor.  After a few years at IBM, he worked on a Master of Mathematics (Computer Science) degree at the University of Waterloo.  He decided to stay home to take care of his newborn son rather than complete that degree.  That decision cost him his career, but he would gladly make the same decision again. Warren is now retired, but he finds it hard to do nothing, so he writes web performance articles for the Monitor.Us blog.  Life is good!