Although this article is based on MySql experience, the concepts apply to other DBMS’s as well.
by Warren Gaebel | Feb 29, 2012
Although this article is based on MySql experience, the concepts apply to other DBMS’s as well.
Compressing textual data is almost always a good idea. Other data types may have good compression ratios, too. The CPU cost of compressing and uncompressing is usually less than the cost of moving large amounts of data between the disk and RAM.
When we design tables for performance, we must keep in mind that data integrity (including relational integrity) is more important than performance in almost all cases, and that there are sometimes tradeoffs between the two. For example, turning off MySql’s checksum feature may provide better performance, but is the slightly elevated risk of corrupted data worth it?
Horizontal Splitting: In some designs, access to live data is not a requirement. A delay of a few minutes, hours, or days before data is available is perfectly acceptable. In these cases, a table can be split horizontally, with new rows in one table (the transactional table) and older rows in a second, identically-structured table (the non-transactional table). New rows are added to the transactional table and queries are performed on the non-transactional table. Rows are moved from the transactional table to the non-transactional table when system activity troughs or according to a predetermined schedule. If the DBMS implements prioritization of queries, the rows can also be moved over on an ongoing basis, but at a lower priority than other queries. Indexing is not required on the transactional table because of its write-only nature.
Horizontal splitting can improve performance even if our design requires access to live data. Each row is stored in one of several identically-structured tables. Given some immutable value in the row (e.g., the primary key), a mapping algorithm tells us which table a row is in (or should be put into). Placing each table on a separate drive is not strictly required, but it does further improve performance.
Vertical Splitting: If a table has many columns, split it by putting the less-frequently-used columns into a second table. If a query requires data from both tables, a simple join on the primary key is used. In many cases, though, only the first table is used, which means the DBMS does not have to work its way through all the less-frequently-used data.
This concept can be extended in many ways. For example, if our analysis of usage patterns identifies a few frequently-used, but independent, queries, we could place the columns for each of those queries into separate tables. If two queries have a column in common (i.e., they are not completely independent), that column can be placed in both tables to eliminate the need for a join. This is an example of acceptable redundancy in physical database design, but processing logic must make sure the redundant data is always in sync.
Columns that are used in
are usually indexed. Other columns are usually not indexed. This is probably a good point from which we can start our testing and measuring.
Keep indexed columns short and indexes small. Autoindexed INTs make the best primary keys.
Do not use multiple columns in a single index. Use multiple indexes, one for each column, instead.
Like tables, indexes are stored on disk, so physical storage of indexes also impacts performance. Please refer to the discussion on physical storage above – it applies to indexes, too. Also consider pointer size for indexes.
If a query affects most or all of the rows in a table, sequential access is faster than using an index because it minimizes disk seeks (assuming low fragmentation on the HDD). In most cases, the DBMS knows this and automatically bypasses the index. If all the queries on a table are like this, indexing won’t be used, so building indexes can actually degrade overall performance.
Indexes are less important for queries on small tables.
If the DBMS gives us a choice of indexing algorithms, read the documentation to see which one is best for a given situation.
Paid Monitor Open API. Published by Paid Monitor at portal.monitor.us/api. Accessed 2012.01.13.
MySQL 5.6 Reference Manual: 7. Optimization. Published at dev.mysql.com/doc/refman/5.6/en/optimization.html. Accessed 2012.02.08.
MySQL 5.6 Reference Manual: 7.2.1.4. How to Avoid Full Table Scans. Published at dev.mysql.com/doc/refman/5.6/en/how-to-avoid-table-scan.html. Accessed 2012.02.08.
Using M3 to Take System Monitors to the Next Level by Josh Mattson. Published 2011.09.06 by Paid Monitor at blog.monitor.us/index.php/2011/09/06/using-m3-to-take-custom-system-monitors-to-the-next-level. Accessed 2012.01.13.
Website Performance: Taxonomy of Tips by Warren Gaebel, B.A., B.C.S. Published 2011.12.29 by Paid Monitor at blog.monitor.us/2011/12/website-performance-taxonomy-of-tips. Accessed 2012.01.31.
Try Paid Monitor For Free. A 15-day free trial. Your opportunity to see how easy it is to use the Paid Monitor cloud-based monitoring system. Credit card not required. Accessed 2012.01.13.
The Paid Monitor Exchange at GitHub. This is the official repository for scripts, plugins, and SDKs that make it a breeze to use the Paid Monitor system to its full potential. Accessed 2012.01.13.
Category: Articles | Tagged No Comments.
Web & Cloud
Monitoring