The previous article presented an introduction SQL and NoSQL. This article builds on these topics by reviewing six of the top performance metrics to capture to assess the health of your database in your enterprise application.
Specifically this article reviews the following:
- Business Transactions
- Query Performance
- User and Query Conflicts
- Capacity
- Configuration
- NoSQL Databases
Business Transactions
Business Transactions provide insight into real user behavior: they capture real-time performance that real users are experiencing as they interact with your application. As mentioned in the previous article, measuring the performance of a business transaction involves capturing the response time of a business transaction holistically as well as measuring the response times of its constituent tiers. These response times can then be compared with the baseline that best meets your business needs to determine normalcy.
If you were to measure only a single aspect of your application, I would encourage you to measure the behavior of your business transactions. While container metrics can provide a wealth of information and can help you determine when to auto-scale your environment, your business transactions determine the performance of your application. Instead of asking for the CPU usage of your application server you should be asking whether or not your users can complete their business transactions and if those business transactions are behaving optimally.
As a little background, business transactions are identified by their entry-point, which is the interaction with your application that starts the business transaction.
Once a business transaction is defined, its performance is measured across your entire application ecosystem. The performance of each business transaction is evaluated against its baseline to assess normalcy. For example, we might determine that if the response time of the business transaction is slower than two standard deviations from the average response time for this baseline that it is behaving abnormally, as shown in figure 1.
Figure 1 Evaluating BT Response Time Against its Baseline
The baseline used to assess the business transaction is consistent for the hour in which the business transaction is running, but the business transaction is being refined by each business transaction execution. For example, if you have chosen a baseline that compares business transactions against the average response time for the hour of the day and the day of the week after the current hour is over, all business transactions executed in that hour will be incorporated into the baseline for next week. Through this mechanism an application can evolve over time without requiring the original baseline to be thrown away and rebuilt; you can consider it as a window moving over time.
In summary, business transactions are the most reflective measurement of the user experience, so they are the most important metric to capture.
Query Performance
The most obvious place to look for poor query performance is in the query itself. Problems can result from queries that take too long to identify the required data or bring the data back. Look for these issues in queries:
Selecting more data than needed
It is not enough to write queries that return the appropriate rows; queries that return too many columns can cause slowness both in selecting the rows and retrieving the data. It is better to list the required columns rather than writing SELECT*. When the query is based on selecting specific fields, the plan may identify a covering index, which can speed up the results. A covering index includes all the fields used in the query. This means that the database can generate the results just from the index. It does not need to go to the underlying table to build the result. Additionally, listing the columns required in the result reduces the data that’s transmitted, which also benefits performance.
Inefficient joins between tables
Joins cause the database to bring multiple sets of data into memory and compare values, which can generate many database reads and significant CPU. Depending on how the tables are indexed, the join may require scanning all the rows of both tables. A poorly written join on two large tables that requires a complete scan of each one is very computationally expensive. Other factors that slow down joins include joining on columns that are different data types, requiring conversions, or a join condition that includes LIKE, which prevents the use of indexes. Avoid defaulting to using a full outer join; use inner joins when appropriate to bring back only the desired data.
Too few or too many indexes
When there aren’t any indexes that the query optimizer can use, the database needs to resort to table scans to produce query results, which generates a large amount of disk input/output (I/O). Proper indexes also reduce the need for sorting results. Indexes on non-unique values do not provide as much help as unique indexes in generating results. If the keys are large, the indexes become large as well, and using them creates more disk I/O. Most indexes are intended to help the performance of data retrieval, but it is important to realize that indexes also impact the performance of data inserts and updates, as all associated indexes must be updated.
Too much literal SQL causing parse contention
Before any SQL query can be executed, it must be parsed, which checks syntax and permissions before generating the execution plan. Because parsing is expensive, databases save the SQL they’ve parsed to reuse it and eliminate the parsing time. Queries that use literal values cannot be shared, as the WHERE clauses differ. This results in each query being parsed and added to the shared pool. Because the pool has limited space, some saved queries are discarded to make room. If those queries recur, they need to be parsed again.
User and Query Conflicts
Databases are designed to be multi-user, but the activities of multiple users can cause conflicts.
Page/row locking due to slow queries
To ensure that queries produce accurate results, databases must lock tables to prevent inserts and updates from occurring while a read query is running. If a report or query is slow, users who need to modify database values may experience slowness and delays in completing their updates. Lock hints help the database use the least disruptive locks. Separating reporting from transactional databases is also an efficient solution.
Transactional locks and deadlocks
Deadlocks occur when two transactions are blocked because each one needs a resource held by the other. When there’s a normal lock, a transaction is blocked until a resource is released. There isn’t a resolution to a deadlock. Databases monitor for deadlocks and choose to terminate one of the blocked transactions, freeing the resource and allowing the other transaction to proceed. The other transaction is rolled back.
Batch activities causing resource contention for online users
Batch processes typically perform bulk operations such as loads of large amounts of data or generating complex analytical reports. These operations are resource-intensive and can impact performance for online users. The best solution for this issue is to ensure that batch operations are run when online usage is low, such as at night, or to use separate databases for transactional processing and analytical reporting.
Capacity
Not all database performance issues are database issues. Some problems result from running the database on inadequate hardware.
Not enough CPUs or CPU speed too slow
More CPUs can share the server workload, resulting in improved performance. The performance the database experiences is not solely due to the database but also is affected by other processes running on the server, so it is important to review the overall load as well as database usage. As CPU utilization varies throughout the day, metrics should be examined for periods of low usage, average usage, and peak usage to best assess whether additional CPU resources will be beneficial.
Slow disk without enough IOPS
Disk performance can be stated in terms of input/output operations per second (IOPS). Combined with the I/O size, this provides a measure of the number of the disk’s throughput in terms of megabytes per second. This throughput is also affected by the disk’s latency, which is how long it takes the request to complete. These metrics are unique to the technology of the disk storage. Traditional hard disk drives (HDD) have a rotating disk and are typically slower than solid state drives (SSD) or flash memory without any moving parts. Until recently, an SSD was more expensive than an HDD, but costs have come down, making it a competitive option.
Full or misconfigured disks
Databases obviously require significant disk access, so incorrectly configured disks have a considerable performance impact. Disks should be suitably partitioned, with system data such as catalogs and logs separated from user data. Highly active tables should be separated to avoid contention. Increase parallelism by placing databases and indexes on different disks. Don’t arrange the operating system and swap space on the same disk as the database.
Not enough memory
Limited or poorly allocated physical memory impacts database performance. The more memory that is available, typically the better the performance will be. Monitor paging and swapping. Set up several page spaces on multiple, non-busy disks. Make sure the paging space allocated is sufficient for database requirements; each database vendor can provide guidance on this matter.
Slow network
Network speeds can affect how quickly retrieved data is returned to the end user or calling process. Use broadband for connecting to remote databases. In some cases, choosing TCP/IP instead of named pipes for the connection protocol can significantly increase performance.
Configuration
Every database has a large number of configuration settings. Default values may not be enough to give your database the performance it needs. Check all parameter settings, which includes looking for the following issues:
Buffer cache too small
Buffer cache improves performance by storing data in kernel memory and eliminating disk I/O. When the cache is too small, data is flushed from the cache more frequently. If it is needed again, it must be reread from disk. Besides the slowness of the disk read, this puts additional work on I/O devices and can become a bottleneck. In addition to allocating enough space to the buffer cache, tuning SQL queries can help them use buffer cache more efficiently.
No query caching
Query caching stores both database queries and their result sets. When an identical query is executed, the data is quickly retrieved from memory rather than requiring the query to be executed again. Updates to data invalidate the results, so query caching is only effective on static data. In some cases, a query cache can become a bottleneck rather than a benefit to performance. Huge caches can cause contention when they are locked for updates.
I/O contention due to temporary table creation on disk
Databases need to create temporary tables when performing certain query operations, such as executing a GROUP BY clause. When possible, the temporary tables are created in memory. However, in some cases, creating the temporary table in memory is not feasible, such as when the data contains BLOB or TEXT objects. In those cases, the temporary tables are created on disk. A large amount of disk I/O is required to create the temporary table, populate it with records, select the needed data from it, and drop the table when the query is complete. To avoid potential performance impact, the temporary database should be separated from the main database space. Rewriting queries also reduce the need for temporary tables by creating derived tables instead. Using a derived table, which directly selects from the result of another SELECT statement, allows data to be joined in memory rather than using disk.
NoSQL Databases
NoSQL has much appeal because of its ability to handle large amounts of data very rapidly. However, some disadvantages should be assessed when weighing if NoSQL is right for your use-case scenario. This is why it is wise to consider that NoSQL stands for “Not Only SQL.” This clearer definition accepts the premise that NoSQL is not always the right solution, nor does it necessarily replace SQL across the board — here are five reasons why:
Finicky Transactions
It is hard to keep entries consistent with NoSQL. When accessing structured data, it does not always ensure that changes to various tables are made at the same time. If a process crashes, tables might become inconsistent. An example of consistent transactions is double-entry accounting. A corresponding credit must balance every debit and vice versa. If the data on both sides is not consistent, the entry cannot be made. NoSQL may not “balance the books” properly.
Complex Databases
Supporters of NoSQL tend to point to the efficient code, simplicity, and speed of NoSQL. All of these factors line up when database tasks are simple. However, when databases become more complicated, NoSQL begins to break down. SQL has more potential over NoSQL when database demands are complicated because SQL has mature, industry standard interfaces. Each NoSQL setup has a unique interface.
Consistent JOINS
When executing a JOIN in SQL, there is a tremendous amount of overhead because the system must pull data from different tables and align them with keys. NoSQL seems like a dream because there is a lack of JOINS. Everything is all together in one place in the same table. When data is retrieved, it pulls all of the key-value pairs at the same time. The problem is that this can create several copies of the same data. Those copies have to be updated, and NoSQL does not have the functionality to help in this situation.
Flexibility in Schema Design
NoSQL was unique when it emerged on the scene because it did not require a schema. In previous database models, programmers would have to think about the columns they needed to accommodate all of the potential and data entries in each row. With NoSQL, entries can have a variety of strings or none at all. This flexibility allows programmers to ramp up applications quickly. However, it can be problematic when there are several groups working on the same program, or when new teams of developers take over a project. After some developers have modified the database using the freedom of NoSQL, there may be a wide variety of key pair implementations.
Resource Intensive
NoSQL databases are commonly much more resource intensive than relational databases. They require much more CPU reserves and RAM allocation. For that reason, most shared hosting companies do not offer NoSQL. You must sign up for a VPS or run your own dedicated server. On the other hand, SQL is made to run on one server. This works out fine in the beginning, but as database demands increase, the hardware must expand as well. The problem is that a single server with huge capacity is much more expensive than a variety of smaller servers. The price increase is exponential. This provides one reason NoSQL has found a home in enterprise computing scenarios, such as those used by Google and Facebook.
Conclusion
This article presented a top-6 list of metrics that you might want to measure when assessing the health of your database. In summary, those top-6 items were:
- Business Transactions
- Query Performance
- User and Query Conflicts
- Capacity
- Configuration
- NoSQL Databases
In the next article, we are going to pull all of the topics in this series together to present the approach that AppDynamics took to implementing its APM strategy. This is not a marketing article, but rather an explanation of why certain decisions and optimizations were made and how they can provide you with a robust view of the health of your applications and database.