Ask any developer to name a few common performance bottlenecks and 99.9% of them will say “the database” as their first answer. Ask a DBA the same question and 99.9% of them will respond “crappy application code.” If we put aside the mutual love between each party, it’s actually worth looking at why they arrive at different conclusions.
First of all, it’s very easy to make any application or SQL query perform fast in a dev environment, especially on a developer’s desktop. Why? The two things that impact query performance are data volume and user/transaction concurrency, both of which typically don’t exist in dev environments when unit tests pass. Computing power these days is also silly fast and cheap with desktop cpu and memory helping any dev environment mask potential performance bottlenecks.
Enter the test environment with load tests and larger data sets, and it’s normally at this stage where most application scalability and performance limits are found. It’s where development DBAs earn their money working with developers to show them just how fast or slow their SQL queries actually are. So what exactly causes SQL statements to run slow and how do you make them run faster?
#1 – Bad SQL
Yup, that’s right. Despite many databases having query optimizers, human inefficiency can’t be ignored. As an example, anyone who writes SELECT * should really be given a public flogging these days. For developers, it’s easy to focus on getting the right result from a query, rather than thinking how long the query might take in production with real data. Performing several table joins on a dev database with small sets of data will always run quickly; executing the same query in production with millions of rows in tables could grind an application to a halt. Cursors are another abused function, which again can be an expensive operation. They help developers iterate through result sets but can waste system resource if they are not properly implemented. If more developers wrote SQL against a production replica schema in their dev environment, they’d be the first people to see how efficient their queries actually are before they are let loose in live environments.
#2 – Bad Database Design
Sounds obvious, but a badly designed database schema will often contribute to bad database performance. Too little or too much normalization can mean the database and your queries have to work harder to get the data you need. A bad database design can increase the complexity of SQL queries and the cost of change down the road. Joins can be expensive operations on big data sets, so it’s best to fully understand the needs of the application(s) and its data before jumping into building schemas. You can’t predict requirements or change but you can plan and mitigate how your data model may evolve over time so that altering schemas in the future is less painful than ad-hoc hacking.
#3 – Not using Database Features
If you’re going to pay the likes of Oracle a squillion dollars, you might as well use the features they provide. Stored procedures require a bit more effort but make applications less complex and costly to maintain. Databases are good at processing, managing, and querying data–way more efficient than any developer embedding SQL queries in application code, which can cause result sets to be dumped in application server memory so they can be iterated through and processed. Less is more when it comes to accessing the database–so keep your hits low, let the database do the work for you, and only bring back the data you need. Persisting lots of data back and forth between the application server and database is a prime reason why OutOfMemory exceptions can occur, along with frequent garbage collection. I’ve seen a single business transaction create hundreds of database connections and queries because application logic was processing data instead of the database server itself. Take a look at this AppDynamics screenshot as an example: the business transaction below created 822 queries and 822 connections meaning 2.4 secs was spent going backwards and forwards to the database.
#4 – Poor Execution Plans
Most databases have query optimizers to figure out the most efficient way of retrieving data. The output from a query optimizer is known as an execution plan that describes the optimum method and steps to retrieve the data. You might have heard of the Cost Based Optimizer (CBO) in Oracle which the vendor itself recommends. Things like database dictionaries and statistics store important meta-data (e.g. table sizes, row counts, index info….) which help the optimizer decide on the optimal execution plan. For example, it’s a lot faster to scan a table with 100 rows than 1,000,000 rows–but if the optimizer doesn’t know a table has 1,000,000 rows, it produces an execution plan blind to this fact. This is common in test environments where database schemas are copied from production and imported without statistics. Everything looks great, but the application SQL queries run slow because the database statistics are out of date.
Assuming database statistics are accurate, the most common cause of query latency is full table scans where the data being queried hasn’t been indexed. The database has to fetch and scan through all records in the table your accessing and you’ll see a lot of I/O wait as a result. It’s several times worse than flicking through your iPod song collection with no albums or artist indexes! By the time you flick through the first hundred songs, you’re almost asleep with boredom. Indexes can often be a silver bullet, and it’s true that they can dramatically improve business transaction response time and throughput. However, they can’t be placed everywhere because indexes require disk space and need to be maintained over time as data changes and grows in size. Indexes can therefore speed up read latency but may introduce latency when data is written or updated.
#5 – Locking Contention
Just as you get contention in the JVM/CLR by implementing thread synchronization on application logic and data, you can get the same performance hit in the database when queries read and write data from tables. Relational databases provide great atomicity and read/write consistency but at a cost. Good database design and considerations around indexes will ensure operations like insert, update, and delete don’t introduce high latency and locking in the database.
#6 – I/O bottlenecks
At the end of the day, databases store data on disk. Traditional spinning disk is getting bigger and cheaper, but it’s not getting any faster and that can impact the performance of database queries, especially when disk is shared. For example, when a DBA provisions a database they need one or more logical volumes to mount the database data files, index files, and redo logs. A storage administrator can provision logical volumes from anywhere. It might be allocated from a SAN, or be cobbled together by chunks of disk from multiple I/O devices. So whilst databases can be shared by many applications, so can its underlying storage and I/O capacity. If the same physical I/O device is serving two databases or applications, you can end up with hot spots on a disk where the same physical spindles are being over-worked–thus creating I/O contention and red/write latency for queries. As applications move to the cloud, storage will become more critical to both relational databases and NoSQL technologies as they rely on shared storage provisioned by cloud providers. Below is an example of storage latency that one AppDynamics customer saw in the cloud using the S3 storage AWS. You can see the application write to disk took over 1.2 seconds.
So if the above root causes for slow SQL are well known, why do slow queries still happen today in production environments?
It’s not uncommon for a typical organization or enterprise to have multiple applications accessing the same database or schemas. This shared complexity creates a huge headache for both development and production DBAs, who see things slightly differently than an application development team for the following reasons:
All Applications are different
All applications behave differently: some are read intensive while others are write intensive. Whilst “adding an index” can often be the gift that keeps on giving for applications teams, the DBA has to do a lot of analysis and consideration to make to ensure the evolution of schemas is just. DBAs have to find the right balance for query performance so that all connected applications receive the appropriate level of performance and availability.
Applications are tested in isolation
A dev team often tests and optimizes their application against a replica production scale database. However, what dev teams can’t test is what impact other applications will have on the shared database when everything is running together in production. If one application is grossly inefficient in how it accesses a database, it may create contention and latency for other applications accessing the same instance.
Applications are Agile
Agile methodology is helping Dev teams deliver faster, but is placing more burden on DBA and operations teams who need to deploy releases. Developing new features and functionality in the application code means data models in the database also need to change at the same time. Five years ago it was normal to do 2 or 3 deployments a year; these days, some organizations are doing 10+ deployments a day! A few weeks back we had an AppDynamics customer who deployed their application into production, and within 5 minutes AppDynamics spotted one business transaction that was experiencing high response times because of a slow running SQL statement. Root Cause was a missing index that was tested in test but never got implemented in production as part of the release. Sadly, this isn’t the first time we’ve seen this.
Applications are hidden in production
Production DBAs and developers lack simple application context and visibility in production. This isn’t a criticism; applications are hugely distributed and complex entities. A production DBA will typically see an application as a “connection” accessing their database, originating from the application server that connects through a connection pool or driver. For each connection there can be hundreds of sessions with hundreds of queries. So when an application developer states “The database is slow,” you kind of see why the average DBA might not look especially pleased.
Whilst slow running SQL statements are indeed common performance bottlenecks, it’s worth appreciating the reasons why they still surface today in production. As applications move to the cloud and meet big data, you can be sure thousands more in future will slip through the net and impact production environments. Agile methodology is pushing Dev, DBA and Ops teams to the limit so they deliver a competitive edge to the business. With so much change happening, application outages and slow performance are inevitable at some point. Having visibility of change and application performance in production is something AppDynamics can help you with. You can get started in minutes today by downloading AppDynamics Lite or requesting a 30-day trial of AppDynamics Pro. Finding a slow running SQL statement in production is this easy: