This blog post is a developer’s perspective on how using our own AppDynamics software has helped us find and fix performance-related issues – and how other developers can do the same.
One of the most challenging aspects of developing cloud-based platforms is scalability. As we innovate and build new features, it is essential for developers to ensure these new platform features are scalable and do not impact the performance of our applications, especially as any impact on performance is likely to impact all tenants on the cluster.
This is a challenge that we experienced firsthand. My team here at Appdynamics is responsible for building custom dashboards, one of the most powerful features of AppDynamics. It allows you to group relevant metrics into one central dashboard as well as build sophisticated dashboards with drill down capabilities.
As we added more capabilities and support for the different types of metrics, we realized that scaling our dashboards was becoming a challenge. Many of our customers were exploring different dashboard capabilities and features, which was truly stressing our endpoints. This is when our practice of using AppDynamics internally – “AppD on AppD” – helped give us insight and visibility into the issues we faced while developing new features on the platform.
Our AppD on AppD Approach
Recently, we made some changes on the backend of a REST endpoint that returns metrics for different types of widgets on custom dashboards. Our team’s internal process is to ensure every feature we implement is stress-tested under a real type of load profile, so we made sure to test this as well. Below is what we discovered.
On local dev environments, the feature worked fine and the data returned by the API took a few milliseconds. Next, we deployed our changes to our performance environment, which is similar to an actual production instance with large amounts of data. We immediately noticed that under stress, the UI which uses the endpoint was very slow and the average response time of the endpoint was considerably high. Luckily, since AppDynamics was monitoring the performance environment, it was easy for us to dig into the issue.
We configured a business transaction to monitor the REST endpoint with the slow response time and within a few minutes, we collected transaction snapshots which gave us valuable behind-the-scenes information of the REST API calls. There were two things that caught our attention:
Resolved Issue #1
A particular method which was being called repeatedly was taking more time to execute than before. The method itself took around 100-150 milliseconds, but if it was called 100 times in a single transaction, it would take around 15,000 milliseconds to execute, which is roughly 15 seconds.
The image below shows the total time it took for all the calls of this method in this single transaction.
Here is a code snippet of that method:
After a few minutes of looking at the full implementation of this method, we found that String.replace could be a potential problem here, which happens to be slower than StringUtils.replace. As a result, we made a minor change and modified the code to use StringUtil implementation. Here is some information on StringUtils.replace vs String.replace.
Resolved Issue #2
Another issue we noticed was that there were too many database calls for a single request.
We made a few optimizations here, including caching the data and bulking-up the queries. After applying these fixes, we measured the performance again and saw the average response time for the endpoint improve drastically.
Lastly, we created our own custom dashboard to measure performance of our specific endpoint, showing us different metrics like average response time, errors, and calls per minute along with thresholds and baselines. We also created a scheduled report that sends our team a dashboard snapshot everyday, allowing us to easily spot outliers and proactively address issues. We also could have created alerts, which notify you when certain conditions are met or exceeded based on your configuration. Scheduled reports and alerts make it easy to spot outliers and proactively address issues.
Without AppDynamics, it would have been difficult for a developer to quickly pinpoint these issues within a large codebase. But with AppDynamics, not only does it become easier to find issues in production, but developers can proactively ensure that features are robust and scalable before deploying them to production. This reduces the time spent on performance-related issues, and instead, gives developers more time to innovate and write code.