Capturing reliable telemetry on your apps is less about which toolchains you use, and more about using them effectively to automate the processes and techniques we discussed in my prior blogs (part 1, part 2, part 3) in this series. Don’t get me wrong, having the right tools is paramount, but when it comes to getting value from your tools, value comes when you maximize the integration and automation, using all of what those tools can offer. When it comes to incorporating and integrating the APM tools on the market you want to keep the following best practices in mind.
1. Leverage hierarchy such as applications and tiers to logically group and represent the business by Process/Functional Group. For example, use application definitions to separate lines of business, or functional groups within the LOB. Also, consider aggregating all your SOA and backend landscapes together as common shared services. Come up with a naming standard and stick to it. Having a standard for naming and/or tagging will go a long way in lessening the complexity challenge in the long-term.
2. Leverage tiers to group together common processes. Tiers should define the boundaries of process groups, where all processes are doing the same thing. Even if they are running separate versions (such as in canary deployments), or in different geographic locations, a common identity should be used to group processes doing the same thing. If you are running multiple applications or app components in a single java process, consider breaking them out into their own processes. This will aid in better understanding the topology and their dependencies.
3. Automate configuration management of monitoring configuration. Your monitoring tool configuration should follow the same mantra of config as code. Keep a tight grip on the definition of the configuration that produces your metrics. Change these configurations in lower environments and promote changes to your monitoring with your deploys.
4. Use the built-in capabilities of your APM tooling to create robust gates during integration and performance tests. Use APIs to integrate with orchestration tools such as Jenkins or TeamCity to fail builds when baselines and trends are violated. Embed links to APM violations in the console logs within these orchestration tools so they surface to developers quickly.
5. Leverage header injection within test harnesses to decorate transactions with test-metadata which can be captured within the APM tools. This will allow you to tie failed transactions to specific points within the scripts driving load and provide valuable context to the developer or test engineer when diagnosing an issue.
6. Expose throughput and response time metrics to the surface through real-time dashboards. These should be considered primary gating metrics for every endpoint on an API and for every component in your app. Everyone should be focused on lowering latency and increasing throughput. If an endpoint’s response time is not predictable, it should be re-written.
7. Integrate your pipeline orchestration tools to signal to your APM software when something is happening. Signal the start and completion of jobs which deploy software, execute tests, etc… Doing this will ensure you can overlay events that change the environment with performance data enabling you to see the impact of the change in real time.
8. Performance metrics used as gates such as Response Time and Transactions Per Second should be incorporated with other quality gating metrics such as those tracked in tools like Sonar.
9. Automate performance tests so components are routinely taken to the break point. Create an environment with the flexibility and automation that will allow the parallel execution of performance test cases with ease.
10. Leverage mocks in component tests, leverage dependencies in system-wide tests. Mocking will allow you to test components in isolation and should allow you to play out any number of scenarios under duress.
11. Test continuously, and when something breaks, it must be fixed before anything else can happen.
In this blog series, we have examined the principles of performance engineering, component testing and how to achieve robust lean-agile states. We talked about app decomposition and the importance of defining a strategy for measuring applications and components as well as 4 types of performance test cases I personally use as the basis for understanding application performance. I hope you found these as useful as I enjoyed writing them!