Engineering, Product

The AppD Approach: Principles of Cloud Metrics

By | | 5 min read


Summary
As more enterprises host their applications in the cloud, it becomes increasingly important for application performance monitoring (APM) solutions to ingest performance data pipelines from cloud providers.

At AppDynamics, we are committed to making cloud performance data a first class part of our product. Below are some of the principles we use to guide our roadmap for cloud performance data.

Connecting to Cloud Platforms Should Be Dead Simple

Connecting to a cloud platform to ingest its performance data—whether it’s Amazon CloudWatch, Azure Monitor, or Google Stackdriver—should be as simple as using OAuth to log into a website. Ideally it takes seconds and, at most, minutes.

An OAuth-type standard for sharing performance data has yet to be created, but in the interim, here are two examples of how we plan to ingest CloudWatch data, both of which satisfy our requirement for quickness:

  • Role Sharing—You create an “appdynamics_monitoring” role with a read-only policy and a reference to the AppDynamics AWS account. You then go into AppDynamics and enter your AWS ID and the role’s name.
  • Role Keys—You create an “appdynamics_monitoring” role with a read-only policy, and enter the role’s access key ID and secret key into AppDynamics.

 

Further, we’ve found that many of our customers have several AWS accounts, sometimes tens. In these cases, we believe it’s important to have a UX that allows for linking these accounts in batch.

Slicing the Stack

An enterprise’s tech stack is like a massive 3D graph, with services at the top talking to each other, running on top of shared and unshared, real and virtual infrastructure that goes several layers deep. Consider the following example of the possible complexity of such a stack:


With many of our customers, we find there are folks who own a vertical slice of the stack and others who own a horizontal slice. We believe it’s important to view these users as distinct, and to have views of the tech stack that are relevant to their roles.

The Vertical Consumer

The vertical consumer looks at the stack from an application-first perspective, from the top down. They are the “traditional” APM user. For them, we think it’s important to visualize cloud performance data in the following ways:

Cloud Entities on the Flowmap

We want to put any cloud entity that receives a request from an application or web server onto the flowmap. We plan to start with Cloudwatch, specifically Lambda, ELB, RDS, S3 and DynamoDB. When you see one of these entities on the flowmap, you’ll be able to mouse over it to see a hover card containing key metrics. For example, for an ELB instance, you’ll see latency, requests/min, and backend errors. If you want to see more, you’ll be able to click into the entity to see a dashboard where you can visualize all of the Cloudwatch metrics. This dashboard will have default time series and charts based on what we believe are the most important metrics for the entity, but will also be fully customizable. And if you edit the chart for one entity, you’ll have the option to instantly apply your changes to charts for all instances of that entity.

Cloud Infrastructure Metrics Right Under the Flowmap

Putting cloud entities on the flowmap works well for entities that applications are talking to, but what about those that applications sit on top of? We plan to make these infra metrics available to you whenever you double-click on an application. So if you have Tomcat running on an EC2 instance, you’ll be able to click into the Tomcat tier on your flowmap and see key metrics from the EC2 instance it’s running on top of.

The Horizontal Consumer

The horizontal consumer concerns themselves with, you guessed it, a horizontal slice of the tech stack. They might have ownership of EKS, ECS, EC2, SNS, RDS or S3, with a focus on ensuring compute and memory availability. For them, we believe the most valuable interface is a highly customizable set of dashboards where they can view:

  • Aggregate metrics for the entities they are responsible for. In one view, for example, you should be able to see aggregate EC2, Lambda, and RDS latency.
  • Aggregate metrics for a single entity type. In one view, you should be able to see the most important aggregate EC2 metrics.
  • Metrics for a single entity. You should be able to navigate from the above view to one for a specific EC2 instance.

 

A constant throughout these views will be drop-downs and autocomplete fields that allow you to easily see what metrics are available, and what dimensions you can use to slice the data. Further, we think it’s vital to import all the tags you’ve used to describe your AWS schema, so you can use them to filter and compare.

What’s Next

We’re excited to start executing on a roadmap that aligns with the above principles. If you have any questions or comments, please don’t hesitate to get in touch!

This blog may contain product roadmap information of AppDynamics. AppDynamics reserves the right to change any product roadmap information at any time, for any reason and without notice. This information is intended to outline AppDynamics’ general product direction, it is not a guarantee of future product features, and it should not be relied on in making a purchasing decision. The development, release, and timing of any features or functionality described for AppDynamics’ products remains at AppDynamics’ sole discretion. AppDynamics reserves the right to change any planned features at any time before making them generally available as well as never making them generally available.