TAG | production
If you’ve used a GPS product before you’re probably familiar with Garmin. They are the leading worldwide provider of navigation devices and technologies. In addition to their popular consumer GPS products, Garmin also offers devices for planes, boats, outdoor activities, fitness, and cars. Because end user experience is so important to Garmin and its customers, ensuring the speed of its web and mobile applications is one of its highest priorities.
This blog post summarizes the content of a case study that can be found by clicking here.
The reality in today’s IT driven world is that there are many different issues that need to be solved and therefore many different use cases for the products designed to help with these issues. Garmin has shown it’s industry leadership by taking advantage of AppDynamics to address many use cases with a single product. This shows a high level of maturity that most organizations strive to achieve.
“We never had anything that would give us a good deep dive into what was going on in the transactions inside our application – we had no historical data, and we had no insight into database calls, threads, etc.,” – Doug Strick
Real-Time and Historic Visibility in Production
You just can’t test everything before your application goes to production. Real users will find ways to break your application and/or make it slow to a crawl. When this happens you need historical data to compare against as well as real-time information to quickly remediate problems.
Garmin found that AppDynamics, an application performance management solution designed for high-volume production environments, best suited its needs. Garmin’s operations team feels comfortable leaving AppDynamics on in the production environment, allowing them to respond more quickly and collect historical data.
The following use cases are examples of how Garmin is attacking IT and business problems to better serve their company and their customers.
Use Case 1: Memory Monitoring
Before they started using AppDynamics, Garmin was unable to track application memory usage over time. Using AppDynamics they were able to identify a major memory issue impacting customers and ensure a fast and stable user experience. Click here to read the details in the case study.
Use Case 2: Automated Remediation Using Application Run Book Automation
AppDynamics’ Application Run Book Automation feature allows organizations to trigger certain tasks, such as restarting a JVM or running a script, with thresholds in AppDynamics. Garmin was able to use this feature very effectively one weekend while waiting for a fix from a third party. Click here to read the details in the case study.
Use Case 3: Code-level Diagnostics
“We knew we needed a tool that could constantly monitor our production environment, allowing us to collect historical data and trend performance over time,” said Strick. “Also, we needed something that would give us a better view of what was going on inside the application at the code level.” – Doug Strick
With AppDynamics, Garmin’s application teams have been able to rapidly resolve several issues originating in the application code. Click here to read the details in the case study.
Use Case 4: Bad Configuration in Production
Garmin also uses AppDynamics to verify that new code releases perform well. Because Garmin has an agile release schedule, new code releases are very frequent (every 2 weeks) and it’s important to ensure that each release goes smoothly. One common problem that occurs when deploying new code is that configurations for test are not updated for the production environment, resulting in performance issues. Click here to read the details in the case study.
Use Case 5: Real-time Business Metrics (RTBM)
IT metrics alone don’t tell the whole story. There can be business problems due to pricing issues our issues with external service providers that impact the business and will never show up as an IT metric. That’s why AppDynamics created RTBM and why Garmin adopted it so early on in their deployment of AppDynamics. With Real-Time Business Metrics, organizations like Garmin can collect, monitor, and visualize transaction payload data from their applications. Strick used this capability to measure the revenue flow in Garmin’s eCommerce application. Click here to read the details in the case study.
“AppDynamics has given us visibility into areas we never had visibility into before… Some issues that used to take several hours now take under 30 minutes to resolve, or a matter of seconds if we’re using Run Book Automation.” - Doug Strick
Garmin has shown how to get real value out of its APM tool of choice and you can do the same. Click here to try AppDynamics for free in your environment today.Link to this post:
Today we are blogging something a little different than our normal. I’m Jim Hirschauer the Operations Guy, and this is my esteemed colleague Dustin Whittle the Developer. In this blog post we’re going to discuss how we would take an application from inception through development, testing, QA, and into production. We’ll each comment on the different stages and provide our perspective on the tools that we need to use at each stage and how they help with automation, testing, and monitoring. Along the way we’ll call out the potential collaboration points to identify the areas where the DevOps approach provides the most value.
The software development loop looks like this:
Inception and working with a product team
From an operational perspective, my first instinct is to understand the application architecture so that I can start thinking about the proper deployment model for the infrastructure components. Here are some of my operational questions and considerations for this stage:
- Are we using a public or private cloud?
- What is the lead time for spinning up each component and ensuring they comply with my companies regulations?
- When do I need to provide a development environment to my dev team or will they handle it themselves?
- Does this application perform functions that other applications or services already handle? Operations should have high-level visibility into the application and service portfolio.
From a development perspective, my first milestone is to make sure the ops team fully understands the application and what it takes to deploy it to a pre-production environment. This is where we the developers sync with the product and ops team and make sure we are aligned.
Planning for the product team:
- Is the project scope well defined? Is there a product requirements document?
- Do we have a well defined product backlog?
- Are there mocks of the user experience?
Planning for the ops team:
- What tools will we use for deployment and configuration management?
- How will we automate the deployment process and does the ops team understand the manual steps?
- How will we integrate our builds with our continuous integration server?
- How will we automate the provisioning of new environments?
- Capacity Planning – Do we know the expected production load?
There’s not a ton of activity at this stage for the operations team. This is really where the devops synergy comes into play. DevOps is simply operations working together with engineers to get things done faster in an automated and repeatable way. When it comes to scaling, the more automation in place the easier things will be in the long run.
Development and scoping production
This should start with a conversation between the dev and ops teams to control domain ownership. Depending on your organization and peers strengths this is a good time to decide who will be responsible for automating the provisioning and deployment of the application. The ops questions for deploying complex web applications:
- How do you provision virtual machines?
- How do you configure network devices and servers?
- How do you deploy applications?
- How do you collect and aggregate logs?
- How do you monitor services?
- How do you monitor network performance?
- How do you monitor application performance?
- How do you alert and remediate when there are problems?
During the development phase the operations focused staff normally make sure the development environment is managed and are actively working to set up the test, QA and Prod environments. This can take a lot of time if automation tools aren’t used.
Here are some tools you can use to automate server build and configuration:
Meanwhile, the operations staff should also make sure that the developers have access to tools which will help them with release management and application monitoring and troubleshooting. Here are some of those tools:
Application + Network Performance Management:
Testing and Quality Assurance
Once developers have built unit and functional tests we need to ensure the tests are running after every commit and we don’t allow regressions in our promoted environments. In theory, developers should do this before they commit any code, but often times problems don’t show up until you have production traffic running under production infrastructure. The goal of this step is really to simulate as much as possible everything that can go wrong and find out what happens and how to remediate.
The next step is to do capacity planning and load testing to be confident the application doesn’t fall over when it is needed most. There are a variety of tools for load testing:
- Apica Load Test - Cloud-based load testing for web and mobile applications
- Soasta – Build, execute, and analyze performance tests on a single, powerful, intuitive platform.
- Bees with Machine Guns – A utility for arming (creating) many bees (micro EC2 instances) to attack (load test) targets (web applications).
- MultiMechanize – Multi-Mechanize is an open source framework for performance and load testing. It runs concurrent Python scripts to generate load (synthetic transactions) against a remote site or service. Multi-Mechanize is most commonly used for web performance and scalability testing, but can be used to generate workload against any remote API accessible from Python.
- Google PageSpeed Insights - PageSpeed Insights analyzes the content of a web page, then generates suggestions to make that page faster. Reducing page load times can reduce bounce rates and increase conversion rates.
The last step of testing is discovering all of the possible failure scenarios and coming up with a disaster recovery plan. For example what happens if we lose a database or a data center or have a 100x surge in traffic.
During the test and QA stages operations needs to play a prominent role. This is often overlooked by ops teams but their participation in test and QA can make a meaningful difference in the quality of the release into production. Here’s how.
If the application is already in production (and monitored properly), operations has access to production usage and load patterns. These patterns are essential to the QA team for creating a load test that properly exercises the application. I once watched a functional test where 20+ business transactions were tested manually by the application support team. Directly after the functional test I watched the load test that ran the same 2 business transactions over and over again. Do you think the load test was an accurate representation of production load? No way! When I asked the QA team why there were only 2 transactions they said “Because that is what the application team told us to model.”
The development and application support teams usually don’t have time to sit with the QA team and give them an accurate assessment of what needs to be modeled for load testing. Operations teams should work as the middle man and provide business transaction information from production or from development if this is an application that has never seen production load.
Here are some of the operational tasks during testing and QA:
- Ensure monitoring tools are in place.
- Ensure environments are properly configured
- Participate in functional, load, stress, leak, etc… tests and provide analysis and support
- Providing guidance to the QA team
Production is traditionally the domain of the operations team. For as long as I can remember, the development teams have thrown applications over the production wall for the operations staff to deal with when there are problems. Sure, some problems like hardware issues, network issues, and cooling issues are purely on the shoulders of operations–but what about all of those application specific problems? For example, there are problems where the application is consuming way too many resources, or when the application has connection issues with the database due to a misconfiguration, or when the application just locks up and has to be restarted.
I recall getting paged in the middle of the night for application-related issues and thinking how much better each release would be if the developers had to support their applications once they made it to production. It was really difficult back in those days to say with any certainty that the problem was application related and that a developer needed to be involved. Today’s monitoring tools have changed that and allow for problem isolation in just minutes. Since developers in financial services organizations are not allowed access to production servers, it makes having the proper tools all the more important.
Production devops is all about:
- deploying code in a fast, repeatable, scalable manner
- rapidly identifying performance and stability problems
- alerting the proper team when a problem is detected
- rapidly isolating the root cause of problems
- automatic remediation of known problems and rapid manual remediation of new problems (runbooks and runbook automation)
Your application must always be available and operating correctly during business hours (this may be 24×7 for your specific application).
In case of failures alerting tools are crucial to notify the ops team of serious issues. The operations team will usually have a runbook to turn to when things go wrong. A best practice is to collaborate on incident response plans.
Finally we’ve made it to the last major category of the SDLC, maintenance. As an operations guy my mind focuses on the following tasks:
- Capacity planning – Do we have enough resources available to the application? If we use dynamic scaling, this is not an issue but a task to ensure the scaling is working properly.
- Patching – are we up to date with patches on the infrastructure and application components? This is supposed to help with performance and/or security and/or stability but it doesn’t always work out that way.
- Support – are we current with our software support levels (aka, have we paid and are we on supported versions)?
- New releases (application updates) – New releases always made me cringe since I assumed the release would have issues the first week. I learned this reaction from some very late nights immediately following those new releases.
As a developer the biggest issues during the maintenance phase is working with the operations team to deploy new versions and make critical bug fixes. The other primary concern is troubleshooting production problems. Even when no new code has been deployed, sometimes failures happen. If you have a great process, application performance monitoring, and a devops mentality collaborating with ops to resolve the root cause of failures becomes easy.
As you can see, the dev and ops perspectives are pretty different, but that’s exactly why those 2 sides of the house need to tear down the walls and work together. DevOps isn’t just a set of tools, but a philosophical shift that needs that requires buy-in from all folks involved to really succeed. It’s only through a high level of collaboration that things will change for the better. AppDynamics can’t change the mindset of your organization, but it is a great way to foster collaboration across all of your organizational silos. Sign up for your free trial today and make a difference for you organization.Link to this post: