How Garmin attacks IT and business problems with Application Performance Management (APM)

If you’ve used a GPS product before you’re probably familiar with Garmin. They are the leading worldwide provider of navigation devices and technologies. In addition to their popular consumer GPS products, Garmin also offers devices for planes, boats, outdoor activities, fitness, and cars. Because end user experience is so important to Garmin and its customers, ensuring the speed of its web and mobile applications is one of its highest priorities.

This blog post summarizes the content of a case study that can be found by clicking here.

The reality in today’s IT driven world is that there are many different issues that need to be solved and therefore many different use cases for the products designed to help with these issues. Garmin has shown it’s industry leadership by taking advantage of AppDynamics to address many use cases with a single product. This shows a high level of maturity that most organizations strive to achieve.

“We never had anything that would give us a good deep dive into what was going on in the transactions inside our application – we had no historical data, and we had no insight into database calls, threads, etc.,” – Doug Strick

Garmin-FlowMap

AppDynamics flow map showing part of Garmin’s production environment.

Real-Time and Historic Visibility in Production

You just can’t test everything before your application goes to production. Real users will find ways to break your application and/or make it slow to a crawl. When this happens you need historical data to compare against as well as real-time information to quickly remediate problems.

Garmin found that AppDynamics, an application performance management solution designed for high-volume production environments, best suited its needs. Garmin’s operations team feels comfortable leaving AppDynamics on in the production environment, allowing them to respond more quickly and collect historical data.

The following use cases are examples of how Garmin is attacking IT and business problems to better serve their company and their customers.

Use Case 1: Memory Monitoring

Before they started using AppDynamics, Garmin was unable to track application memory usage over time. Using AppDynamics they were able to identify a major memory issue impacting customers and ensure a fast and stable user experience. Click here to read the details in the case study.

Garmin-Heap

Chart of heap usage over time.

Use Case 2: Automated Remediation Using Application Run Book Automation

AppDynamics’ Application Run Book Automation feature allows organizations to trigger certain tasks, such as restarting a JVM or running a script, with thresholds in AppDynamics. Garmin was able to use this feature very effectively one weekend while waiting for a fix from a third party. Click here to read the details in the case study.

Use Case 3: Code-level Diagnostics

“We knew we needed a tool that could constantly monitor our production environment, allowing us to collect historical data and trend performance over time,” said Strick. “Also, we needed something that would give us a better view of what was going on inside the application at the code level.” – Doug Strick

With AppDynamics, Garmin’s application teams have been able to rapidly resolve several issues originating in the application code.  Click here to read the details in the case study.

Use Case 4: Bad Configuration in Production

Garmin also uses AppDynamics to verify that new code releases perform well. Because Garmin has an agile release schedule, new code releases are very frequent (every 2 weeks) and it’s important to ensure that each release goes smoothly. One common problem that occurs when deploying new code is that configurations for test are not updated for the production environment, resulting in performance issues. Click here to read the details in the case study.

Garmin-Verification

AppDynamics flow map showing production nodes making services calls to test environments.

Use Case 5: Real-time Business Metrics (RTBM)

IT metrics alone don’t tell the whole story. There can be business problems due to pricing issues our issues with external service providers that impact the business and will never show up as an IT metric. That’s why AppDynamics created RTBM and why Garmin adopted it so early on in their deployment of AppDynamics. With Real-Time Business Metrics, organizations like Garmin can collect, monitor, and visualize transaction payload data from their applications. Strick used this capability to measure the revenue flow in Garmin’s eCommerce application.  Click here to read the details in the case study.

Garmin-RTBM

Real-time order totals for web orders.

Conclusion

“AppDynamics has given us visibility into areas we never had visibility into before… Some issues that used to take several hours now take under 30 minutes to resolve, or a matter of seconds if we’re using Run Book Automation.” – Doug Strick

Garmin has shown how to get real value out of its APM tool of choice and you can do the same. Click here to try AppDynamics for free in your environment today.

Going live with a mobile app: Part 2 – Developing a mobile application

In the first part of this series I discussed planning your mobile application and how to decide between HTML5 vs Native, iOS vs Android, and how to get started going live with a mobile app. In this post I will dive into some considerations when developing a mobile application.

With a large and mature developer ecosystem there are ample development resources available. With mature open source frameworks for both iOS and Android it is easy to have a solid foundation by standing on the shoulders of giants. Github is your friend when it comes to discovering the open source communities that power so many iOS and Android applications. While iOS is developed closed source, there are many libraries available like AFNetworking, SSToolkit, PonyDebugger, and KIF. Android is a primarily open source operating system with many libraries available like GreenDAO ORM, Picasso, and OkHttp.

Development consideration: PaaS vs IaaS vs SaaS

When undertaking a new mobile application you must decide to integrate your existing backend or build a new one. In the case of building a new backend for your mobile application you must consider whether to use an existing mobile cloud like Parse, Stackmob, Kinvey, Helios, or Cloudmine. These mobile cloud platforms allow you to develop new applications quicker with APIs for writing custom application code, storing user and application data, managing push notifications, and instrumenting with analytics. Mobile cloud platforms as a service are the fastest way to get to market if you don’t have existing server side APIs. More often than not you will need to integrate your existing application data to your mobile experience. When you choose infrastructure as a service, or platform as a service, you are really deciding which problem set to focus on. Either you can focus 100% on your application, or have more flexibility (and ongoing maintenance) by using a lower level provider that gives your infrastructure as a service.

iaas_vs_paas

Development consideration: Degrade gracefully for a smooth online and offline experience

When designing a modern mobile application you must plan for a variety of conditions from high-speed, low-latency Wi-Fi connections to low-speed, high-latency 2g/3g/4g connections to even being offline altogether. Adapting to the sheer variety in devices, form factors, CPU speeds, and memory variability is a daunting task. It’s important to be able to support online interactions and gracefully fail back to offline while maintaining state and a consistent user experience.

Mobile insights with AppDynamics

With AppDynamics for Mobile, you get complete visibility into the end user experience across mobile and web with the end user experience dashboard. Get a better understanding of your audience and where to focus development efforts with analytics on device, carriers, OS, and application versions:

AppDynamics For Mobile

AppDynamics For Mobile

Want to start monitoring your iOS or Android application today? Sign up for our beta program to get the full power of AppDynamics for your mobile apps.

Mobile APM

In the next post in this series I will dive into launching a mobile app in the app stores. As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

Going live with a mobile app: Part 1 – Planning a mobile app

There are many concerns when going live with a modern mobile application, from project planning, to development, to quality assurance and managing a successful launch. In this series we will dive into what it takes to go live with a mobile application on iOS and Android.

Whether you are a brick-and-mortar retailer or an eCommerce web site, the need to go mobile is greater than ever. With mobile market share growing year over year, mobile applications are the new web site. If you don’t have one available you are missing a major market. According to a report from Kleiner Perkins, mobile applications now account for 15% of all Internet traffic, with 1.5 billion users worldwide. Mobile adoption has accelerated like never before:

The path to 50 million users

As of May 2013, 91% of American adults own a cell phone and 56% of American adults own a smartphone according to a recent survey by Pew Internet. So you want to launch a mobile application and skyrocket to your first 50 million users. Here is what you need to know to get started.

Planning a mobile application

Planning a major mobile application is the one of the most difficult parts of the project to execute. Which platforms should you build for and which devices should you support? Do you build your own backend or use a Platform-as-a-Service (PaaS)? Do you build a native mobile application or a web-based HTML5 application? All of these questions and more will come up and you need to understand the market to make decisions you won’t soon regret.

Design consideration: Native or HTML5?

The promise of HTML5 has always been to write once and run everywhere. This promise comes at the expense of the features and user experience. While HTML5 mobile applications are a good start, they will never have the fluid user experience of native applications. The best application experience comes from a native integration.

HTML5 does have some strategic benefits:

• Cost to deploy cross platforms
• Immediate updates and control of distribution
• Faster time to market since it is easier to hire from a large talent pool
• Device compatibility

Native applications have some benefits as well:

• Richer user experience
• Better performance
• Native integrations that enable better monetization

There are several intermediary solutions available to take your HTML5 app and convert it to a native application available for iOS and Android. Appcelerator Titanium, AppGyver, PhoneGap, and Sencha Touch have all claimed to provide a native experience without having to write a native application. In my experience all of these solutions fall short in delivering a truly compelling mobile experience. If your only metric is time to market then these solutions provide a good result, but if you care about the user experience native is the only mobile strategy to pursue.

Several high-profile companies have regretted pursuing an HTML5 mobile strategy. Most famously, Mark Zuckerberg, CEO of Facebook, has publicly stated, “On iOS and Android, you can do so much better by doing native work…” and that focusing on HTML5 was a mistake in the long run.

Design consideration: Apple App Store vs Google Play communities

In 2013 if you want the most downloads you build for Android first, but if you want the most revenue you build iOS first. The reality is you are building a business with your mobile application, and the money mostly exists in the iOS App Store community. Apple apps make more money even though the Android market is larger. Apple owns only 18 percent of the app universe, but it banks almost 500 percent more than Google, pulling in a sweet $5.1 million in revenue every day. Meanwhile, Google owns 75 percent of all app downloads, but it only takes in $1.1 million per day.

flurry

The Apple App Store and Google Play app ecosystems have distinct communities with Apple having stricter regulations and a manual review process for releases, compared to the much more agile and developer-friendly Android community. Google has a top price of $200 compared to the high end of Apple of $999 with much more variance in pricing than the Google Play market.

The biggest difference between the app store markets is the regulations enforced on app developers. Apple, while the biggest earner, is also the strictest with functionality requirements (and restrictions) and quality standards that ensure it’s the highest quality market, but also the most censored. While the Google Play Store is more developer-friendly, with a straightforward app submission and distribution process with better developer tooling and business reporting. Both markets have more than 100 million devices and over a billion app downloads.

Due to the constraints and expectations of the venture capital markets that so many app developers are constricted by it doesn’t make sense to choose Android as a launch platform. The expectation for traction and revenue means Apple iOS is the better business bet when you have limited time and mobile resources. It’s the fastest platform to prove an app concept and the viability of a business. The harsh reality of the mobile app ecosystem means that for every top-earning app there are 10,000 failures.

A look at the current state of the mobile device market share:

device_marketshare

Once you decide which platform to focus your mobile engineering resources on, you need to figure out what devices and versions you are going to support. As of October 2013, iOS 7.x market share accounts for 64.9% of all iOS devices:

ios_marketshare

As of October 2013, Android 4.x market share accounts for 48.6% of all Android devices:

android_marketshare

Based on market share it makes business sense to offer compatibility with Android 2.3.3+ and iOS 5.1+ devices.

Design consideration: Tablet or phone?

As PC shipments declines to 8.4%, tablets are expected to grow 53.4% according to an October 2013 Gartner Forecast. There is a clear shift from traditional desktops to tablet and smart phones. One of the design considerations you must factor in is the variety of form factors in mobile devices. Crafting an application experience that adapts to both landscape and horizontal modes and that works across standard and high definition screens from three to eleven inches is an impressive feat, especially when having to balance between high-speed low-latency Wi-Fi connections and low-speed high-latency 2g/3g/4g connections.

gartner_marketshare

Mobile insights with AppDynamics

With AppDynamics for Mobile, you get complete visibility into the end user experience across mobile and web with the end user experience dashboard. Get a better understanding of your audience and where to focus development efforts with analytics on device, carriers, OS, and application versions:

AppDynamics For Mobile

Want to start monitoring your iOS or Android application today? Sign up for our beta program to get the full power of AppDynamics for your mobile apps.

Mobile APM

In the next post in this series I will dive into developing a mobile app and the various approaches that are available. As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

AppDynamics goes to QCon San Francisco

AppDynamics is at QCon San Francisco this week for another stellar event from the folks at InfoQ. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. If you are in the area this week stop by our booth and say hello!

I presented the Performance Testing Crash Course highlighting how to capacity plan and load test your applications to gaurantee a smooth launch.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

PHP Performance Crash Course, Part 2: The Deep Dive

In my first post on this series I covered some basic tips for optimizing performance in php applications. In this post we are going to dive a bit deeper into the principles and practical tips in scaling PHP.

Top engineering organizations think of performance not as a nice-to-have, but as a crucial feature of their product. Those organizations understand that performance has a direct impact on the success of their business.

Ultimately, scalability is about the entire architecture, not some minor code optimizations. Often times people get this wrong and naively think they should focus on the edge cases. Solid architectural decisions like doing blocking work in the background via tasks, proactively caching expensive calls, and using a reverse proxy cache will get you much further than arguing about single quotes or double quotes.

Just to recap some core principles for performant PHP applications:

The first few tips don’t really require elaboration, so I will focus on what matters.

Optimize your sessions

In PHP it is very easy to move your session store to Memcached:

1) Install the Memcached extension with PECL

pecl install memcached

2) Customize your php.ini configuration to change the session handler


session.save_handler = memcached
session.save_path = "localhost:11211"

If you want to support a pool of memcache instances you can separate with a comma:

session.save_handler = memcached
session.save_path = "10.0.0.10:11211,10.0.0.11:11211,10.0.0.12:11211"

The Memcached extension has a variety of configuration options available, see the full list on Github. The ideal configuration I have found if using a pool of servers:

session.save_handler = memcached
session.save_path = "10.0.0.10:11211,10.0.0.11:11211,10.0.0.12:11211"

memcached.sess_prefix = “session.”
memcached.sess_consistent_hash = On
memcached.sess_remove_failed = 1
memcached.sess_number_of_replicas = 2
memcached.sess_binary = On
memcached.sess_randomize_replica_read = On
memcached.sess_locking = On
memcached.sess_connect_timeout = 200
memcached.serializer = “igbinary”

That’s it! Consult the documentation for a complete explanation of these configuration directives.

Leverage caching

Any data that is expensive to generate or query and long lived should be cached in-memory if possible. Common examples of highly cacheable data include web service responses, database result sets, and configuration data.

Using the Symfony2 HttpFoundation component for built-in http caching support

I won’t attempt to explain http caching. Just go read the awesome post from Ryan Tomako, Things Caches Do or the more in-depth guide to http caching from Mark Nottingham. Both are stellar posts that every professional developer should read.

With the Symfony2 HttpFoundation component it is easy to add support for caching to your http responses. The component is completely standalone and can be dropped into any existing php application to provide an object oriented abstraction around the http specification. The goal is to help you manage requests, responses, and sessions. Add “symfony/http-foundation” to your Composer file and you are ready to get started.

Expires based http caching flow

use SymfonyComponentHttpFoundationResponse;

$response = new Response(‘Hello World!’, 200, array(‘content-type’ => ‘text/html’));

$response->setCache(array(
‘etag’ => ‘a_unique_id_for_this_resource’,
‘last_modified’ => new DateTime(),
‘max_age’ => 600,
‘s_maxage’ => 600,
‘private’ => false,
‘public’ => true,
));

If you use both the request and response from the http foundation you can check your conditional validators from the request easily:


use SymfonyComponentHttpFoundationRequest;
use SymfonyComponentHttpFoundationResponse;

$request = Request::createFromGlobals();

$response = new Response(‘Hello World!’, 200, array(‘content-type’ => ‘text/html’));

if ($response->isNotModified($request)) {
$response->send();
}

Find more examples and complete documentation from the very detailed Symfony documentation.

Caching result sets with Doctrine ORM

If you aren’t using an ORM or some form of database abstraction you should consider it. Doctrine is the most fully featured database abstraction layer and object-relational mapper available for PHP. Of course, adding abstractions comes at the cost of performance, but I find Doctrine to be exteremly fast and efficient if used properly. If you leverage the Doctrine ORM you can easily enable caching result sets in Memcached:


$memcache = new Memcache();
$memcache->connect('localhost', 11211);

$memcacheDriver = new DoctrineCommonCacheMemcacheCache();
$memcacheDriver->setMemcache($memcache);

$config = new DoctrineORMConfiguration();
$config->setQueryCacheImpl($memcacheDriver);
$config->setMetadataCacheImpl($memcacheDriver);
$config->setResultCacheImpl($memcacheDriver);

$entityManager = DoctrineORMEntityManager::create(array(‘driver’ => ‘pdo_sqlite’, ‘path’ => __DIR__ . ‘/db.sqlite’), $config);

$query = $em->createQuery(‘select u from EntitiesUser u’);
$query->useResultCache(true, 60);

$users = $query->getResult();

Find more examples and complete documentation from the very detailed Doctrine documentation.

Caching web service responses with Guzzle HTTP client

Interacting with web services is very common in modern web applications. Guzzle is the most fully featured http client available for PHP. Guzzle takes the pain out of sending HTTP requests and the redundancy out of creating web service clients. It’s a framework that includes the tools needed to create a robust web service client. Add “guzzle/guzzle” to your Composer file and you are ready to get started.

Not only does Guzzle support a variety of authentication methods (OAuth 1+2, HTTP Basic, etc), it also support best practices like retries with exponential backoffs as well as http caching.


$memcache = new Memcache();
$memcache->connect('localhost', 11211);

$memcacheDriver = new DoctrineCommonCacheMemcacheCache();
$memcacheDriver->setMemcache($memcache);

$client = new GuzzleHttpClient(‘http://www.test.com/’);

$cachePlugin = new GuzzlePluginCacheCachePlugin(array(
‘storage’ => new GuzzlePluginCacheDefaultCacheStorage(
new GuzzleCacheDoctrineCacheAdapter($memcacheDriver)
)
));
$client->addSubscriber($cachePlugin);

$response = $client->get(‘http://www.wikipedia.org/’)->send();

// response will come from cache if server sends 304 not-modified
$response = $client->get(‘http://www.wikipedia.org/’)->send();

Following these tips will allow you to easily cache all your database queries, web service requests, and http responses.

Moving work to the background with Resque and Redis

Any process that is slow and not important for the immediate http response should be queued and processed via non-blocking background tasks. Common examples are sending social notifications (like Facebook, Twitter, LinkedIn), sending emails, and processing analytics. There are a lot of systems available for managing messaging layers or task queues, but I find Resque for PHP dead simple. I won’t provide an in-depth guide as Wan Qi Chen’s has already published an excellent blog post series about getting started with Resque. Add “chrisboulton/php-resque” to your Composer file and you are ready to get started. A very simple introduction to adding Resque to your application:

1) Define a Redis backend

Resque::setBackend('localhost:6379');

2) Define a background task

class MyTask
{
public function perform()
{
// Work work work
echo $this->args['name'];
}
}

3) Add a task to the queue

Resque::enqueue('default', 'MyTask', array('name' => 'AppD'));

4) Run a command line task to process the tasks with five workers from the queue in the background

$ QUEUE=* COUNT=5 bin/resque

For more information read the official documentation or see the very complete tutorial from Wan Qi Chen:

Monitor production performance

AppDynamics is application performance management software designed to help dev and ops troubleshoot performance problems in complex production applications. The application flow map allows you to easily monitor calls to databases, caches, queues, and web services with code level detail to performance problems:

Symfony2 Application Flow Map

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

If you prefer slide format these posts were inspired from a recent tech talk I presented:

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Why web analytics aren’t enough!

Every production web application should use web analytics. There are many great free tools for web analytics, the most popular of which is Google Analytics. Google Analytics helps you analyze visitor traffic and paint a complete picture of your audience and their needs. Web analytics solutions provide insight into how people discover your site, what content is most popular, and who your users are. Modern web analytics also provide insight into user behavior, social engagement, client-side page speed, and the effectiveness of ad campaigns. Any responsible business owner is data-driven and should leverage web analytics solutions to get more information about your end users.

Web Analytics Landscape

Google Analytics

While Google Analytics is the most popular and the de facto standard in the industry, there are quite a few quality web analytics solutions available in the marketplace:

The Forrester Wave Report provides a good guide to choosing an analytics solution.

Forrester Wave

There are also many solutions focused on specialized web analytics that I think are worth mentioning. They are either geared towards mobile applications or getting better analytics on your customers’ interactions:

Once you understand your user demographics, it’s great to be able to get additional information about how performance affects your users. Web analytics only tells you one side of the story, the client-side. If you are integrating web analytics, check out Segment.io which provides analytics.js for easy integration of multiple analytics providers.

It’s all good – until it isn’t

Using Google Analytics on its own is fine and dandy – until you’re having performance problems in production you need visibility into what’s going on. This is where application performance management solutions come in. APM tools like AppDynamics provide the added benefit of understanding both the server-side and the client-side. Not only can you understand application performance and user demographics in real time, but when you have problems you can use the code-level visibility to understand the root cause of your performance problems. Application performance management is the perfect complement to web analytics. Not only do you understand your user demographics, but you also understand how performance affects your customers and business. It’s important to be able to see from a business perspective how well your application is performing in production:

 

Screen Shot 2013-10-29 at 1.34.01 PM

Since AppDynamics is built on an extensible platform, it’s easy to track custom metrics directly from Google Analytics via the machine agent.

The end user experience dashboard in AppDynamics Pro gives you real time visibility where your users are suffering the most:

Profile-PageView

Capturing web analytics is a good start, but it’s not enough to get an end-to-end perspective on the performance of your web and mobile applications. The reality is that understanding user demographics and application experience are two completely separate problems that require two complementary solutions. O’Reilly has a stellar article on why real user monitoring is essential for production applications.

Get started with AppDynamics Pro today for in-depth application performance management.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Performance testing tools explained: The client side

In my last post I showed different approaches for load testing the server side. In this post I will highlight some tools I use to monitor the performance of the client side.

In modern Javascript-intensive web applications, users spend more time waiting on client-side rendering than the server-side processing. The reality is that as much effort that goes into optimizing the server-side, more effort should be made to optimize the client-side. End user monitoring has never been so important.

Why does performance matter?

Just to recap a few statistics about the business impact of performance at major internet companies:

Performance Impact

As you iterate through the software development cycle it is important to measure application performance on both the server side and the client side and understand the impact of every release. Here are some tools you can use to test the performance of the entire end user experience:

Google PageSpeed Insights

Google PageSpeed Insights provides actionable advice for improving the performance of client-side web applications. PageSpeed Insights analyzes the content of a web page, then generates suggestions to make that page faster. Reducing page load times can reduce bounce rates and increase conversion rates. The service is available as part of the Google Chrome Developer Tools as an extension, a web service, and extensions for Apache and Nginx.

Use the Google PageSpeed Insight API to integrate client-side optimizations into your continuous integration setup.

curl "https://www.googleapis.com/pagespeedonline/v1/runPagespeed?url=http://dustinwhittle.com/&key=xxx"

WBench

WBench is a tool that uses the HTML5 navigation timing API to benchmark end user load times for websites.

1) Install WBench:

gem install wbench

2) Run WBench:

wbench http://dustinwhittle.com/

WebPageTest.org

WebPageTest.org enables anyone to test the client-side performance on a range of browsers from anywhere in the world for free. This service is great and worth every penny I didn’t pay for it. Not only does it provide a range of mobile/desktop browsers and locations, but it also shows a waterfall timeline and a video of the rendering.

Screen Shot 2013-10-23 at 3.52.50 PM

AppDynamics

With AppDynamics Pro you get in-depth performance metrics to evaluate the scalability and performance of your application. Use the AppDynamics Pro Metrics Browser to track end user experience times and errors over the duration of the load tests:

With AppDynamics Pro End User Experience Dashboard you get visibility into both the server-side and the client-side:

Use AppDynamics Pro to compare multiple application releases to see the change in performance and stability:

release

In my next post in this series I will cover load testing tools for native mobile applications. Get started with AppDynamics Pro today for in-depth application performance management.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Making Healthcare.gov scale isn’t a quick fix

Health Insurance Marketplace - Please wait healthcare.govLike many of you, I’ve been following the news about the Obamacare situation. I’m not going to blame the President or his administration for the performance issues that Healthcare.gov are experiencing. I’m not going to blame the contractors that did the integration work. Instead I’m going to blame the colleges and universities around the world who teach computer science and programming.

I became a Bachelor of Computer Science many moons ago. While I wasn’t in a bar drinking Guinness, I studied the application development lifecycle from a bible-like book called “Software Engineering,” which detailed all the steps I needed to write modern applications. It covered everything from use cases to requirements to development, testing and maintenance. I graduated from college and started as a Java developer for a well-known U.S.-based systems integrator. I soon realized I had learned very little about developing applications that would work in the real world. Making an application function or work per spec is easy; making it perform and scale is an entirely different skillset.

downloadBut surely my “Software Engineering” book had a section called “Non-Functional Requirements” which included references to Performance and Scalability? Yes, it did, but no lecturer deemed those sections critical to my learning. Not once did someone tell me, “By the way, here are the top 25 things that will make your application run like a dog with 3 legs.” Building efficient code, designing scalable architectures and dealing with over 100,000 transactions per minute was a massive gap in my education. That’s why most developers today ignore the performance and scalability non-functional requirements, and leave them to QA teams to deal with once the application has been designed and developed. Simply put, performance is an afterthought.

I’ve spoken at a lot of conferences in the past two years for AppDynamics. In nearly all of them I’ve asked the audience, “How many of you performance test your code?” Usually one or two people raise their hands. This isn’t good enough, and it’s evidence of why websites and applications frequently slow down and break. Performance and Scalability are not something you can just “bolt on” to an application – these non-functional requirements must be designed from the start in any application architecture. Any experienced developer or architect knows this. The secret is that requirements will always change throughout the development lifecycle, especially now that agile is the de facto development process. The secret is to design an architecture that can scale and perform through change without needing to rewrite large numbers of components or services. Service Orientated Architectures (SOA) allowed developers to do this, because it provides the abstraction needed so new application services can be plugged in, and scaled horizontally when required. You then add virtualization and cloud into the mix and you’ve got on-demand computing power for those SOA services.

A key problem with SOA is that it creates service fragmentation, distribution and complexity. Developers and architects no longer manage just a few servers – they manage hundreds or thousands of servers that are shared and have lots of dependencies. If you truly want to test these huge SOA environments for performance and scalability you have to reproduce them in test, and that’s not easy. Imagine the Obama HealthCare.gov team trying to build a test environment and simulate millions of users across hundreds of different internal and remote third party services – an impossible task. Turns out it was an impossible task, their team only tested the websites health and performance a week before go live. Now, imagine if your application looked like this:

productionCraziness

 Screenshot above is from an AppDynamics Customer that leveraged SOA design principles. Healthcare.gov could be equally complex to manage and test.

Building fast and scalable applications is not easy. The answer isn’t simply to do more testing before you deploy your application in production. The answer is to design performance and scalability into your architecture right from the start. So yes, performance testing is better than no performance testing, but when HealthCare.gov has millions of concurrent users you really need an architecture that was built to scale from its inception.

Application performance monitoring like AppDynamics can definitely help identify the key architecture bottlenecks of HealthCare.gov in production, but the reality is the government is going to need a team of smart, dedicated architects and developers to make the required architecture changes and fixes. Unfortunately this isn’t something that can be achieved in hours or days.

If anyone needs to learn a lesson it should the colleges and universities around the world who educate students on how to build applications, so when students like me graduate, they are aware of what it takes to build scalable applications in the real world. Non-functional requirements like performance and scalability are just as important as the functional requirements we’ve all come to understand and practice in application development.

Healthcare.gov is just another example of what happens when non functional requirements are ignored.

Steve.

Performance testing tools explained: the server side

The performance of your application affects your business more than you might think. Top engineering organizations think of performance not as a nice-to-have, but as a crucial feature of their product. Those organizations understand that performance has a direct impact on user experience and, ultimately, their bottom line. Unfortunately, most engineering teams do not regularly test the performance and scalability of their infrastructure. In my last post on performance testing I highlighted a long list of tools that can be used for load testing. In this post we will walk through three performance testing tools: Siege, MultiMechanize, and Bees with Machine Guns. I will show simple examples to get started performance testing your web applications regardless of the language.

Why performance matters?

A few statistics about the business impact of performance at major internet companies:

Performance Impact

As you iterate through the software development cycle it is important to measure application performance and understand the impact of every release. As your production infrastructure evolves you should also track the impact of package and operating system upgrades. Here are some tools you can use to load test your production applications:

Apache Bench

Apache bench is simple tool for load testing applications provided by default with the Apache httpd server. A simple example to load test example.com with 10 concurrent users for 10 seconds.

Install Apache Bench:

apt-get install apache2-utils

Apache bench a web server with 10 conncurrent connections for 10 seconds:

ab -c 10 -t 10 -k http://example.com/

Benchmarking example.com (be patient)
Finished 286 requests

Server Software:        nginx
Server Hostname:        example.com
Server Port:            80

Document Path:          /
Document Length:        6642 bytes

Concurrency Level:      10
Time taken for tests:   10.042 seconds
Complete requests:      286
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      2080364 bytes
HTML transferred:       1899612 bytes
Requests per second:    28.48 [#/sec] (mean)
Time per request:       351.133 [ms] (mean)
Time per request:       35.113 [ms] (mean, across all concurrent requests)
Transfer rate:          202.30 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        9   20  14.2     17     148
Processing:   117  325  47.8    323     574
Waiting:      112  317  46.3    314     561
Total:        140  346  46.0    341     589

Percentage of the requests served within a certain time (ms)
  50%    341
  66%    356
  75%    366
  80%    372
  90%    388
  95%    408
  98%    463
  99%    507
 100%    589 (longest request)

 

Siege + Sproxy

Personally, I prefer Siege to Apache Bench for simple load testing as it is a bit more flexible.

Install Siege:

apt-get install siege

Siege a web server with 10 conncurrent connections for 10 seconds:

siege -c 10 -b -t 10S http://example.com/

** SIEGE 2.72
** Preparing 10 concurrent users for battle.
The server is now under siege...
Lifting the server siege...      done.

Transactions:              263 hits
Availability:              100.00 %
Elapsed time:              9.36 secs
Data transferred:          0.35 MB
Response time:             0.35 secs
Transaction rate:          28.10 trans/sec
Throughput:                0.04 MB/sec
Concurrency:               9.82
Successful transactions:   263
Failed transactions:       0
Longest transaction:       0.54
Shortest transaction:      0.19

 

More often than not you want to load test an entire site and not just a single endpoint. A common approach is to crawl the entire application to discover all urls and then load test a sample of those urls. The makers of Siege also make sproxy which in conjunction with wget enables you to crawl an entire site through a proxy and record all of the urls accessed. It makes for an easy way to compile a list of every possible url in your appplication.

 

1) Enable sproxy and specify to output all the urls to a file urls.txt:

sproxy -o ./urls.txt

2) Use wget with sproxy to crawl all the urls of example.com:

wget -r -o verbose.txt -l 0 -t 1 --spider -w 1 -e robots=on -e "http_proxy = http://127.0.0.1:9001" "http://example.com/"

3) Sort and de-duplicate the list of urls from our application:

sort -u -o urls.txt urls.txt

4) Siege the list of urls with 100 concurrent users for 3 minutes:

siege -v -c 100 -i -t 3M -f urls.txt

** SIEGE 2.72
** Preparing 100 concurrent users for battle.
The server is now under siege...
Lifting the server siege...      done.

Transactions:              263- hits
Availability:              100.00 %
Elapsed time:              90.36 secs
Data transferred:          3.51 MB
Response time:             0.35 secs
Transaction rate:          88.10 trans/sec
Throughput:                0.28 MB/sec
Concurrency:               9.82
Successful transactions:   2630
Failed transactions:       0
Longest transaction:       0.54
Shortest transaction:      0.19

 

Multi-Mechanize

When testing web applications sometimes you need to write test scripts that simulate virtual user activity against a site/service/api. Multi-Mechanize is an open source framework for performance and load testing. It runs concurrent Python scripts to generate load (synthetic transactions) against a remote site or service. Multi-Mechanize is most commonly used for web performance and scalability testing, but can be used to generate workload against any remote API accessible from Python. Test output reports are saved as HTML or JMeter-compatible XML.

 

1) Install Multi-Mechanize:

pip install multi-mechanize

2) Bootstrapping a new multi mechanize project is easy:

multimech-newproject demo

import mechanize
import time

class Transaction(object):
    def run(self):
        br = mechanize.Browser()
        br.set_handle_robots(False)

        start_timer = time.time()
        resp = br.open('http://www.example.com/')
        resp.read()
        latency = time.time() - start_timer

        self.custom_timers['homepage'] = latency

        assert (resp.code == 200)
        assert ('Example' in resp.get_data())

3) Run the multi-mechanize project and review the outputted reports

multimech-run demo

multimech

 

Bees with Machine Guns

In the real world you need to test your production infrastructure with realistic traffic. In order to generate the amount of load that realistically represents production, you need to use more than one machine. The Chicago Tribune has invested in helping the world solve this problem by creating Bees with Machine Guns. Not only does it have an epic name, but it is also incredibly useful for load testing using many cloud instances via Amazon Web Services. Bees with Machine Guns is a utility for arming (creating) many bees (micro EC2 instances) to attack (load test) targets (web applications).

1) Install Bees with Machine Guns:

pip install beeswithmachineguns

 

2) Configure Amazon Web Services credentials in ~/.boto:

[Credentials]

aws_access_key_id=xxx
aws_secret_access_key=xxx

[Boto]

ec2_region_name = us-west-2
ec2_region_endpoint = ec2.us-west-2.amazonaws.com

 

3) Create 2 EC2 instances using the default security group in the us-west-2b availabily zone using the ami-bc05898c image and login using the ec2-user user name.

bees up -s 2 -g default -z us-west-2b -i ami-bc05898c -k aws-us-west-2 -l ec2-user

Connecting to the hive.
Attempting to call up 2 bees.
Waiting for bees to load their machine guns...
.
.
.
.
Bee i-3828400c is ready for the attack.
Bee i-3928400d is ready for the attack.
The swarm has assembled 2 bees.

 

4) Check if the ec2 instances are ready for battle

bees report

Read 2 bees from the roster.
Bee i-3828400c: running @ 54.212.22.176
Bee i-3928400d: running @ 50.112.6.191

 

5) Attack a url if the ec2 instances are ready for battle

bees attack -n 100000 -c 1000 -u http://example.com/

Read 2 bees from the roster.
Connecting to the hive.
Assembling bees.
Each of 2 bees will fire 50000 rounds, 125 at a time.
Stinging URL so it will be cached for the attack.
Organizing the swarm.
Bee 0 is joining the swarm.
Bee 1 is joining the swarm.
Bee 0 is firing his machine gun. Bang bang!
Bee 1 is firing his machine gun. Bang bang!
Bee 1 is out of ammo.
Bee 0 is out of ammo.
Offensive complete.
     Complete requests:   100000
     Requests per second: 1067.110000 [#/sec] (mean)
     Time per request:    278.348000 [ms] (mean)
     50% response time:   47.500000 [ms] (mean)
     90% response time:   114.000000 [ms] (mean)
Mission Assessment: Target crushed bee offensive.
The swarm is awaiting new orders.

6) Spin down all the EC2 instances

bees down

 

AppDynamics

With AppDynamics Pro you get in-depth performance metrics to evaluate the scalability and performance of your application. Use the AppDynamics Pro Metrics Browser to track key response times and errors over the duration of the load tests:

metrics browser

Use the AppDynamics Scalability Analysis Report to evaluate the performance of your application against load tests:

Screen Shot 2013-10-15 at 5.44.52 PM

Use AppDynamics Pro to compare multiple application releases to see the change in performance and stability:

release

Get started with AppDynamics Pro today for in-depth application performance management.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Top 10 Reasons Why eCommerce Apps Will Fail This Black Friday

My wife is a shopoholic and serial checkout killer. Every week she spends several hours browsing and carefully adding items to shopping carts. Every now and then she’ll interrupt my sports program with an important announcement “My Checkout just failed”. Take this example Mrs Appman sent me during the month of September:

Checkout Fail

The fact I work in the APM industry means it is my responsibility to fix these problems immediately (for my wife). As Black Friday is coming up I thought I’d share with you what typically goes wrong under the covers when our customer’s e-commerce applications go bang during a critical time.

It is worth mentioning that nearly all our customers perform some form of performance or load testing prior to the Black Friday period. Many actually simulate the load from the previous year on test environments designed to reproduce Black Friday activity. While 75% of bottlenecks can be avoided in testing, unfortunately a few surface in production as a result of applications being too big and complex to manage to test under real world scenarios. For example, most e-commerce applications these days span more than 500 servers distributed across many networks and service providers.

Here are the top 10 reasons why eCommerce applications will fail this Black Friday:

1. Database Connection Pool

Nearly every checkout transaction will interact with one or more databases. Connections to this resource are therefore sacred and can often be deadly when transaction concurrency is high. Most application servers come with default connection pool configurations of between 10 and 20. When you consider that transaction throughput for e-commerce applications can easily exceed 100,000 trx/min you soon realize that default pool configurations aren’t going to cut it. When  a database connection pool becomes exhausted incoming checkout requests simply wait or timeout until a connection becomes available. Take this screenshot for example:

Connection Pool Issue

2. Missing Databases Indexes

This root cause is somewhat related to the exhausted connection pools. Simply put, slow running SQL statements hold onto a database connection for longer, therefore connection pools aren’t recycled as often as they should be as queries take longer. The number 1 root cause of slow SQL statements is missing indexes on database tables, which is often caused by miss-communication between developers who write SQL, and the DBAs who configure and maintain the database schemas which hold the data. The classic “full table scan” query execution where a transaction and its database operation must scan through all the data in a table before a result is returned. Here is an example of what such looks like in AppDynamics:

Missing Index

3. Code Deadlock

High transaction concurrency often means application server threads have to contend more for application resource and objects. Most e-commerce applications have some form of atomicity build in to their transactions, so that order and stock volumes are kept in check as thousands of users fight over special offers and low prices. If access to application resource is not properly managed some threads can end up in deadlock, which can often cause an application server and all its user transactions to hang and timeout. One example of this was last year where an e-commerce customer was using a non-thread safe cache. Three threads tried to perform a get, set and remove on the same cache at the same time causing code deadlock to occur, impacting over ~2,500 checkout transactions as the below screenshot shows.

deadlock

4. Socket Timeout Exceptions

Server connectivity is an obvious root cause, if you check your server logs using a Sumologic or Splunk then you’ll probably see hundreds of these events. They represent network problems or routing failures where a checkout transaction is attempting to contact one or more servers in the application infrastructure. Most of the time the services you are connecting to aren’t your own, for example a shipping provider, credit card processor, or fraud detector. On high traffic days like Black Friday it isn’t just your site experiencing a surge in traffic – often times entire networks are saturated due to intense demand. After a period of time (often 30-45 secs) the checkout transaction will just give up, timeout and return an error to the user. No Connectivity = No Revenue. Here is an example of what it looks like:

socket timeout exception

5. Garbage Collection

Caches are an easy way to speed up applications. The closer data is to application logic (in memory) the faster it executes. It is therefore no surprise that as memory has gotten bigger and cheaper most companies have adopted some form of in-memory caching to eliminate database access for frequent used results. The days of 64GB and 128GB heaps are now upon us which means the impact of things like Garbage Collection are more deadly to end users. Maintaining cache data and efficiently creating/persisting user objects in memory becomes paramount for eliminating frequent garbage collection cycles. Just because you have GB’s of memory to play with doesn’t mean you can be lazy in how you create, maintain and destroy objects. Here is are a few screenshots that show how garbage collection can kill your e-commerce application:

Garbage Collection

Screen Shot 2013-10-14 at 2.57.04 PM

6. Transactions with High CPU Burn

Its no secret than inefficient application logic will require more CPU cycles than efficient logic. Unfortunately the number 1 solution to slow performance in the past was for eCommerce vendors to buy more servers. More servers = More Capacity = More Transaction Throughput. While this calculation sounds good, the reality is that not all e-commerce transactions are CPU bound. Adding more capacity just masks inefficient code in the short term, and can waste you significant amounts of money in the long term. If you have specific transactions in your eCommerce application that hog or burn CPU then you might want to consider tuning those before you whip out your check book with Oracle or Dell. For example:

High CPU Burn

7. 3rd Party Web Services

If your e-commerce application is built around a distributed SOA architecture then you’ll have multiple points of failure. Especially if several of those services are provided by a 3rd party where you have no visibility. For example, most payment and credit card authorization services are provided by 3rd party vendors like PayPal, Stripe, or Braintree. If these services slow down or fail then its impossible for checkout transactions to complete. You therefore need to monitor these services religiously so when problems occur you can rapidly identify whether it is your code or connectivity or someone else’s outage. Here is example of how AppDynamics can help you monitor your 3rd party web services:

Transaction Flow

Screen Shot 2013-10-14 at 2.59.50 PM

8. Crap Recursive Code

This is similar to #6 but burns time instead of resources. For example, many e-commerce transactions will request data from multiple sources (caches, databases, web services) at the same time. Every one of these round trips could be expensive and may involve network time along the way. I’ve seen a single eCommerce search transaction call the same database multiple times instead of performing a single operation using a stored procedure on the database. Recursive remote calls may only take 10-50 millisecond each, but if they are invoked multiple times per transaction they can add seconds to your end user experience. For example, here is that search transaction that took x seconds and made 13,000 database calls.

Screen Shot 2013-10-14 at 3.00.36 PM

9. Configuration Change

As much as we’d like to think that production environments are “locked down” with change control process, they are not. Accidents happen, humans make mistakes and hotfixes occasionally get applied in a hurry at 2am 😉 Application server configuration can be sensitive just like networks, or any other pieces of the infrastructure. Being able to audit, report and compare configuration change across your application gives you instant intelligence that a change may have caused your eCommerce application to break. For example, AppDynamics can record any application server change and show you the time and values that were updated to help you correlate change with slowdowns and outages, see below screenshot.

Screen Shot 2013-10-14 at 3.01.02 PM

10. Out of Stock Exception

“I’m sorry, the product you requested is no longer in stock”. This basically means you were too slow and you’ll need to wait until 2014 for the same offer. Remember to set an alarm next year for Black Friday 😉

out_stock_en

In addition, AppDynamics can also monitor the revenue and performance of your checkout transactions over-time which helps Dev and Ops teams monitor and alert on the health of the business:

 Correlating revenue and performance

The good news is that AppDynamics Pro can identify all of the above defects in minutes. You can take a free trial here and be deployed in production in under 30 minutes! If you send us a few screenshots of your findings in production like the above we’ll send you a $250 Amazon gift certificate for your hard work!

Steve.