Cisco on Cisco: An enterprise observability story

Cisco reduces MTTR, increases productivity, and innovates faster with Cisco Full-Stack Observability

Key benefits


38% reduction in meantime to restore (MTTR) across all IT services.


Reduced business-impacting incidents hours by 80%.


Achieved verifiable 99.999% availability for IT services and applications.



Cisco relies on more than 80,000 global employees to support innovation and growth. Cisco AIOps capabilities, powered by Cisco Observability Platform, Cisco AppDynamics, and Cisco ThousandEyes, enables operations and application teams to gain visibility and insight to accelerate cross-functional incident response workflows that deliver 99.999% availability uptime for IT services.

 

The ability to tie data to business impact is one aspect that stood out about the Cisco Observability Platform."

Chuck Churchill
Senior Director, IT Service Management,
Cisco

Challenge

An increase in applications, data, and tools deployed or migrated across the Cisco hybrid cloud IT landscape created layers of complexity that obscured monitoring, visibility, and observability across the business. Additionally, the rate of change in observability outpaces the speed at which Cisco IT teams can adapt.

Solution

The Cisco Observability Platform, including AIOps capabilities, along with Cisco AppDynamics and Cisco ThousandEyes, allows service operations and application teams to easily leverage extensibility and receive automatic platform updates. With additional features from Application Resource Optimizer and Cost Insights modules on the Cisco Observability Platform, Cisco IT will be able to automate key workflow tasks to reduce MTTR for critical issues.

Results

Cisco IT works seamlessly, collaboratively, and productively across all ops functions to add new features quickly, deliver five-9s of IT service availability, and gain contextual insights required to report on business-critical KPIs in real time with confidence.  

We rely on AppDynamics for overall business performance and individual component views in shared dashboards that our team needs to respond quickly.”


Venkat Bongoni
Principal Architect, Cisco

Challenge

Simplifying a complex IT environment

Cisco empowers a workforce of more than 80,000 global employees to help customers and partners deliver high-performing digital services. Cisco promises, “If you can imagine it, we will build the bridge to get you there.” And it isn’t just a slogan. It’s a directive that relies on AI Ops solutions to ensure high-performing availability for all the technologies that Cisco’s workforce needs to fulfill that promise.

Finding and resolving issues in a highly complex, hybrid, and global enterprise IT environment such as Cisco’s can be daunting. “We use 50+ different tools to monitor Cisco and third-party products for capacity, utilization, provisioning, and discovery,” says Venkat Bongoni, principal architect at Cisco. “To operate our applications and services across different environments, it’s important to connect all systems and all monitoring tools available.” But doing so is not simple when you’re responsible for monitoring more than two million daily events and two billion metrics streaming from 300,000+ endpoints, 70,000+ containers/virtual machines (VMs) — and a vast number of routers, switches, and global access points hosted in the cloud and on-premises.

Building a custom solution for IT Ops

In 2018, Cisco IT leadership appointed a team of data science engineers to build a custom IT Ops platform that would simplify service and application operation processes and reporting. Referred to as CHAIN (Correlate, Harmonize Application Infrastructure Notifications), the vision was to reduce alerts and other noise from more than 50 monitoring tools, correlate the data in a meaningful way, and allow cross-functional teams to predict and prevent issues before they became a problem. CHAIN delivered end-to-end visibility and insights on a single screen for applications, network, infrastructure, collaboration, and command center teams. With it, they could work in unison to ensure service availability, triage more effectively, and resolve incidents quickly. “Every team needed dependency, topography, and application information,” says Bongoni. “Now that we have it, we can supply real-time operational metrics for any level within the IT service management organization.”

When goals are met — new opportunities arise

After four years spent building CHAIN, Cisco IT achieved its vision and goals. Cross-functional teams had shared dashboards for transparency, which eased collaboration across incident resolution and access to data that proved service availability through metrics. As Cisco IT leadership turned toward adding more capabilities, new challenges arose. Jeffrey Drury, leader, full-stack observability systems engineering at Cisco explains, “Building this environment was a tremendous effort, and the team delivered on its goals,” he says. “But there were operational challenges and a need to add more infrastructure and applications into the view. However, most of the build team’s time was spent patching, upgrading, securing, and maintaining the new system.”  

Cisco IT recognized that the rate of change in the observability space outpaces the speed at which their internal IT teams can adapt. “Observability is now a set of multidisciplinary capabilities bringing together observability, security for applications and business risk, sustainability, optimization, and AI-driven insights across IT and business boundaries,” says Chuck Churchill, senior director, IT service management at Cisco. “We are in the process of migrating to the Cisco Observability Platform to take advantage of capabilities being rapidly developed by Cisco and its partners to accelerate capabilities that teams and leaders need to make business decisions.”

Solution

Gaining observability with business context

The Cisco Observability Platform allows Cisco IT to add new features quickly and automate maintenance tasks. The platform’s extensible architecture and contextual insights made the decision to migrate easy. “We enhanced our service management experience through having everything in one place,” says Bongoni. “The integrations between AppDynamics, ThousandEyes, and ServiceNow allowed full visibility from the network and devices through on-prem and cloud native applications.” Ingesting all types of metrics, events, logs, and traces (MELT) from anywhere enables business context to sit on top of the platform and deliver insights teams want. “From individuals in an ops role correlating metrics, issues, and alerts for troubleshooting, to the CIO who wants real-time availability metrics across all services, the ability to tie data to business impact is one aspect that stood out about the Cisco Observability Platform,” says Churchill.

"With AppDynamics, teams have the necessary telemetry data for AI Ops correlation that gives a holistic view across its entire IT landscape and the capability to bring MELT data into AI Ops.”

Venkat Bongoni
Principal Architect,
Cisco

Extensibility helps IT Ops resolve issues faster and boost productivity 

Many companies stitch together different vendor implementations for IT Ops but if one vendor makes a change in the API or how they present data, it can affect underlying relationships. The Cisco Observability Platform uses OpenTelemetry™ as a standard and that was a key benefit for this global IT group. “OpenTelemetry allows the Cisco Observability Platform to ingest telemetry data from any source and build relationships between the data. That value is difficult to build at speed in a custom solution,” says Churchill.

Among other benefits is the platform’s ability to use AI/ML to predict remediation steps based on data collected for past incidents. It’s a feature that has great value when aiming for five-9s of application and service availability. Adopting the platform as a new foundation for CHAIN was smooth and the migration happened impressively fast. “In about six weeks, the Cisco IT team was able to build a custom proof-of-concept using production data streams to deliver the extended visibility across the infrastructure that we were seeking,” says Drury.  

Impressive time to value

Cisco’s IT leadership looked for ease in adding new capabilities. In CHAIN, features like monitoring, optimization, and security would require a custom build, which takes time and money — but these capabilities already exist within the Cisco Full-Stack Observability portfolio. Extensibility supports the build team’s ability to migrate quickly and begin creating and customizing existing use cases that align with critical KPIs. Within a few weeks, the team provided a proof-of-concept, demonstrating how Application Resource Optimizer enables ops teams to drill down into microservices running on public-cloud Kubernetes® clusters, see a cost reduction over time, and receive recommendations that improve underlying performance for associated data stores, VM storage, and other resources. Similarly, with the Cost Insights module, teams can analyze workloads running in cloud and cloud native infrastructures and leverage cost modeling in public clouds to better understand usage and optimize for cost savings. The addition of customer digital experience monitoring (CDEM) supports correlation between application user experience data from AppDynamics and network insights from ThousandEyes in real time — to remove guesswork that hinders root cause analysis of poor application performance.

Benefits

Cisco Full-Stack Observability starts with AppDynamics

To drive the business forward for customers and partners, Cisco IT leadership needs cross-functional teams to have robust monitoring capabilities for application performance and the corresponding metrics to address incidents in real time. AppDynamics application performance monitoring and its AI/ML capabilities are significant for Cisco IT to gain visibility for application health and quickly identify issues. “With AppDynamics, teams have the necessary telemetry data for AI Ops correlation that gives a holistic view across its entire IT landscape and the capability to bring MELT data into AI Ops,” says Bongoni. “We rely on AppDynamics for overall business performance and individual component views in shared dashboards that our team needs to respond quickly.”

Business context for critical metrics

The Cisco Observability Platform enables aggregated performance metrics across the IT environment, including entities associated with on-premises infrastructure and third-party vendors. “For any business application, we can now see the number of incidents created, any scheduled changes, and drill down into the microservices running on Kubernetes clusters to see violations that have occurred there,” says Bongoni. Pulling data into a single dashboard, accessible by all IT service management roles, gives everyone transparency for critical KPIs such as availability, performance, response time loss, latency, and data coming from various monitoring systems.

With the Cisco Observability Platform, Cisco IT will be able to provide a one-click feature for its command center, replicating an important capability in CHAIN today. When an incident is discovered — ticket creation, opening a situation room in Webex Teams, and sending an e-page to team leads all happen simultaneously. By automating cross-functional processes and creating shared dashboards to promote accountability, system availability for services and applications jumped from just under four 9s to five 9s and business impacting incident hours dropped by 80%.

Accelerating innovation and collaboration

Decades ago, Cisco built a router that brought in data with industry-standard protocols and determined where the data needed to go. In doing so, the company made a significant contribution to networking that accelerated adoption of the internet and the way people access technology. Similarly, the Cisco Observability Platform brings in telemetry data from any source connected to a network, such as a server in a data center, an object in a public cloud, or even a light switch. “With the Cisco Observability Platform, Cisco IT can define how telemetry comes in and then build meaningful relationships. That's a key tenet of the platform,” says Drury.

With a SaaS platform, updates and maintenance are automated, and its extensibility eases adoption of new capabilities and integrations. “The platform provides us with an extended set of capabilities such as hybrid cost optimization, sustainability insights, and application security correlation — these are things we would have had to build in, so we appreciate that it’s a very extensible architecture,” says Churchill. Additionally, the platform’s ability to ingest and correlate MELT data from any source allows IT teams to baseline and adjust MELT and other workflows, bring IO data into the platform, and correlate it with business context on top of the architecture.

Creating a culture of accountability

By promoting a culture of accountability across shared views, the IT Ops organization is empowered to discover new ways that automation can optimize and inspire new features and workflows. Cisco IT is dedicated to adopting and using Cisco Full Stack Observability to enhance service management experiences by adding capabilities and innovating cross-functionally — as one team.  

The platform provides us with an extended set of capabilities such as hybrid cost optimization, sustainability insights, and application security correlation — these are things we would have had to build in, so we appreciate that it’s a very extensible architecture.”

Chuck Churchill
Senior Director, IT Service Management,
Cisco

About Cisco Systems

Cisco Systems, Inc., is a worldwide technology leader headquartered in San Jose, California that securely connects everything to make anything possible. Our purpose is to power an inclusive future for all by helping our customers reimagine their applications, power hybrid work, secure their enterprise, transform their infrastructure, and meet their sustainability goals. 

Resources

View all resources

Achieving end-to-end visibility with Cisco AppDynamics and ThousandEyes

How to bring application observability and network intelligence together with ease.

A blueprint for the journey towards Cisco Full-Stack Observability

Gain practical tips, strategies and techniques to build a winning observability strategy.

Transform digital business today with Cisco Cloud Observability

Application performance management with business context and advanced troubleshooting across cloud and hybrid environments.

The Age of Application Observability

Get insights from 1,000+ IT pros for optimizing application performance and delivering sustainable innovation in hybrid environments.

See your applications like never before