Kubernetes Monitoring: How to Monitor Using Best Practices
Kubernetes automates the deployment, scaling and operations of applications running inside the containers across clusters of hosts. Monitoring Kubernetes requires full-stack visibility and a deep understanding of what metrics to monitor.
Kubernetes is a container-orchestration platform for automating deployment, scaling and operations of applications running inside the containers across clusters of hosts. Open-sourced by Google in 2014, Kubernetes was built based on the search giant's own experience with running containers in production. It's now under the aegis of the Cloud Native Computing Foundation (CNCF), which reports that Kubernetes is the most popular container management tool among large enterprises, used by 83% of respondents in a recent CNCF survey. And in case you're wondering, the name "Kubernetes" originates from Greek and means helmsman or pilot.
How Kubernetes Came to Be
To understand the value of Kubernetes, we must first take a look back at how enterprises have deployed applications over the years. In traditional deployments, applications ran on physical servers, an approach that led to resource allocation issues. If multiple applications were running on a single server, for instance, one application could consume most of the resources, hampering performance of other applications. One solution was to run each app on a separate physical server, but this approach was too expensive and led to underutilization of resources.
Virtualization was the next step, which addressed the limitations of physical servers by running multiple virtual machines (VMs)—each running its own components, including the operating system (OS) and applications—on top of a physical server's CPU. VMs offered numerous benefits, including improved utilization of server resources, lower hardware costs, easier application upgrades, and other scalability enhancements.
VMs have their shortcomings, though. For example, each VM having its own OS image means additional memory and storage requirements. This adds complexity to the software development lifecycle and limits the portability of apps between public and private clouds and data centers.
The Benefits of Containers
A container is similar to a virtual machine in that it has its own software, libraries, memory, configuration files and so on. Containers offer many advantages, though, most notably the ability to share the operating system (each VM has its own OS image), making them relatively lightweight, fast and efficient. Other container benefits include:
Improved efficiency: With containers, applications can be deployed, patched or scaled more quickly.
Better portability: It's easy to deploy applications running in containers to multiple operating systems and hardware platforms.
Consistent operation: Container-based apps run the same no matter where they are deployed.
Sharper observability: In addition to surfacing OS-level metrics, they show application health and other insights.
Benefits app development: Containers supports agile and DevOps initiatives (for faster development and test and production cycles).
What Kubernetes Does for Containers
Since containers are far more efficient, fast, and lightweight than traditional virtualization, it's unsurprising that enterprises with large application deployments may deploy multiple containers as one or more container clusters. But this environment comes with its own set of challenges, as large, distributed containerized applications often become difficult to coordinate.
Kubernetes is an open source container orchestrator that automates the deployment, scaling and operations of application containers across clusters of hosts. The most popular container orchestrator, Kubernetes, is used most often with Docker, the leading containerization platform. But Kubernetes also supports other container systems that meet container image format and runtime standards set by the Open Container Initiative (OCI), an open source technical community supervised by The Linux Foundation. Alternatives to Kubernetes include Docker Swarm, a container orchestrator bundled with Docker, and Apache Mesos.
How Kubernetes Works
Kubernetes provides a framework to run distributed systems with resilience. Upon deployment, you get a Kubernetes cluster—a set of machines, or nodes, that run containerized applications that Kubernees manages. A cluster has at least one:
Worker node to host the pods. (Each pod is a group of one or more containers.)
Master node to manage the worker nodes and pods in the cluster.
To have a working Kubernetes cluster, you need multiple master components, node components and add ons. We'll discuss many of these components in the section below on Kubernetes monitoring.
Benefits of Kubernetes
While Kubernetes offers many benefits, its four key advantages include velocity, scaling, infrastructure abstraction and efficiency, according to the book Kubernetes: Up and Running by Joe Beda, Brendan Burns and Kelsey Hightower.
Velocity: Kubernetes provides the tools you need to quickly ship features hourly or daily while maintaining a highly available service.
Scaling: Kubernetes' configuration management tools make it easier to scale the containers that make up a distributed application and the clusters that power those containers.
Infrastructure Abstraction: It's challenging to run a distributed application among multiple public cloud providers or in a mixed public-private cloud setting. Kubernetes has a number of plug-ins that simplify these tasks.
Efficiency: By automating distribution of applications across a cluster of machines and ensuring higher levels of utilization, Kubernetes helps with efficiency and cost management.
What is Kubernetes monitoring? What does it involve? Let's start with what you need to monitor in Kubernetes and why.
Kubernetes can greatly simplify application deployment in containers and across clouds, but it brings its own set of complexities. As Google notes in its Site Reliability Engineering guide, monitoring a very large, complex system has two major challenges: the vast number of components being analyzed, and the need to maintain a "reasonably low maintenance burden" on the engineers in charge. These requirements demand a monitoring system that not only enables alerts for high-level service objectives, but also can inspect individual components.
To scale an application and provide reliable service, you need insights into how the application behaves when deployed. To monitor application performance in a Kubernetes cluster, it's critical to examine the performance of containers, pods and services, as well as the characteristics of the cluster as a whole. By providing information about an application's resource usage, Kubernetes allows you to gauge application performance to detect and remove bottlenecks.
Components of Kubernetes: What to Monitor
A Kubernetes cluster architecture includes a master node and separate Kubernetes nodes. Master components include:
etcd
Stores configuration information, which can be used by each node in the cluster.
API server (kube-apiserver)
Validates and configures data for API objects such as pods, services, replication controllers and more.
Scheduler (kube-scheduler)
Manages workload utilization and the allocation of pods to available nodes.
kube-controller-manager
A daemon responsible for collecting and sending information to the API server.
cloud-controller-manager
Runs controllers that interact with underlying cloud provider(s).
Kubernotes node components include:
Container runtime (e.g., Docker)
kubelet: a primary node agent that watches for pod specs via the API server; it also registers a node with the Kubernetes cluster and reports events, pod status and resource utilization.
Kubernetes proxy (kube-proxy): a proxy service that runs on each node and helps make services available to the external host.
Kubernetes add-ons
You have many Kubernetes add-on components to choose from, but here are some popular selections. You'll find a more comprehensive list of add-ons here.
Kubernetes Dashboard: a web-based UI for Kubernetes clusters that allows you to monitor the health status of workloads (deployments, pods, replica sets and more), and view CPU and memory usage metrics aggregated across all nodes. It has features configuration, discovery, load balancing, storage, monitoring, and creating and managing workloads.
Cluster DNS: a DNS server that serves DNS records for Kubernetes services.
ACI: provides integrated container networking and network security with Cisco Application Centric Infrastructure (ACI) (via GitHub).
Cluster-Level Logging: saves container logs to a central log store with a search/browsing UI.
Kubernetes Monitoring Challenges
Migrating traditional, monolithic applications to Kubernetes can be time-consuming and error-prone. Enterprises are willing to take this risk, though, to achieve greater agility, innovation, cost benefits, scalability and business growth in the cloud. But companies that migrate monolithic applications to microservices lack visibility into the Kubernetes environment. This makes it impossible to see—in real-time—the interactions of every microservice.
Kubernetes Is Complex
Another reason Kubernetes is difficult to monitor is that a Kubernetes cluster is considerably more complex, with multiple servers and private and public cloud services, notes integration engineer Dave Snyder. When a problem starts, there are many logs and other data and components to investigate. A legacy monolithic environment might require a couple of log searches, but a Kubernetes environment may contain one or more logs for multiple microservices involved in the issue you're investigating.
Kubernetes Monitoring with APM
Kubernetes monitoring with an application performance monitoring solution gives organizations visibility into application and business performance, including deeper insights into containerized applications, Kubernetes clusters, Docker containers, and underlying infrastructure metrics.
This visibility allows enterprises to enhance container-level metrics and gain visibility into CPU, packet, memory and network utilization. Organizations then can baseline these metrics and associated health rules, as well as resource usage statistics on their APM-monitored container applications. By comparing APM metrics with the underlying container and server metrics, companies quickly gain insight into the performance of their containerized applications, and learn of potential impediments in the infrastructure stack. Specific metrics, for instance, can help identify both bandwidth-hogging applications and container-level network errors.
Full-Stack Visibility in Kubernetes Environments
Visibility allows organizations to monitor containerized applications running inside Kubernetes pods, and identify container issues that hamper application performance.
A comprehensive Kubernetes monitoring solution provides end-to-end visibility into every component of the organization's applications — infrastructure, Kubernetes platform, containers, and every microservice and end user device.
Kubernetes presents operational workflows and complexities, many of which involve application performance monitoring. As you expand the use of Kubernetes into your production environments, these challenges become even more significant.
By creating levels of abstractions such as pods and services, Kubernetes frees you from worrying about where your applications are running, or if they have sufficient resources to run efficiently. But to ensure optimal performance, you still must monitor your applications, the containers that run them, and even Kubernetes itself.
Here are some important Kubernetes monitoring best practices:
Use Kubernetes DaemonSets
When running Kubernetes, you might want to run a single pod on all your nodes, such as when running a monitoring process like the AppDynamics agent or Fluentd, an open-source data collector, to collect logs. (The AppDynamics Standalone Machine Agent, which monitors containerized applications running inside Kubernetes pods, is deployed as a daemonset in every node in a Kubernetes cluster.)
A daemonset is a Kubernetes workload object that ensures a particular pod runs on every node in the cluster, or on some subset of nodes. By using a daemonset, you're telling Kubernetes to make sure there is one instance of the pod on every node.
Tags and Labels Matter—A Lot
With Kubernetes managing container orchestration, labels become critically important for monitoring because they are the only way you have to interact with pods and containers. To make your metrics as useful as possible, it's essential to define your labels with a logical, consistent and coherent schema.
Know Which Metrics to Monitor
According to Kubernetes.io, several key types of Kubernetes metrics should be tracked closely:
Running pods and their deployments
Resource metrics, including CPU, memory usage and disk I/O
Container-native metrics
Application metrics
Use Service Discovery
Since Kubernetes schedules applications dynamically based on scheduling policy, you may not know where your apps are running—but you'll have to monitor them anyway. You'll want to use a monitoring system with service discovery, which automatically adapts metric collection to moving containers. This approach allows you to continuously monitor your applications without interruption.
Kubernetes offers many benefits but adds complexity as well. For example, its ability to distribute containerized applications across multiple data centers—and even different cloud providers—requires a comprehensive monitoring solution to collect and aggregate metrics across many different sources.
Continuous monitoring of system and application health is essential, and many free and commercial solutions provide real-time monitoring of Kubernetes clusters and the applications they host. Here are several open source tools for Kubernetes monitoring:
Prometheus
This popular monitoring and alerting tool for Kubernetes and Docker provides detailed, actionable metrics and analysis. Developed by SoundCloud and donated to the CNCF community, Prometheus is designed specifically to monitor applications and microservices running in containers at scale. Prometheus is not a dashboard, however, and often is used in conjunction with Grafana (see below) to visualize data.
Grafana
Grafana, an open source platform for analytics and metric visualization, includes four dashboards: Cluster, Node, Pod/Container and Deployment. Kubernetes admins often install Grafana and leverage the Prometheus data source to create information-rich dashboards.
Jaeger
Jaeger is a tracing system used to troubleshoot and monitor transactions in complex distributed systems. It addresses software issues that arise in distributed context propagation, distributed transactions monitoring, latency optimization and more.
Dashboard
Kubernetes Dashboard, a web UI add-on for Kubernetes clusters, allows you to monitor the health status of workloads. (For more details on Dashboard, see the "Kubernetes add ons" section above.)
Kubewatch
This add-on monitors changes that occur in the Kubernetes pod, and sends notifications to a Slack channel. Written in Golang, Kubewatch uses a Kubernetes client library to interact with the Kubernetes API server, and a Slack client library to interact with Slack.
Weave Scope
A visualization and monitoring tool for Kubernetes and Docker, Weave Scope offers a top-down view of your app and entire infrastructure. Developed by Weaveworks, Weave Scope generates a map of processes, containers and hosts in a Kubernetes cluster. Its graphical UI also allows you to manage and run diagnostic commands on containers.
EFK Stack
The EFK Stack is really a melange of three tools that work well together: Elasticsearch, Fluentd and Kibana. Fluentd is a data collector that culls logs from pods running on Kubernetes cluster nodes. It routes these logs to the Elasticsearch search engine, which ingests the data and stores it in a central repository. Kibana, a data visualization plugin for Elasticsearch, is the UI for the EFK Stack, allowing the user to visualize collected logs and metrics and create custom dashboards.
InfluxDB
InfluxData’s InfluxDB is a performant store for time series data. Built for very high volume storage of monitoring records, it provides horizontal scalability and high availability with clustering. InfluxDB is a good solution for long-term storage of Kubernetes monitoring data for historic records or modeling.
If your Kubernetes monitoring needs go beyond the limitations of open source, and you're looking for an enterprise APM solution, get in touch with AppDynamics.