Cassandra - Monitoring Extension

An AppDynamics extension to be used with a stand alone Java machine agent to provide metrics for Cassandra servers.

 

Use Case

Apache Cassandra is an open source distributed database management system. The Cassandra monitoring extension captures statistics from the Cassandra server and displays them in the AppDynamics Metric Browser.

 

Prerequisites

By default, cassandra starts with remote JMX enabled. In case, you have a custom script that starts Cassandra, please make sure you have the JMX parameters enabled. For more information about JMX parameters see http://docs.oracle.com/javase/6/docs/technotes/guides/management/agent.html

 

 

Metrics provided

 

  • Cache size, capacity, hit count, hit rate, request count
  • Total latency, statistics, timeout requests, unavailable requests
  • Bloom filter disk space used, false positives, false ratio
  • SSTables compression ratio, live tables, disk space, compacted row size
  • Row size histogram
  • Column count histogram
  • Memtable columns, data size, switch count
  • Pending tasks
  • Read latency
  • Write latency
  • Pending and completed tasks
  • Compaction tasks pending and completed
  • Timeouts
  • Dropped messages
  • Streams
  • Total disk space used
  • Thread pool tasks: active, completed, blocked, pending

In addition to the above metrics, we also add a metric called "Metrics Collection Successful" with a value -1 when an error occurs and 1 when the metrics collection is successful. 

 

Note : By default, a Machine agent or a AppServer agent can send a fixed number of metrics to the controller. To change this limit, please follow the instructions mentioned here 
 
For eg.
java -Dappdynamics.agent.maxMetrics=2500 -jar machineagent.jar

 


 

Installation

 

1. Run "mvn clean install" and find the CassandraMonitor.zip file in the "target" folder. You can also download the CassandraMonitor.zip from [AppDynamics Exchange][].
2. Unzip as "CassandraMonitor" and copy the "CassandraMonitor" directory to `<MACHINE_AGENT_HOME>/monitors`

 

Configuration

 

Note : Please make sure to not use tab (\t) while editing yaml files. You may want to validate the yaml file using a yaml validator.

 

1. Configure the cassandra instances by editing the config.yml file in `<MACHINE_AGENT_HOME>/monitors/CassandraMonitor/`.


2. Configure the MBeans in the config.yml. By default, "org.apache.cassandra.metrics" is all that you may need. But you can add more mbeans as per your requirement.
You can also add excludePatterns (regex) to exclude any metric tree from showing up in the AppDynamics controller.

 

# List of cassandra servers
servers:
- host: "localhost"
port: 7199
username: ""
password: ""
displayName: "localhost"


# number of concurrent tasks
numberOfThreads: 10

#timeout for the thread
threadTimeout: 30

#prefix used to show up metrics in AppDynamics
metricPathPrefix: "Custom Metrics|Cassandra|"

#Metric Overrides. Change this if you want to transform the metric key or value or its properties.
metricOverrides:
- metricKey: ".*Ratio.*"
postfix: "Percent"
multiplier: 100
disabled: false
timeRollup: "AVERAGE"
clusterRollup: "COLLECTIVE"
aggregator: "SUM"


- metricKey: ".*Cache.*Rate.*"
postfix: "Percent"
multiplier: 100
 

3. Configure the path to the config.yml file by editing the <task-arguments> in the monitor.xml file in the `<MACHINE_AGENT_HOME>/monitors/CassandraMonitor/` directory. Below is the sample

 <task-arguments>
         <!-- config file-->
         <argument name="config-file" is-required="true" default-value="monitors/CassandraMonitor/config.yml" />
          ....
     </task-arguments>

 


4. MetricOverrides can be given at each server level or at the global level. MetricOverrides given at the global level will take precedence over server level.

The following transformations can be done using the MetricOverrides

a. metricKey: The identifier to identify a metric or group of metrics. Metric Key supports regex. b. metricPrefix: Text to be prepended before the raw metricPath. It gets appended after the displayName. Eg. Custom Metrics|cassandra||||

c. metricPostfix: Text to be appended to the raw metricPath. Eg. Custom Metrics|cassandra||||

d. multiplier: An integer or decimal to transform the metric value.

e. timeRollup, clusterRollup, aggregator: These are AppDynamics specific fields. More info about them can be found https://docs.appdynamics.com/display/PRO41/Build+a+Monitoring+Extension+Using+Java

f. disabled: This boolean value can be used to turn off reporting of metrics.

Please note that if more than one regex specified in metricKey satisfies a given metric, the metricOverride specified later will win.


Cluster level metrics : 

 

As of 1.5.2+ version of this extension, we support cluster level metrics only if each node in the cluster have a separate machine agent installed on it. There are two configurations required for this setup

 

1. Make sure that nodes belonging to the same cluster has the same <tier-name> in the <MACHINE_AGENT_HOME>/conf/controller-info.xml, we can gather cluster level metrics.  The tier-name here should be your cluster name. 

 

2. Make sure that in every node in the cluster, the <MACHINE_AGENT_HOME>/monitors/CassandraMonitor/config.yaml should emit the same metric path. To achieve this make the displayName to be empty string and remove the trailing "|" in the metricPrefix.  The config.yaml should be something as below

 

# List of cassandra servers
servers:
- host: "localhost"
port: 7199
username: ""
password: ""
displayName: "localhost"


# number of concurrent tasks
numberOfThreads: 10

#timeout for the thread
threadTimeout: 30

#prefix used to show up metrics in AppDynamics
metricPathPrefix: "Custom Metrics|Cassandra|"

#Metric Overrides. Change this if you want to transform the metric key or value or its properties.
metricOverrides:
- metricKey: ".*Ratio.*"
postfix: "Percent"
multiplier: 100
disabled: false
timeRollup: "AVERAGE"
clusterRollup: "COLLECTIVE"
aggregator: "SUM"


- metricKey: ".*Cache.*Rate.*"
postfix: "Percent"
multiplier: 100

To make it more clear,assume that Cassandra "Node A" and Cassandra "Node B" belong to the same cluster "ClusterAB". In order to achieve cluster level as well as node level metrics, you should do the following

 

1. Both Node A and Node B should have separate machine agents installed on them. Both the machine agent should have their own Cassandra extension.

2. In the Node A's and Node B's machine agents' controller-info.xml make sure that you have the tier name to be your cluster name , "ClusterAB" here. Also, nodeName in controller-info.xml is Node A and Node B resp.

3. The config.yaml for Node A and Node B should be

# List of cassandra servers
servers:
- host: "localhost"
port: 7199
username: ""
password: ""
displayName: "localhost"


# number of concurrent tasks
numberOfThreads: 10

#timeout for the thread
threadTimeout: 300000

#prefix used to show up metrics in AppDynamics
metricPathPrefix: "Custom Metrics|Cassandra|"

#Metric Overrides. Change this if you want to transform the metric key or value or its properties.
metricOverrides:
- metricKey: ".*Ratio.*"
postfix: "Percent"
multiplier: 100
disabled: false
timeRollup: "AVERAGE"
clusterRollup: "COLLECTIVE"
aggregator: "SUM"


- metricKey: ".*Cache.*Rate.*"
postfix: "Percent"
multiplier: 100
 

 

( Note :: Cassandra extension would report a lot of metrics. If you don't want to show some metrics in your dashboard use the excludePatterns in the config.yaml to filter them. Also, by default, a Machine agent can send a fixed number of metrics to the controller. To change this limit, please follow the instructions mentioned http://docs.appdynamics.com/display/PRO14S/Metrics+Limits.)

Now, if Node A and Node B are reporting say a metric called ReadLatency to the controller, with the above configuration they will be reporting it using the same metric path.

Node A reports Custom Metrics | ClusterAB | ReadLatency = 50
Node B reports Custom Metrics | ClusterAB | ReadLatency = 500

The controller will automatically average out the metrics at the cluster (tier) level as well. So you should be able to see the cluster level metrics under

Application Performance Management | Custom Metrics | ClusterAB | ReadLatency = 225

Also, now if you want to see individual node metrics you can view it under

Application Performance Management | Custom Metrics | ClusterAB | Individual Nodes | Node A | ReadLatency = 50
Application Performance Management | Custom Metrics | ClusterAB | Individual Nodes | Node B | ReadLatency = 500

 

Please note that for now the cluster level metrics are obtained by the averaging all the individual node level metrics in a cluster.

 

 Metrics

 

Cache

 

Metric NameDescription
Capacity In BytesCache capacity in bytes
HitsCache hit count
Hit RateCache hit rate
RequestsCache request count
SizeCache size in bytes

 

Client Request

 

Metric NameDescription
LatencyLatency statistics
Total LatencyTotal latency in micro seconds
TimeoutsTotal number of timeout requests. More precisely, total number of TimeoutException thrown
UnavailablesTotal number of unavailable requests. More precisely, total number of UnavailableException thrown

 

Column Family

 

Metric NameDescription
Bloom Filter Disk Space UsedDisk space used by bloom filter
Bloom Filter False PositivesNumber of false positives for bloom filter
Bloom Filter False RatioFalse positive ratio of bloom filter
Compression RatioCurrent compression ratio for all SSTables
Estimated Row Size HistogramHistogram of estimated row size (in bytes)
Estimated Column Count HistogramHistogram of estimated number of columns
Live Disk Space UsedDisk space used by 'live' SSTables
Live SS Table CountNumber of 'live' SSTables
Max Row SizeSize of the largest compacted row
Mean Row SizeMean size of compacted rows
Memtable Columns CountTotal number of columns present in memtable
Memtable Data SizeTotal amount of data stored in memtable, including column-related overhead
Memtable Switch CountNumber of times flushing has resulted in memtable being switched out
Min Row SizeSize of the smallest compacted row
Pending TasksEstimated number of tasks pending
Read LatencyRead latency statistics
Read Total LatencyTotal latency in micro seconds for reads
Recent Bloom Filter False PositivesNumber of false positives since last check
Recent Bloom Filter False RatioFalse positive ratio since last check
SSTables Per Read HistogramHistogram of the number of SSTables accessed per read
Total Disk Space UsedTotal disk space used by SSTables, including obsolete ones waiting to be GC'd
Write LatencyWrite latency statistics
Write Total LatencyTotal latency for writes, in microseconds

 

Commit Log

 

Metric NameDescription
Completed TasksApproximate number of completed tasks
Pending TasksApproximate number of pending tasks
Total Commit Log SizeCurrent data size of all commit log segments

 

Compaction

 

Metric NameDescription
Completed TasksEstimated number of completed compaction tasks.
Pending TasksEstimated number of pending compaction tasks.
Bytes CompactedNumber of bytes compacted since node started.
Total Compactions CompletedEstimated number of completed compaction tasks.

 

Connection

 

Metric NameDescription
Total TimeoutsTotal number of timeouts occurred for this node.

 

Dropped Message

 

Metric NameDescription
DroppedTotal number of dropped message for this verb.

 

Streaming

 

Metric NameDescription
ActiveOutboundStreamsCurrently active outbound streams.
TotalIncomingBytesTotal incoming bytes received since node started.
TotalOutgoingBytesTotal outgoing bytes sent since node started.

 

Storage

 

Metric NameDescription
LoadTotal disk space used (in bytes) for this node.

 

Thread Pool

 

Metric NameDescription
Active TasksApproximate number of tasks thread pool is actively executing.
Completed TasksApproximate total number of tasks thread pool has completed execution.
Currently Blocked TasksNumber of currently blocked tasks.
Pending TasksApproximate number of pending tasks thread pool has.
Total Blocked TasksTotal number of blocked tasks since node start up.

 

Contributing

 

Always feel free to fork and contribute any changes directly via GitHub.

 

Support

 

For any questions or feature request, please contact AppDynamics Center of Excellence.

 

Version:

1.6.0

Compatibility:

3.7+

Last Update:

06/09/2015

Cassandra Versions Tested On:

2.0.7

 

Release notes: 

1.2: Fixed to work with empty username & password fields

1.3 Added support for multiple Cassandra servers.

1.4 Fixed JMX connection leak

1.5.0 Revamped the entire extension.

1.5.1 Adding the fix to convert decimal metric values into rounded strings as controller does not support decimal metrics.

1.5.5 Added support for cluster level metrics. Averaging out the metrics

1.5.6 Fixed issue of converting decimals into whole number strings the right way.

1.5.9 Added metric overrides.

1.6.0 Added more logging statements.

2.0.0 Revamped the extension