Couchbase Server is an open source, distributed (shared-nothing architecture) NoSQL document-oriented database that is optimized for interactive applications. This extension allows the user to connect to a specific cluster host and retrieve metrics about the cluster, all the nodes within the cluster and any buckets associated with the nodes.
Configure the CouchBase monitoring extension by editing the config.yml file in
Configure the "tier" under which the metrics need to be reported. This can be done by changing the value of
<TIER NAME OR TIER ID> in
metricPrefix: "Server|Component:`<TIER NAME OR TIER ID>|Custom Metrics|CouchBase"
metricPrefix: "Server|Component:Extensions tier|Custom Metrics|CouchBase"
Configure the CouchBase cluster by specifying the name(required), host(required), port(required), queryPort(required) of any node(server) in the CouchBase cluster, username(only if authentication enabled), password(only if authentication enabled), passwordEncrypted(only if password encryption required).
servers: - name: "Cluster1" host: "localhost" port: "8091" queryPort: "8093" username: "Administrator" password: "password1" passwordEncrypted: ""
Configure the encyptionKey for encryptionPasswords(only if password encryption required). See next section for encrypting password.
#Encryption key for Encrypted password. encryptionKey: "axcdde43535hdhdgfiniyy576"
Configure the metrics section.
For configuring the metrics, the following properties can be used:
|Property||Default value||Possible values||Description|
|alias||metric name||Any string||The substitute name to be used in the metric browser instead of metric name.|
|aggregationType||"AVERAGE"||"AVERAGE", "SUM", "OBSERVATION"||Aggregation qualifier|
|timeRollUpType||"AVERAGE"||"AVERAGE", "SUM", "CURRENT"||Time roll-up qualifier|
|clusterRollUpType||"INDIVIDUAL"||"INDIVIDUAL", "COLLECTIVE"||Cluster roll-up qualifier|
|multiplier||1||Any number||Value with which the metric needs to be multiplied.|
|convert||null||Any key value map||Set of key value pairs that indicates the value to which the metrics need to be transformed. eg: UP:0, DOWN:1|
|delta||false||true, false||If enabled, gives the delta values of metrics instead of actual values.|
- bandwidth_usage: #Bandwidth used during replication, measured in bytes per second. alias: "bandwidthUsed" multiplier: 1 aggregationType: "SUM" timeRollUpType: "CURRENT" clusterRollUpType: "INDIVIDUAL" delta: true - status: alias: "status" convert: "healthy" : 1 "warmup" : 2
**All these metric properties are optional, and the default value shown in the table is applied to the metric(if a property has not been specified) by default.**
There are six categories of metrics i.e cluster, node , bucket, query, xdcr, index. To disable any of these sections, change the include parameter under the section to "false" as follows:
index: include: "false" stats: - memorySnapshotInterval: alias: "memorySnapshotInterval" - stableSnapshotInterval: alias: "stableSnapshotInterval" - maxRollbackPoints: alias: "maxRollbackPoints"
To avoid setting the clear text password in the config.yml, please follow the steps below to encrypt the password and set the encrypted password and the key in the config.yml:
java -cp "appd-exts-commons-2.0.0.jar" com.appdynamics.extensions.crypto.Encryptor myKey myPasswordwhere "myKey" is any random key, "myPassword" is the actual password that needs to be encrypted
It is recommended that a single CouchBase monitoring extension be used to monitor a single CouchBase cluster.
cluster metrics ram: - total: #Total ram available to cluster (bytes) alias: "total" - quotaTotal: #Ram quota total for the cluster (bytes) alias: "quotaTotal" - quotaUsed: #Ram quota used by the cluster (bytes) alias: "quotaUsed" - used: #Ram used by the cluster (bytes) alias: "used" - usedByData: #Ram used by the data in the cluster (bytes) alias: "usedByData" - quotaUsedPerNode: #Ram quota used per node in the cluster (bytes) alias: "quotaUsedPerNode" - quotaTotalPerNode: #Ram quota total per node in the cluster (bytes) alias: "quotaTotalPerNode" hdd: - total: #Total harddrive space available to cluster (bytes) alias: "total" - quotaTotal: #Harddrive quota total for the cluster (bytes) alias: "quotaTotal" - used: #Harddrive space used by the cluster (bytes) alias: "used" - usedByData: #Harddrive use by the data in the cluster(bytes) alias: "usedByData" - free: #Free harddrive space in the cluster (bytes) alias: "free" counters: - rebalance_success: alias: "rebalance_success" - rebalance_start: alias: "rebalance_start" others: - rebalanceStatus: #Rebalancing status of the cluster alias: "rebalanceStatus" convert: "none" : 0
nodes metrics systemStats: # - cpu_utilization_rate: #The CPU utilization rate (%) alias: "cpu_utilization_rate" - swap_total: #Total swap size allocated (bytes) alias: "swap_total" - swap_used: #Amount of swap space used(bytes) alias: "swap_used" - mem_total: #Total memory available to the node (bytes) alias: "mem_total" - mem_free: #Amount of memory free for the node (bytes) alias: "mem_free" interestingStats: - cmd_get: #Number of get commands alias: "cmd_get" - couch_docs_actual_disk_size: #Amount of disk space used by Couch docs(bytes) alias: "couch_docs_actual_disk_size" - couch_docs_data_size: #Data size of couch documents associated with a node (bytes) alias: "couch_docs_data_size" - couch_spatial_data_size: #Size of object data for Couch spatial views (bytes) alias: "couch_spatial_data_size" - couch_spatial_disk_size: #Amount of disk space occupied by Couch spatial views (bytes) alias: "couch_spatial_disk_size" - couch_views_actual_disk_size: #Amount of disk space occupied by Couch views (bytes) alias: "couch_views_actual_disk_size" - couch_views_data_size: #Size of object data for Couch views (bytes) alias: "couch_views_data_size" - curr_items: #Number of current items alias: "curr_items" - curr_items_tot: #Total number of items associated with node alias: "curr_items_tot" - ep_bg_fetched: #Number of disk fetches performed since server was started alias: "ep_bg_fetched" - get_hits: #Number of get hits alias: "get_hits" - mem_used: #Memory used by the node (bytes) alias: "mem_used" - ops: #Number of operations performed on Couchbase alias: "ops" - vb_replica_curr_items: #Number of items/documents that are replicas alias: "vb_replica_curr_items" otherStats: - clusterMembership: #Current node status with respect to membership in the cluster alias: "clusterMembership" convert: "active" : 1 "inactiveAdded" : 2 "inactiveFailed" : 3 - status: #Node status alias: "status" convert: "healthy" : 1 "warmup" : 2
buckets metrics quota: - ram: #Amount of RAM used by the bucket (bytes) alias: "ram" - rawRAM: #Amount of raw RAM used by the bucket (bytes) alias: "rawRam" basicStats: - quotaPercentUsed: #Percentage of RAM used (for active objects) against the configure bucket size(%) alias: "quotaPercentUsed" - opsPerSec: #Number of operations per second alias: "opsPerSec" - diskFetches: #Number of disk fetches alias: "diskFetches" - itemCount: #Number of items associated with the bucket alias: "itemCount" - diskUsed: #Amount of disk used (bytes) alias: "diskUsed" - dataUsed: #Size of user data within buckets of the specified state that are resident in RAM(%) alias: "dataUsed" - memUsed: #Amount of memory used by the bucket (bytes) alias: "memUsed" otherStats: - couch_total_disk_size: - couch_docs_fragmentation: - couch_views_fragmentation: - hit_ratio: - ep_cache_miss_rate: - ep_resident_items_rate: - vb_avg_active_queue_age: - vb_avg_replica_queue_age: - vb_avg_pending_queue_age: - vb_avg_total_queue_age: - vb_active_resident_items_ratio: - vb_replica_resident_items_ratio: - vb_pending_resident_items_ratio: - avg_disk_update_time: #Average time required to update data on disk - avg_disk_commit_time: #Average time required to commit data on disk - avg_bg_wait_time: #The average background fetch time in microseconds - avg_active_timestamp_drift: - avg_replica_timestamp_drift: - bg_wait_count: - bg_wait_total: - bytes_read: - bytes_written: - cas_badval: #No. of CAS operations per second using an incorrect CAS ID for data that this bucket contains. - cas_hits: #No. of CAS operations per second for data that this bucket contains - cas_misses: #No. of CAS operations per second for data that this bucket contains - cmd_get: - cmd_set: - curr_connections: - curr_items: - curr_items_tot: - decr_hits: #No of decrement operations per sec for data that this bucket contains - decr_misses: #No of decrement operations per sec for data that this bucket does not contain - delete_hits: #No of delete operations per sec for data that this bucket contains - delete_misses: #No of delete operations per sec for data that this bucket does not contain - evictions: - get_hits: #No of get operations per sec for data that this bucket contains - get_misses: #No of get operations per sec for data that this bucket does not contain - incr_hits: #No of increment operations per sec for data that this bucket contains - incr_misses: #No of increment operations per sec for data that this bucket does not contain - misses: - xdc_ops: #No of XDCR related operations per second for this bucket - cpu_idle_ms: - cpu_local_ms: - cpu_utilization_rate: - hibernated_requests: - hibernated_waked: - mem_actual_free: - mem_actual_used: - mem_free: - mem_total: - mem_used_sys: - rest_requests: - swap_total: - swap_used:
query metrics systemVitals: - request.completed.count: #Number of requests completed alias: "request.completed.count" - request.active.count: #Number of active requests alias: "request.active.count" - request.per.sec.1min: #query throughput 1 minute alias: "request.per.sec.1min" - request.per.sec.5min: #query throughput 5 minutes alias: "request.per.sec.5min" - request.per.sec.15min: #query throughput 15 minutes alias: "request.per.sec.15min" - request_time.mean: #Mean time to complete a request alias: "request_time.mean(ms)" - request_time.median: #Median time to complete a request alias: "request_time.median(ms)" - request_time.80percentile: #80th percentile query response time alias: "request_time.80percentile(ms)" - request_time.95percentile: #95th percentile query response time alias: "request_time.95percentile(ms)" - request_time.99percentile: #99th percentile query response time alias: "request_time.99percentile(ms)" - request.prepared.percent: #percentage of prepared requests alias: "request.prepared.percent"
xdcr metrics stats: - bandwidth_usage: #Bandwidth used during replication, measured in bytes per second. alias: "bandwidth_usage" - changes_left: #Number of updates still pending replication. alias: "changes_left" - data_replicated: #Size of data replicated in bytes. alias: "data_replicated" - docs_checked: #Number of documents checked for changes. alias: "docs_checked" - docs_failed_cr_source: #Number of documents that have failed conflict resolution on the source cluster and not replicated to target cluster. alias: "docs_failed_cr_source" - docs_filtered: #Number of documents that have been filtered out and not replicated to target cluster. alias: "docs_filtered" - docs_latency_wt: #Weighted average latency for sending replicated changes to destination cluster. alias: "docs_latency_wt" - docs_opt_repd: #Number of docs sent optimistically. alias: "docs_opt_repd" - docs_received_from_dcp: #Number of documents received from DCP. alias: "docs_received_from_dcp" - docs_rep_queue: #Number of documents in replication queue. alias: "docs_rep_queue" - docs_written: #Number of documents written to the destination cluster via xdcr. alias: "docs_written" - meta_latency_wt: #Weighted average time for requesting document metadata. xdcr uses this for conflict resolution prior to sending the document into the replication queue. alias: "meta_latency_wt" - num_checkpoints: #Number of checkpoints issued in replication queue. alias: "num_checkpoints" - num_failedckpts: #Number of checkpoints failed during replication. alias: "num_failedckpts" - rate_received_from_dcp: #Number of documents received from DCP per second. alias: "rate_received_from_dcp" - rate_replication: #Rate of documents being replicated, measured in documents per second. alias: "rate_replication" - size_rep_queue: #Size of replication queue in bytes. alias: "size_rep_queue" - time_committing: #Seconds elapsed during replication. alias: "time_committing"
index metrics stats: - memorySnapshotInterval: #Represents how often the indexer creates an in-memory snapshot for querying alias: "memorySnapshotInterval" - stableSnapshotInterval: #Represents how often the indexer creates a persistent snapshot of recovery alias: "stableSnapshotInterval" - maxRollbackPoints: #Maximum number of rollback points alias: "maxRollbackPoints"
Workbench is a feature by which you can preview the metrics before registering it with the controller. This is useful if you want to fine-tune the configurations. Workbench is embedded into the extension jar. To use the workbench, follow all the steps in installation and configuration.
Start the workbench with the following command if you are in
java -jar /monitors/CouchBaseMonitor/couchbase-monitoring-extension.jarThis starts an HTTP server at http://localhost:9090/. This can be accessed from the browser.
If the server is not accessible from outside/browser, you can use the following endpoints to see the list of registered metrics and errors.
Get the stats curl http://localhost:9090/api/stats Get the registered metrics curl http://localhost:9090/api/metric-paths
CouchBase Version Tested On:
2.0.0 - Revamped the extension to support new extensions framework(2.0.0), Added 3 different categories of metrics(query, xdcr and index), Added extra metrics in cluster, node and bucket categories.
Please follow the steps specified in the Troubleshooting document to debug problems faced while using the extension.
Always feel free to fork and contribute any changes directly via GitHub.
For any questions or feature request, please contact AppDynamics Support.