Cloud auto-scaling decisions are often made based on infrastructure metrics such as CPU Utilization. However, in a cloud or virtualized environment, infrastructure metrics may not be reliable enough for making auto-scaling decisions. Auto-scaling decisions based on application metrics, such as request-queue depth or requests per minute, are much more useful since the application is intimately familiar with conditions such as:
When the existing number of compute instances cannot handle the incoming arrival rate of traffic and must elastically scale up additional instances based on a high-watermark threshold on a given application metric
When it’s time to scale back down based on a low-watermark threshold on the same application metric.