Understanding Kubernetes Horizontal Pod Autoscaling



By using auto-scaling, your application's computing resources can be automatically increased or decreased depending on the amount of resources it needs at any given time. It came about thanks to advances in cloud computing, which fundamentally altered how computer resources are assigned and made it possible to build a fully scalable server on the cloud.


What is the HPA?

Horizontal Pod or HPA The autoscaling component of Kubernetes pods is called autoscaler. The benefits that HPA gives are as follows: inexpensive solution, automatic scaling can offer longer uptime and more availability in circumstances when traffic on production workloads are unpredictable. Automated scaling differs from having a fixed amount of pods in that it adjusts to actual usage patterns and hence avoids the possible disadvantage of having few or many pods for the traffic load. A static scale solution, for instance, can schedule certain pods to sleep at night if traffic is typically lower at that time. On the other side, it can better absorb unforeseen traffic spikes.


Requirements for HPA
Metrics Server

A scalable, effective source of container resource metrics for the built-in autoscaling pipelines in Kubernetes is Metrics Server. It gathers resource metrics from Kubelets and makes them available to the Horizontal Pod Autoscaler using the Kubernetes API server's Metrics API. Kubectl top has access to the metrics API as well, which makes troubleshooting autoscaling pipelines simpler.





Metrics Server is not intended to be used for non-autoscaling applications. Use it neither as a source of monitoring solution metrics nor to pass measurements to monitoring solutions, for instance.

Note: The HPA functionality can be used immediately because the metrics server is already installed by default in CaaSP 4.2.



Validate Metrics-Server installation

The kubectl top command, which gets the most recent metrics of the pods and nodes, will be accessible to use on the cluster after metrics server installation; if the command isn't working, check the installation of the metrics server.


kubectl top node

kubectl top pod


Horizontal Pod Auto-Scaler

Based on observed CPU and Memory consumption, or by using custom metrics, HPA is used to automatically scale the number of pods in a replication controller, deployment, replica set, stateful set, or a group of them. The horizontal pod's automatic scaling does not apply to items that cannot be scaled, such as DaemonSets.



The horizontal-pod-autoscaler-sync-period flag of the controller manager controls the period of the Horizontal Pod Autoscaler, which is implemented as a control loop (with a default value of 15 seconds). Based on the metrics listed in each HorizontalPodAutoscaler definition, the controller manager reviews resource usage during each period. The Resource Metrics API or the Custom Metrics API are the sources of metrics for the controller manager (for resource metrics per pod) (for all other metrics).




The HPA performs the following process to determine the appropriate replica count:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]



The HPA Manifest:


When the CPU average of all active pods for this application is equal to or more than 50%, we scale up the number of replicas on the deployment php-apache, and we scale down the number of replicas when the CPU Average is lower than 50%.

The number of replicas from the deployment, pod, and replicaset must be removed before using the HPA. because the HPA Controller determines the number of copies.



For scale using the kubectl:

kubectl autoscale deployment php-apache — cpu-percent=50 — min=1 — max=10


For verifying the HPA:

kubectl get hpa php-apache


For describing the HPA:

kubectl describe hpa php-apache


Understanding the complete flow


1. When asked, the metrics server delivers the aggregated metrics from the current pods to the Kubernetes API.

2. By default, the HPA controller checks every 15 seconds, and if the values are within the bounds of the HPA's rule, it alters the number of pods.

3. The Kubernetes scheduler will assign the pods to the nodes with available resources in the case of scale-up.

4. The HPA will reduce the number of replicates if the regulation is adjusted back.




The metrics-server has to be verified if the kubectl get hpa command returns a status “unknown” because the HPA controller cannot retrieve the measurements.

We need to confirm the Cluster-Autoscaler if the pods don't scale up and the kubectl describe pods report FailedScheduling nodes unavailable.



For businesses trying to maximize the performance of their containerized applications, Kubernetes Horizontal Pod Autoscaling is a crucial feature. By knowing its fundamental components and advantages, you can successfully manage your applications’ scaling, ensuring that they stay efficient, performant, and cost-effective.


In Apprecode we are always ready to consult you about implementing DevOps methodology. Please contact us for more information.

Read also

Helm: How to Reference Variables From values?

Helm is a package manager for Kubernetes that helps you manage and deploy complex applications. In Helm, variables are used to store values that can be used throughout your chart. These values can be customized for each deployment, allowing you to deploy your application in different environments with different configurations.

The Power of Kubernetes: 5 Reasons to Embrace Container Orchestration

Since it emerged on the scene in 2015, Kubernetes has seemed to be the subject of continual discussion in the business world: what it can accomplish, what the future holds for it, how it interfaces with technology X and tool Y, etc. But, it's uncommon to hear specific justifications for why businesses should even consider utilizing Kubernetes.