How We Took Our Kubernetes Autoscaling from Basic to Advanced Mode with Istio Metrics

Elevating Kubernetes Autoscaling with Istio Metrics

Let’s face it: in the world of microservices, managing traffic and scaling workloads can feel like trying to catch a runaway train. You’re flying down the tracks at full speed, but if you’re not careful, things can get out of hand real quick. We’ve all been there — constantly battling the scale-up and scale-down conundrum, trying to keep the system efficient without wasting resources.

But guess what? It doesn’t have to be that way. We figured out a better way to approach scaling. Here’s how we went from standard Kubernetes Horizontal Pod Autoscaling (HPA) to a traffic-aware, Istio-powered scaling machine that adapts like a pro.

The Problem: Scaling Challenges at Massive Scale

To give you some context, we weren’t dealing with your average-sized Kubernetes clusters. We had massive environments running a ~100+ node each cluster, each packed with hundreds of deployments, and similar to this, we have multiple clusters. That’s a lot of moving parts! Managing resources and scaling operations on such a large scale wasn’t exactly smooth sailing.

With hundreds of microservices across so many nodes, our scaling decisions needed to be super-smart to keep the system efficient. The usual CPU and memory-based scaling just wasn’t cutting it. CPU usage could be low, but our services were still under heavy traffic. That meant under-provisioned pods in high-traffic situations and over-provisioned ones when the traffic dropped — neither of which was ideal.

We needed something better, something that could help us scale based on actual traffic patterns and keep resources optimized for the scale of operations we were running. That’s where Istio came in.

The Superpower of Istio: More Than Just a Service Mesh

Now, we know Istio’s reputation precedes it. But it’s not just about traffic routing. Its telemetry features are what make it an absolute game-changer for scaling workloads. Istio collects all sorts of useful data — from HTTP request rates to response times — without you having to do a thing. It’s like having a personal assistant keeping an eye on your app’s health 24/7.

And you know what? Instead of just sitting pretty in the background, we decided to harness this data for autoscaling. That’s right — instead of blindly scaling based on CPU, we used Istio to power scaling decisions based on real-time traffic.

The Solution: From A to Z — How We Built It

HPA2-1

This diagram illustrates the architecture behind autoscaling Kubernetes workloads using Istio metrics. By collecting telemetry data—such as HTTP request rates, status codes, and durations—from Envoy sidecars, Istio enables a more traffic-aware scaling approach. Prometheus stores and queries this data, while Kubernetes leverages it through the HPA to dynamically adjust workloads.

1. Istio + Prometheus: Better Together for Smarter Scaling

First things first: we had to make sure Istio was set up properly to collect metrics. You’ve got to have Prometheus running because that’s where Istio’s telemetry data gets stored. And no, it’s not as complicated as it sounds. A few Helm charts and some Istio magic later, and we were set up to start monitoring the traffic.

2. The Key Ingredient: The Metrics Adapter

Here’s where the real fun begins. If you want Istio’s traffic data to influence Kubernetes scaling, you need an adapter. To unlock this potential, we plugged in Zalando’s kube-metrics-adapter. 

Why Zalando’s Adapter:

Feature

Zalando Kube-Metrics-Adapter

Prometheus Adapter

Dynamic PromQL from annotations

Yes (in HPA annotations)

No — must be pre-configured in adapter config

Simple setup per HPA

Very easy (just add annotations)

Requires editing central config map

External Metrics support

Yes

Not supported by default

Helm Chart available

Yes

Yes

bash

helm install kube-metrics-adapter zalando/kube-metrics-adapter
--namespace kube-system --set
prometheus.url=http://prometheus.istio-system:9090

It’s like setting up a bridge between Istio’s monitoring and Kubernetes’ autoscaling brain. Once the adapter did its thing, we could use Istio’s metrics to scale workloads intelligently. And trust me, when it works, it feels like you’ve unlocked a new level in your Kubernetes game.

3. Crafting the Prometheus Query: Scaling by Real Traffic

Next, we had to create a Prometheus query that measures the real traffic each pod is handling. Think of this query as our scaling algorithm. It measures the request rate per pod, ignoring 404 errors — because who cares about errors when you’re trying to handle real traffic?

promql

sum(
   rate(
     istio_requests_total{
       destination_workload="dummy-application",
       destination_workload_namespace="test",
      response_code!="404"
     }[1m]
   )
 ) /
 count(
   count(
     container_memory_usage_bytes{
       namespace="test",
       pod_name=~"dummy-application.*"
     }
   ) by (pod_name)
 )

4. Custom HPA Configuration

With the metrics exposed, here’s a sample HorizontalPodAutoscaler YAML that uses the above query:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: dummy-application
  namespace: test
  annotations:
    metric-config.object.istio-requests-total.prometheus/per-replica: "true"
    metric-config.object.istio-requests-total.prometheus/query: |
      sum(
        rate(
          istio_requests_total{
            destination_workload="dummy-application",
            destination_workload_namespace="test"
          }[1m]
        )
      ) /
      count(
        count(
          container_memory_usage_bytes{
            namespace="test",
            pod_name=~"dummy-application.*"
          }
        ) by (pod_name)
      )
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: dummy-application
  metrics:
    - type: Object
      object:
        metricName: istio-requests-total
        target:
          apiVersion: v1
          kind: Pod
          name: podinfo
        targetValue: 10

Now, you might think: “That’s just a bunch of numbers.” But this is where the magic happens. This query ensures that we’re scaling based on traffic intensity, not just idle CPU cycles.

The Result: What Happened When We Hit the “Go” Button?

  1. Cost Reduction: Before, we used to over-provision, running extra pods just to be safe. But with traffic-based autoscaling, we only scale when needed. This optimized resource usage and cut down our cloud bill significantly. Efficiency FTW.

  2. Smarter Scaling: Rather than relying on CPU load (which can be a poor proxy for load), we scaled pods based on real demand. If traffic spiked, the service scaled up. If traffic slowed, the pods scaled down. Like a well-tuned engine.

  3. Improved User Experience: Faster response times and more resilient services. Users are getting quicker responses because our autoscaling ensures there are always enough resources to handle requests, even during traffic spikes.

  4. Operational Efficiency: By automating scaling decisions, we freed up time for our developers. They no longer had to worry about tuning resources manually for every deployment. Kubernetes, with Istio, took over.

Lessons Learned (and Why You Should Care)

  1. Don’t underestimate Istio’s telemetry: It’s not just for routing traffic — use it for scaling decisions, and you’ll unlock a whole new dimension of control.

  2. Metrics matter: Just looking at CPU usage won’t cut it. You need a more granular approach, and Istio provides that.

  3. Customization is key: With Kubernetes and Istio, you can build exactly what you need, rather than settling for default scaling behavior.

  4. Set it, forget it: Once it’s set up, the system runs autonomously. It’s like having a personal assistant who scales your infrastructure while you sip coffee.

Conclusion: The Road Ahead

If you’re still relying on basic HPA scaling (CPU + memory), it’s time to level up. Istio brings rich traffic metrics that allow you to scale based on what truly matters: real user traffic. And the best part? It’s all automated. Kubernetes and Istio do the heavy lifting so you can focus on growing your application.

Are you ready to make traffic-driven autoscaling your new secret weapon? Let’s face it — the future of scaling isn’t just about resources; it’s about real-time user demand. And with Istio and Kubernetes, we’ve built a system that’s ready for anything.