Making Efficiency a Platform Capability

Published by VMBlog

In most companies, there is a constant battle happening between three distinct forces.

On one side, you have the FinOps team. They are looking at the cloud bill and demanding we cut costs. On the other side, you have the Product teams. They want features delivered fast and the application to be lightning-quick for users. And then there is the SRE team. Their main goal is to make sure the site never goes down, which often means they are afraid of changing things too quickly.

The problem is that the person caught in the middle of this three-way fight is the Developer.

We expect developers to write quality business code, but also to satisfy the cost team (don't waste CPU!), the performance team (make it fast!), and the reliability team (don't break production!). We try to solve this by creating "Best Practices" blueprints including long checklists, and then meetings and spreadsheets. But this approach fails. It creates overhead and "cognitive load", too much information for one person to handle. The teams end up working in silos, fighting over priorities, and often, nobody is happy.

To fix this organizational mess, we need to stop drowning humans, rather empower them. We need to stop relying on manual checklists and start building a platform where balancing cost, performance, and reliability is not manual, but an invisible capability of the platform itself, working for the humans.

We can do this by automating the feedback loop at two specific points: before the code ships, and after it goes live.

From Static Validation to Automated Tuning

The industry has done a great job of "shifting left" on security and unit testing. We wouldn't dream of merging code that fails a syntax check. Yet, we routinely merge code that completely changes the performance profile of our application without adjusting the infrastructure that runs it.

When a developer changes the code, it creates a ripple effect. A new library might change how the application behaves, perhaps increasing the pressure on the Heap or changing the garbage collection pattern. If we don't adjust the container limits to match, we risk crashes. If we don't update the Horizontal Pod Autoscaler (HPA) triggers, the application might scale too slowly under load. And if the underlying Node Instance type is computed-optimized while the new code is memory-hungry, we are simply burning money.

Imagine a platform capability that acts as a smart optimization engine for the entire stack. When a developer opens a Pull Request, the platform uses Argo Workflows to orchestrate a series of rapid experiments. Driving this process is a Reinforcement Learning engine. It analyzes the results of each iteration, how the code, the runtime, and the infrastructure interacted. If the first test showed low latency but high memory waste, the engine learns from this and intelligently selects the next set of parameters to test, perhaps adjusting the Heap or the HPA trigger.

It simulates reality, converging on an optimal configuration before the code ever reaches production. By validating this interplay automatically, the pipeline stops being a simple "Pass/Fail" gate. It becomes a tuning engine that ensures the code, the runtime, and the infrastructure are working in harmony.

This way, the developer has the confidence that they are shipping a production-ready artifact, not just lines of code, eliminating the anxiety of deployment day.

 

Closing the Loop with Intelligent Analysis

While tuning ahead of the release is crucial for establishing a solid baseline, performance simulations are never a perfect mirror of reality. The most valuable data we have is the actual behavior of our applications under real user traffic.

Most organizations treat production metrics as a fire alarm: something to look at only when things break. But a mature platform uses these metrics as a constant feedback loop to optimize the system. This goes far beyond standard autoscaling. We aren't just trying to replicate a Vertical Pod Autoscaler (VPA) that reacts blindly to CPU spikes.

We are looking to build an intelligent analysis engine that understands the context of a release. When we deploy a new version, perhaps using a Canary rollout strategy via Argo Rollouts, we should capture performance data for that specific "fresh" state over a defined period, say, one week after release.

By analyzing this specific time window, we get a clear picture of how the new code behaves under real user traffic compared to the previous version and the platform can then generate a recommendation. If the new version is consistently using only 10% of its requested CPU during its first week, the system shouldn't just log it but, it should generate a tangible artifact, like a Pull Request (like having a Dependabot but for efficiency), suggesting a "right-sized" configuration for the next deployment.

Blindly considering only Kubernetes is where many platforms fall short: they focus entirely on Kubernetes resources (CPU/Memory requests and limits) and ignore what is happening inside the container.

You cannot effectively tune a container's limit if the application runtime is misconfigured. For example, if you lower the memory limit of a Java container but leave the JVM Heap size too high, the application will crash (OOMKill). If you squeeze the CPU limit of a Node.js application without understanding its event loop behavior, you introduce latency.

Our feedback loop must correlate container metrics with runtime metrics. The recommendation shouldn't just be "lower your memory limit" but it should be: "your JVM heap usage is stable at 512MB, but you requested 4GB. We recommend lowering the container limit to 1GB and adjusting the -Xmx flag to 512MB".

When we optimize at the workload level, we trigger a positive ripple effect throughout the entire cluster.

If we accurately tune the requests and limits based on this analysis, we provide the HPA with cleaner signals: if the requests are wrong, the HPA scales too early (waste) or too late (downtime).

Furthermore, accurate workload sizing directly impacts the density of our clusters. When pods are bloated with unnecessary resource requests, the scheduler cannot fit them efficiently, forcing us to spin up more nodes than we actually need. Optimized pods mean higher density, fewer nodes, and significantly lower cloud bills. 

Forecasting and Capacity Planning

Once we have this deep, historical data on workload behavior, we can zoom out from the single microservice to the entire infrastructure.

By analyzing usage trends over longer periods, months or a full year, we can move from reactive tuning to proactive forecasting identifying seasonal patterns and predict when specific clusters will face pressure allowing us to make smarter decisions about the underlying hardware.

For example, by looking at the aggregate "allocatable vs. allocated" capacity ratio over time, we might realize that our current node instance types are inefficient. We might be paying for high-memory instances when our aggregate workload is actually CPU-bound.

We stop guessing what hardware we need and start letting the data dictate the infrastructure supply.

Why Efficiency As A Platform Capability?

ThoughtWorks has introduced the concept of "Fitness Functions" as automated checks that ensure the system satisfies architectural goals (like performance, security, or stability) as it evolves.

I believe we can all agree on that. The big question is how to get there.

To build a platform that ensures performance, security, and stability, we need to build a platform that improves itself. Performance and reliability need to become capabilities provided by the platform, autonomous, continuous, and invisible to the human, albeit at their service.

By implementing these two entry points, rigid performance gates in the pipeline and intelligent, runtime-aware feedback loops in production, we remove the cognitive load from our developers and solve the delicate balance between costs, performance and reliability. We don't ask them to be experts in Kubernetes resource management or capacity planning. We simply provide a "paved path that handles the heavy lifting", as Dianne Marsh (Director of Engineering @ Netflix) defines platforms.

If we accomplish this, the tension between velocity and excellence disappears. Speed becomes safe, and efficiency becomes automatic. That is the true promise of a modern platform.