Kubernetes 1.35: In-Place Pod Resizing is Finally GA. Is it the End of Pod Restarts?

Published by Medium on AWS In Plain Engligh

The ability to change the size of a running Pod feels like a feature that should have existed from day one. However, in the world of Kubernetes, resources like CPU and memory were considered immutable for a very long time. If you got your capacity planning wrong, your only option was to destroy the workload and start over. With the release of Kubernetes 1.35, the platform finally embraces the idea that applications are dynamic and that our infrastructure should be able to adapt to them without forcing a reboot.

A Feature Years in the Making

This capability did not appear overnight. The community began working on In-Place Pod Resizing way back in version 1.27. It started as an Alpha feature, which means it was experimental and hidden behind special feature gates because it was risky. Changing the resource limits of a running process on Linux is technically complex. If the Kubelet tries to shrink memory too aggressively, the application might crash immediately. Over the last few years, the Kubernetes team has slowly refined how the Kubelet talks to the container runtime to ensure safety. Now, in version 1.35, the feature has finally reached General Availability (GA). This means it is considered stable, safe, and ready for critical production environments.

How It Works Under the Hood

To understand the magic, we need to look at what happens when you actually request a change. In the past, you would update the Pod specification directly, which confused the system into thinking you wanted a new Pod. In Kubernetes 1.35, the process is much more deliberate. You do not touch the main Pod spec; instead, you send a request to a special “subresource” called resize. This tells the Kubernetes API server that your intention is strictly to adjust resources, not to redeploy the application.

Once the API server accepts your request, the Kubelet (the agent running on the specific node where your Pod lives) takes over. The Kubelet acts like a strict traffic controller. First, it looks at the available capacity on the machine. It calculates if there is enough free CPU or memory to satisfy your new request. If the node is full, the Kubelet will not kill other Pods to make room. Instead, it places your request in a “ Pending ” state. Your Pod continues running with its old limits, patiently waiting in line until enough resources on that specific node become free.

If the node does have space, the Kubelet proceeds to the next step. It talks to the Container Runtime (like containerd or CRI-O) using the Container Runtime Interface. It instructs the runtime to update the Linux Control Groups (cgroups) for that specific process. Think of cgroups as invisible walls that define how much of the computer a program can use. The runtime simply moves these walls further out, instantly giving the application access to more CPU cycles or memory pages. If you have configured a resizePolicy that requires a restart, the Kubelet will reboot the container; otherwise, the change happens silently while the code keeps running.

The Java Paradox: CPU vs. Memory

Considering the Pod size it’s not enough, we have to look at how applications actually behave, and there is no better example than the Java Virtual Machine (JVM). Java applications have a unique personality. When they first start up, they are incredibly hungry for CPU power because of the JIT Compiler. Once they are up and running, they often settle down and use less CPU.In the past, this created a difficult choice. You either gave the Java app a huge amount of CPU just for the first five minutes (wasting money for the rest of the day), or you gave it a small amount of CPU and watched it take twenty minutes to start up. With In-Place Resizing, you get the best solution. You can launch the Pod with high CPU limits to ensure a fast startup. Once the application is ready, an autoscaler (like the VPA) can quietly reduce the CPU limit down to a maintenance level. The application stays online, the customers stay happy, and your cloud bill goes down.

However, memory is a completely different story, and this is where you must be careful. While Kubernetes can easily give a container more memory, the application inside might not notice. For example, when a Java app starts, it looks at the available memory and decides how big its Heap should be.

If you launch the JVM using default settings (without setting *-Xmx* or *-XX:MaxRAMPercentage*), the default Heap size, based on JVM heuristics, will be 25% of the available memory (the Pod's memory limit).

It usually makes this decision only once. If you double the container’s memory an hour later, the JVM will still stick to its original, smaller Heap size. It does not automatically expand just because the room got bigger. In these cases, you still need to restart the container so the application can recalculate its settings.

Java and other runtimes do not currently support on-the-fly memory resizing. There is an ongoing conversation with the Java community to make this possible.

While expanding memory requires care, shrinking it is even more complex. You cannot simply take away RAM that an application is currently using to store data. When you ask Kubernetes to lower a memory, the Kubelet performs a ‘best-effort’ safety check. It looks at how much memory the container is actually using at that exact moment. If your app is using 4GB and you try to lower the limit to 2GB, the system will not kill the container. Instead, the resize operation will hang. The status will remain stuck in ‘ InProgress ’ indefinitely, waiting for the application to voluntarily free up that space. Unlike CPU, which can be throttled instantly, memory cannot be reclaimed until the application decides to let it go.

This is why Kubernetes 1.35 allows you to set different policies for different resources, letting you resize CPU instantly while requiring a restart for memory changes.

Real-World Limitations

While this feature is a game-changer, it cannot break the laws of physics. The most obvious limitation is the physical capacity of the node. If your server is full and you ask to double the size of a Pod, the resize operation will simply stop and wait. Kubernetes cannot create hardware out of thin air, and it will not automatically move the Pod to a different node to make room for the new size. You have to wait for other workloads on that specific machine to finish and free up space.

There is also a strict rule regarding the Quality of Service (QoS) class that you must respect. Kubernetes classifies every Pod as Guaranteed, Burstable, or BestEffort based on its configuration, and this classification is permanent for the life of the Pod. For example, if you deploy a critical workload as ‘Guaranteed’ (where the resource request exactly equals the limit), you cannot change just one of those values later. If you try to increase the limit without also increasing the request, you would technically be changing the Pod’s class to ‘Burstable,’ which is forbidden. The API server will reject your command immediately to prevent you from accidentally breaking your own production guarantees.

There are also operating system limitations to consider. Currently, this smooth resizing capability relies heavily on how Linux manages processes specifically through cgroups v2. This means that if you are running Windows containers in your cluster, you are left out of the party for now. Windows nodes still require the traditional recreation of Pods to apply new resource specs.

The Future of Autoscaling

Ultimately, this update paves the way for a much smarter generation of autoscalers. We are moving away from the era where the Vertical Pod Autoscaler was difficult to use because it caused too many disruptions. We are entering a time where our clusters can breathe. They can expand during traffic spikes and shrink during quiet hours, all while the application logic continues to run uninterrupted. It has been a long road from Alpha to GA, but Kubernetes 1.35 has finally made the dream of elastic, restart-free scaling (almost) a reality.