The State of Java on Kubernetes 2026: Why Defaults are Killing Your Performance

Published by Akamas

Java is the backbone of enterprise software, and Kubernetes is the standard for deployment. You would assume that by 2026, these two technologies would work perfectly together out of the box.

The data confirms Java’s dominance: according to the Azul State of Java Survey 2025, 68% of enterprises report that the majority of their applications run on the JVM. Furthermore, New Relic’s 2024 State of the Java Ecosystem found that adoption of container-friendly Java versions like Java 17 and 21 has surged by nearly 300%, proving that Java is being aggressively modernized for the cloud.

Unfortunately, while adoption is high, performance and efficiency are low.

In our recent webinar, Stefano Doni (CTO at Akamas) and Bruno Borges (Principal Program Manager at Microsoft and Java Champion) analyzed the state of the industry. After reviewing data from thousands of JVMs running in production environments globally, the verdict was clear: the majority of Java workloads on Kubernetes are running with “default” settings that are actively hurting performance and wasting money.

Here is a look at what the data says, why “safe” defaults are dangerous in the cloud, and how you can fix it.

How Are We Configuring JVMs?

The premise of modern DevOps is automation and efficiency. However, the reality is that most organizations are “lifting and shifting” their Java applications into containers without adjusting the Java Virtual Machine (JVM) configuration.

This isn’t a new problem, but it is getting worse. During the session, Bruno Borges referenced historical data from New Relic, highlighting that for years, over 65% of workloads lacked explicit Garbage Collection (GC) tuning.

But that was about 5 years ago. How is the situation now? Stefano Doni then presented Akamas’s 2025 research, analyzing thousands of Java workloads being optimized in our platform. The data confirms that despite the maturity of Kubernetes and Java, these bad habits persist today.

Bar chart showing distribution of explicitly configured garbage collectors in Java workloads on Kubernetes

Bar chart showing distribution of explicitly configured garbage collectors in Java workloads on Kubernetes

According to our data:

  • 60% of JVMs have an unset Garbage Collector. This means the JVM picks a default algorithm that might not be optimized for your specific container size or latency requirements.
  • The majority of Heap configurations are unset. Most developers rely on the JVM’s automatic memory management.
  • Most containers are too small. A significant number of JVMs run with less than 1 CPU or less than 1 GiB of RAM, which is a major bottleneck for Java’s multi-threaded architecture. Bar chart showing distribution of CPU core allocations in Java Kubernetes pods

Bar chart showing distribution of CPU core allocations in Java Kubernetes pods

Bar chart showing JVM max heap size distribution in Kubernetes environments

Bar chart showing JVM max heap size distribution in Kubernetes environments

Why is this happening? As Bruno Borges highlighted, many developers assume “the JVM will tune itself”. While the JVM ergonomics are smart, they were originally designed for large, shared physical servers, not for the constrained resources of isolated Kubernetes pods.

Ergonomics is the process by which the Java Virtual Machine (JVM) and garbage collection tuning, such as behavior-based tuning, improve application performance. The JVM provides platform-dependent default selections for the garbage collector, heap size, and runtime compiler. These selections match the needs of different types of applications while requiring less command-line tuning. In addition, behavior-based tuning dynamically tunes the sizes of the heap to meet a specified behavior of the application. Oracle Java Documentation

The reality is that default JVM heuristics are far from optimal and can have an invisible but significant impact on your workloads.

Wasting Cloud Resources

The most common mistake is leaving the Heap Size unset (no -Xmx or -XX:MaxRAMPercentage).

Modern JVMs (Java 10+) are “container-aware”, meaning they attempt to self-configure based on the resources assigned to the container. However, this default behavior is incredibly conservative. For typical production containers (specifically those allocated above 512MB of memory), the JVM defaults to utilizing only 25% of the container memory limit for the Heap.

Note: For very small containers (<256MB), the JVM acts differently, taking up to 50%, but most enterprise workloads fall into the larger category where the 25% rule applies.

This default creates a paradoxical situation regarding resource efficiency. Imagine you configure a pod with 2GB of memory requests and limits. You are reserving 2GB of cluster capacity for that container. However, because you relied on defaults, the JVM sees that limit but decides to allocate only 512MB to your application’s heap, the remaining 1.5GB is technically available for non-heap memory but a typical microservice rarely requires 1.5GB of overhead. You are effectively leaving almost 75% of your available memory “on the table” paying for capacity your application is strictly forbidden from touching.

Beyond the waste, this configuration directly undermines performance and stability. Because the heap is artificially restricted to a small fraction of the available memory, your application fills it up much faster than necessary. This triggers frequent and aggressive Garbage Collection cycles to reclaim space, leading to CPU spikes and potential latency issues, even though the container itself has plenty of free memory available.

What happens when you leave your GC Unset?

The selection of a Garbage Collector (GC) is the single most impactful factor on application latency and throughput.The research data indicates that nearly 60% of Java workloads leave the Garbage Collector type unset. This forces the JVM to rely on its ergonomics to choose an algorithm for you.

By default, modern Java versions prefer the G1GC collector, designed for high throughput and low latency. However, the JVM only enables G1GC if it detects enough resources (at least 2 CPUs and 1791MB of RAM).

JVM ergonomics in Kubernetes: automatic garbage collector selection based on memory and CPU thresholds

JVM ergonomics in Kubernetes: automatic garbage collector selection based on memory and CPU thresholds

If you deploy a microservice with a standard “cost-saving” configuration of 1 CPU or less, the JVM silently downgrades the Garbage Collector to SerialGC. Unlike G1GC, which runs concurrently, SerialGC performs memory cleanup using a single thread, freezing your application entirely for the duration of the collection.

When a container is just large enough to select G1GC (perhaps with exactly two CPUs) you hit a frustrating technical paradox. Because G1GC is designed for parallelism, it spins up a suite of worker threads to clean memory; however, in such a constrained environment, these threads end up crowding the very few cores available. Instead of streamlining the process, the CPU wastes valuable cycles managing the constant context switching between your application’s logic and the swarm of background GC threads.

This convenience of “automatic” selection is actually a double-edged sword: it often chooses a GC that isn’t optimal for your specific performance or footprint goals, and it does so silently. Because the JVM shifts its entire memory management strategy based on an apparently unrelated setting like the CPU limit, it can introduce performance penalties that are incredibly difficult to diagnose. Then, relying on these defaults means ignoring the delicate dance between threads and cores, leaving you at risk of either bottlenecking a node with a single-threaded bottleneck or choking a small node with multi-threaded overhead.

The High Cost of “Micro-Containers”

One of the most pervasive trends is the deployment of JVMs with less than 1 CPU. While this might look efficient on a spreadsheet, and is often chosen to reduce the “blast radius” by spreading instances, it creates a fundamental architectural conflict for Java. This strategy forces a difficult trade-off between resource efficiency (cost) and the actual stability of the application.

It is crucial to understand that a Kubernetes CPU limit is not a “speed limit” that slows down the processor; it is a “time quota” enforced by the Linux kernel’s Completely Fair Scheduler (CFS). Typically, the kernel divides time into 100ms windows. If you assign a container a limit of 500m (0.5 CPU), you are granting it exactly 50ms of runtime every 100ms. But your JVM can use all of the CPUs of your K8s node. Once that quota is exhausted, the kernel drastically throttles the container: it simply freezes all threads until the next window begins.

This mechanism is disastrous for Java due to its multi-threaded nature, it is not just running your code: it is constantly running background threads for the Just-In-Time (JIT) compiler and the Garbage Collector.

When your application starts or encounters a new “hot” code path, the JIT compiler kicks in to transform bytecode into optimized machine code. This process is inherently CPU-intensive. In a micro-container, a JIT burst can easily consume the entire 50ms quota in the blink of an eye. The result is that the Linux kernel forcibly pauses your application for the remaining 50ms of the window. To your users, this looks like application latency, but it is actually an OS-enforced sleep where no code is executing.

Enforcing this issue is the sheer density of threads within the JVM. Even a simple application spawns dozens of threads: Garbage Collection threads, Compiler threads and your application threads. When you constrain a Java pod to less than 1 CPU, all these threads are forced to fight for those microscopic time slices. The background maintenance tasks, which are supposed to be invisible helpers, become active obstructors of your business logic.

This contention damages performance regardless of which Garbage Collector you use. For modern concurrent collectors (like G1GC, ZGC and Shenandoah) that attempt to clean memory in the background (spawning multiple worker threads named ParallelGCThreads and ConcGCThreads to mark and evacuate objects), tight CPU limits force the Kubernetes scheduler to throttle threads, slowing down user requests to allow the GC to proceed. Furthermore, when the inevitable “ Stop-The-World ” pause occurs, a phase required by all collectors, the lack of available CPU cycles means the cleanup work takes significantly longer to complete.

This explains the findings from the benchmarks discussed by Bruno Borges. When comparing six small replicas (1 CPU each) against two larger replicas (3 CPUs each), the results revealed a clear performance gap. Even with the exact same total CPU capacity, the two larger replicas delivered significantly better throughput and lower tail latency because by giving the JVM “breathing room” (larger CPU limits), you reduce the statistical probability of hitting these CPU throttling walls and ensure that background threads can execute on separate cores without freezing the request processing threads.

Latency and throughput benchmark comparing different Kubernetes container configurations for Java workloads

Latency and throughput benchmark comparing different Kubernetes container configurations for Java workloads

The Ecosystem Is Reacting

The industry is aware of these challenges, and during the webinar, Bruno Borges shared several exciting initiatives aimed at mitigating them.

Tools like the Azure Command Launcher for Java (“jaz”) are being developed to automate the configuration of JVM flags based on the environment, effectively “tuning” the startup command for you. Furthermore, projects within the OpenJDK ecosystem, such as Project Leyden and CRaC (Coordinated Restore at Checkpoint), are making strides to drastically reduce startup times and memory footprint through static images and snapshotting.

These advancements will help developers start on the right foot. However, they do not eliminate the fundamental need to manage the complex performance and efficiency trade-offs when running Java applications in Kubernetes. Even with a perfectly tuned startup command or a faster boot time, once your application is running under load, the tension between memory, CPU, and latency remains.

Stop Guessing, Start Measuring

The days of manual tuning are over. Even well-intentioned attempts to “fix” these issues manually often backfire.

Real efficiency requires finding a delicate balance between Throughput, Latency, and Footprint. Improving one often comes at the expense of the others.

  • Maximize Throughput? You might need a larger heap, increasing your Footprint.
  • Minimize Footprint? You risk squeezing the JVM, destroying your Latency.

To solve this puzzle, you cannot look at the JVM in isolation. You must align the JVM Configurations with the Kubernetes Container resources and HPA scaling policies simultaneously. It is a trade-off between wasting money on over-provisioning or risking reliability issues with under-provisioning. And it has to be done for all your Java workloads, in a continuous fashion, as traffic changes and new releases are deployed to production.

You don’t need to solve this multi-variable equation manually. You can use Akamas Insights to run a comprehensive health check of your Java applications on your Kubernetes clusters. By connecting to your existing observability tool (like Prometheus, DataDog and Dynatrace), Akamas Insights analyzes the full picture, JVM metrics, container limits and actual usage, to tell you exactly where you are sacrificing reliability for cost, or vice versa.

Stop flying blind: Get your free Application Health Score with Akamas Insights now