#1 - From WASM to GenAI: Platforms, DevOps, and Observabilty at scale

Hey there, welcome to the first issue of #CloudNativeFriday newsletter! ✌🏻

In this edition, we’re diving into the evolving world of Platform Engineering and the innovative technologies reshaping how we build and manage applications. From the game-changing potential of WebAssembly (Wasm) in enhancing portability and performance, to how Kubernetes v1.32 is stepping up to improve resource allocation, there’s plenty to explore. We’ll also take a closer look at the latest trends in AI, Observability, and software architecture, offering valuable insights on how experimentation and probabilistic strategies are helping teams navigate complexity at scale. Plus, don’t miss the chance to contribute to the CNCF TAG App Delivery ‘s Platform as a Product survey!

Read on for these stories and more! 👇🏻

Platform Engineering#

Navigating Platform Engineering Pitfalls with WebAssembly Components#

Published on Cloud Native Computing Foundation (CNCF) blog by Liam Randall

Platform engineering often faces challenges related to the complexity and cost of managing applications. Containers, though effective in microservice management, struggle with portability and resource overhead. WebAssembly offers a lightweight, portable alternative that improves efficiency, startup times, and security. With platforms like wasmCloud, Wasm components can be deployed more securely and flexibly, supporting a variety of programming languages. This makes Wasm a strong candidate for hybrid and edge computing environments, helping streamline application maintenance across diverse, distributed systems.

For a deeper dive into the challenges and benefits of WebAssembly in platform engineering, read the full article here.

Platform as a Product Research - Now with a Survey!#

Published on CNCF TAG App Delivery blog by Dominik Kress

The CNCF Platforms Working Group is investigating how organizations build and manage internal platforms, focusing on the Platform as a Product approach. This means treating platform users like customers and ensuring their needs are met. The research team has expanded the initiative with a new survey to gather more quantitative insights alongside the qualitative data from interviews. By participating in the survey, platform teams can help shed light on industry practices and decision-making processes. Everyone involved in building platforms is encouraged to contribute. Join the CNCF Slack community and attend bi-weekly meetings to stay updated and share your experiences.

To participate in the survey and for more details, check out the full article here.

DevOps#

The Node Resource Interface says “hi” to WebAssembly#

Published on Cloud Native Computing Foundation (CNCF) blog by Sascha Grunert

The integration of WebAssembly into the Node Resource Interface (NRI) introduces new flexibility in managing containers. By enabling plugins written in Wasm, the NRI offers a memory-efficient, portable, and scalable way to modify container behavior. This is particularly advantageous for edge environments, where efficient resource use is essential. With Wasm, containers can be dynamically adjusted with minimal overhead while maintaining the portability that Wasm is known for.

For a deep dive on how WebAssembly is transforming the NRI and what it means for containerized environments, check out the full article.

Kubernetes v1.32: Penelope#

Published on Kubernetes blog by Matteo Bianchi, Edith Puclla, William Rizzo and other folks from the release team

Kubernetes v1.32, Penelope, marks the project’s 10th anniversary, introducing significant improvements such as enhanced CPU management policies, Dynamic Resource Allocation (DRA) updates, and the promotion of key features to stable. Notably, the release introduces a structured authorization configuration and Volume Group Snapshots in beta. The update focuses on improving system resilience, especially for resource-intensive workloads, providing users with a more stable, efficient, and user-friendly experience.

Dive into the article for a complete breakdown of the latest features and improvements.

AI#

What is Inference Parallelism and how it works#

Published on Cloud Native Computing Foundation (CNCF) blog by Aman Juneja

Inference parallelism addresses the challenge of fitting large AI models into GPU memory by distributing computation across multiple GPUs. Key strategies include data parallelism (where copies of the model run on different GPUs), tensor parallelism (splitting model layers across GPUs), pipeline parallelism (sequential layer processing across devices), and expert parallelism (activating specific sub-networks based on input). Choosing the right method depends on the model architecture and hardware setup, optimizing for either latency or throughput. Tools like TensorRT-LLM and vLLM help implement these techniques for improved scalability.

For a deeper dive into inference parallelism and how to choose the best method, read the full article.

Observability#

OpenTelemetry for Generative AI#

Published on OpenTelemetry blog by Drew Robbins and Liudmila Molkova

As generative AI technologies, such as Large Language Models (LLMs), become more widely adopted, ensuring performance, efficiency, and safety is crucial. OpenTelemetry is evolving to support these needs, specifically for AI workloads. Key initiatives include Semantic Conventions and Python Instrumentation Libraries, starting with OpenAI’s API. These efforts standardize telemetry data for better monitoring of model performance, including tracking metrics like latency, usage, and token consumption. As the adoption of generative AI grows, OpenTelemetry plans to expand support for additional environments, enhancing observability across AI applications.

For a deeper understanding of OpenTelemetry’s role in AI observability, check out the full article here.

Announcing Prometheus 3.0#

Published on Prometheus bog by the Prometheus team

Prometheus 3.0 is a major release with key updates after seven years since the last major version. It introduces a revamped user interface, improving interaction with features like a tree view. Remote Write 2.0 enhances metadata and exemplars, optimizing payloads. UTF-8 support allows more flexibility in metric names and labels, boosting OpenTelemetry compatibility. Prometheus also supports OTLP ingestion, improving integration. Additionally, native histograms, though experimental, offer a more efficient approach to metric collection, aiming to reduce costs and enhance performance in large-scale monitoring setups.

For a comprehensive overview of Prometheus 3.0’s new features and breaking changes, read the full article here.

Software Architecture#

Software Architecture and the Art of Experimentation#

Published on InfoQ blog by Pierre Pureur and Kurt Bittner

In software architecture, being wrong is inevitable, but experimentation helps mitigate its cost. By running small experiments, we can test assumptions and identify potential flaws early on. The Minimum Viable Architecture (MVA) approach allows for architectural decisions that support an MVP, validating their feasibility through experimentation. Effective experiments are focused, time-bound, and address one hypothesis at a time, with measurable outcomes. These experiments help reveal risks early, minimize costly mistakes, and prepare systems for future changes, ensuring long-term supportability.

To dive deeper into the role of experiments in software architecture and how to design them effectively, read the full article.

Beat the Plan: Probabilistic Strategies for Successful Software Delivery at Scale#

Published on InfoQ blog by John Rauser

Large-scale software delivery is inherently uncertain, much like Thorp’s probabilistic approach to beating blackjack. The article highlights that instead of relying on rigid, deterministic planning, successful software delivery depends on embracing uncertainty. Thorp’s ability to thrive by understanding probability mirrors the need for software teams to focus on adaptive systems that respond to change, rather than strictly following predictive models. By adopting a probabilistic mindset, teams can navigate volatility, make informed decisions, and manage risk more effectively.

For a deeper exploration of these ideas, you can read the full article here.