The New Stack Podcast

320 episodes — Page 1 of 7

Fivetran's CPO: closed data stacks won't survive the agent era

May 13, 202622 min

The new FinOps problem isn't cloud bills

May 12, 202628 min

How Microsoft is governing thousands of Kubernetes clusters without manual intervention

May 7, 202625 min

Why Broadcom gave Velero to the CNCF Sandbox — and what it means for Kubernetes data protection

Apr 25, 202622 min

Why AI engineering needs old-school discipline

Apr 24, 202624 min

Jim Bugwadia on why finding a Kubernetes problem is only half the battle for Kyverno users

Apr 23, 202623 min

How AWS Bedrock is shaping Model Context Protocol

Apr 22, 202631 min

Why Microsoft is betting on temporary identities to stop autonomous agents from going rogue

Apr 21, 202624 min

As agentic AI explodes, Amazon doubles down on MCP

Apr 16, 202624 min

A year in, Google wants its Axion processors to feel like a scheduling decision

Apr 15, 202622 min

Can you make Kubernetes invisible? Here's why AWS is on a mission to do it.

Apr 14, 202623 min

The next stages of AI conformance in the cloud-native, open-source world

Apr 9, 202625 min

Microsoft wants to make service mesh invisible

Apr 8, 202621 min

Amazon EKS Auto Mode wants to end Kubernetes toil — one node at a time

Apr 7, 202622 min

Ep 1601Edge-forward: Akamai eyes sweet spot between centralized & decentralized AI inference

At KubeCon + CloudNativeCon Europe 2026, Lena Hall and Thorsten Hans of Akamai outlined how the company is evolving from a CDN provider into a developer-focused cloud platform for AI. Akamai’s strategy centers on low-latency, distributed computing, combining managed Kubernetes, serverless functions, and a distributed AI inference platform to support modern workloads. With a global footprint of core and “distributed reach” datacenters, Akamai aims to bring compute closer to users while still leveraging centralized infrastructure for heavier processing. This hybrid model enables faster feedback loops critical for applications like fraud detection, robotics, and conversational AI. To address concerns about complexity, Akamai emphasizes managed infrastructure and self-service tools that abstract away integration challenges. Its platform supports open source through managed Kubernetes and pre-packaged tools, simplifying deployment. Akamai also invests in serverless technologies like WebAssembly-based functions, enabling developers to build and deploy globally distributed applications quickly. Overall, the company prioritizes developer experience, allowing teams to focus on application logic rather than infrastructure management. Learn more from The New Stack about the latest developments around how Akamai is transforming to a developer-focused cloud platform for AI. Akamai Picks Up Hosting for Kernel.org Should You Care About Fermyon Wasm Functions on Akamai? Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Apr 1, 202622 min

Ep 1598Kubernetes co-founder Brendan Burns: AI-generated code will become as invisible as assembly

In this episode of The New Stack Makers, Microsoft Corporate Vice President and Technical Fellow, Brendan Burns discusses how AI is reshaping Kubernetes and modern infrastructure. Originally designed for stateless applications, Kubernetes is evolving to support AI workloads that require complex GPU scheduling, co-location, and failure sensitivity. Features like Dynamic Resource Allocation and projects such as KAITO introduce AI-specific capabilities, while maintaining Kubernetes’ core strength: vendor-neutral extensibility. Burns highlights that AI also changes how systems are monitored. Success is no longer binary; it depends on answer quality, user feedback, and large-scale testing using thousands of prompts and even AI evaluators. On software development, Burns argues that the industry’s focus on reviewing AI-generated code is temporary. Just as developers stopped inspecting compiler output, AI-generated code will become a disposable artifact validated by tests and specifications. This shift will redefine engineering roles and may lead to programming languages designed for machines rather than humans, signaling a fundamental transformation in how software is built and maintained. Learn more from The New Stack about the latest developments around how AI is reshaping Kubernetes and modern infrastructure: How To Use AI To Design Intelligent, Adaptable Infrastructure The AI Infrastructure crisis: When ambition meets ancient systems Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Mar 24, 202643 min

Ep 1596AI can write your infrastructure code. There's a reason most teams won't let it.

In this episode ofThe New Stack Agents, Marcin Wyszynski, co-founder of Spacelift and OpenTofu, explains how AI is transforming infrastructure as code (IaC). Originally built for individual operators, tools like Terraform struggled to scale across teams, prompting Wyszynski to help launch OpenTofu after HashiCorp’s 2023 license change. Now, the bigger shift is AI: engineers no longer write configuration languages like HCL manually, as AI tools generate it, dramatically lowering the barrier to entry. However, this creates a dangerous gap between generating infrastructure and truly understanding it—like using a phrasebook to ask questions in a foreign language but not understanding the response. In infrastructure, that lack of comprehension can lead to serious risks. To address this, Spacelift introduced Intent, which allows AI to directly interact with cloud systems in real time while enforcing deterministic guardrails through policy controls. The broader challenge remains balancing speed with control—enabling faster experimentation without sacrificing safety. Wyszynski argues that, like humans, AI can be trusted when constrained by strong guardrails. Learn more from The New Stack about the latest developments around how AI is transforming infrastructure as code (IaC). The Maturing State of Infrastructure as Code in 2025 Generative AI Tools for Infrastructure as Code Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Mar 20, 202629 min

Ep 1595OutSystems CEO on how enterprises can successfully adopt vibe coding

Woodson Martin, CEO ofOutSystems, argues that successful enterprise AI deployments rarely rely on standalone agents. Instead, production systems combine AI agents with data, workflows, APIs, applications, and human oversight. While claims that “95% of agent pilots fail” are common, Martin suggests many of those pilots were simply low-commitment experiments made possible by the low cost of testing AI. Enterprises that succeed typically keep humans in the loop, at least initially, to review recommendations and maintain control over decisions. Current enterprise use cases for agents include document processing, decision support, and personalized outputs. When integrated into broader systems, these applications can deliver measurable productivity gains. For example,Travel Essencebuilt an agentic system that reduced a two-hour customer planning process to three minutes, allowing staff to focus more on sales and helping drive 20% top-line growth. Martin also believes AI will pressure traditional SaaS seat-based pricing and accelerate custom software development. In this environment, governed platforms like OutSystems can help enterprises adopt “vibe coding” while maintaining compliance, security, and lifecycle management. Learn more from The New Stack about the latest developments around enterprise adoption of vibe coding: How To Use Vibe Coding Safely in the Enterprise 5 Challenges With Vibe Coding for Enterprises Vibe Coding: The Shadow IT Problem No One Saw Coming Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Mar 6, 202643 min

Ep 1594Inception Labs says its diffusion LLM is 10x faster than Claude, ChatGPT, Gemini

On a recent episode of the The New Stack Agents, Inception Labs CEO Stefano Ermon introduced Mercury 2, a large language model built on diffusion rather than the standard autoregressive approach. Traditional LLMs generate text token by token from left to right, which Ermon describes as “fancy autocomplete.” In contrast, diffusion models begin with a rough draft and refine it in parallel, similar to image systems like Stable Diffusion. This parallel process allows Mercury 2 to produce over 1,000 tokens per second—five to ten times faster than optimized models from labs such as OpenAI, Anthropic, and Google, according to company tests. Ermon argues diffusion models better leverage GPUs, with support from investor Nvidia to optimize performance. While Mercury 2 matches mid-tier models like Claude Haiku and Google Flash rather than top systems such as Claude Opus or GPT-4, Ermon believes diffusion’s speed and economic advantages will become increasingly compelling as AI applications scale. Learn more from The New Stack about the latest developments around around large language model built on diffusion: How Diffusion-Based LLM AI Speeds Up Reasoning Get Ready for Faster Text Generation With Diffusion LLMs Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Mar 2, 202643 min

Ep 1593NanoClaw's answer to OpenClaw is minimal code, maximum isolation

OnThe New Stack Agents, Gavriel Cohen discusses why he built NanoClaw, a minimalist alternative to OpenClaw, after discovering security and architectural flaws in the rapidly growing agentic framework. Cohen, co-founder of AI marketing agencyQwibit, had been running agents across operations, sales, and research usingClaude Code. When Clawdbot (laterOpenClaw) launched, it initially seemed ideal. But Cohen grew concerned after noticing questionable dependencies—including his own outdated GitHub package—excessive WhatsApp data storage, a massive AI-generated codebase nearing 400,000 lines, and a lack of OS-level isolation between agents. In response, he createdNanoClawwith radical minimalism: only a few hundred core lines, minimal dependencies, and containerized agents. Built around Claude Code “skills,” NanoClaw enables modular, build-time integrations while keeping the runtime small enough to audit easily. Cohen argues AI changes coding norms—favoring duplication over DRY, relaxing strict file limits, and treating code as disposable. His goal is simple, secure infrastructure that enterprises can fully understand and trust. Learn more from The New Stack about the latest around personal AI agents Anthropic: You can still use your Claude accounts to run OpenClaw, NanoClaw and Co. It took a researcher fewer than 2 hours to hijack OpenClaw OpenClaw is being called a security “Dumpster fire,” but there is a way to stay safe Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Feb 20, 202651 min

Ep 1592The developer as conductor: Leading an orchestra of AI agents with the feature flag baton

A few weeks after Dynatrace acquired DevCycle, Michael Beemer and Andrew Norris discussed on The New Stack Makers podcast how feature flagging is becoming a critical safeguard in the AI era. By integrating DevCycle’s feature flagging into the Dynatrace observability platform, the combined solution delivers a “360-degree view” of software performance at the feature level. This closes a key visibility gap, enabling teams to see exactly how individual features affect systems in production. As “agentic development” accelerates—where AI agents rapidly generate code—feature flags act as a safety net. They allow teams to test, control, and roll back AI-generated changes in live environments, keeping a human in the loop before full releases. This reduces risk while speeding enterprise adoption of AI tools. The discussion also highlighted support for the Cloud Native Computing Foundation’s OpenFeature standard to avoid vendor lock-in. Ultimately, developers are evolving into “conductors,” orchestrating AI agents with feature flags as their baton. Learn more from The New Stack about the latest around AI enterprise development: Why You Can't Build AI Without Progressive Delivery Beyond automation: Dynatrace unveils agentic AI that fixes problems on its own Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Feb 19, 202619 min

Ep 1591The reason AI agents shouldn’t touch your source code — and what they should do instead

Dynatrace is at a pivotal point, expanding beyond traditional observability into a platform designed for autonomous operations and security powered by agentic AI. In an interview on *The New Stack Makers*, recorded at the Dynatrace Perform conference, Chief Technology Strategist Alois Reitbauer discussed his vision for AI-managed production environments. The conversation followed Dynatrace’s acquisition of DevCycle, a feature-management platform. Reitbauer highlighted feature flags—long used in software development—as a critical safety mechanism in the age of agentic AI. Rather than allowing AI agents to rewrite and deploy code, Dynatrace envisions them operating within guardrails by adjusting configuration settings through feature flags. This approach limits risk while enabling faster, automated decision-making. Customers, Reitbauer noted, are increasingly comfortable with AI handling defined tasks under constraints, but not with agents making sweeping, unsupervised changes. By combining AI with controlled configuration tools, Dynatrace aims to create a safer path toward truly autonomous operations. Learn more from The New Stack about the latest in progressive delivery: Why You Can’t Build AI Without Progressive Delivery Continuous Delivery: Gold Standard for Software Development Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Feb 13, 202622 min

Ep 1589You can’t fire a bot: The blunt truth about AI slop and your job

Matan-Paul Shetrit, Director of Product Management at Writer, argues that people must take responsibility for how they use AI. If someone produces poor-quality output, he says, the blame lies with the user—not the tool. He believes many misunderstand AI’s role, confusing its ability to accelerate work with an abdication of accountability. Speaking on The New Stack Agents podcast, Shetrit emphasized that “we’re all becoming editors,” meaning professionals increasingly review and refine AI-generated content rather than create everything from scratch. However, ultimate responsibility remains human. If an AI-generated presentation contains errors, the presenter—not the AI—is accountable. Shetrit also discussed the evolving AI landscape, contrasting massive general-purpose models from companies like OpenAI and Google with smaller, specialized models. At Writer, the focus is on enabling enterprise-scale AI adoption by reducing costs, improving accuracy, and increasing speed. He argues that bespoke, narrowly focused models tailored to specific use cases are essential for delivering reliable, cost-effective AI solutions at scale. Learn more from The New Stack about the latest around enterprise development: Why Pure AI Coding Won’t Work for Enterprise Software How To Use Vibe Coding Safely in the Enterprise Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Feb 11, 202657 min

Ep 1590GitLab CEO on why AI isn't helping enterprise ship code faster

AI coding assistants are boosting developer productivity, but most enterprises aren’t shipping software any faster. GitLab CEO Bill Staples says the reason is simple: coding was never the main bottleneck. After speaking with more than 60 customers, Staples found that developers spend only 10–20% of their time writing code. The remaining 80–90% is consumed by reviews, CI/CD pipelines, security scans, compliance checks, and deployment—areas that remain largely unautomated. Faster code generation only worsens downstream queues.GitLab’s response is its newly GA’ed Duo Agent Platform, designed to automate the full software development lifecycle. The platform introduces “agent flows,” multi-step orchestrations that can take work from issue creation through merge requests, testing, and validation. Staples argues that context is the key differentiator. Unlike standalone coding tools that only see local code, GitLab’s all-in-one platform gives agents access to issues, epics, pipeline history, security data, and more through a unified knowledge graph.Staples believes this platform approach, rather than fragmented point solutions, is what will finally unlock enterprise software delivery at scale. Learn more from The New Stack about the latest around GitLab and AI: GitLab Launches Its AI Agent Platform in Public BetaGitLab’s Field CTO Predicts: When DevSecOps Meets AIJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

Feb 10, 202657 min

Ep 1588The enterprise is not ready for "the rise of the developer"

Sean O’Dell of Dynatrace argues that enterprises are unprepared for a major shift brought on by AI: the rise of the developer. Speaking at Dynatrace Perform in Las Vegas, O’Dell explains that AI-assisted and “vibe” coding are collapsing traditional boundaries in software development. Developers, once insulated from production by layers of operations and governance, are now regaining end-to-end ownership of the entire software lifecycle — from development and testing to deployment and security. This shift challenges long-standing enterprise structures built around separation of duties and risk mitigation. At the same time, the definition of “developer” is expanding. With AI lowering technical barriers, software creation is becoming more about creative intent than mastery of specialized tools, opening the door to nontraditional developers. Experimentation is also moving into production environments, a change that would have seemed reckless just 18 months ago. According to O’Dell, enterprises now understand AI well enough to experiment confidently, but many are not ready for the cultural, operational, and security implications of developers — broadly defined — taking full control again.Learn more from The New Stack about the latest around enterprise developers and AI: Retool’s New AI-Powered App Builder Lets Non-Developers Build Enterprise AppsSolving 3 Enterprise AI Problems Developers FaceEnterprise Platform Teams Are Stuck in Day 2 HellJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

Feb 5, 202625 min

Ep 1587Meet Gravitino, a geo-distributed, federated metadata lake

In the era of agentic AI, attention has largely focused on data itself, while metadata has remained a neglected concern. Junping (JP) Du, founder and CEO of Datastrato, argues that this must change as AI fundamentally alters how data and metadata are consumed, governed, and understood. To address this gap, Datastrato created Apache Gravitino, an open source, high-performance, geo-distributed, federated metadata lake designed to act as a neutral control plane for metadata and governance across multi-modal, multi-engine AI workloads. Gravitino achieved major milestones in 2025, including graduation as an Apache Top Level Project, a stable 1.1.0 release, and membership in the new Agentic AI Foundation. Du describes Gravitino as a “catalog of catalogs” that unifies metadata across engines like Spark, Trino, Ray, and PyTorch, eliminating silos and inconsistencies. Built to support both structured and unstructured data, Gravitino enables secure, consistent, and AI-friendly data access across clouds and regions, helping enterprises manage governance, access control, and scalability in increasingly complex AI environments.Learn more from The New Stack about how the latest data and metadata are consumed, governed, and understood: Is Agentic Metadata the Next Infrastructure Layer?Why AI Loves Object StorageThe Real Bottleneck in Enterprise AI Isn’t the Model, It’s ContextJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

Jan 29, 202629 min

Ep 1586CTO Chris Aniszczyk on the CNCF push for AI interoperability

Chris Aniszczyk, co-founder and CTO of the Cloud Native Computing Foundation (CNCF), argues that AI agents resemble microservices at a surface level, though they differ in how they are scaled and managed. In an interview ahead of KubeCon/CloudNativeCon Europe, he emphasized that being “AI native” requires being cloud native by default. Cloud-native technologies such as containers, microservices, Kubernetes, gRPC, Prometheus, and OpenTelemetry provide the scalability, resilience, and observability needed to support AI systems at scale. Aniszczyk noted that major AI platforms like ChatGPT and Claude already rely on Kubernetes and other CNCF projects.To address growing complexity in running generative and agentic AI workloads, the CNCF has launched efforts to extend its conformance programs to AI. New requirements—such as dynamic resource allocation for GPUs and TPUs and specialized networking for inference workloads—are being handled inconsistently across the industry. CNCF aims to establish a baseline of compatibility to ensure vendor neutrality. Aniszczyk also highlighted CNCF incubation projects like Metal³ for bare-metal Kubernetes and OpenYurt for managing edge-based Kubernetes deployments. Learn more from The New Stack about CNCF and what to expect in 2026:Why the CNCF’s New Executive Director Is Obsessed With InferenceCNCF Dragonfly Speeds Container, Model Sharing with P2PJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

Jan 22, 202623 min

Ep 1585Solving the Problems that Accompany API Sprawl with AI

API sprawl creates hidden security risks and missed revenue opportunities when organizations lose visibility into the APIs they build. According to IBM’s Neeraj Nargund, APIs power the core business processes enterprises want to scale, making automated discovery, observability, and governance essential—especially when thousands of APIs exist across teams and environments. Strong governance helps identify endpoints, remediate shadow APIs, and manage risk at scale. At the same time, enterprises increasingly want to monetize the data APIs generate, packaging insights into products and pricing and segmenting usage, a need amplified by the rise of AI.To address these challenges, Nargund highlights “smart APIs,” which are infused with AI to provide context awareness, event-driven behavior, and AI-assisted governance throughout the API lifecycle. These APIs help interpret and act on data, integrate with AI agents, and support real-time, streaming use cases.IBM’s latest API Connect release embeds AI across API management and is designed for hybrid and multi-cloud environments, offering centralized governance, observability, and control through a single hybrid control plane.Learn more from The New Stack about smart APIs: Redefining API Management for the AI-Driven Enterprise How To Accelerate Growth With AI-Powered Smart APIs Wrangle Account Sprawl With an AI Gateway Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Jan 15, 202619 min

Ep 1584CloudBees CEO: Why Migration Is a Mirage Costing You Millions

A CloudBees survey reveals that enterprise migration projects often fail to deliver promised modernization benefits. In 2024, 57% of enterprises spent over $1 million on migrations, with average overruns costing $315,000 per project. In The New Stack Makers podcast, CloudBees CEO Anuj Kapur describes this pattern as “the migration mirage,” where organizations chase modernization through costly migrations that push value further into the future. Findings from the CloudBees 2025 DevOps Migration Index show leaders routinely underestimate the longevity and resilience of existing systems. Kapur notes that applications often outlast CIOs, yet new leadership repeatedly mandates wholesale replacement. The report argues modernization has been mistakenly equated with migration, which diverts resources from customer value to replatforming efforts. Beyond financial strain, migration erodes developer morale by forcing engineers to rework functioning systems instead of building new solutions. CloudBees advocates meeting developers where they are, setting flexible guardrails rather than enforcing rigid platforms. Kapur believes this approach, combined with emerging code assistance tools, could spark a new renaissance in software development by 2026.Learn more from The New Stack about enterprise modernization: Why AI Alone Fails at Large-Scale Code ModernizationHow AI Can Speed up Modernization of Your Legacy IT SystemsJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

Jan 13, 202634 min

Ep 1583Human Cognition Can’t Keep Up with Modern Networks. What’s Next?

IBM’s recent acquisitions of Red Hat, HashiCorp, and its planned purchase of Confluent reflect a deliberate strategy to build the infrastructure required for enterprise AI. According to IBM’s Sanil Nambiar, AI depends on consistent hybrid cloud runtimes (Red Hat), programmable and automated infrastructure (HashiCorp), and real-time, trustworthy data (Confluent). Without these foundations, AI cannot function effectively. Nambiar argues that modern, software-defined networks have become too complex for humans to manage alone, overwhelmed by fragmented data, escalating tool sophistication, and a widening skills gap that makes veteran “tribal knowledge” hard to transfer. Trust, he says, is the biggest barrier to AI adoption in networking, since errors can cause costly outages. To address this, IBM launched IBM Network Intelligence, a “network-native” AI solution that combines time-series foundation models with reasoning large language models. This architecture enables AI agents to detect subtle warning patterns, collapse incident response times, and deliver accurate, trustworthy insights for real-world network operations.Learn more from The New Stack about AI infrastructure and IBM’s approach: AI in Network Observability: The Dawn of Network Intelligence How Agentic AI Is Redefining Campus and Branch Network Needs Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Jan 7, 202623 min

Ep 1582From Group Science Project to Enterprise Service: Rethinking OpenTelemetry

Ari Zilka, founder of MyDecisive.ai and former Hortonworks CPO, argues that most observability vendors now offer essentially identical, reactive dashboards that highlight problems only after systems are already broken. After speaking with all 23 observability vendors at KubeCon + CloudNativeCon North America 2025, Zilka said these tools fail to meaningfully reduce mean time to resolution (MTTR), a long-standing demand he heard repeatedly from thousands of CIOs during his time at New Relic.Zilka believes observability must shift from reactive monitoring to proactive operations, where systems automatically respond to telemetry in real time. MyDecisive.ai is his attempt to solve this, acting as a “bump in the wire” that intercepts telemetry and uses AI-driven logic to trigger actions like rolling back faulty releases.He also criticized the rising cost and complexity of OpenTelemetry adoption, noting that many companies now require large, specialized teams just to maintain OTel stacks. MyDecisive aims to turn OpenTelemetry into an enterprise-ready service that reduces human intervention and operational overhead.Learn more from The New Stack about OpenTelemetry:Observability Is Stuck in the Past. Your Users Aren't. Setting Up OpenTelemetry on the Frontend Because I Hate MyselfHow to Make OpenTelemetry Better in the BrowserJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 30, 202517 min

Ep 1581Why You Can't Build AI Without Progressive Delivery

Former GitHub CEO Thomas Dohmke’s claim that AI-based development requires progressive delivery frames a conversation between analyst James Governor and The New Stack’s Alex Williams about why modern release practices matter more than ever. Governor argues that AI systems behave unpredictably in production: models can hallucinate, outputs vary between versions, and changes are often non-deterministic. Because of this uncertainty, teams must rely on progressive delivery techniques such as feature flags, canary releases, observability, measurement and rollback. These practices, originally developed to improve traditional software releases, now form the foundation for deploying AI safely. Concepts like evaluations, model versioning and controlled rollouts are direct extensions of established delivery disciplines. Beyond AI, Governor’s book “Progressive Delivery” challenges DevOps thinking itself. He notes that DevOps focuses on development and operations but often neglects the user feedback loop. Using a framework of four A’s — abundance, autonomy, alignment and automation — he argues that progressive delivery reconnects teams with real user outcomes. Ultimately, success isn’t just reliability metrics, but whether users are actually satisfied. Learn more from The New Stack about progressive delivery: Mastering Progressive Hydration for Enhanced Web Performance Continuous Delivery: Gold Standard for Software Development Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 23, 202527 min

Ep 1580How Nutanix Is Taming Operational Complexity

Most enterprises today run workloads across multiple IT infrastructures rather than a single platform, creating significant operational challenges. According to Nutanix CTO Deepak Goel, organizations face three major hurdles: managing operational complexity amid a shortage of cloud-native skills, migrating legacy virtual machine (VM) workloads to microservices-based cloud-native platforms, and running VM-based workloads alongside containerized applications. Many engineers have deep infrastructure experience but lack Kubernetes expertise, making the transition especially difficult and increasing the learning curve for IT administrators. To address these issues, organizations are turning to platform engineering and internal developer platforms that abstract infrastructure complexity and provide standardized “golden paths” for deployment. Integrated development environments (IDEs) further reduce friction by embedding capabilities like observability and security. Nutanix contributes through its hyper converged platform, which unifies compute and storage while supporting both VMs and containers. At KubeCon North America, Nutanix announced version 2.0 of Nutanix Data Services for Kubernetes (NDK), adding advanced data protection, fault-tolerant replication, and enhanced security through a partnership with Canonical to deliver a hardened operating system for Kubernetes environments.Learn more from The New Stack about operational complexity in cloud native environments:Q&A: Nutanix CEO Rajiv Ramaswami on the Cloud Native Enterprise Kubernetes Complexity Realigns Platform Engineering Strategy Platform Engineering on the Brink: Breakthrough or Bust? Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 18, 202515 min

Ep 1579Do All Your AI Workloads Actually Require Expensive GPUs?

GPUs dominate today’s AI landscape, but Google argues they are not necessary for every workload. As AI adoption has grown, customers have increasingly demanded compute options that deliver high performance with lower cost and power consumption. Drawing on its long history of custom silicon, Google introduced Axion CPUs in 2024 to meet needs for massive scale, flexibility, and general-purpose computing alongside AI workloads. The Axion-based C4A instance is generally available, while the newer N4A virtual machines promise up to 2x price performance.In this episode, Andrei Gueletii, a technical solutions consultant for Google Cloud joined Gari Singh, a product manager for Google Kubernetes Engine (GKE), and Pranay Bakre, a principal solutions engineer at Arm for this episode, recorded at KubeCon + CloudNativeCon North America, in Atlanta. Built on Arm Neoverse V2 cores, Axion processors emphasize energy efficiency and customization, including flexible machine shapes that let users tailor memory and CPU resources. These features are particularly valuable for platform engineering teams, which must optimize centralized infrastructure for cost, FinOps goals, and price performance as they scale.Importantly, many AI tasks—such as inference for smaller models or batch-oriented jobs—do not require GPUs. CPUs can be more efficient when GPU memory is underutilized or latency demands are low. By decoupling workloads and choosing the right compute for each task, organizations can significantly reduce AI compute costs.Learn more from The New Stack about the Axion-based C4A: Beyond Speed: Why Your Next App Must Be Multi-ArchitectureArm: See a Demo About Migrating a x86-Based App to ARM64Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 18, 202529 min

Ep 1578Breaking Data Team Silos Is the Key to Getting AI to Production

Enterprises are racing to deploy AI services, but the teams responsible for running them in production are seeing familiar problems reemerge—most notably, silos between data scientists and operations teams, reminiscent of the old DevOps divide. In a discussion recorded at AWS re:Invent 2025, IBM’s Thanos Matzanas and Martin Fuentes argue that the challenge isn’t new technology but repeating organizational patterns. As data teams move from internal projects to revenue-critical, customer-facing applications, they face new pressures around reliability, observability, and accountability.The speakers stress that many existing observability and governance practices still apply. Standard metrics, KPIs, SLOs, access controls, and audit logs remain essential foundations, even as AI introduces non-determinism and a heavier reliance on human feedback to assess quality. Tools like OpenTelemetry provide common ground, but culture matters more than tooling.Both emphasize starting with business value and breaking down silos early by involving data teams in production discussions. Rather than replacing observability professionals, AI should augment human expertise, especially in critical systems where trust, safety, and compliance are paramount.Learn more from The New Stack about enabling AI with silos: Are Your AI Co-Pilots Trapping Data in Isolated Silos?Break the AI Gridlock at the Intersection of Velocity and TrustTaming AI Observability: Control Is the Key to SuccessJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 17, 202530 min

Ep 1577Why AI Parallelization Will Be One of the Biggest Challenges of 2026

Rob Whiteley, CEO of Coder, argues that the biggest winners in today’s AI boom resemble the “picks and shovels” sellers of the California Gold Rush: companies that provide tools enabling others to build with AI. Speaking onThe New Stack Makersat AWS re:Invent, Whiteley described the current AI moment as the fastest-moving shift he’s seen in 25 years of tech. Developers are rapidly adopting AI tools, while platform teams face pressure to approve them, as saying “no” is no longer viable. Whiteley warns of a widening gap between organizations that extract real value from AI and those that don’t, driven by skills shortages and insufficient investment in training. He sees parallels with the cloud-native transition and predicts the rise of “AI-native” companies. As agentic AI grows, developers increasingly act as managers overseeing many parallel AI agents, creating new challenges around governance, security, and state management. To address this, Coder introduced Mux, an open source coding agent multiplexer designed to help developers manage and evaluate large volumes of AI-generated code efficiently.Learn more from The New Stack about AI Parallelization The Production Generative AI Stack: Architecture and ComponentsEnable ParallelFrontend/Backend Development to Unlock VelocityJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 16, 202524 min

Ep 1575Kubernetes GPU Management Just Got a Major Upgrade

Nvidia Distinguished Engineer Kevin Klues noted that low-level systems work is invisible when done well and highly visible when it fails — a dynamic that frames current Kubernetes innovations for AI. At KubeCon + CloudNativeCon North America 2025, Klues and AWS product manager Jesse Butler discussed two emerging capabilities: dynamic resource allocation (DRA) and a new workload abstraction designed for sophisticated AI scheduling.DRA, now generally available in Kubernetes 1.34, fixes long-standing limitations in GPU requests. Instead of simply asking for a number of GPUs, users can specify types and configurations. Modeled after persistent volumes, DRA allows any specialized hardware to be exposed through standardized interfaces, enabling vendors to deliver custom device drivers cleanly. Butler called it one of the most elegant designs in Kubernetes.Yet complex AI workloads require more coordination. A forthcoming workload abstraction, debuting in Kubernetes 1.35, will let users define pod groups with strict scheduling and topology rules — ensuring multi-node jobs start fully or not at all. Klues emphasized that this abstraction will shape Kubernetes’ AI trajectory for the next decade and encouraged community involvement.Learn more from The New Stack about dynamic resource allocation: Kubernetes Primer: Dynamic Resource Allocation (DRA) for GPU WorkloadsKubernetes v1.34 Introduces Benefits but Also New Blind SpotsJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 11, 202535 min

Ep 1574The Rise of the Cognitive Architect

At KubeCon North America 2025, GitLab’s Emilio Salvador outlined how developers are shifting from individual coders to leaders of hybrid human–AI teams. He envisions developers evolving into “cognitive architects,” responsible for breaking down large, complex problems and distributing work across both AI agents and humans. Complementing this is the emerging role of the “AI guardian,” reflecting growing skepticism around AI-generated code. Even as AI produces more code, humans remain accountable for reviewing quality, security, and compliance.Salvador also described GitLab’s “AI paradox”: developers may code faster with AI, but overall productivity stalls because testing, security, and compliance processes haven’t kept pace. To fix this, he argues organizations must apply AI across the entire development lifecycle, not just in coding. GitLab’s Duo Agent Platform aims to support that end-to-end transformation.Looking ahead, Salvador predicts the rise of a proactive “meta agent” that functions like a full team member. Still, he warns that enterprise adoption remains slow and advises organizations to start small, build skills, and scale gradually.Learn more from The New Stack about the evolving role of "cognitive architects":The Engineer in the AI Age: The Orchestrator and ArchitectThe New Role of Enterprise Architecture in the AI EraThe Architect’s Guide to Understanding Agentic AIJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 10, 202522 min

Ep 1573Why the CNCF's New Executive Director is Obsessed With Inference

Jonathan Bryce, the new CNCF executive director, argues that inference—not model training—will define the next decade of computing. Speaking at KubeCon North America 2025, he emphasized that while the industry obsesses over massive LLM training runs, the real opportunity lies in efficiently serving these models at scale. Cloud-native infrastructure, he says, is uniquely suited to this shift because inference requires real-time deployment, security, scaling, and observability—strengths of the CNCF ecosystem. Bryce believes Kubernetes is already central to modern inference stacks, with projects like Ray, KServe, and emerging GPU-oriented tooling enabling teams to deploy and operationalize models. To bring consistency to this fast-moving space, the CNCF launched a Kubernetes AI Conformance Program, ensuring environments support GPU workloads and Dynamic Resource Allocation. With AI agents poised to multiply inference demand by executing parallel, multi-step tasks, efficiency becomes essential. Bryce predicts that smaller, task-specific models and cloud-native routing optimizations will drive major performance gains. Ultimately, he sees CNCF technologies forming the foundation for what he calls “the biggest workload mankind will ever have.” Learn more from The New Stack about inference: Confronting AI’s Next Big Challenge: Inference Compute Deep Infra Is Building an AI Inference Cloud for Developers Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 9, 202525 min

Ep 1572Kubernetes Gets an AI Conformance Program — and VMware Is Already On Board

The Cloud Native Computing Foundation has introduced the Certified Kubernetes AI Conformance Program to bring consistency to an increasingly fragmented AI ecosystem. Announced at KubeCon + CloudNativeCon North America 2025, the program establishes open, community-driven standards to ensure AI applications run reliably and portably across different Kubernetes platforms. VMware by Broadcom’s vSphere Kubernetes Service (VKS) is among the first platforms to achieve certification.In an interview with The New Stack, Broadcom leaders Dilpreet Bindra and Himanshu Singh explained that the program applies lessons from Kubernetes’ early evolution, aiming to reduce the “muddiness” in AI tooling and improve cross-platform interoperability. They emphasized portability as a core value: organizations should be able to move AI workloads between public and private clouds with minimal friction.VKS integrates tightly with vSphere, using Kubernetes APIs directly to manage infrastructure components declaratively. This approach, along with new add-on management capabilities, reflects Kubernetes’ growing maturity. According to Bindra and Singh, this stability now enables enterprises to trust Kubernetes as a foundation for production-grade AI. Learn more from The New Stack about Broadcom’s latest updates with Kubernetes: Has VMware Finally Caught Up with Kubernetes?VMware VCF 9.0 Finally Unifies Container and VM ManagementJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 8, 202530 min

Ep 1571How etcd Solved Its Knowledge Drain with Deterministic Testing

The etcd project — a distributed key-value store older than Kubernetes — recently faced significant challenges due to maintainer turnover and the resulting loss of unwritten institutional knowledge. Lead maintainer Marek Siarkowicz explained that as longtime contributors left, crucial expertise about testing procedures and correctness guarantees disappeared. This gap led to a problematic release that introduced critical reliability issues, including potential data inconsistencies after crashes.To rebuild confidence in etcd’s correctness, the new maintainer team introduced “robustness testing,” creating a framework inspired by Jepsen to validate both basic and distributed-system behavior. Their goal was to ensure linearizability, the “Holy Grail” of distributed systems, which required developing custom failure-injection tools and teaching the community how to debug complex scenarios.The team later partnered with Antithesis to apply deterministic simulation testing, enabling fully reproducible execution paths and easier detection of subtle race conditions. This approach helped codify implicit knowledge into explicit properties and assertions. Siarkowicz emphasized that such rigorous testing is essential for safeguarding the sensitive “core” of large open source projects, ensuring correctness even as maintainers change.Learn more from The New Stack about the etcd projectTutorial: Install a Highly Available K3s Cluster at the Edge Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 5, 202521 min

Ep 1570Helm 4: What’s New in the Open Source Kubernetes Package Manager?

Helm — originally a hackathon project called Kate’s Place — turned 10 in 2025, marking the milestone with the release of Helm 4, its first major update in six years. Created by Matt Butcher and colleagues as a playful take on “K8s,” the early project won a small prize but quickly grew into a serious effort when Deus leadership recognized the need for a Kubernetes package manager. Renamed Helm, it rapidly expanded with community contributors and became one of the first CNCF graduating projects.Helm 4 reflects years of accumulated design debt and evolving use cases. After the rapid iterations of Helm 1, 2, and 3, the latest version modernizes logging, improves dependency management, and introduces WebAssembly-based plugins for cross-platform portability—addressing the growing diversity of operating systems and architectures. Beyond headline features, maintainers emphasize that mature projects increasingly deliver “boring” but essential improvements, such as better logging, which simplify workflows and integrate more cleanly with other tools. Helm’s re-architected internals also lay the foundation for new chart and package capabilities in upcoming 4.x releases. Learn more from The New Stack about Helm: The Super Helm Chart: To Deploy or Not To Deploy?Kubernetes Gets a New Resource Orchestrator in the Form of KroJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 3, 202524 min

Ep 1569All About Cedar, an Open Source Solution for Fine-Tuning Kubernetes Authorization

Kubernetes has relied on role-based access control (RBAC) since 2017, but its simplicity limits what developers can express, said Micah Hausler, principal engineer at AWS, on The New Stack Makers. RBAC only allows actions; it can’t enforce conditions, denials, or attribute-based rules. Seeking a more expressive authorization model for Kubernetes, Hausler explored Cedar, an authorization engine and policy language created at AWS in 2022 and later open-sourced. Although not designed specifically for Kubernetes, Cedar proved capable of modeling its authorization needs in a concise, readable way. Hausler highlighted Cedar’s clarity—nontechnical users can often understand policies at a glance—as well as its schema validation, autocomplete support, and formal verification, which ensures policies are correct and produce only allow or deny outcomes.Now onboarding to the CNCF sandbox, Cedar is used by companies like Cloudflare and MongoDB and offers language-agnostic tooling, including a Go implementation donated by StrongDM. The project is actively seeking contributors, especially to expand bindings for languages like TypeScript, JavaScript, and Python.Learn more from The New Stack about Cedar:Ceph: 20 Years of Cutting-Edge Storage at the Edge The Cedar Programming Language: Authorization SimplifiedJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 2, 202516 min

Ep 1568Teaching a Billion People to Code: How JupyterLite Is Scaling the Impossible

JupyterLite, a fully browser-based distribution of JupyterLab, is enabling new levels of global scalability in technical education. Developed by Sylvain Corlay’s QuantStack team, it allows math and programming lessons to run entirely in students’ browsers — kernel included — without relying on Docker or cloud-scale infrastructure. Its most prominent success is Capytale, a French national deployment that supports half a million high school students and over 200,000 weekly sessions from essentially a single server, which hosts only teaching content while computation happens locally in each browser.QuantStack, founded in 2016 as what Corlay calls an “accidental startup,” has since grown into a 30-person team contributing across Jupyter, Conda-Forge, and Apache Arrow. But JupyterLite embodies its most ambitious goal: making programming education accessible to countries with rapidly growing youth populations, such as Nigeria, where traditional cloud-hosted notebooks are impractical. Achieving a billion-user future will require advances in accessibility, collaboration, and expanding browser-based package support — efforts that depend on grants and foundation backing.Learn more from The New Stack about Project JupyterFrom Physics to the Future: Brian Granger on Project Jupyter in the Age of AIJupyter AI v3: Could It Generate an ‘Ecosystem of AI Personas?’Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Dec 1, 202519 min

12 3 4 Next »