
348: Compliance Theater Now Available as a Subscriptions
Welcome to episode 348 of The Cloud Pod, where the weather is always cloudy! Justin, Ryan, and Matt are in the studio this week to bring you all the latest news in AI and Cloud, inclduing Strykers troubles, AWS’ birthday, Bedrock Agents, and Claude Code
The Cloud Pod | Weekly AI & Cloud News on AWS, Azure & GCP · Justin Brodley, Jonathan Baker, Ryan Lucas and Matt Kohn | Cloud Computing & AI News
Audio is streamed directly from the publisher (thecloudpod.net) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
Welcome to episode 348 of The Cloud Pod, where the weather is always cloudy! Justin, Ryan, and Matt are in the studio this week to bring you all the latest news in AI and Cloud, inclduing Strykers troubles, AWS’ birthday, Bedrock Agents, and Claude Code – plus so much more. Let’s get started!
Titles we almost went with this week
- SOC 2 It to Me Delve Fires Back
- Shell Yeah Bedrock Agents Just Got Command Line Powers
- When Your SOC 2 Report Is Just Fan Fiction
- uv, Ruff, and ty Walk Into an OpenAI Acquisition
- Hash Field Expiration Is Here, and It’s No Redis Herring
- Stop Paying Full Price for Tokens You Already Bought
- Fake It Till You Audit It
- Cache Me If You Can CNCF Sandbox Edition
- Microsoft Learns Consent Matters in Copilot Rollout
- Microsoft’s Stinky Cloud Gets Federal Seal of Approval
- When Your Audit Trail Leads to a Blog Fight
- Ping Your AI Agent on Discord Like a Millennial
- Twenty Years of AWS and the Bill Never Stops
- The LLM hack that feels a lot like Node Shift Left Package issues
- Claude Code Auto Mode Lets AI Work Unsupervised
- Stop Babysitting Your AI Claude Code Goes Solo
- Auto Mode Gives Claude Code the Keys to the Car
- Java comes to the coffee shop with AI
General News
01:21 Customer Updates: Stryker Network Disruption
- Stryker confirmed a cyberattack on March 11, 2026, that disrupted their internal Microsoft corporate environment, affecting order processing, manufacturing, and shipping, but notably not their connected medical devices or cloud-hosted products.
- The attack vector was specific to Stryker’s Microsoft environment, which meant products running on AWS (Vocera Edge, Vocera Ease) and Google Cloud Platform (care.ai) were architecturally isolated and unaffected, demonstrating a practical benefit of multi-cloud separation.
- Stryker explicitly stated this was not ransomware or malware, and government agencies, including CISA, FBI, and the White House National Cyber Director, were engaged, with domain seizures linked to threat actors already executed.
- The incident highlights how healthcare organizations can architect medical device and cloud product infrastructure to be independent of corporate IT environments, as every product from Mako to SurgiCount to LIFEPAK operated normally due to network segmentation.
- Real-world patient impact was limited but present, with some personalized implant cases rescheduled due to shipping delays, underscoring that even contained corporate IT incidents can have downstream effects on physical supply chains.
02:30 Justin – “HugOps to the entire Stryker team; I couldn’t imagine having to rebuild my entire Windows estate at a company the size of Stryker in the middle of trying to do business and everything else.”
05:00 Federal cyber experts called Microsoft’s cloud a “pile of shit,” and approved it anyway
- FedRAMP authorized Microsoft’s Government Community Cloud High despite internal reviewers finding insufficient security documentation, issuing an unusual “buyer beware” notice to agencies considering the product.
- This raises questions about the integrity of the federal cloud authorization process when commercial pressures intersect with security evaluations.
- The GCC High offering is specifically designed to handle some of the US government’s most sensitive data, making the documentation gaps particularly consequential, given that Microsoft had already been linked to two significant federal breaches involving Russian and Chinese state actors.
- The core technical concern was Microsoft’s inability to adequately document how data is protected as it moves between servers within their cloud infrastructure, leaving reviewers unable to assess the system’s overall security posture with confidence.
- For cloud practitioners and federal agencies, this situation highlights the risk of relying on vendor-provided security documentation without independent verification, especially for high-sensitivity workloads where compliance approval does not necessarily equal verified security.
- The outcome has broader implications for FedRAMP’s credibility as a security benchmark, since agencies selecting cloud providers often treat authorization as a meaningful security signal rather than a conditional or incomplete endorsement.
06:00 Ryan – “If you can’t adequately explain how basic things like encryption and security controls are handled in your environment, that’s not good, right? Because while it’s not completely indicative of a security problem, it’s highly suspect.”
06:51 Delve – Fake Compliance as a Service – Part I
- A detailed investigation alleges that Delve, a compliance automation platform, fabricates audit evidence, including board meeting records and test results, then uses Indian certification mills operating through US shell entities to rubber-stamp reports rather than conduct independent verification.
- The core technical concern is that Delve reportedly generates identical audit reports across all clients, meaning the auditor independence required by AICPA and ISO standards is structurally violated since Delve itself is effectively acting as both platform and auditor.
- Companies using Delve for HIPAA or GDPR compliance may face significant regulatory exposure, as the article claims the platform skips major framework requirements while telling clients they have achieved 100% compliance, potentially creating criminal liability under HIPAA and fines up to 4% of global revenue under GDPR.
- The investigation highlights a broader issue in the compliance automation space where AI and automation claims may not reflect actual product capabilities, with the article describing Delve as essentially a template pack with a SaaS wrapper rather than a genuinely automated compliance tool.
- For cloud-focused companies evaluating compliance platforms, this case underscores the importance of verifying auditor independence credentials, requesting evidence of actual testing procedures, and understanding whether a platform produces genuinely customized documentation or pre-populated templates adopted with minimal review.
- Interested in reading the leaked spreadsheet? Find those here and the leaked documents here.
08:47 Ryan – “I’m not a big fan of checkbox security and having that around just for compliance purposes. But it’s also like, this is really a misrepresentation. You look at things and, and it’s certified by Delve; it’s not certified by these other companies. And if all that evidence, the specifics they listed in the report are crazy, just how, like, this is not cool. It’s just generated. It’s not even real in the slightest.”
11:37 Response to Misleading Claims
- Delve is a SOC 2 compliance automation platform serving over 1,700 customers, and this response addresses a Substack post making claims about the legitimacy of its audit processes.
- The core distinction Delve makes is that it automates evidence collection and provides templates, while independent licensed auditors retain sole authority to issue final reports.
- The debate touches on a broader industry practice where compliance platforms provide standardized control sets based on AICPA and ISO frameworks, meaning structural overlap across reports is expected rather than evidence of fraud.
- This is worth discussing because buyers of compliance software often do not fully understand where the platform ends and the auditor begins.
- Delve claims 120+ automated integrations, which is a notable gap from the 14 cited in the original criticism, and speaks to how quickly compliance tooling has evolved in the cloud ecosystem.
- For cloud-native companies pursuing SOC 2, the depth of integrations directly affects how much manual evidence collection is required.
- The use of pre-filled templates for board minutes and policies is standard practice across compliance platforms, but it raises a legitimate question about whether customers treat these as starting points or simply submit them unchanged.
- This is a real risk area for organizations where compliance becomes a checkbox exercise rather than a genuine security posture.
- The competitive compliance automation market, which includes players like Vanta and Drata, means disputes like this are likely to continue as vendors differentiate on auditor quality, automation depth, and pricing.
- Listeners evaluating compliance tools should independently verify auditor accreditation regardless of which platform they use.
13:08 Ryan – “I would argue the use of pre-filled templates is common…prefilled and direct copied templates from between companies.”
19:04 Supply Chain Attack in litellm 1.82.8 on PyPI
- Litellm versions 1.82.7 and 1.82.8 on PyPI were found to contain a malicious .pth file that executes automatically on every Python process startup, with no corresponding release on the official GitHub repository, indicating the PyPI account was likely compromised.
- The malware follows a three-stage attack pattern: collecting SSH keys, cloud credentials, .env files, and Kubernetes configs; encrypting and exfiltrating them to a domain unrelated to legitimate litellm infrastructure; then attempting persistent backdoor installation via systemd and privileged Kubernetes pod creation.
- The attack was discovered because a bug in the malware caused an exponential fork bomb through a recursive .pth file, triggering, which crashed the host machine and made the compromise visible rather than silent.
- Any developer or CI/CD pipeline that pulled litellm as a transitive dependency after March 24, 2026, should treat all credentials on that machine as compromised and rotate SSH keys, cloud provider tokens, API keys, and database passwords immediately.
- This incident highlights the risk of supply chain attacks through transitive dependencies, where a package you never directly installed can introduce malicious code into your environment, making dependency auditing and package integrity verification important practices for cloud-connected development workflows.
21:21 Justin – “Yeah… that’s bad too.”
KUBECON EU
23:24 GKE and OSS innovation at KubeCon EU 2026
- GKE Autopilot is no longer a cluster-level decision made at creation time. Standard clusters can now enable Autopilot compute classes on a per-workload basis, removing the need to create entirely new clusters when workload requirements change.
- Google is open-sourcing the GKE Cluster Autoscaler, one of the core infrastructure provisioning components, with the goal of making it available to the broader Kubernetes community as a vendor-neutral tool.
- llm-d, a Kubernetes-native distributed inference framework built with Red Hat and NVIDIA, has been accepted as a CNCF Sandbox project. It addresses inference-aware traffic management, multi-node replica orchestration, and KV cache offloading in a hardware-agnostic way.
- Google released an open-source DRA driver for TPUs, coordinated alongside NVIDIA, donating their own DRA driver, establishing Dynamic Resource Allocation as a shared standard for describing specialized hardware across Kubernetes workloads.
- TPU support is coming to Ray v2.55 with backing from both Google and Anyscale, and a new Ray History Server in alpha allows users to debug completed or terminated RayJobs using persisted logs, state, and metrics through the Ray Dashboard on GKE.
24:29 Ryan – “It’s super nice of them to open source that, because it does seem like a very powerful thing to use. I love the idea of having individual workloads on a cluster, and be able to delegate to managed and unmanaged… it’s kind of neat.”
24:49 llm-d officially a CNCF Sandbox project
- llm-d has been accepted as a CNCF Sandbox project, with Google Cloud as a founding contributor alongside Red Hat, IBM Research, CoreWeave, and NVIDIA.
- The project aims to extend Kubernetes for LLM inference workloads under an open-source model with no vendor lock-in, available at llm-d.ai.
- The core technical contribution is model-aware request routing through the llm-d Endpoint Picker, which considers KV-cache hit rates, in-flight requests, and queue depth to direct traffic to optimal backends.
- In production testing on Vertex AI, this approach reduced Time-to-First-Token latency by over 35% for coding workloads and improved P95 tail latency by 52% for bursty chat workloads.
- A notable outcome of the routing intelligence was doubling Vertex AI’s prefix cache hit rate from 35% to 70%, which directly reduces re-computation overhead and lowers cost-per-token for high-volume inference deployments.
- Google leads development of the Kubernetes LeaderWorkerSet API, which llm-d uses to orchestrate prefill and decode disaggregation across independently scalable pods, supporting both TPU and GPU fleets at scale.
- Google has also extended vLLM natively for Cloud TPUs with a unified PyTorch and JAX backend, delivering up to 5x throughput gains over the initial release. Pricing for running llm-d workloads depends on underlying GKE and accelerator costs, which vary by instance type and region.
26:21 What’s new with Microsoft in open-source and Kubernetes at KubeCon + CloudNativeCon Europe 2026
- Dynamic Resource Allocation has reached general availability in Kubernetes, and Microsoft’s DRANet now includes upstream support for Azure RDMA NICs, meaning GPU-to-NIC topology alignment is handled at the scheduler level rather than through manual configuration.
- This matters for teams running distributed training workloads where network topology directly affects performance.
- AI Runway is a new open-source project under the KAITO umbrella that provides a common Kubernetes API for inference workloads, with a web interface, HuggingFace model discovery, GPU memory fit indicators, and real-time cost estimates.
- It supports multiple runtimes, including NVIDIA Dynamo and KubeRay, giving platform teams a single control plane for model deployments without requiring end users to know Kubernetes.
- AKS networking gets several notable updates, including Azure Kubernetes Application Network for identity-aware mTLS and traffic telemetry without a full service mesh, WireGuard encryption at the node level via Cilium, and Pod CIDR expansion that lets clusters grow IP ranges in place rather than requiring a full rebuild.
- Pricing for Advanced Container Networking Services features like Cilium mTLS is not specified in the announcement.
- On the observability side, AKS now surfaces GPU utilization directly into managed Prometheus and Grafana, closing a monitoring gap that previously required manual exporter configuration.
- A new agentic container networking interface also lets operators run natural-language diagnostic queries against live telemetry, reducing time to identify network issues.
- Blue-green agent pool upgrades and agent pool rollback are now available in AKS, letting teams provision a parallel node pool with the new configuration, validate it, and revert to the previous Kubernetes version and node image if problems appear.
- AKS Desktop also reached general availability, giving developers a local environment that mirrors production AKS configuration.
27:42 Ryan – “And if you’ve ever debugged an issue on Kubernetes, then you know that there’s logs everywhere that you have to go and review and correlate across each other, so having an agent that can go and look across all those places and diagnose issues is fantastic.”
AI Is Going Great – Or How ML Makes Money
28:22 Project SnowWork: The easiest way for business users to get work done
- Snowflake announced Project SnowWork in Research Preview, an agentic AI platform targeting business users in finance, sales, marketing, and operations who need to complete multi-step data workflows without writing code or relying on technical teams.
- The platform differentiates itself from general AI assistants by grounding outputs in an organization’s existing Snowflake data and automatically enforcing existing RBAC and governance policies, meaning users only see data they are already authorized to access.
- Project SnowWork ships with pre-built persona profiles for specific business functions, so a finance user gets workflows tuned to FP&A KPIs and close narratives while a sales user gets pipeline risk summaries, rather than a one-size-fits-all interface.
- Practical use cases highlighted include compressing financial close storytelling from days to a single workflow and replacing manual pipeline rollups with automated executive briefs, which gives listeners a concrete sense of the time savings being targeted.
- Access is currently limited to a select group of customers in a collaborative research preview, so this is not a general availability release, and organizations interested in early access would need to engage directly with Snowflake.
27:42 Ryan – “I do like the idea of bringing AI to the data rather than the data to the AI, which is a common problem, especially in enterprise platforms. I worry a little bit; The RBAC and authorization in Snowflake is very complex, and I wonder if people are actually going through and actually defining those in a way that would be proper segmentation? But I guess, you know, they have access to it today, they just have to know how to query it.”
30:10 OpenAI to acquire Astral
- OpenAI is acquiring Astral, the company behind three widely adopted Python developer tools: uv for dependency and environment management, Ruff for linting and formatting, and ty for type safety enforcement.
- The Astral team will join the Codex team after the deal closes, pending regulatory approval.
- Codex has reached over 2 million weekly active users, with 3x user growth and 5x usage increase since the start of 2025. This acquisition appears aimed at deepening Codex’s ability to operate across the full Python development lifecycle rather than just generating code snippets.
- The stated goal is to move Codex toward participating in complete development workflows, including planning changes, modifying codebases, running tools, verifying results, and maintaining software over time. Integrating Astral’s tooling directly into that workflow gives Codex agents access to infrastructure developers already use daily.
- OpenAI has committed to continuing support for Astral’s open source projects after closing, which matters to the Python community given how widely these tools are already embedded in developer workflows. Developers using uv or Ruff should not expect immediate disruption to those projects.
- For cloud and platform teams, this signals a trend toward AI coding agents that are tightly coupled with language-specific toolchains rather than operating as generic code generators, which could influence how development environments and CI/CD pipelines are structured going forward.
30:47 Justin – “I don’t know why they needed to buy the company to do all this, it is open source already.”
32:50 Anthropic just shipped an OpenClaw killer called Claude Code Channels, letting you message it over Telegram and Discord
- Anthropic released Claude Code Channels in version 2.1.80, enabling developers to connect their Claude Code sessions to Telegram and Discord bots, shifting from a synchronous chat model to an asynchronous, persistent agent that can work autonomously and notify users when tasks are completed.
- The feature is built on Anthropic’s open-source Model Context Protocol, which acts as a standardized bridge between Claude Code and external messaging platforms.
- The setup uses the Bun JavaScript runtime to run a polling service that injects incoming messages as session events, allowing Claude to execute code, run tests, and reply back through the messaging app.
- Practically, this eliminates the need for developers to maintain dedicated hardware like a Mac Mini running open-source agent frameworks 24/7, since Claude Code itself now handles session persistence when run in a background terminal or on a VPS.
- The plugin architecture is open, with official Telegram and Discord connectors hosted on GitHub under Anthropic repositories, meaning the community can build additional connectors for platforms like Slack or WhatsApp without waiting for Anthropic to ship them natively.
- The feature remains tied to Anthropic’s commercial subscriptions (Pro, Max, and Enterprise), so while the MCP layer is open, the underlying Claude model and Claude Code harness are proprietary, which is an important cost and vendor-lock consideration for teams evaluating this against self-hosted alternatives.
33:50 Justin – “I tried to use this, and it don’t work for me, but I didn’t have enough time to test it, I had too many Claude sessions going, and I needed to kill all of them and update properly to the 2.1.80 version. But I am curious to play with it a little more.”
35:34 Put Claude to work on your computer
- Anthropic has launched computer use capabilities in Claude Cowork and Claude Code, now in research preview for Pro and Max subscribers on macOS. Claude can directly control a browser, mouse, keyboard, and screen to complete tasks when no direct connector exists, with no setup required.
- The feature follows a tool priority hierarchy, reaching for service connectors like Slack or Google Calendar first, then falling back to direct computer control. Claude requests explicit permission before accessing new applications and can be stopped at any point.
- Anthropic has built in prompt injection safeguards by scanning model activations during computer use sessions. They acknowledge that the capability is still early and recommend users avoid sensitive data and start with trusted applications only.
- Dispatch, released alongside this update, enables a continuous conversation thread between mobile and desktop, letting users assign tasks from their phone and pick up completed work on their computer.
- Use cases include automated morning email checks, scheduled metric pulls, and triggering Claude Code sessions for pull requests.
- The combination of Dispatch and computer use means Claude can execute multi-step workflows on a desktop while the user is away, such as making IDE changes, running tests, and submitting a PR.
- Current limitations include macOS-only support, slower execution compared to direct integrations, and occasional need for retries on complex tasks.
36:28 Ryan – “I didn’t know this was macOS only, because I was going to put it on my Linux server so I could get compute that wasn’t my laptop.”
38:32 Auto mode for Claude Code
- Anthropic launched auto mode for Claude Code in research preview for Team plan users, with Enterprise and API access coming soon. It works with both Claude Sonnet 4.6 and Opus 4.6, offering a middle ground between the default conservative permission prompts and the risky dangerously-skip-permissions flag.
- The core mechanism is a classifier that reviews each tool call before execution, automatically blocking potentially destructive actions like mass file deletion, sensitive data exfiltration, or malicious code execution, while letting safe actions proceed without interruption.
- This directly addresses a practical developer workflow problem: Claude Code’s default mode requires frequent human approvals that prevent truly unattended long-running tasks, and auto mode allows developers to kick off extended jobs without babysitting the process.
- Anthropic is transparent about the limitations, noting the classifier may still allow some risky actions when user intent is ambiguous, and may occasionally block benign ones. They continue to recommend using it in isolated environments rather than treating it as a fully safe alternative.
- There is a small performance tradeoff to be aware of, as auto mode adds some overhead to token consumption, cost, and latency per tool call due to the classifier running before each action.
AWS
41:21 Amazon Bedrock AgentCore Runtime now supports shell command execution
- Amazon Bedrock AgentCore Runtime now includes InvokeAgentRuntimeCommand, an API that lets developers execute shell commands directly inside a running agent session, streaming output in real time over HTTP/2 and returning exit codes without custom container logic.
- The practical benefit here is that AI agents frequently need to run deterministic operations like tests, dependency installs, or git commands alongside LLM reasoning, and previously, developers had to build all that process management themselves inside their containers.
- Commands run in the same container, filesystem, and environment as the agent session and can execute concurrently with agent invocations without blocking, which simplifies architectures for coding agents, CI/CD automation, and similar workflows.
- The feature is available across 14 AWS regions, including major US, European, and Asia Pacific locations, giving teams broad geographic coverage for latency-sensitive or data-residency-constrained workloads.
- Pricing details are not specified in the announcement, so teams evaluating this should check the AgentCore Runtime pricing page directly before building cost models around heavy command execution workloads.
42:11 Ryan – I do get the advantages of this. Most of my use cases in GitHub Autopilot or Cloud Code it’s running Shell to do lots of things, especially executing tests, and so for CI-CD type workflows, you couldn’t do anything without it. I’m really curious how teams were working around this; people that were previously using Agent Core, because I bet that is ugly. But yeah, it’s going to be dangerous.”
42:56 Amazon Inspector expands agentless EC2 scanning and introduces Windows KB-based findings
- Amazon Inspector now supports agentless EC2 scanning for a broader range of software, including WordPress, Apache HTTP Server, Python packages, and Ruby gems, plus Windows OS vulnerabilities, with no configuration changes required for existing customers.
- The new Windows KB-based findings consolidate multiple CVEs addressed by a single Microsoft patch into one finding, surfacing the highest CVSS score, EPSS score, and exploit availability, which reduces noise and makes remediation more straightforward.
- All existing CVE-based Windows OS findings will automatically transition to KB-based findings, meaning security teams will see fewer duplicate alerts and can map findings directly to specific Microsoft patches via included KB article links.
- The agentless approach lowers the operational overhead for security teams managing large EC2 fleets, particularly in environments where installing and maintaining agents is restricted or impractical.
- Both capabilities are available across all AWS Regions where Amazon Inspector is currently offered, and pricing follows the existing Inspector model based on instance scanning volume, so customers should review the Inspector pricing page for current rates.
43:33 Justin – “I’m actually shocked this wasn’t already there, because CVE is really just the generic way that you would find these, but typically they’re always linked to a knowledge-based article which then typically links you to the patch, so I don’t know how people got from the CVE to the patch without this before, other than maybe the CVE mentions the KB articles.”
22:53 Amazon ECR now supports pull-through cache for Chainguard
- Amazon ECR pull-through cache now supports Chainguard as an upstream registry source, allowing customers to automatically sync Chainguard container images into ECR without building custom synchronization workflows.
- Chainguard images are known for their minimal attack surface and security-focused builds, so pairing them with ECR’s native image scanning and lifecycle policies gives teams a more integrated security posture for their container supply chain.
- The practical benefit here is operational simplicity: teams using Chainguard images at scale no longer need separate tooling to keep images current, as ECR handles the sync automatically and frequently.
- Cached Chainguard images inherit standard ECR capabilities, including lifecycle policies for cost management and image scanning, which means customers get consistent governance across both their own images and upstream Chainguard images.
- The feature is available in all AWS regions where ECR pull-through cache is supported, and pricing follows standard ECR storage and data transfer rates with no additional charge specific to the Chainguard integration. Full details are in the ECR pull-through cache documentation here.
46:22 Matt – “It’s massive, but checks a box for your security team, right, that doesn’t want to understand how containers work. Just use this one, and you’ll have to worry about it. It’s like, but I can install anything I want on it. So is it actually going to help?”
47:57 AWS at 20*: Inside the rise of Amazon’s cloud empire, and what’s at stake in the AI era
- AWS turns 20 this month, growing from 10 cents per compute hour in 2006 to nearly $129 billion in annual revenue, which would place it in the Fortune 500 top 40 as a standalone company.
- The article traces how S3 and EC2 established the pay-per-use primitive model that directly undercut Oracle-style licensing and reshaped enterprise IT economics.
- Bedrock has become the fastest-growing service in AWS history, surpassing 100,000 customers and generating multi-billion dollar revenue with 60% quarter-over-quarter spending growth. AWS built it as a multi-model platform rather than pushing a single in-house option, following the same pattern it used with CPUs and GPUs by offering AMD, Intel, Graviton, Nvidia, and Trainium alongside each other.
- Project Rainier, an AI compute cluster powered by over 500,000 Trainium2 chips in Indiana, represents AWS attempting to reduce dependence on Nvidia by building its own silicon stack from chip to data center.
- The OpenAI partnership, worth up to $100 billion in cloud commitments over eight years, brings OpenAI workloads onto Trainium chips, making it the second major AI lab after Anthropic to commit to Amazon’s custom silicon.
- AWS still leads cloud revenue at over $116 billion annually, but Azure at $75 billion and Google Cloud at $50 billion annual run rates show the gap narrowing, particularly in AI workloads.
- Corey Quinn’s Cisco analogy is worth discussing: AWS could remain profitable and essential while becoming less central to where AI innovation actually happens.
- Jassy has publicly projected AWS could reach $600 billion in annual revenue by 2036 with AI as the driver, backing that with $200 billion in capital expenditure planned for this year alone, which would consume nearly all of Amazon’s operating cash flow.
- Happy Birthday
49:37 AWS MCP Server (Preview) now with enhanced monitoring and semantic search capability
- AWS MCP Server in preview now automatically publishes metrics to CloudWatch under the AWS-MCP namespace at no additional cost, covering invocation counts, success rates, client errors, server errors, and throttling for individual tools like the AWS API caller and Agent SOP retriever.
- Agent SOPs are pre-built, tested workflows that guide AI assistants through complex multi-step AWS tasks, and the documentation search tool now uses semantic similarity so agents can discover the right SOP through natural language queries rather than exact keyword matching.
- The CloudWatch integration addresses a previous gap where customers had no visibility into agent-driven changes, enabling teams to track usage patterns, identify permission issues, and configure alarms when error rates exceed defined thresholds.
- The service is currently available only in US East (N. Virginia) in preview, which is worth noting for teams with data residency requirements or those operating primarily in other regions.
- For listeners building AI-assisted infrastructure automation, this update provides a practical observability layer for MCP-based agents, which is increasingly relevant as teams adopt AI assistants for AWS operations tasks.
50:26 Ryan – “Why did everything go offline? Now you can find out!”
GCP
50:59 CloudSQL read pools support autoscaling
- Cloud SQL read pools, now generally available for Enterprise Plus edition, let you provision up to 20 read replicas behind a single load-balance