Vanishing Gradients

78 episodes — Page 1 of 2

Privacy Theater Is Not Privacy Engineering: What It Actually Takes to Ship Safe AI

Apr 15, 20261h 6m

LLM Architecture in 2026: What You Need to Know with Sebastian Raschka

Apr 13, 20261h 18m

Episode 72: Why Agents Solve the Wrong Problem (and What Data Scientists Do Instead)

I often see what I would consider to be b******t evals, especially in data, like write this dumb SQL. Almost every one of these dumb SQL questions that I’ve seen for benchmarks are just so either obviously easy or overwhelmingly adversarial. They just, they don’t feel valuable as a data scientist, it’s something that you probably would never ask a real data scientist to do. So I went out my way to create real ones. Let me read one to you.Bryan Bischof, Head of AI at Theory Ventures, joins Hugo to talk about what happened when 150 people spent six hours using AI agents to answer real data science questions across SQL tables, log files, and 750,000 PDFs.They Discuss:* Failure Funnels, pinpoint where agent reasoning breaks down using causal-chain binary evaluations instead of vague 1-5 scales;* Median Score: 23 out of 65, what happened when world-class engineers turned agents loose on real data work, and why general-purpose coding agents with human prodding beat fancy frameworks;* Zero-Cost Submissions Kill Trust, without a penalty for wrong answers, agents hill-climb to correct submissions through brute force instead of building confidence;* Data Science is “Zooming”, moving beyond binary decisions to iterative problem framing, refining “does our inventory suck?” into a tractable hypothesis;* MCP as Semantic Layer, model your organization’s proprietary knowledge once and distribute it to whatever LLM interface your team prefers;* The Subagent vs. Tool Debate, a distinction that adds cognitive load without hiding complexity;* Self-Orchestration Gap, agents don’t yet realize they should trigger specialized extraction frameworks like DocETL instead of reading 750K PDFs one by one;* The Future of Evals, from vibe checks to objective functions and continuous user feedback that lets systems converge on reliability.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort has started. Registration is still open. All sessions are recorded so don’t worry about having missed any. Here is a 25% discount code for readers. 👈LINKS* Bryan Bischof on Twitter/X* Bryan Bischof on LinkedIn* Theory Ventures* The Hunt for a Trustworthy Data Agent (blog post)* America’s Next Top Modeler GitHub repo* Hamel’s evals FAQ: How do I evaluate agentic workflows?* DocETL* LLM Judges and AI Agents at Scale (Hugo’s podcast with Shreya Shankar)* When Your Metrics Are Lying (Cimo Labs)* Lessons from a Year of Building with LLMs (livestream on YouTube)* Bryan Bischof: The Map is Not the Territory (YouTube)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort has started. Registration is still open. All sessions are recorded so don’t worry about having missed any. Here is a 25% discount code for readers. 👈 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Mar 20, 20261h 33m

Episode 71: Durable Agents - How to Build AI Systems That Survive a Crash with Samuel Colvin

Our thesis is that AI is still just engineering… those people who tell us for fun and profit, that somehow AI is so, so profound, so new, so different from anything that’s gone before that it somehow eclipses the need for good engineering practice are wrong. We need that good engineering practice still, and for the most part, most things are not new. But there are some things that have become more important with AI. One of those is durability.Samuel Colvin, Creator of Pydantic AI, joins Hugo to talk about applying battle-tested software engineering principles to build durable and reliable AI agents.They Discuss:* Production agents require engineering-grade reliability: Unlike messy coding agents, production agents need high constraint, reliability, and the ability to perform hundreds of tasks without drifting into unusual behavior;* Agents are the new “quantum” of AI software: Modern architecture uses discrete “agentlets”: small, specialized building blocks stitched together for sub-tasks within larger, durable systems;* Stop building “chocolate teapot” execution frameworks: Ditch rudimentary snapshotting; use battle-tested durable execution engines like Temporal for robust retry logic and state management;* AI observability will be a native feature: In five years, AI observability will be integrated, with token counts and prompt traces becoming standard features of all observability platforms;* Split agents into deterministic workflows and stochastic activities: Ensure true durability by isolating deterministic workflow logic from stochastic activities (IO, LLM calls) to cache results and prevent redundant model calls;* Type safety is essential for enterprise agents: Sacrificing type safety for flexible graphs leads to unmaintainable software; professional AI engineering demands strict type definitions for parallel node execution and state recovery;* Standardize on OpenTelemetry for portability: Use OpenTelemetry (OTel) to ensure agent traces and logs are portable, preventing vendor lock-in and integrating seamlessly into existing enterprise monitoring.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a 25% discount code for listeners. 👈LINKS* Samuel Colvin on LinkedIn* Pydantic* Pydantic Stack Demo repo* Deep research example code* Temporal* DBOS (Postgres alternative to Temporal)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube👉Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort starts March 10, 2026. Here is a 25% discount code for listeners.👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Feb 18, 202651 min

Episode 70: 1,400 Production AI Deployments

There’s a company who spent almost $50,000 because an agent went into an infinite loop and they forgot about it for a month.It had no failures and I guess no one was monitoring these costs. It’s nice that people do write about that in the database as well. After it happened, they said: watch out for infinite loops. Watch out for cascading tool failures. Watch out for silent failures where the agent reports it has succeeded when it didn’t!We Discuss:* Why the most successful teams are ripping out and rebuilding their agent systems every few weeks as models improve, and why over-engineering now creates technical debt you can’t afford later;* The $50,000 infinite loop disaster and why “silent failures” are the biggest risk in production: agents confidently report success while spiraling into expensive mistakes;* How ELIOS built emergency voice agents with sub-400ms response times by aggressively throwing away context every few seconds, and why these extreme patterns are becoming standard practice;* Why DoorDash uses a three-tier agent architecture (manager, progress tracker, and specialists) with a persistent workspace that lets agents collaborate across hours or days;* Why simple text files and markdown are emerging as the best “continual learning” layer: human-readable memory that persists across sessions without fine-tuning models;* The 100-to-1 problem: for every useful output, tool-calling agents generate 100 tokens of noise, and the three tactics (reduce, offload, isolate) teams use to manage it;* Why companies are choosing Gemini Flash for document processing and Opus for long reasoning chains, and how to match models to your actual usage patterns;* The debate over vector databases versus simple grep and cat, and why giving agents standard command-line tools often beats complex APIs;* What “re-architect” as a job title reveals about the shift from 70% scaffolding / 30% model to 90% model / 10% scaffolding, and why knowing when to rip things out is the may be the most important skill today.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort starts March 10, 2026. Here is a 25% discount code for readers. 👈Show Notes Links* Alex Strick van Linschoten on LinkedIn* Alex Strick van Linschoten on Twitter/X* LLMOps Database* LLMOps Database Dataset on Hugging Face* Hugo’s MCP Server for LLMOps Database* Alex’s Blog: What 1,200+ Production Deployments Reveal About LLMOps in 2025* Previous Episode: Practical Lessons from 750 Real-World LLM Deployments* Previous Episode: Tales from 400 LLM Deployments* Context Rot Research by Chroma* Hugo’s Post: AI Agent Harness - 3 Principles for Context Engineering* Hugo’s Post: The Rise of Agentic Search* Episode with Nick Moy: The Post-Coding Era* Hugo’s Personal Podcast Prep Skill Gist* Claude Tool Search Documentation* Gastown on GitHub (Steve Yegge)* Welcome to Gastown by Steve Yegge* ZenML - Open Source MLOps & LLMOps Framework* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast livestream on YouTube* Join the final cohort of our Building AI Applications course in March, 2026 (25% off for listeners)👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort starts March 10, 2026. Here is a 25% discount code for readers. 👈 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Feb 12, 20261h 9m

Episode 69: Python is Dead. Long Live Python! With the Creators of pandas & Parquet

> It’s the agent writing the code. And it’s the development loop of writing the code, building testing, write the code, build test and iterating. And so I do think we’ll see for many types of software, a shift away from Python towards other programming languages. I think Go is probably the best language for those like other types of software projects. And like I said, I haven’t written a line of Go code in my life.– Wes McKinney (creator of pandas Principal Architect at Posit),Wes McKinney, Marcel Kornacker, and Alison Hill join Hugo to talk about the architectural shift for multimodal AI, the rise of “agent ergonomics,” and the evolving role of developers in an AI-generated future.We Discuss:* Agent Ergonomics: Optimize for agent iteration speed, shifting from human coding to fast test environments, potentially favoring languages like Go;* Adversarial Code Review: Deploy diverse AI models to peer-review agent-generated code, catching subtle bugs humans miss;* Multimodal Data Verbs: Make operations like resizing and rotating native to your database to eliminate data-plumbing bottlenecks;* Taste as Differentiator: Value “taste”—the ability to curate and refine the best output from countless AI-generated options—over sheer execution speed;* 100x Software Volume: Embrace ephemeral, just-in-time software; prioritize aggressive generation and adversarial testing over careful planning for quality.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript of the workshop & fireside chat here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈This was a fireside chat at the end of a livestreamed workshop we did on building multimodal AI systems with Pixeltable. Check out the full workshop below (all code here on Github):Links and Resources* Wes McKinney on LinkedIn* Marcel Kornacker on LinkedIn* Alison Hill on LinkedIn* Spicy Takes* Palmer Penguins* Pixeltable* Posit* Positron* Building Multimodal AI Systems Workshop Repository* Pixeltable Docs: LLM Tool Calling with MCP Servers* Pixeltable Docs: Working with Pydantic* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube* Join the final cohort of our Building AI Applications course in March, 2026 (25% off for listeners)https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfsWhat people said during the workshop“I think the interface looks amazing/simple. Strong work! 🦾” — @goldentribe“This is quite amazing. Watching this I felt the same way when I first leant pandas, NumPy and scikit and how well i was able to manipulate and wrangle data. PixelTable feels seamless and looks as good as those legendary frameworks but for Multimodal Data.” — @vinod7“This is all extremely cool to see, I love the API and the approach.” — @steveb4191“Thanks so much, Hugo! That was very insightful! Great work Alison and Marcel!” — @vinod7“Just wrapped up watching a replay of the Pixeltable workshop. So cool!! Love the notebooks and working examples. The important parts were covered and worked beautifully 🕺” — @therobbrennan👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Feb 3, 202655 min

Episode 68: A Builder’s Guide to Agentic Search & Retrieval with Doug Turnbull & John Berryman

The best way to build a horrible search product? Don’t ever measure anything against what a user wants.Search veterans Doug Turnbull (Led Search at Reddit + Shopify; Wrote Relevant Search + AI Powered Search) and John Berryman (Early Engineer on Github Copilot; Author of Relevant Search + Prompt Engineering for LLMs), join Hugo to talk about how to build Agentic Search Applications.We Discuss:* The evolution of information retrieval as it moves from traditional keyword search toward “agentic search“ and what this means for builders.* John’s five-level maturity model (you can prototype today!) for AI adoption, moving from Trad Search to conversational AI to asynchronous research assistants that reason about result quality.* The Agentic Search Builders Playbook, including why and how you should “hand-roll” your own agentic loops to maintain control;* The importance of “revealed preferences” that LLM-judges often miss (evaluations must use real clickstream data to capture “revealed preferences” that semantic relevance alone cannot infer)* Patterns and Anti-Patterns for Agentic Search Applications* Learning and teaching Search in the Age of AgentsYou can find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈Doug and Hugo are also doing a free lightning lesson on Feb 20 about How To Build Your First Agentic Search Application! You’ll walk away with a framework & code to build your first agentic search app. Register here to join live or get the recording after.Links and ResourcesGuests* Arcturus Labs (John’s website)* Software Doug (Doug’s website)* John Berryman on LinkedIn* Doug Turnbull on LinkedInBooks* Relevant Search by Doug Turnbull & John Berryman (Manning)* AI-Powered Search by Doug Turnbull (Manning)* Prompt Engineering for LLMs by John Berryman (O’Reilly)Blog Posts* Incremental AI Adoption for E-commerce by John Berryman* Roaming RAG – RAG without the Vector Database by John Berryman* Agents Turn Simple Keyword Search into Compelling Search Experiences by Doug Turnbull* A Simple Agentic Loop with Just Python Functions by Doug Turnbull* Agentic Code Generation to Optimize a Search Reranker by Doug Turnbull* LLM Judges Aren’t the Shortcut You Think by Doug Turnbul (Hugo’s 5 minute video below)* Malleable Software by Ink & Switch (inc. Geoffrey Lit)* Patterns and Anti-Patterns for Building with AI by Hugo Bowne-AndersonOther Resources* The Rise of Agentic Search, a recent VG Podcast with Jeff Huber* Karpathy on Cognitive Core LLMs* Cheat at Search with Agents course by Doug Turnbull (use code: vanishinggradients for $200 off)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube* Join the final cohort of our Building AI Applications course in Q1, 2026 (25% off for listeners)Timestamps (for YouTube livestream)00:00 How to Build Agentic Search & Retrieval Systems02:48 Defining Search and AI03:26 Evolution of Search Technologies08:46 Search in E-commerce and Other Domains12:15 Combining Search and AI: RAG and LLMs23:50 User Intent and Search Optimization29:47 Levels of AI Integration in Search32:25 Exploring the Complexity of Search in Various Domains33:49 The Evolution and Impact of Agentic Search34:07 Defining Terms: RAG and Agentic Search34:52 The Research Loop and Tool Interaction35:55 Formal Protocols and Structured Outputs38:39 Building Agentic Search Experiences: Tips and Advice41:50 The Importance of Empathy in AI and Search Development54:30 The Role of UX in Search Applications01:01:15 Future of Search: Malleable User Interfaces01:02:38 Exploring Malleable Software01:04:20 The Coordination Challenge in Software Development01:05:23 The Impact of Claude Code & Claude Cowork01:06:22 The Future of Knowledge Work with AI01:12:39 Evaluating Search Algorithms with AI01:15:15 The Role of Agents in Search Optimization01:29:55 Teaching AI and Search Techniques01:34:25 Final Thoughts and Farewell👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgpod Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jan 23, 20261h 28m

Episode 67: Saving Hundreds of Hours of Dev Time with AI Agents That Learn

This is continual learning, right? Everyone has been talking about continual learning as the next challenge in AI. Actually, it’s solved. Just tell it to keep some notes somewhere. Sure, it’s not, it’s not machine learning, but in some ways it is because when it will load this text file again, it will influence what it does … And it works so well: it’s easy to understand. It’s easy to inspect, it’s easy to evolve and modify!Eleanor Berger and Isaac Flaath, the minds behind Elite AI Assisted Coding, join Hugo to talk about how to redefine software development through effective AI-assisted coding, leveraging “specification-first” approaches and advanced agentic workflows.We Discuss:* Markdown learning loops: Use simple agents.md files for agents to self-update rules and persist context, creating inspectable, low-cost learning;* Intent-first development: As AI commoditizes syntax, defining clear specs and what makes a result “good” becomes the core, durable developer skill;* Effortless documentation: Leverage LLMs to distill messy “brain dumps” or walks-and-talks into structured project specifications, offloading context faster;* Modular agent skills: Transition from MCP servers to simple markdown-based “skills” with YAML and scripts, allowing progressive disclosure of tool details;* Scheduled async agents: Break the chat-based productivity ceiling by using GitHub Actions or Cron jobs for agents to work on issues, shifting humans to reviewers;* Automated tech debt audits: Deploy background agents to identify duplicate code, architectural drift, or missing test coverage, leveraging AI to police AI-induced messiness;* Explicit knowledge culture: AI agents eliminate “cafeteria chat” by forcing explicit, machine-readable documentation, solving the perennial problem of lost institutional knowledge;* Tiered model strategy: Optimize token spend by using high-tier “reasoning” models (e.g., Opus) for planning and low-cost, high-speed models (e.g., Flash) for execution;* Ephemeral software specs: With near-zero generation costs, software shifts from static products to dynamic, regenerated code based on a permanent, underlying specification.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Eleanor & Isaac are teaching their next cohort of their Elite AI Assisted Coding course starting this week. They’re kindly giving readers of Vanishing Gradients 25% off. Use this link.👈👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈Show Notes* Elite AI Assisted Coding Substack* Eleanor Berger on LinkedIn* Isaac Flaath on LinkedIn* Elite AI Assisted Coding Course (Use the code HUGO for 25% off)* How to Build an AI Agent with AI-Assisted Coding* Eleanor/Isaac’s blog post “The SpecFlow Process for AI Coding”* Eleanor’s growing list of (free) tutorials on Agent Skills* Eleanor’s YouTube playlist on agent skills* Eleanor’s blog post “Are (Agent) Skills the New Apps”* Simon Willison’s blog post on skills/general computer automation/data journalism agents* Eleanor/Isaac’s blog post about asynchronous client agents in GitHub actions* Eleanor/Isaac’s blog post on agentic coding workflows with Hang Yu, Product Lead for Qoder @ Alibaba* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube* Join the final cohort of our Building AI Applications course in Q1, 2026 (25% off for listeners)Timestamps (for YouTube livestream)00:00 Introduction to Elite AI Assisted Coding02:24 Starting a New AI Project: Best Practices03:19 The Importance of Context in AI Projects07:19 Specification-First Planning12:01 Sharing Intent and Documentation18:27 Living Documentation and Continual Learning24:36 Choosing the Right Tools and Models29:18 Managing Costs and Token Usage40:16 Using Different Models for Different Tasks43:41 Mastering One Model for Better Results44:54 The Rise of Agent Skills in 202645:34 Understanding the Importance of Skills47:18 Practical Applications of Agent Skills01:11:43 Security Concerns with AI Agents01:15:02 Collaborative AI-Assisted Coding01:18:59 Future of AI-Assisted Coding01:22:27 Key Takeaways for Effective AI-Assisted CodingLive workshop with Eleanor, Isaac, & HugoWe also recently did a 90-minute workshop on How to Build an AI Agent with AI-Assisted Coding.We wrote a blog post on it for those who don’t have 90 minutes right now. Check it out here.I then made a 4 min video about it all for those who don’t have time to read the blog post.👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-a

Jan 14, 20261h 18m

Episode 66: The Agent Paradox - Why Moderna's Most Productive AI Systems Aren't Agents

Surprise. We don’t have agents. I actually went in and did an audit of all the LLM applications that we’ve developed internally. And if you were to take Anthropic’s definition of workflow versus agent, we don’t have agents. I would not classify any of our applications as agents. xEric Ma, who leads Research Data Science in the Data Science and AI group at Moderna, joins Hugo on moving past the hype of autonomous agents to build reliable, high-value workflows.We discuss:* Reliable Workflows: Prioritize rigid workflows over dynamic AI agents to ensure reliability and minimize stochasticity in production environments;* Permission Mapping: The true challenge in regulated environments is security, specifically mapping permissions across source documents, vector stores, and model weights;* Trace Log Risk: LLM execution traces pose a regulatory risk, inadvertently leaking restricted data like trade secrets or personal information;* High-Value Data Work: LLMs excel at transforming archived documents and freeform forms into required formats, offloading significant “janitorial” work from scientists;* “Non-LLM” First: Solve problems with simpler tools like Python or ML models before LLMs to ensure robustness and eliminate generative AI stochasticity;* Contextual Evaluation: Tailor evaluation rigor to consequences; low-stakes tools can be “vibe-checked,” while patient safety outputs demand exhaustive error characterization;* Serverless Biotech Backbone: Serverless infrastructure like Modal and reactive notebooks such as Marimo empowers biotech data scientists for rapid deployment without heavy infrastructure overhead.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch👉 Eric & Hugo have a free upcoming livestream workshop: Building Tools for Thinking with AI (register to join live or get the recording afterwards) 👈Show notes* Eric’s website* Eric Ma on LinkedIn* Eric’s blog* Eric’s data science newsletter* Building Effective AI Agents by the Anthropic team* Wow, Marimo from Eric’s blog* Wow, Modal from Eric’s blog* Upcoming Events on Luma* Watch the podcast video on YouTube* Join the final cohort of our Building AI Applications course in Q1, 2026 (35% off for listeners)Timestamps00:00 Defining Agents and Workflows02:04 Challenges in Regulated Environments04:24 Eric Ma's Role at Moderna, Leading Research Data Science in the Data Science and AI Group12:37 Document Reformatting and Automation15:42 Data Security and Permission Mapping20:05 Choosing the Right Model for Production20:41 Evaluating Model Changes with Benchmarks23:10 Vibe-Based Evaluation vs. Formal Testing27:22 Security and Fine-Tuning in LLMs28:45 Challenges and Future of Fine-Tuning34:00 Security Layers and Information Leakage37:48 Wrap-Up and Final Remarks👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2026. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jan 8, 202642 min

Episode 65: The Rise of Agentic Search

We’re really moving from a world where humans are authoring search queries and humans are executing those queries and humans are digesting the results to a world where AI is doing that for us.Jeff Huber, CEO and co-founder of Chroma, joins Hugo to talk about how agentic search and retrieval are changing the very nature of search and software for builders and users alike.We Discuss:* “Context engineering”, the strategic design and engineering of what context gets fed to the LLM (data, tools, memory, and more), which is now essential for building reliable, agentic AI systems;* Why simply stuffing large context windows is no longer feasible due to “context rot” as AI applications become more goal-oriented and capable of multi-step tasks* A framework for precisely curating and providing only the most relevant, high-precision information to ensure accurate and dependable AI systems;* The “agent harness”, the collection of tools and capabilities an agent can access, and how to construct these advanced systems;* Emerging best practices for builders, including hybrid search as a robust default, creating “golden datasets” for evaluation, and leveraging sub-agents to break down complex tasks* The major unsolved challenge of agent evaluation, emphasizing a shift towards iterative, data-centric approaches.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈Oh! One more thing: we’ve just announced a Vanishing Gradients livestream for January 21 that you may dig:* A Builder’s Guide to Agentic Search & Retrieval with Doug Turnbull and John Berryman (register to join live or get the recording afterwards.Show notes* Jeff Huber on Twitter* Jeff Huber on LinkedIn* Try Chroma!* Context Rot: How Increasing Input Tokens Impacts LLM Performance by The Chroma Team* AI Agent Harness, 3 Principles for Context Engineering, and the Bitter Lesson Revisited* From Context Engineering to AI Agent Harnesses: The New Software Discipline* Generative Benchmarking by The Chroma Team* Effective context engineering for AI agents by The Anthropic Team* Making Sense of Millions of Conversations for AI Agents by Ivan Leo (Manus) and Hugo* How we built our multi-agent research system by The Anthropic Team* Upcoming Events on Luma* Watch the podcast video on YouTube👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Dec 19, 202551 min

Episode 64: Data Science Meets Agentic AI with Michael Kennedy (Talk Python)

We have been sold a story of complexity. Michael Kennedy (Talk Python) argues we can escape this by relentlessly focusing on the problem at hand, reducing costs by orders of magnitude in software, data, and AI.In this episode, Michael joins Hugo to dig into the practical side of running Python systems at scale. They connect these ideas to the data science workflow, exploring which software engineering practices allow AI teams to ship faster and with more confidence. They also detail how to deploy systems without unnecessary complexity and how Agentic AI is fundamentally reshaping development workflows.We talk through:- Escaping complexity hell to reduce costs and gain autonomy- The specific software practices, like the "Docker Barrier", that matter most for data scientists- How to replace complex cloud services with a simple, robust $30/month stack- The shift from writing code to "systems thinking" in the age of Agentic AI- How to manage the people-pleasing psychology of AI agents to prevent broken code- Why struggle is still essential for learning, even when AI can do the work for youLINKSTalk Python In Production, the Book! (https://talkpython.fm/books/python-in-production)Just Enough Python for Data Scientists Course (https://training.talkpython.fm/courses/just-enough-python-for-data-scientists)Agentic AI Programming for Python Course (https://training.talkpython.fm/courses/agentic-ai-programming-for-python)Talk Python To Me (https://talkpython.fm/) and a recent episode with Hugo as guest: Building Data Science with Foundation LLM Models (https://talkpython.fm/episodes/show/526/building-data-science-with-foundation-llm-models)Python Bytes podcast (https://pythonbytes.fm/)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtube.com/live/jfSRxxO3aRo?feature=share)Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (35% off for listeners) (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Dec 3, 20251h 2m

Episode 63: Why Gemini 3 Will Change How You Build AI Agents with Ravin Kumar (Google DeepMind)

Gemini 3 is a few days old and the massive leap in performance and model reasoning has big implications for builders: as models begin to self-heal, builders are literally tearing out the functionality they built just months ago... ripping out the defensive coding and reshipping their agent harnesses entirely.Ravin Kumar (Google DeepMind) joins Hugo to breaks down exactly why the rapid evolution of models like Gemini 3 is changing how we build software. They detail the shift from simple tool calling to building reliable "Agent Harnesses", explore the architectural tradeoffs between deterministic workflows and high-agency systems, the nuance of preventing context rot in massive windows, and why proper evaluation infrastructure is the only way to manage the chaos of autonomous loops.They talk through:- The implications of models that can "self-heal" and fix their own code- The two cultures of agents: LLM workflows with a few tools versus when you should unleash high-agency, autonomous systems.- Inside NotebookLM: moving from prototypes to viral production features like Audio Overviews- Why Needle in a Haystack benchmarks often fail to predict real-world performance- How to build agent harnesses that turn model capabilities into product velocity- The shift from measuring latency to managing time-to-compute for reasoning tasksLINKSFrom Context Engineering to AI Agent Harnesses: The New Software Discipline, a podcast Hugo did with Lance Martin, LangChain (https://high-signal.delphina.ai/episode/context-engineering-to-ai-agent-harnesses-the-new-software-discipline)Context Rot: How Increasing Input Tokens Impacts LLM Performance (https://research.trychroma.com/context-rot)Effective context engineering for AI agents by Anthropic (https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/CloimQsQuJM)Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Nov 22, 20251h 0m

Episode 62: Practical AI at Work: How Execs and Developers Can Actually Use LLMs

Many leaders are trapped between chasing ambitious, ill-defined AI projects and the paralysis of not knowing where to start. Dr. Randall Olson argues that the real opportunity isn't in moonshots, but in the "trillions of dollars of business value" available right now. As co-founder of Wyrd Studios, he bridges the gap between data science, AI engineering, and executive strategy to deliver a practical framework for execution.In this episode, Randy and Hugo lay out how to find and solve what might be considered "boring but valuable" problems, like an EdTech company automating 20% of its support tickets with a simple retrieval bot instead of a complex AI tutor. They discuss how to move incrementally along the "agentic spectrum" and why treating AI evaluation with the same rigor as software engineering is non-negotiable for building a disciplined, high-impact AI strategy.They talk through:How a non-technical leader can prototype a complex insurance claim classifier using just photos and a ChatGPT subscription.The agentic spectrum: Why you should start by automating meeting summaries before attempting to build fully autonomous agents.The practical first step for any executive: Building a personal knowledge base with meeting transcripts and strategy docs to get tailored AI advice.Why treating AI evaluation with the same rigor as unit testing is essential for shipping reliable products.The organizational shift required to unlock long-term AI gains, even if it means a short-term productivity dip.LINKSRandy on LinkedIn (https://www.zenml.io/llmops-database)Wyrd Studios (https://thewyrdstudios.com/)Stop Building AI Agents (https://www.decodingai.com/p/stop-building-ai-agents)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)🎓 Learn more:In Hugo's course: Building AI Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us! Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Oct 31, 202559 min

Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in Production

Most AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI.Drawing from his work at Zen ML, Alex details why success requires scaling down and enforcing MLOps discipline to navigate the unpredictable "Agent Reliability Cliff". He provides the essential architectural shifts, evaluation hygiene techniques, and practical steps needed to move beyond guesswork and build scalable, trustworthy AI products.We talk through:- Why "shoving a thousand agents" into an app is the fastest route to unmanageable chaos- The essential MLOps hygiene (tracing and continuous evals) that most teams skip- The optimal (and very low) limit for the number of tools an agent can reliably use- How to use human-in-the-loop strategies to manage the risk of autonomous failure in high-sensitivity domains- The principle of using simple Python/RegEx before resorting to costly LLM judgesLINKSThe LLMOps Database: 925 entries as of today....submit a use case to help it get to 1K! (https://www.zenml.io/llmops-database)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)🎓 Learn more:-This was a guest Q&A from Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us! Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Oct 16, 202528 min

Episode 60: 10 Things I Hate About AI Evals with Hamel Husain

Most AI teams find "evals" frustrating, but ML Engineer Hamel Husain argues they’re just using the wrong playbook. In this episode, he lays out a data-centric approach to systematically measure and improve AI, turning unreliable prototypes into robust, production-ready systems.Drawing from his experience getting countless teams unstuck, Hamel explains why the solution requires a "revenge of the data scientists." He details the essential mindset shifts, error analysis techniques, and practical steps needed to move beyond guesswork and build AI products you can actually trust.We talk through: The 10(+1) critical mistakes that cause teams to waste time on evals Why "hallucination scores" are a waste of time (and what to measure instead) The manual review process that finds major issues in hours, not weeks A step-by-step method for building LLM judges you can actually trust How to use domain experts without getting stuck in endless review committees Guest Bryan Bischof's "Failure as a Funnel" for debugging complex AI agentsIf you're tired of ambiguous "vibe checks" and want a clear process that delivers real improvement, this episode provides the definitive roadmap.LINKSHamel's website and blog (https://hamel.dev/)Hugo speaks with Philip Carter (Honeycomb) about aligning your LLM-as-a-judge with your domain expertise (https://vanishinggradients.fireside.fm/51)Hamel Husain on Lenny's pocast, which includes a live demo of error analysis (https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill)The episode of VG in which Hamel and Hugo talk about Hamel's "data consulting in Vegas" era (https://vanishinggradients.fireside.fm/9)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtube.com/live/QEk-XwrkqhI?feature=share)Hamel's AI evals course, which he teaches with Shreya Shankar (UC Berkeley): starts Oct 6 and this link gives 35% off! (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME) https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Sep 30, 20251h 13m

Episode 59: Patterns and Anti-Patterns For Building with AI

John Berryman (Arcturus Labs; early GitHub Copilot engineer; co-author of Relevant Search and Prompt Engineering for LLMs) has spent years figuring out what makes AI applications actually work in production. In this episode, he shares the “seven deadly sins” of LLM development — and the practical fixes that keep projects from stalling. From context management to retrieval debugging, John explains the patterns he’s seen succeed, the mistakes to avoid, and why it helps to think of an LLM as an “AI intern” rather than an all-knowing oracle. We talk through: - Why chasing perfect accuracy is a dead end - How to use agents without losing control - Context engineering: fitting the right information in the window - Starting simple instead of over-orchestrating - Separating retrieval from generation in RAG - Splitting complex extractions into smaller checks - Knowing when frameworks help — and when they slow you down A practical guide to avoiding the common traps of LLM development and building systems that actually hold up in production.LINKS:Context Engineering for AI Agents, a free, upcoming lightning lesson from John and Hugo (https://maven.com/p/4485aa/context-engineering-for-ai-agents)The Hidden Simplicity of GenAI Systems, a previous lightning lesson from John and Hugo (https://maven.com/p/a8195d/the-hidden-simplicity-of-gen-ai-systems)Roaming RAG – RAG without the Vector Database, by John (https://arcturus-labs.com/blog/2024/11/21/roaming-rag--rag-without-the-vector-database/)Cut the Chit-Chat with Artifacts, by John (https://arcturus-labs.com/blog/2024/11/11/cut-the-chit-chat-with-artifacts/)Prompt Engineering for LLMs by John and Albert Ziegler (https://amzn.to/4gChsFf)Relevant Search by John and Doug Turnbull (https://amzn.to/3TXmDHk)Arcturus Labs (https://arcturus-labs.com/)Watch the podcast on YouTube (https://youtu.be/mKTQGKIUq8M)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)🎓 Learn more:Hugo's course (this episode was a guest Q&A from the course): Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Sep 23, 202547 min

Episode 58: Building GenAI Systems That Make Business Decisions with Thomas Wiecki (PyMC Labs)

While most conversations about generative AI focus on chatbots, Thomas Wiecki (PyMC Labs, PyMC) has been building systems that help companies make actual business decisions. In this episode, he shares how Bayesian modeling and synthetic consumers can be combined with LLMs to simulate customer reactions, guide marketing spend, and support strategy. Drawing from his work with Colgate and others, Thomas explains how to scale survey methods with AI, where agents fit into analytics workflows, and what it takes to make these systems reliable. We talk through: Using LLMs as “synthetic consumers” to simulate surveys and test product ideas How Bayesian modeling and causal graphs enable transparent, trustworthy decision-making Building closed-loop systems where AI generates and critiques ideas Guardrails for multi-agent workflows in marketing mix modeling Where generative AI breaks (and how to detect failure modes) The balance between useful models and “correct” models If you’ve ever wondered how to move from flashy prototypes to AI systems that actually inform business strategy, this episode shows what it takes. LINKS:The AI MMM Agent, An AI-Powered Shortcut to Bayesian Marketing Mix Insights (https://www.pymc-labs.com/blog-posts/the-ai-mmm-agent)AI-Powered Decision Making Under Uncertainty Workshop w/ Allen Downey & Chris Fonnesbeck (PyMC Labs) (https://youtube.com/live/2Auc57lxgeU)The Podcast livestream on YouTube (https://youtube.com/live/so4AzEbgSjw?feature=share)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Sep 9, 20251h 0m

Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)

While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply. Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines. We talk through: - Treating LLM workflows as ETL pipelines for unstructured text - Error analysis: why you need humans reviewing the first 50–100 traces - Guardrails like retries, validators, and “gleaning” - How LLM judges work — rubrics, pairwise comparisons, and cost trade-offs - Cheap vs. expensive models: when to swap for savings - Where agents fit in (and where they don’t) If you’ve ever wondered how to move beyond unreliable demos, this episode shows how to scale LLMs to millions of documents — without breaking the bank.LINKSShreya's website (https://www.sh-reya.com/)DocETL, A system for LLM-powered data processing (https://www.docetl.org/)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/3r_Hsjy85nk)Shreya's AI evals course, which she teaches with Hamel "Evals" Husain (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Aug 29, 202541 min

Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters

While much of the AI world chases ever-larger models, Ravin Kumar (Google DeepMind) and his team build across the size spectrum, from billions of parameters down to this week’s release: Gemma 270M, the smallest member yet of the Gemma 3 open-weight family. At just 270 million parameters, a quarter the size of Gemma 1B, it’s designed for speed, efficiency, and fine-tuning. We explore what makes 270M special, where it fits alongside its billion-parameter siblings, and why you might reach for it in production even if you think “small” means “just for experiments.” We talk through: - Where 270M fits into the Gemma 3 lineup — and why it exists - On-device use cases where latency, privacy, and efficiency matter - How smaller models open up rapid, targeted fine-tuning - Running multiple models in parallel without heavyweight hardware - Why “small” models might drive the next big wave of AI adoption If you’ve ever wondered what you’d do with a model this size (or how to squeeze the most out of it) this episode will show you how small can punch far above its weight.LINKSIntroducing Gemma 3 270M: The compact model for hyper-efficient AI (Google Developer Blog) (https://developers.googleblog.com/en/introducing-gemma-3-270m/)Full Model Fine-Tune Guide using Hugging Face Transformers (https://ai.google.dev/gemma/docs/core/huggingface_text_full_finetune)The Gemma 270M model on HuggingFace (https://huggingface.co/google/gemma-3-270m)The Gemma 270M model on Ollama (https://ollama.com/library/gemma3:270m)Building AI Agents with Gemma 3, a workshop with Ravin and Hugo (https://www.youtube.com/live/-IWstEStqok) (Code here (https://github.com/canyon289/ai_agent_basics))From Images to Agents: Building and Evaluating Multimodal AI Workflows, a workshop with Ravin and Hugo (https://www.youtube.com/live/FNlM7lSt8Uk)(Code here (https://github.com/canyon289/ai_image_agent))Evaluating AI Agents: From Demos to Dependability, an upcoming workshop with Ravin and Hugo (https://lu.ma/ezgny3dl)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/VZDw6C2A_8E)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 ($600 off early bird discount for November cohort availiable until August 16) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Aug 14, 202545 min

Episode 55: From Frittatas to Production LLMs: Breakfast at SciPy

Traditional software expects 100% passing tests. In LLM-powered systems, that’s not just unrealistic — it’s a feature, not a bug. Eric Ma leads research data science in Moderna’s data science and AI group, and over breakfast at SciPy we explored why AI products break the old rules, what skills different personas bring (and miss), and how to keep systems alive after the launch hype fades. You’ll hear the clink of coffee cups, the murmur of SciPy in the background, and the occasional bite of frittata as we talk (hopefully also a feature, not a bug!)We talk through: • The three personas — and the blind spots each has when shipping AI systems • Why “perfect” tests can be a sign you’re testing the wrong thing • Development vs. production observability loops — and why you need both • How curiosity about failing data separates good builders from great ones • Ways large organizations can create space for experimentation without losing delivery focus If you want to build AI products that thrive in the messy real world, this episode will help you embrace the chaos — and make it work for you.LINKSEric' Website (https://ericmjl.github.io/)More about the workshops Eric and Hugo taught at SciPy (https://hugobowne.substack.com/p/stress-testing-llms-evaluation-frameworks)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 ($600 off early bird discount for November cohort availiable until August 16) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Aug 12, 202538 min

Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference

Colab is cozy. But production won’t fit on a single GPU.Zach Mueller leads Accelerate at Hugging Face and spends his days helping people go from solo scripts to scalable systems. In this episode, he joins me to demystify distributed training and inference — not just for research labs, but for any ML engineer trying to ship real software.We talk through: • From Colab to clusters: why scaling isn’t just about training massive models, but serving agents, handling load, and speeding up iteration • Zero-to-two GPUs: how to get started without Kubernetes, Slurm, or a PhD in networking • Scaling tradeoffs: when to care about interconnects, which infra bottlenecks actually matter, and how to avoid chasing performance ghosts • The GPU middle class: strategies for training and serving on a shoestring, with just a few cards or modest credits • Local experiments, global impact: why learning distributed systems—even just a little—can set you apart as an engineerIf you’ve ever stared at a Hugging Face training script and wondered how to run it on something more than your laptop: this one’s for you.LINKSZach on LinkedIn (https://www.linkedin.com/in/zachary-mueller-135257118/)Hugo's blog post on Stop Buliding AI Agents (https://www.linkedin.com/posts/hugo-bowne-anderson-045939a5_yesterday-i-posted-about-stop-building-ai-activity-7346942036752613376-b8-t/)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/stop-building-agents)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338Zach's course (45% off for VG listeners!): Scratch to Scale: Large-Scale Training in the Modern World (https://maven.com/walk-with-code/scratch-to-scale?promoCode=hugo39) -- https://maven.com/walk-with-code/scratch-to-scale?promoCode=hugo39📺 Watch the video version on YouTube: YouTube link (https://youtube.com/live/76NAtzWZ25s?feature=share) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jul 18, 202541 min

Episode 53: Human-Seeded Evals & Self-Tuning Agents: Samuel Colvin on Shipping Reliable LLMs

Demos are easy; durability is hard. Samuel Colvin has spent a decade building guardrails in Python (first with Pydantic, now with Logfire), and he’s convinced most LLM failures have nothing to do with the model itself. They appear where the data is fuzzy, the prompts drift, or no one bothered to measure real-world behavior. Samuel joins me to show how a sprinkle of engineering discipline keeps those failures from ever reaching users.We talk through: • Tiny labels, big leverage: how five thumbs-ups/thumbs-downs are enough for Logfire to build a rubric that scores every call in real time • Drift alarms, not dashboards: catching the moment your prompt or data shifts instead of reading charts after the fact • Prompt self-repair: a prototype agent that rewrites its own system prompt—and tells you when it still doesn’t have what it needs • The hidden cost curve: why the last 15 percent of reliability costs far more than the flashy 85 percent demo • Business-first metrics: shipping features that meet real goals instead of chasing another decimal point of “accuracy”If you’re past the proof-of-concept stage and staring down the “now it has to work” cliff, this episode is your climbing guide.LINKSPydantic (https://pydantic.dev/)Logfire (https://pydantic.dev/logfire)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/stop-building-agents)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — next cohort starts July 8: https://maven.com/s/course/d56067f338📺 Watch the video version on YouTube: YouTube link (https://youtube.com/live/wk6rPZ6qJSY?feature=share) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jul 8, 202544 min

Episode 52: Why Most LLM Products Break at Retrieval (And How to Fix Them)

Most LLM-powered features do not break at the model. They break at the context. So how do you retrieve the right information to get useful results, even under vague or messy user queries?In this episode, we hear from Eric Ma, who leads data science research in the Data Science and AI group at Moderna. He shares what it takes to move beyond toy demos and ship LLM features that actually help people do their jobs.We cover:• How to align retrieval with user intent and why cosine similarity is not the answer• How a dumb YAML-based system outperformed so-called smart retrieval pipelines• Why vague queries like “what is this all about” expose real weaknesses in most systems• When vibe checks are enough and when formal evaluation is worth the effort• How retrieval workflows can evolve alongside your product and user needsIf you are building LLM-powered systems and care about how they work, not just whether they work, this one is for you.LINKSEric's website (https://ericmjl.github.io/)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/stop-building-agents)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — next cohort starts July 8: https://maven.com/s/course/d56067f338📺 Watch the video version on YouTube: YouTube link (https://youtu.be/d-FaR5Ywd5k) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jul 2, 202528 min

Episode 51: Why We Built an MCP Server and What Broke First

What does it take to actually ship LLM-powered features, and what breaks when you connect them to real production data?In this episode, we hear from Philip Carter — then a Principal PM at Honeycomb and now a Product Management Director at Salesforce. In early 2023, he helped build one of the first LLM-powered SaaS features to ship to real users. More recently, he and his team built a production-ready MCP server.We cover: • How to evaluate LLM systems using human-aligned judges • The spreadsheet-driven process behind shipping Honeycomb’s first LLM feature • The challenges of tool usage, prompt templates, and flaky model behavior • Where MCP shows promise, and where it breaks in the real worldIf you’re working on LLMs in production, this one’s for you!LINKSSo We Shipped an AI Product: Did it Work? by Philip Carter (https://www.honeycomb.io/blog/we-shipped-ai-product)Vanishing Gradients YouTube Channel (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/ai-as-a-civilizational-technology)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — next cohort starts July 8: https://maven.com/s/course/d56067f338📺 Watch the video version on YouTube: YouTube link (https://youtu.be/JDMzdaZh9Ig) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jun 26, 202547 min

Episode 50: A Field Guide to Rapidly Improving AI Products -- With Hamel Husain

If we want AI systems that actually work, we need to get much better at evaluating them, not just building more pipelines, agents, and frameworks.In this episode, Hugo talks with Hamel Hussain (ex-Airbnb, GitHub, DataRobot) about how teams can improve AI products by focusing on error analysis, data inspection, and systematic iteration. The conversation is based on Hamel’s blog post A Field Guide to Rapidly Improving AI Products, which he joined Hugo’s class to discuss.They cover:🔍 Why most teams struggle to measure whether their systems are actually improving 📊 How error analysis helps you prioritize what to fix (and when to write evals) 🧮 Why evaluation isn’t just a metric — but a full development process ⚠️ Common mistakes when debugging LLM and agent systems 🛠️ How to think about the tradeoffs in adding more evals vs. fixing obvious issues 👥 Why enabling domain experts — not just engineers — can accelerate iterationIf you’ve ever built an AI system and found yourself unsure how to make it better, this conversation is for you.LINKS* A Field Guide to Rapidly Improving AI Products by Hamel Husain (https://hamel.dev/blog/posts/field-guide/)* Vanishing Gradients YouTube Channel (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) * Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)* Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/ai-as-a-civilizational-technology)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — next cohort starts July 8: https://maven.com/s/course/d56067f338Hamel & Shreya's course: AI Evals For Engineers & PMs (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME) — use code GOHUGORGOHOME for $800 off📺 Watch the video version on YouTube: YouTube link (https://youtu.be/rWToRi2_SeY) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jun 17, 202527 min

Episode 49: Why Data and AI Still Break at Scale (and What to Do About It)

If we want AI systems that actually work in production, we need better infrastructure—not just better models.In this episode, Hugo talks with Akshay Agrawal (Marimo, ex-Google Brain, Netflix, Stanford) about why data and AI pipelines still break down at scale, and how we can fix the fundamentals: reproducibility, composability, and reliable execution.They discuss:🔁 Why reactive execution matters—and how current tools fall short🛠️ The design goals behind Marimo, a new kind of Python notebook⚙️ The hidden costs of traditional workflows (and what breaks at scale)📦 What it takes to build modular, maintainable AI apps🧪 Why debugging LLM systems is so hard—and what better tooling looks like🌍 What we can learn from decades of tools built for and by data practitionersToward the end of the episode, Hugo and Akshay walk through two live demos: Hugo shares how he’s been using Marimo to prototype an app that extracts structured data from world leader bios, and Akshay shows how Marimo handles agentic workflows with memory and tool use—built entirely in a notebook.This episode is about tools, but it’s also about culture. If you’ve ever hit a wall with your current stack—or felt like your tools were working against you—this one’s for you.LINKS* marimo | a next-generation Python notebook (https://marimo.io/)* SciPy conference, 2025 (https://www.scipy2025.scipy.org/)* Hugo's face Marimo World Leader Face Embedding demo (https://www.youtube.com/watch?v=DO21QEcLOxM)* Vanishing Gradients YouTube Channel (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) * Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)* Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/ai-as-a-civilizational-technology)* Watch the podcast here on YouTube! (https://youtube.com/live/WVxAz19tgZY?feature=share)🎓 Want to go deeper?Check out Hugo's course: Building LLM Applications for Data Scientists and Software Engineers.Learn how to design, test, and deploy production-grade LLM systems — with observability, feedback loops, and structure built in.This isn’t about vibes or fragile agents. It’s about making LLMs reliable, testable, and actually useful.Includes over $800 in compute credits and guest lectures from experts at DeepMind, Moderna, and more.Cohort starts July 8 — Use this link for a 10% discount (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=LLM10) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jun 5, 20251h 21m

Episode 48: How to Benchmark AGI with Greg Kamradt (ARC-AGI)

If we want to make progress toward AGI, we need a clear definition of intelligence—and a way to measure it.In this episode, Hugo talks with Greg Kamradt, President of the ARC Prize Foundation, about ARC-AGI: a benchmark built on Francois Chollet’s definition of intelligence as “the efficiency at which you learn new things.” Unlike most evals that focus on memorization or task completion, ARC is designed to measure generalization—and expose where today’s top models fall short.They discuss:🧠 Why we still lack a shared definition of intelligence🧪 How ARC tasks force models to learn novel skills at test time📉 Why GPT-4-class models still underperform on ARC🔎 The limits of traditional benchmarks like MMLU and Big-Bench⚙️ What the OpenAI O₃ results reveal—and what they don’t💡 Why generalization and efficiency, not raw capability, are key to AGIGreg also shares what he’s seeing in the wild: how startups and independent researchers are using ARC as a North Star, how benchmarks shape the frontier, and why the ARC team believes we’ll know we’ve reached AGI when humans can no longer write tasks that models can’t solve.This conversation is about evaluation—not hype. If you care about where AI is really headed, this one’s worth your time.LINKS* ARC Prize -- What is ARC-AGI? (https://arcprize.org/arc-agi)* On the Measure of Intelligence by François Chollet (https://arxiv.org/abs/1911.01547)* Greg Kamradt on Twitter (https://x.com/GregKamradt)* Hugo's High Signal Podcast with Fei-Fei Li (https://high-signal.delphina.ai/episode/fei-fei-on-how-human-centered-ai-actually-gets-built)* Vanishing Gradients YouTube Channel (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) * Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)* Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/ai-as-a-civilizational-technology)* Watch the podcast here on YouTube! (https://youtu.be/wU82fz4iRfo)🎓 Want to go deeper?Check out Hugo's course: Building LLM Applications for Data Scientists and Software Engineers.Learn how to design, test, and deploy production-grade LLM systems — with observability, feedback loops, and structure built in.This isn’t about vibes or fragile agents. It’s about making LLMs reliable, testable, and actually useful.Includes over $800 in compute credits and guest lectures from experts at DeepMind, Moderna, and more.Cohort starts July 8 — Use this link for a 10% discount (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=LLM10) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

May 23, 20251h 4m

Episode 47: The Great Pacific Garbage Patch of Code Slop with Joe Reis

What if the cost of writing code dropped to zero — but the cost of understanding it skyrocketed?In this episode, Hugo sits down with Joe Reis to unpack how AI tooling is reshaping the software development lifecycle — from experimentation and prototyping to deployment, maintainability, and everything in between.Joe is the co-author of Fundamentals of Data Engineering and a longtime voice on the systems side of modern software. He’s also one of the sharpest critics of “vibe coding” — the emerging pattern of writing software by feel, with heavy reliance on LLMs and little regard for structure or quality.We dive into: • Why “vibe coding” is more than a meme — and what it says about how we build today • How AI tools expand the surface area of software creation — for better and worse • What happens to technical debt, testing, and security when generation outpaces understanding • The changing definition of “production” in a world of ephemeral, internal, or just-good-enough tools • How AI is flattening the learning curve — and threatening the talent pipeline • Joe’s view on what real craftsmanship means in an age of disposable codeThis conversation isn’t about doom, and it’s not about hype. It’s about mapping the real, messy terrain of what it means to build software today — and how to do it with care.LINKS* Joe's Practical Data Modeling Newsletter on Substack (https://practicaldatamodeling.substack.com/)* Joe's Practical Data Modeling Server on Discord (https://discord.gg/HhSZVvWDBb)* Vanishing Gradients YouTube Channel (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) * Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)🎓 Want to go deeper?Check out my course: Building LLM Applications for Data Scientists and Software Engineers.Learn how to design, test, and deploy production-grade LLM systems — with observability, feedback loops, and structure built in.This isn’t about vibes or fragile agents. It’s about making LLMs reliable, testable, and actually useful.Includes over $800 in compute credits and guest lectures from experts at DeepMind, Moderna, and more.Cohort starts July 8 — Use this link for a 10% discount (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=LLM10) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Apr 7, 20251h 19m

Episode 46: Software Composition Is the New Vibe Coding

What if building software felt more like composing than coding?In this episode, Hugo and Greg explore how LLMs are reshaping the way we think about software development—from deterministic programming to a more flexible, prompt-driven, and collaborative style of building. It’s not just hype or grift—it’s a real shift in how we express intent, reason about systems, and collaborate across roles.Hugo speaks with Greg Ceccarelli—co-founder of SpecStory, former CPO at Pluralsight, and Director of Data Science at GitHub—about the rise of software composition and how it changes the way individuals and teams create with LLMs.We dive into:- Why software composition is emerging as a serious alternative to traditional coding- The real difference between vibe coding and production-minded prototyping- How LLMs are expanding who gets to build software—and how- What changes when you focus on intent, not just code- What Greg is building with SpecStory to support collaborative, traceable AI-native workflows- The challenges (and joys) of debugging and exploring with agentic tools like Cursor and ClaudeWe’ve removed the visual demos from the audio—but you can catch our live-coded Chrome extension and JFK document explorer on YouTube. Links below.JFK Docs Vibe Coding Demo (YouTube) (https://youtu.be/JpXCkuV58QE) Chrome Extension Vibe Coding Demo (YouTube) (https://youtu.be/ESVKp37jDwc) Meditations on Tech (Greg’s Substack) (https://www.meditationsontech.com/) Simon Willison on Vibe Coding (https://simonwillison.net/2025/Mar/19/vibe-coding/) Johnno Whitaker: On Vibe Coding (https://johnowhitaker.dev/essays/vibe_coding.html) Tim O’Reilly – The End of Programming (https://www.oreilly.com/radar/the-end-of-programming-as-we-know-it/) Vanishing Gradients YouTube Channel (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Greg Ceccarelli on LinkedIn (https://www.linkedin.com/in/gregceccarelli/) Greg’s Hacker News Post on GOOD (https://news.ycombinator.com/item?id=43557698) SpecStory: GOOD – Git Companion for AI Workflows (https://github.com/specstoryai/getspecstory/blob/main/GOOD.md)🎓 Want to go deeper?Check out my course: Building LLM Applications for Data Scientists and Software Engineers.Learn how to design, test, and deploy production-grade LLM systems — with observability, feedback loops, and structure built in.This isn’t about vibes or fragile agents. It’s about making LLMs reliable, testable, and actually useful.Includes over $2,500 in compute credits and guest lectures from experts at DeepMind, Moderna, and more.Cohort starts April 7 — Use this link for a 10% discount (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=LLM10)🔍 Want to help shape the future of SpecStory?Greg and the team are looking for design partners for their new SpecStory Teams product—built for collaborative, AI-native software development.If you're working with LLMs in a team setting and want to influence the next wave of developer tools, you can apply here: 👉 specstory.com/teams (https://specstory.com/teams) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Apr 3, 20251h 8m

Episode 45: Your AI application is broken. Here’s what to do about it.

Too many teams are building AI applications without truly understanding why their models fail. Instead of jumping straight to LLM evaluations, dashboards, or vibe checks, how do you actually fix a broken AI app? In this episode, Hugo speaks with Hamel Husain, longtime ML engineer, open-source contributor, and consultant, about why debugging generative AI systems starts with looking at your data. In this episode, we dive into: Why “look at your data” is the best debugging advice no one follows. How spreadsheet-based error analysis can uncover failure modes faster than complex dashboards. The role of synthetic data in bootstrapping evaluation. When to trust LLM judges—and when they’re misleading. Why most AI dashboards measuring truthfulness, helpfulness, and conciseness are often a waste of time. If you're building AI-powered applications, this episode will change how you approach debugging, iteration, and improving model performance in production. LINKSThe podcast livestream on YouTube (https://youtube.com/live/Vz4--82M2_0?feature=share)Hamel's blog (https://hamel.dev/)Hamel on twitter (https://x.com/HamelHusain)Hugo on twitter (https://x.com/hugobowne)Vanishing Gradients on twitter (https://x.com/vanishingdata)Vanishing Gradients on YouTube (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA)Vanishing Gradients on Twitter (https://x.com/vanishingdata)Vanishing Gradients on Lu.ma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Building LLM Application for Data Scientists and SWEs, Hugo course on Maven (use VG25 code for 25% off) (https://maven.com/s/course/d56067f338)Hugo is also running a free lightning lesson next week on LLM Agents: When to Use Them (and When Not To) (https://maven.com/p/ed7a72/llm-agents-when-to-use-them-and-when-not-to?utm_medium=ll_share_link&utm_source=instructor) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Feb 20, 20251h 17m

Episode 44: The Future of AI Coding Assistants: Who’s Really in Control?

AI coding assistants are reshaping how developers write, debug, and maintain code—but who’s really in control? In this episode, Hugo speaks with Tyler Dunn, CEO and co-founder of Continue, an open-source AI-powered code assistant that gives developers more customization and flexibility in their workflows.In this episode, we dive into:- The trade-offs between proprietary vs. open-source AI coding assistants—why open-source might be the future.- How structured workflows, modular AI, and customization help developers maintain control over their tools.- The evolution of AI-powered coding, from autocomplete to intelligent code suggestions and beyond.- Why the best developer experiences come from sensible defaults with room for deeper configuration.- The future of LLM-based software engineering, where fine-tuning models on personal and team-level data could make AI coding assistants even more effective.With companies increasingly integrating AI into development workflows, this conversation explores the real impact of these tools—and the importance of keeping developers in the driver's seat.LINKSThe podcast livestream on YouTube (https://youtube.com/live/8QEgVCzm46U?feature=share)Continue's website (https://www.continue.dev/)Continue is hiring! (https://www.continue.dev/about-us)amplified.dev: We believe in a future where developers are amplified, not automated (https://amplified.dev/)Beyond Prompt and Pray, Building Reliable LLM-Powered Software in an Agentic World (https://www.oreilly.com/radar/beyond-prompt-and-pray/)LLMOps Lessons Learned: Navigating the Wild West of Production LLMs 🚀 (https://www.zenml.io/blog/llmops-lessons-learned-navigating-the-wild-west-of-production-llms)Building effective agents by Erik Schluntz and Barry Zhang, Anthropic (https://www.anthropic.com/research/building-effective-agents)Ty on LinkedIn (https://www.linkedin.com/in/tylerjdunn/)Hugo on twitter (https://x.com/hugobowne)Vanishing Gradients on twitter (https://x.com/vanishingdata)Vanishing Gradients on YouTube (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA)Vanishing Gradients on Twitter (https://x.com/vanishingdata)Vanishing Gradients on Lu.ma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Feb 4, 20251h 34m

Episode 43: Tales from 400+ LLM Deployments: Building Reliable AI Agents in Production

Hugo speaks with Alex Strick van Linschoten, Machine Learning Engineer at ZenML and creator of a comprehensive LLMOps database documenting over 400 deployments. Alex's extensive research into real-world LLM implementations gives him unique insight into what actually works—and what doesn't—when deploying AI agents in production.In this episode, we dive into:- The current state of AI agents in production, from successes to common failure modes- Practical lessons learned from analyzing hundreds of real-world LLM deployments- How companies like Anthropic, Klarna, and Dropbox are using patterns like ReAct, RAG, and microservices to build reliable systems- The evolution of LLM capabilities, from expanding context windows to multimodal applications- Why most companies still prefer structured workflows over fully autonomous agentsWe also explore real-world case studies of production hurdles, including cascading failures, API misfires, and hallucination challenges. Alex shares concrete strategies for integrating LLMs into your pipelines while maintaining reliability and control.Whether you're scaling agents or building LLM-powered systems, this episode offers practical insights for navigating the complex landscape of LLMOps in 2025.LINKSThe podcast livestream on YouTube (https://youtube.com/live/-8Gr9fVVX9g?feature=share)The LLMOps database (https://www.zenml.io/llmops-database)All blog posts about the database (https://www.zenml.io/category/llmops)Anthropic's Building effective agents essay (https://www.anthropic.com/research/building-effective-agents)Alex on LinkedIn (https://www.linkedin.com/in/strickvl/)Hugo on twitter (https://x.com/hugobowne)Vanishing Gradients on twitter (https://x.com/vanishingdata)Vanishing Gradients on YouTube (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA)Vanishing Gradients on Twitter (https://x.com/vanishingdata)Vanishing Gradients on Lu.ma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jan 16, 20251h 1m

Episode 42: Learning, Teaching, and Building in the Age of AI

In this episode of Vanishing Gradients, the tables turn as Hugo sits down with Alex Andorra, host of Learning Bayesian Statistics. Hugo shares his journey from mathematics to AI, reflecting on how Bayesian inference shapes his approach to data science, teaching, and building AI-powered applications.They dive into the realities of deploying LLM applications, overcoming “proof-of-concept purgatory,” and why first principles and iteration are critical for success in AI. Whether you’re an educator, software engineer, or data scientist, this episode offers valuable insights into the intersection of AI, product development, and real-world deployment.LINKSThe podcast on YouTube (https://www.youtube.com/watch?v=BRIYytbqtP0)The original podcast episode (https://learnbayesstats.com/episode/122-learning-and-teaching-in-the-age-of-ai-hugo-bowne-anderson)Alex Andorra on LinkedIn (https://www.linkedin.com/in/alex-andorra/)Hugo on LinkedIn (https://www.linkedin.com/in/hugo-bowne-anderson-045939a5/)Hugo on twitter (https://x.com/hugobowne)Vanishing Gradients on twitter (https://x.com/vanishingdata)Hugo's "Building LLM Applications for Data Scientists and Software Engineers" course (https://maven.com/s/course/d56067f338) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jan 4, 20251h 20m

Episode 41: Beyond Prompt Engineering: Can AI Learn to Set Its Own Goals?

Hugo Bowne-Anderson hosts a panel discussion from the MLOps World and Generative AI Summit in Austin, exploring the long-term growth of AI by distinguishing real problem-solving from trend-based solutions. If you're navigating the evolving landscape of generative AI, productionizing models, or questioning the hype, this episode dives into the tough questions shaping the field.The panel features: - Ben Taylor (Jepson) (https://www.linkedin.com/in/jepsontaylor/) – CEO and Founder at VEOX Inc., with experience in AI exploration, genetic programming, and deep learning. - Joe Reis (https://www.linkedin.com/in/josephreis/) – Co-founder of Ternary Data and author of Fundamentals of Data Engineering. - Juan Sequeda (https://www.linkedin.com/in/juansequeda/) – Principal Scientist and Head of AI Lab at Data.World, known for his expertise in knowledge graphs and the semantic web. The discussion unpacks essential topics such as: - The shift from prompt engineering to goal engineering—letting AI iterate toward well-defined objectives. - Whether generative AI is having an electricity moment or more of a blockchain trajectory. - The combinatorial power of AI to explore new solutions, drawing parallels to AlphaZero redefining strategy games. - The POC-to-production gap and why AI projects stall. - Failure modes, hallucinations, and governance risks—and how to mitigate them. - The disconnect between executive optimism and employee workload. Hugo also mentions his upcoming workshop on escaping Proof-of-Concept Purgatory, which has evolved into a Maven course "Building LLM Applications for Data Scientists and Software Engineers" launching in January (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?utm_campaign=8123d0&utm_medium=partner&utm_source=instructor). Vanishing Gradient listeners can get 25% off the course (use the code VG25), with $1,000 in Modal compute credits included.A huge thanks to Dave Scharbach and the Toronto Machine Learning Society for organizing the conference and to the audience for their thoughtful questions.As we head into the new year, this conversation offers a reality check amidst the growing AI agent hype. LINKSHugo on twitter (https://x.com/hugobowne)Hugo on LinkedIn (https://www.linkedin.com/in/hugo-bowne-anderson-045939a5/)Vanishing Gradients on twitter (https://x.com/vanishingdata)"Building LLM Applications for Data Scientists and Software Engineers" course (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?utm_campaign=8123d0&utm_medium=partner&utm_source=instructor). Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Dec 30, 202443 min

Episode 40: What Every LLM Developer Needs to Know About GPUs

Hugo speaks with Charles Frye, Developer Advocate at Modal and someone who really knows GPUs inside and out. If you’re a data scientist, machine learning engineer, AI researcher, or just someone trying to make sense of hardware for LLMs and AI workflows, this episode is for you. Charles and Hugo dive into the practical side of GPUs—from running inference on large models, to fine-tuning and even training from scratch. They unpack the real pain points developers face, like figuring out: - How much VRAM you actually need. - Why memory—not compute—ends up being the bottleneck. - How to make quick, back-of-the-envelope calculations to size up hardware for your tasks. - And where things like fine-tuning, quantization, and retrieval-augmented generation (RAG) fit into the mix. One thing Hugo really appreciate is that Charles and the Modal team recently put together the GPU Glossary—a resource that breaks down GPU internals in a way that’s actually useful for developers. We reference it a few times throughout the episode, so check it out in the show notes below. 🔧 Charles also does a demo during the episode—some of it is visual, but we talk through the key points so you’ll still get value from the audio. If you’d like to see the demo in action, check out the livestream linked below.This is the "Building LLM Applications for Data Scientists and Software Engineers" course that Hugo is teaching with Stefan Krawczyk (ex-StitchFix) in January (https://maven.com/s/course/d56067f338). Charles is giving a guest lecture at on hardware for LLMs, and Modal is giving all students $1K worth of compute credits (use the code VG25 for $200 off).LINKSThe livestream on YouTube (https://www.youtube.com/live/INryb8Hjk3c?si=0cbb0-Nxem1P987d)The GPU Glossary (https://modal.com/gpu-glossary) by the Modal teamWhat We’ve Learned From A Year of Building with LLMs (https://applied-llms.org/) by Charles and friendsCharles on twitter (https://x.com/charles_irl)Hugo on twitter (https://x.com/hugobowne)Vanishing Gradients on twitter (https://x.com/vanishingdata) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Dec 24, 20241h 43m

Episode 39: From Models to Products: Bridging Research and Practice in Generative AI at Google Labs

Hugo speaks with Ravin Kumar,*Senior Research Data Scientist at Google Labs. Ravin’s career has taken him from building rockets at SpaceX to driving data science and technology at Sweetgreen, and now to advancing generative AI research and applications at Google Labs and DeepMind. His multidisciplinary experience gives him a rare perspective on building AI systems that combine technical rigor with practical utility.In this episode, we dive into: • Ravin’s fascinating career path, including the skills and mindsets needed to work effectively with AI and machine learning models at different stages of the pipeline. • How to build generative AI systems that are scalable, reliable, and aligned with user needs. • Real-world applications of generative AI, such as using open weight models such as Gemma to help a bakery streamline operations—an example of delivering tangible business value through AI. • The critical role of UX in AI adoption, and how Ravin approaches designing tools like Notebook LM with the user journey in mind.We also include a live demo where Ravin uses Notebook LM to analyze my website, extract insights, and even generate a podcast-style conversation about me. While some of the demo is visual, much can be appreciated through audio, and we’ve added a link to the video in the show notes for those who want to see it in action. We’ve also included the generated segment at the end of the episode for you to enjoy.LINKSThe livestream on YouTube (https://www.youtube.com/live/ffS6NWqoo_k)Google Labs (https://labs.google/)Ravin's GenAI Handbook (https://ravinkumar.com/GenAiGuidebook/book_intro.html)Breadboard: A library for prototyping generative AI applications (https://breadboard-ai.github.io/breadboard/)As mentioned in the episode, Hugo is teaching a four-week course, Building LLM Applications for Data Scientists and SWEs, co-led with Stefan Krawczyk (Dagworks, ex-StitchFix). The course focuses on building scalable, production-grade generative AI systems, with hands-on sessions, $1,000+ in cloud credits, live Q&As, and guest lectures from industry experts.Listeners of Vanishing Gradients can get 25% off the course using this special link (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=VG25) or by applying the code VG25 at checkout. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Nov 25, 20241h 43m

Episode 38: The Art of Freelance AI Consulting and Products: Data, Dollars, and Deliverables

Hugo speaks with Jason Liu, an independent AI consultant with experience at Meta and Stitch Fix. At Stitch Fix, Jason developed impactful AI systems, like a $50 million product similarity search and the widely adopted Flight recommendation framework. Now, he helps startups and enterprises design and deploy production-level AI applications, with a focus on retrieval-augmented generation (RAG) and scalable solutions.This episode is a bit of an experiment. Instead of our usual technical deep dives, we’re focusing on the world of AI consulting and freelancing. We explore Jason’s consulting playbook, covering how he structures contracts to maximize value, strategies for moving from hourly billing to securing larger deals, and the mindset shift needed to align incentives with clients. We’ll also discuss the challenges of moving from deterministic software to probabilistic AI systems and even do a live role-playing session where Jason coaches me on client engagement and pricing pitfalls.LINKSThe livestream on YouTube (https://youtube.com/live/9CFs06UDbGI?feature=share)Jason's Upcoming course: AI Consultant Accelerator: From Expert to High-Demand Business (https://maven.com/indie-consulting/ai-consultant-accelerator?utm_campaign=9532cc&utm_medium=partner&utm_source=instructor)Hugo's upcoming course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338)Jason's website (https://jxnl.co/)Jason's indie consulting newsletter (https://indieconsulting.podia.com/)Your AI Product Needs Evals by Hamel Husain (https://hamel.dev/blog/posts/evals/)What We’ve Learned From A Year of Building with LLMs (https://applied-llms.org/)Dear Future AI Consultant by Jason (https://jxnl.co/writing/#dear-future-ai-consultant)Alex Hormozi's books (https://www.acquisition.com/books)The Burnout Society by Byung-Chul Han (https://www.sup.org/books/theory-and-philosophy/burnout-society)Jason on Twitter (https://x.com/jxnlco)Vanishing Gradients on Twitter (https://twitter.com/vanishingdata)Hugo on Twitter (https://twitter.com/hugobowne)Vanishing Gradients' lu.ma calendar (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Vanishing Gradients on YouTube (https://www.youtube.com/@vanishinggradients) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Nov 4, 20241h 23m

Episode 37: Prompt Engineering, Security in Generative AI, and the Future of AI Research Part 2

Hugo speaks with three leading figures from the world of AI research: Sander Schulhoff, a recent University of Maryland graduate and lead contributor to the Learn Prompting initiative; Philip Resnik, professor at the University of Maryland, known for his pioneering work in computational linguistics; and Dennis Peskoff, a researcher from Princeton specializing in prompt engineering and its applications in the social sciences.This is Part 2 of a special two-part episode, prompted—no pun intended—by these guys being part of a team, led by Sander, that wrote a 76-page survey analyzing prompting techniques, agents, and generative AI. The survey included contributors from OpenAI, Microsoft, the University of Maryland, Princeton, and more.In this episode, we cover:The Prompt Report: A comprehensive survey on prompting techniques, agents, and generative AI, including advanced evaluation methods for assessing these techniques.Security Risks and Prompt Hacking: A detailed exploration of the security concerns surrounding prompt engineering, including Sander’s thoughts on its potential applications in cybersecurity and military contexts.AI’s Impact Across Fields: A discussion on how generative AI is reshaping various domains, including the social sciences and security.Multimodal AI: Updates on how large language models (LLMs) are expanding to interact with images, code, and music.Case Study - Detecting Suicide Risk: A careful examination of how prompting techniques are being used in important areas like detecting suicide risk, showcasing the critical potential of AI in addressing sensitive, real-world challenges.The episode concludes with a reflection on the evolving landscape of LLMs and multimodal AI, and what might be on the horizon.If you haven’t yet, make sure to check out Part 1, where we discuss the history of NLP, prompt engineering techniques, and Sander’s development of the Learn Prompting initiative.LINKSThe livestream on YouTube (https://youtube.com/live/FreXovgG-9A?feature=share)The Prompt Report: A Systematic Survey of Prompting Techniques (https://arxiv.org/abs/2406.06608)Learn Prompting: Your Guide to Communicating with AI (https://learnprompting.org/)Vanishing Gradients on Twitter (https://twitter.com/vanishingdata)Hugo on Twitter (https://twitter.com/hugobowne)Vanishing Gradients' lu.ma calendar (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Vanishing Gradients on YouTube (https://www.youtube.com/@vanishinggradients) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Oct 8, 202450 min

Episode 36: Prompt Engineering, Security in Generative AI, and the Future of AI Research Part 1

Hugo speaks with three leading figures from the world of AI research: Sander Schulhoff, a recent University of Maryland graduate and lead contributor to the Learn Prompting initiative; Philip Resnik, professor at the University of Maryland, known for his pioneering work in computational linguistics; and Dennis Peskoff, a researcher from Princeton specializing in prompt engineering and its applications in the social sciences.This is Part 1 of a special two-part episode, prompted—no pun intended—by these guys being part of a team, led by Sander, that wrote a 76-page survey analyzing prompting techniques, agents, and generative AI. The survey included contributors from OpenAI, Microsoft, the University of Maryland, Princeton, and more.In this first part, * we’ll explore the critical role of prompt engineering, * & diving into adversarial techniques like prompt hacking and * the challenges of evaluating these techniques. * we’ll examine the impact of few-shot learning and * the groundbreaking taxonomy of prompting techniques from the Prompt Report.Along the way, * we’ll uncover the rich history of natural language processing (NLP) and AI, showing how modern prompting techniques evolved from early rule-based systems and statistical methods. * we’ll also hear how Sander’s experimentation with GPT-3 for diplomatic tasks led him to develop Learn Prompting, and * how Dennis highlights the accessibility of AI through prompting, which allows non-technical users to interact with AI without needing to code.Finally, we’ll explore the future of multimodal AI, where LLMs interact with images, code, and even music creation. Make sure to tune in to Part 2, where we dive deeper into security risks, prompt hacking, and more.LINKSThe livestream on YouTube (https://youtube.com/live/FreXovgG-9A?feature=share)The Prompt Report: A Systematic Survey of Prompting Techniques (https://arxiv.org/abs/2406.06608)Learn Prompting: Your Guide to Communicating with AI (https://learnprompting.org/)Vanishing Gradients on Twitter (https://twitter.com/vanishingdata)Hugo on Twitter (https://twitter.com/hugobowne)Vanishing Gradients' lu.ma calendar (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Vanishing Gradients on YouTube (https://www.youtube.com/@vanishinggradients) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Sep 30, 20241h 3m

Episode 35: Open Science at NASA -- Measuring Impact and the Future of AI

Hugo speaks with Dr. Chelle Gentemann, Open Science Program Scientist for NASA’s Office of the Chief Science Data Officer, about NASA’s ambitious efforts to integrate AI across the research lifecycle. In this episode, we’ll dive deeper into how AI is transforming NASA’s approach to science, making data more accessible and advancing open science practices. We exploreMeasuring the Impact of Open Science: How NASA is developing new metrics to evaluate the effectiveness of open science, moving beyond traditional publication-based assessments.The Process of Scientific Discovery: Insights into the collaborative nature of research and how breakthroughs are achieved at NASA.** AI Applications in NASA’s Science:** From rats in space to exploring the origins of the universe, we cover how AI is being applied across NASA’s divisions to improve data accessibility and analysis.Addressing Challenges in Open Science: The complexities of implementing open science within government agencies and research environments.Reforming Incentive Systems: How NASA is reconsidering traditional metrics like publications and citations, and starting to recognize contributions such as software development and data sharing.The Future of Open Science: How open science is shaping the future of research, fostering interdisciplinary collaboration, and increasing accessibility.This conversation offers valuable insights for researchers, data scientists, and those interested in the practical applications of AI and open science. Join us as we discuss how NASA is working to make science more collaborative, reproducible, and impactful.LINKSThe livestream on YouTube (https://youtube.com/live/VJDg3ZbkNOE?feature=share)NASA's Open Science 101 course Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Sep 19, 202458 min

Episode 34: The AI Revolution Will Not Be Monopolized

Hugo speaks with Ines Montani and Matthew Honnibal, the creators of spaCy and founders of Explosion AI. Collectively, they've had a huge impact on the fields of industrial natural language processing (NLP), ML, and AI through their widely-used open-source library spaCy and their innovative annotation tool Prodigy. These tools have become essential for many data scientists and NLP practitioners in industry and academia alike.In this wide-ranging discussion, we dive into:• The evolution of applied NLP and its role in industry• The balance between large language models and smaller, specialized models• Human-in-the-loop distillation for creating faster, more data-private AI systems• The challenges and opportunities in NLP, including modularity, transparency, and privacy• The future of AI and software development• The potential impact of AI regulation on innovation and competitionWe also touch on their recent transition back to a smaller, more independent-minded company structure and the lessons learned from their journey in the AI startup world.Ines and Matt offer invaluable insights for data scientists, machine learning practitioners, and anyone interested in the practical applications of AI. They share their thoughts on how to approach NLP projects, the importance of data quality, and the role of open-source in advancing the field.Whether you're a seasoned NLP practitioner or just getting started with AI, this episode offers a wealth of knowledge from two of the field's most respected figures. Join us for a discussion that explores the current landscape of AI development, with insights that bridge the gap between cutting-edge research and real-world applications.LINKSThe livestream on YouTube (https://youtube.com/live/-6o5-3cP0ik?feature=share)How S&P Global is making markets more transparent with NLP, spaCy and Prodigy (https://explosion.ai/blog/sp-global-commodities)A practical guide to human-in-the-loop distillation (https://explosion.ai/blog/human-in-the-loop-distillation)Laws of Tech: Commoditize Your Complement (https://gwern.net/complement)spaCy: Industrial-Strength Natural Language Processing (https://spacy.io/)LLMs with spaCy (https://spacy.io/usage/large-language-models)Explosion, building developer tools for AI, Machine Learning and Natural Language Processing (https://explosion.ai/)Back to our roots: Company update and future plans, by Matt and Ines (https://explosion.ai/blog/back-to-our-roots-company-update)Matt's detailed blog post: back to our roots (https://honnibal.dev/blog/back-to-our-roots)Ines on twitter (https://x.com/_inesmontani)Matt on twitter (https://x.com/honnibal)Vanishing Gradients on Twitter (https://twitter.com/vanishingdata)Hugo on Twitter (https://twitter.com/hugobowne)Check out and subcribe to our lu.ma calendar (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) for upcoming livestreams! Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Aug 22, 20241h 42m

Episode 33: What We Learned Teaching LLMs to 1,000s of Data Scientists

Hugo speaks with Dan Becker and Hamel Husain, two veterans in the world of data science, machine learning, and AI education. Collectively, they’ve worked at Google, DataRobot, Airbnb, Github (where Hamel built out the precursor to copilot and more) and they both currently work as independent LLM and Generative AI consultants.Dan and Hamel recently taught a course on fine-tuning large language models that evolved into a full-fledged conference, attracting over 2,000 participants. This experience gave them unique insights into the current state and future of AI education and application.In this episode, we dive into:* The evolution of their course from fine-tuning to a comprehensive AI conference* The unexpected challenges and insights gained from teaching LLMs to data scientists* The current state of AI tooling and accessibility compared to a decade ago* The role of playful experimentation in driving innovation in the field* Thoughts on the economic impact and ROI of generative AI in various industries* The importance of proper evaluation in machine learning projects* Future predictions for AI education and application in the next five years* We also touch on the challenges of using AI tools effectively, the potential for AI in physical world applications, and the need for a more nuanced understanding of AI capabilities in the workplace.During our conversation, Dan mentions an exciting project he's been working on, which we couldn't showcase live due to technical difficulties. However, I've included a link to a video demonstration in the show notes that you won't want to miss. In this demo, Dan showcases his innovative AI-powered 3D modeling tool that allows users to create 3D printable objects simply by describing them in natural language.LINKSThe livestream on YouTube (https://youtube.com/live/hDmnwtjktsc?feature=share)Educational resources from Dan and Hamel's LLM course (https://parlance-labs.com/education/)Upwork Study Finds Employee Workloads Rising Despite Increased C-Suite Investment in Artificial Intelligence (https://investors.upwork.com/news-releases/news-release-details/upwork-study-finds-employee-workloads-rising-despite-increased-c)Episode 29: Lessons from a Year of Building with LLMs (Part 1) (https://vanishinggradients.fireside.fm/29)Episode 30: Lessons from a Year of Building with LLMs (Part 2) (https://vanishinggradients.fireside.fm/30)Dan's demo: Creating Physical Products with Generative AI (https://youtu.be/U5J5RUOuMkI?si=_7cYLYOU1iwweQeO)Build Great AI, Dan's boutique consulting firm helping clients be successful with large language models (https://buildgreat.ai/)Parlance Labs, Hamel's Practical consulting that improves your AI (https://parlance-labs.com/)Hamel on Twitter (https://x.com/HamelHusain)Dan on Twitter (https://x.com/dan_s_becker)Vanishing Gradients on Twitter (https://twitter.com/vanishingdata)Hugo on Twitter (https://twitter.com/hugobowne) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Aug 12, 20241h 25m

Episode 32: Building Reliable and Robust ML/AI Pipelines

Hugo speaks with Shreya Shankar, a researcher at UC Berkeley focusing on data management systems with a human-centered approach. Shreya's work is at the cutting edge of human-computer interaction (HCI) and AI, particularly in the realm of large language models (LLMs). Her impressive background includes being the first ML engineer at Viaduct, doing research engineering at Google Brain, and software engineering at Facebook.In this episode, we dive deep into the world of LLMs and the critical challenges of building reliable AI pipelines. We'll explore:The fascinating journey from classic machine learning to the current LLM revolutionWhy Shreya believes most ML problems are actually data management issuesThe concept of "data flywheels" for LLM applications and how to implement themThe intriguing world of evaluating AI systems - who validates the validators?Shreya's work on SPADE and EvalGen, innovative tools for synthesizing data quality assertions and aligning LLM evaluations with human preferencesThe importance of human-in-the-loop processes in AI developmentThe future of low-code and no-code tools in the AI landscapeWe'll also touch on the potential pitfalls of over-relying on LLMs, the concept of "Habsburg AI," and how to avoid disappearing up our own proverbial arseholes in the world of recursive AI processes.Whether you're a seasoned AI practitioner, a curious data scientist, or someone interested in the human side of AI development, this conversation offers valuable insights into building more robust, reliable, and human-centered AI systems.LINKSThe livestream on YouTube (https://youtube.com/live/hKV6xSJZkB0?feature=share)Shreya's website (https://www.sh-reya.com/)Shreya on Twitter (https://x.com/sh_reya)Data Flywheels for LLM Applications (https://www.sh-reya.com/blog/ai-engineering-flywheel/)SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines (https://arxiv.org/abs/2401.03038)What We’ve Learned From A Year of Building with LLMs (https://applied-llms.org/)Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences (https://arxiv.org/abs/2404.12272)Operationalizing Machine Learning: An Interview Study (https://arxiv.org/abs/2209.09125)Vanishing Gradients on Twitter (https://twitter.com/vanishingdata)Hugo on Twitter (https://twitter.com/hugobowne)In the podcast, Hugo also mentioned that this was the 5th time he and Shreya chatted publicly. which is wild!If you want to dive deep into Shreya's work and related topics through their chats, you can check them all out here:Outerbounds' Fireside Chat: Operationalizing ML -- Patterns and Pain Points from MLOps Practitioners (https://www.youtube.com/watch?v=7zB6ESFto_U)The Past, Present, and Future of Generative AI (https://youtu.be/q0A9CdGWXqc?si=XmaUnQmZiXL2eagS)LLMs, OpenAI Dev Day, and the Existential Crisis for Machine Learning Engineering (https://www.youtube.com/live/MTJHvgJtynU?si=Ncjqn5YuFBemvOJ0)Lessons from a Year of Building with LLMs (https://youtube.com/live/c0gcsprsFig?feature=share)Check out and subcribe to our lu.ma calendar (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) for upcoming livestreams! Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jul 27, 20241h 15m

Episode 31: Rethinking Data Science, Machine Learning, and AI

Hugo speaks with Vincent Warmerdam, a senior data professional and machine learning engineer at :probabl, the exclusive brand operator of scikit-learn. Vincent is known for challenging common assumptions and exploring innovative approaches in data science and machine learning.In this episode, they dive deep into rethinking established methods in data science, machine learning, and AI. We explore Vincent's principled approach to the field, including:The critical importance of exposing yourself to real-world problems before applying ML solutionsFraming problems correctly and understanding the data generating processThe power of visualization and human intuition in data analysisQuestioning whether algorithms truly meet the actual problem at handThe value of simple, interpretable models and when to consider more complex approachesThe importance of UI and user experience in data science toolsStrategies for preventing algorithmic failures by rethinking evaluation metrics and data qualityThe potential and limitations of LLMs in the current data science landscapeThe benefits of open-source collaboration and knowledge sharing in the communityThroughout the conversation, Vincent illustrates these principles with vivid, real-world examples from his extensive experience in the field. They also discuss Vincent's thoughts on the future of data science and his call to action for more knowledge sharing in the community through blogging and open dialogue.LINKSThe livestream on YouTube (https://youtube.com/live/-CD66CI1pEo?feature=share)Vincent's blog (https://koaning.io/)CalmCode (https://calmcode.io/)scikit-lego (https://koaning.github.io/scikit-lego/)Vincent's book Data Science Fiction (WIP) (https://calmcode.io/book)The Deon Checklist, an ethics checklist for data scientists (https://deon.drivendata.org/)Of oaths and checklists, by DJ Patil, Hilary Mason and Mike Loukides (https://www.oreilly.com/radar/of-oaths-and-checklists/)Vincent's Getting Started with NLP and spaCy Course course on Talk Python (https://training.talkpython.fm/courses/getting-started-with-spacy)Vincent on twitter (https://x.com/fishnets88):probabl. on twitter (https://x.com/probabl_ai)Vincent's PyData Amsterdam Keynote "Natural Intelligence is All You Need [tm]" (https://www.youtube.com/watch?v=C9p7suS-NGk)Vincent's PyData Amsterdam 2019 talk: The profession of solving (the wrong problem) (https://www.youtube.com/watch?v=kYMfE9u-lMo)Vanishing Gradients on Twitter (https://twitter.com/vanishingdata)Hugo on Twitter (https://twitter.com/hugobowne)Check out and subcribe to our lu.ma calendar (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) for upcoming livestreams! Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jul 9, 20241h 36m

Episode 30: Lessons from a Year of Building with LLMs (Part 2)

Hugo speaks about Lessons Learned from a Year of Building with LLMs with Eugene Yan from Amazon, Bryan Bischof from Hex, Charles Frye from Modal, Hamel Husain from Parlance Labs, and Shreya Shankar from UC Berkeley.These five guests, along with Jason Liu who couldn't join us, have spent the past year building real-world applications with Large Language Models (LLMs). They've distilled their experiences into a report of 42 lessons across operational, strategic, and tactical dimensions (https://applied-llms.org/), and they're here to share their insights.We’ve split this roundtable into 2 episodes and, in this second episode, we'll explore:An inside look at building end-to-end systems with LLMs;The experimentation mindset: Why it's the key to successful AI products;Building trust in AI: Strategies for getting stakeholders on board;The art of data examination: Why looking at your data is more crucial than ever;Evaluation strategies that separate the pros from the amateurs.Although we're focusing on LLMs, many of these insights apply broadly to data science, machine learning, and product development, more generally.LINKSThe livestream on YouTube (https://www.youtube.com/live/c0gcsprsFig)The Report: What We’ve Learned From A Year of Building with LLMs (https://applied-llms.org/)About the Guests/Authors (https://applied-llms.org/about.html) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jun 26, 20241h 15m

Episode 29: Lessons from a Year of Building with LLMs (Part 1)

Hugo speaks about Lessons Learned from a Year of Building with LLMs with Eugene Yan from Amazon, Bryan Bischof from Hex, Charles Frye from Modal, Hamel Husain from Parlance Labs, and Shreya Shankar from UC Berkeley.These five guests, along with Jason Liu who couldn't join us, have spent the past year building real-world applications with Large Language Models (LLMs). They've distilled their experiences into a report of 42 lessons across operational, strategic, and tactical dimensions (https://applied-llms.org/), and they're here to share their insights.We’ve split this roundtable into 2 episodes and, in this first episode, we'll explore:The critical role of evaluation and monitoring in LLM applications and why they're non-negotiable, including "evals" - short for evaluations, which are automated tests for assessing LLM performance and output quality;Why data literacy is your secret weapon in the AI landscape;The fine-tuning dilemma: when to do it and when to skip it;Real-world lessons from building LLM applications that textbooks won't teach you;The evolving role of data scientists and AI engineers in the age of AI.Although we're focusing on LLMs, many of these insights apply broadly to data science, machine learning, and product development, more generally.LINKSThe livestream on YouTube (https://www.youtube.com/live/c0gcsprsFig)The Report: What We’ve Learned From A Year of Building with LLMs (https://applied-llms.org/)About the Guests/Authors (https://applied-llms.org/about.html) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Jun 26, 20241h 30m

12 Next »