PLAY PODCASTS
Unsupervised Learning with Jacob Effron

Unsupervised Learning with Jacob Effron

by Redpoint Ventures · Redpoint Ventures

97 episodesEN

Show overview

Unsupervised Learning with Jacob Effron has been publishing since 2023, and across the 3 years since has built a catalogue of 97 episodes, alongside 7 trailers or bonus episodes. That works out to roughly 90 hours of audio in total. Releases follow a fortnightly cadence.

Episodes typically run thirty-five to sixty minutes — most land between 47 min and 1h 4m — and the run-time is fairly consistent across the catalogue. None of the episodes are flagged explicit by the publisher. It is catalogued as a EN-language Technology show.

The show is actively publishing — the most recent episode landed yesterday, with 10 episodes already out so far this year. Published by Redpoint Ventures.

Episodes
97
Running
2023–2026 · 3y
Median length
55 min
Cadence
Fortnightly

From the publisher

We probe the sharpest minds in AI in search for the truth about what’s real today, what will be real in the future and what it all means for businesses and the world. If you’re a builder, researcher or investor navigating the AI world, this podcast will help you deconstruct and understand the most important breakthroughs and see a clearer picture of reality. Follow this show and consider enabling notifications to stay up to date on our latest episodes. Unsupervised Learning is a podcast by Redpoint Ventures, an early-stage venture capital fund that has invested in companies like Snowflake, Stripe, and Mistral. Hosted by Redpoint investor Jacob Effron alongside Patrick Chase, Jordan Segall and Erica Brescia.

Latest Episodes

View all 97 episodes

AI Vibe Check: Lab Wars, Why APIs Might Vanish & Future Predictions

Jun 12, 20261h 6m

Ep 89: AI Research Legend’s Honest Assessment of Where We Are

Jun 3, 20261h 13m

Ep 88: Unpacking DeepMind's Quest for SuperIntelligence with Demis Hassabis' Biographer

Jun 1, 202656 min

Ep 87: Gemini Co-Lead on World Models, RL's Next Domains & Continual Learning

May 22, 202659 min

Ep 86: Yann LeCun on Leaving Meta, Breaking The LLM Paradigm, & Why Hinton is Wrong

May 15, 20261h 21m

Ep 85: Has AI Infra Stabilized, FM Vibe Shift, & What's Next for Coding Agents

Apr 23, 202654 min

Ep 84: OpenAI’s Chief Scientist on Continual Learning Hype, RL Beyond Code, & Future Alignment Directions

Apr 9, 202658 min

Ep 83Ep 83: Owning the System of Record, AI-Native Org Charts, & Why ITSM is The Most Vulnerable Legacy Category

Serval is one of the fastest-growing AI-native enterprise software companies right now, and this episode is a rare inside look at the deliberate architectural, go-to-market, and talent decisions behind that growth. Jake Stauch breaks down why he made the contrarian bet to build a full system of record rather than layer on top of existing tools, why ITSM is more vulnerable to AI disruption than CRM, ERP, or HRIS, and how Serval is winning Fortune 500 deals against a $14B incumbent with a fraction of the resources. Beyond the product, Jake gets into the organizational decisions that underpin Serval's velocity — why recruiting is the #1 job of every employee, how to prevent talent bar decay as you scale from 8 to 200 people, and how the role of the manager is shifting as ICs own more scope than ever. Threading it all together is a founder's honest account of what it means to build a horizontal software company when the models are improving, the infrastructure is shifting, and the window to displace a legacy incumbent is open but won't stay open forever. (0:00) Intro (1:25) What is Serval? (4:51) Early Doubts and Strategy (6:34) AI Tailwinds in ITSM (8:04) Competing with ServiceNow (9:41) Why ITSM Is Vulnerable (11:52) Automation via Codegen (16:27) Critical Guardrails (28:32) Internal Support Complexity (30:24) Hiring as the Moat (31:44) Dream Team Recruiting (33:49) Managers vs Super ICs (36:44) Junior Engineers and AI Native Workflows (43:13) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq’d by VMWare) @jordan_segall - Partner at Redpoint

Apr 2, 202654 min

Ep 82Ep 82: Behind Legora's $550M Raise, Model Competition, Doubling Revenue Every Quarter, & US Expansion

Max Jungestål, CEO of Legora, joins Jacob Effron and Logan Bartlett to discuss the company's $550M Series D and share a candid account of what building an AI-native company at speed actually looks like from the inside. Max argues that the AI application layer requires a fundamentally different operating model than traditional SaaS, one built on low ego, constant reinvention, and a willingness to watch nine months of work get washed away by a model update. He walks through how step-function improvements in the underlying models, particularly Opus 4.5 and 4.6, have repeatedly forced Legora to rebuild core product features from scratch, and why he sees that as a feature, not a bug. On the legal industry, Max offers a ground-level view of how AI is actually diffusing through law firms, less through top-down mandates and more through competitive pressure between firms and, increasingly, from enterprise clients demanding efficiency from their outside counsel. He pushes back on the viability of AI-native law firms, dismisses outcome-based pricing as harder than it looks, and makes the case for why foundation model competition creates tailwinds rather than threats for a company with Legora's depth. The episode closes with a detailed look at the US expansion strategy, including the deliberate cultural decisions, like flying all New York hires to Stockholm for onboarding, that Max believes are the real source of Legora's compounding advantage. [0:00] Intro [1:16] Legora's Series D Story [3:24] Why You Need Low Ego to Build in AI [5:58] From 60% to 100% Accuracy in One Summer [7:04] Law Firm Economics Shift [14:09] Pricing Seats Vs Outcomes [18:31] Why Foundation Models Entering Legal Helps Legora [30:10] Convincing a 75-Year-Old Partner to Go All In [33:02] Hiring Legal Engineers [34:32] Running an AI-Native Company [35:57] The Opus 4.5 Christmas Breakthrough [40:02] Building With Customers [44:01] All In On US Expansion [51:22] Stockholm Startup DNA With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq’d by VMWare) @jordan_segall - Partner at Redpoint

Mar 11, 202654 min

Ep 81Ep 81: Ex-OpenAI Researcher On Why He Left, His Honest AGI Timeline, & The Limits of Scaling RL

This episode features Jerry Tworek, a key architect behind OpenAI's breakthrough reasoning models (o1, o3) and Codex, discussing the current state and future of AI. Jerry explores the real limits and promise of scaling pre-training and reinforcement learning, arguing that while these paradigms deliver predictable improvements, they're fundamentally constrained by data availability and struggle with generalization beyond their training objectives. He reveals his updated belief that continual learning—the ability for models to update themselves based on failure and work through problems autonomously—is necessary for AGI, as current models hit walls and become "hopeless" when stuck. Jerry discusses the convergence of major labs toward similar approaches driven by economic forces, the tension between exploration and exploitation in research, and why he left OpenAI to pursue new research directions. He offers candid insights on the competitive dynamics between labs, the focus required to win in specific domains like coding, what makes great AI researchers, and his surprisingly near-term predictions for robotics (2-3 years) while warning about the societal implications of widespread work automation that we're not adequately preparing for. (0:00) Intro(1:26) Scaling Paradigms in AI(3:36) Challenges in Reinforcement Learning(11:48) AGI Timelines(18:36) Converging Labs(25:05) Jerry’s Departure from OpenAI(31:18) Pivotal Decisions in OpenAI’s Journey(35:06) Balancing Research and Product Development(38:42) The Future of AI Coding(41:33) Specialization vs. Generalization in AI(48:47) Hiring and Building Research Teams(55:21) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq’d by VMWare) @jordan_segall - Partner at Redpoint

Jan 29, 20261h 2m

AI Vibe Check: The Actual Bottleneck In Research, SSI’s Mystique, & Spicy 2026 Predictions

bonus

Ari Morcos and Rob Toews return for their spiciest conversation yet. Fresh from NeurIPS, they debate whether models are truly plateauing or if we're just myopically focused on LLMs while breakthroughs happen in other modalities.They reveal why infinite capital at labs may actually constrain innovation, explain the narrow "Goldilocks zone" where RL actually works, and argue why U.S. chip restrictions may have backfired catastrophically—accelerating China's path to self-sufficiency by a decade. The conversation covers OpenAI's code red moment and structural vulnerabilities, the mystique surrounding SSI and Ilya's "two words," and why the real bottleneck in AI research is compute, not ideas.The episode closes with bold 2026 predictions: Rob forecasts Sam Altman won't be OpenAI's CEO by year-end, while Ari gives 50%+ odds a Chinese open-source model will be the world's best at least once next year. (0:00) Intro(1:51) Reflections on NeurIPS Conference(5:14) Are AI Models Plateauing?(11:12) Reinforcement Learning and Enterprise Adoption(16:16) Future Research Vectors in AI(28:40) The Role of Neo Labs(39:35) The Myth of the Great Man Theory in Science(41:47) OpenAI's Code Red and Market Position(47:19) Disney and OpenAI's Strategic Partnership(51:28) Meta's Super Intelligence Team Challenges(54:33) US-China AI Chip Dynamics(1:00:54) Amazon's Nova Forge and Enterprise AI(1:03:38) End of Year Reflections and Predictions With your co-hosts:@jacobeffron - Partner at Redpoint, Former PM Flatiron Health@patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn@ericabrescia - Former COO Github, Founder Bitnami (acq’d by VMWare)@jordan_segall - Partner at Redpoint

Dec 18, 20251h 18m

Ep 80Ep 80: CEO of Surge AI Edwin Chen on Why Frontier Labs Are Diverging, RL Environments & Developing Model Taste

Edwin Chen is the founder and CEO of Surge AI, the data infrastructure company behind nearly every major frontier model. Surge works with OpenAI, Anthropic, Meta, and Google, providing the high-quality data and evaluation infrastructure that powers their models. Edwin reveals why optimizing for popular benchmarks like LMArena is "basically optimizing for clickbait," how one frontier lab's models regressed for 6-12 months without anyone knowing, and why the industry's approach to measurement is fundamentally broken. Jacob and Edwin discuss what actually makes elite AI evaluators, why "there's never going to be a one size fits all solution" for AI models, and how frontier labs are taking surprisingly divergent paths to AGI. (0:00) Intro(0:56) The Pitfalls of Optimizing for LMArena(4:34) Issues with Data Quality and Measurement(9:44) The Importance of Human Evaluations(13:40) The Rise of RL Environments(17:21) Challenges and Lessons in Model Training(19:59) Silicon Valley's Pivot Culture(23:06) Technology-Driven Approach(24:18) Quality Beyond Credentials(27:51) Impact of Scale Acquisition(28:35) Hiring for Research Culture(30:48) Divergence in AI Training Paradigms(34:16) Future of AI Models(39:32) Multimodal AI and Quality(43:44) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq’d by VMWare) @jordan_segall - Partner at Redpoint

Dec 15, 202548 min

Ep 79Ep 79: OpenAI's Head of Product on How the Best Teams Build, Ship and Scale AI Products

This episode features Olivier Godement, Head of Product for Business Products at OpenAI, discussing the current state and future of AI adoption in enterprises, with a particular focus on the recent releases of GPT 5.1 and Codex. The conversation explores how these models are achieving meaningful automation in specific domains like coding, customer support, and life sciences: where companies like Amgen are using AI to accelerate drug development timelines from months to weeks through automated regulatory documentation. Olivier reveals that while complete job automation remains challenging and requires substantial scaffolding, harnesses, and evaluation frameworks, certain use cases like coding are reaching a tipping point where engineers would "riot" if AI tools were taken away. The discussion covers the importance of cost reduction in unlocking new use cases, the emerging significance of reinforcement fine-tuning (RFT) for frontier customers, and OpenAI's philosophy of providing not just models but reference architectures and harnesses to maximize developer success. (0:00) Intro(1:46) Discussing GPT-5.1(2:57) Adoption and Impact of Codex(4:09) Scientific Community's Use of GPT-5.1(6:37) Challenges in AI Automation(8:19) AI in Life Sciences and Pharma(11:48) Enterprise AI Adoption and Ecosystem(16:04) Future of AI Models and Continuous Learning(24:20) Cost and Efficiency in AI Deployment(27:10) Reinforcement Learning and Enterprise Use Cases(31:17) Key Factors Influencing Model Choice(34:21) Challenges in Model Deployment and Adaptation(38:29) Voice Technology: The Next Frontier(41:08) The Rise of AI in Software Engineering(52:09) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq’d by VMWare) @jordan_segall - Partner at Redpoint

Dec 10, 202556 min

Ep 78Ep 78: Jordan Schneider, Host of China Talk, on AI Race, Key Policy Decisions & Unpacking Geopolitical Chip Tension

This week on Unsupervised Learning, Jacob Effron is joined by Jordan Schneider, host of China Talk, who challenges widespread assumptions about US-China AI competition. China's AI development is driven by private capital and market competition—not central government planning—with companies like DeepSeek, Alibaba, and ByteDance operating more like Silicon Valley startups than state projects. The critical bottleneck is compute: the West maintains a 10-15x advantage in advanced chips, and US export controls implemented one month before ChatGPT created a structural edge favoring America for years. Chinese companies aggressively open-source models from strategic necessity—they couldn't establish a quality gap justifying paid access like OpenAI. Jordan explains why the "Goldilocks strategy" of controlled chip dependency fails, why expert consensus opposes selling advanced semiconductors to China despite Nvidia's lobbying, and how Taiwan's invasion risk is driven more by domestic politics than AGI scenarios. China's real advantage may emerge in robotics manufacturing at scale, where they're already deploying while the US debates strategy. Inside the Politburo's AI Study Session: https://www.chinatalk.media/p/xi-takes-an-ai-masterclassSubmit your questions to Jacob here: https://docs.google.com/forms/d/1vHBYv0bTT_EgFWTjbKnLr_sn3pZnFmcFGWYVTltKEco/edit (0:00) Intro(1:45) The Chinese AI Ecosystem: Pre and Post ChatGPT(3:45) Government Influence and Private Sector Dynamics(6:40) Venture Funding and Major Players(8:36) Talent and International Collaboration(11:25) Open Source Models and Market Dynamics(15:24) What Role Does The Chinese Government Play?(31:17) US-China AI Policy and Strategic Competition(36:18) The Argument for Selling AI Accelerators(37:02) Risks of Not Selling to China(43:34) Technological Constraints and Huawei's Challenges(51:18) US-China Relations and Taiwan(1:02:46) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq’d by VMWare) @jordan_segall - Partner at Redpoint

Dec 5, 20251h 13m

Ep 77Ep 77: Anthropic’s Dianne Na Penn on Opus 4.5, Rethinking Model Scaffolding & Safety as a Competitive Advantage

This episode features Dianne Na Penn, a senior product leader at Anthropic, discussing the launch of Claude Opus 4.5 and the evolution of frontier AI models. The conversation explores how Anthropic approaches model development—balancing ambitious capability roadmaps with user feedback, making strategic bets on areas like agentic coding and computer use while deliberately avoiding others like image generation. Dianne shares insights on the shifting nature of AI evaluation (moving beyond saturated benchmarks like SWE-bench toward more open-ended measures), the evolution of scaffolding from "training wheels" to intelligence amplifiers, and why she believes we're closer to transformative long-running AI than most people think. She also discusses Anthropic's distinctive culture of authenticity, the under appreciated benefits of model alignment for producing independent-thinking AI, and why the real bottleneck to AI agents isn't model capability anymore but product innovation. (0:00) Intro(0:57) Starting the Work on Opus 4.5(2:04) Model Capabilities and Surprises(5:59) Computer Use and Practical Applications(7:21) Pricing and Positioning(10:02) Customer Feedback and Early Access(16:44) The Reality of Enterprise Agents(18:47) Future of AI and Long-Running Intelligence(28:06) Anthropic's Culture and Decision Making(30:31) Key Decisions and Fun Moments(33:45) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq’d by VMWare) @jordan_segall - Partner at Redpoint

Dec 2, 202542 min

Ep 76Ep 76: Sora Creators Bill Peebles, Rohan Sahai & Thomas Dimson on Their Unexpected Viral Success

This episode features the core team behind Sora, OpenAI's groundbreaking video generation platform that became the #1 app in the App Store. Bill Peebles (research lead), Rohan Sahai (product lead), and Thomas Dimson (engineering/product lead with Instagram background) discuss the unexpected viral success of Sora's launch, the product journey that led to the breakthrough "cameo" feature (putting yourself in AI-generated videos), and their philosophy of building a creator-first social network that prioritizes human creativity over passive consumption. They reveal the technical milestones in video generation, their small team size (under 50 people total at launch), navigation of content moderation challenges, early monetization strategy, and their ambitious vision for video models as world simulators that could eventually contribute to scientific breakthroughs by 2028. The conversation captures both the tactical product decisions and strategic philosophy that made Sora a cultural phenomenon. (0:00) Intro(1:35) Unexpected Success of ChatGPT and Sora(3:55) Sora as an Independent App(5:38) Sora Prototypes and Evolution(8:07) User Creativity and Surprising Use Cases(14:46) Celebrity Engagement and Rights Management(17:58) Competition and Future of AI Video Models(25:42) Empowering Creators(31:21) The Evolution of Image Generation(33:36) How Do Models Need to Improve?(42:10) Monetization of Sora(45:54) Global Reach and Cultural Impact(48:38) Moderation and Safety Challenges(50:09) Integration with Other OpenAI Products(52:07) How do Models Learn Physics?(55:16) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq’d by VMWare) @jordan_segall - Partner at Redpoint

Nov 3, 20251h 3m

AI Round Up: Ari Morcos from Datalogy AI and Rob Toews from Radical VC on Karpathy Reactions, OpenAI’s Dealmaking, & Bubble Reality Check

bonus

This episode features Rob Toews from Radical Ventures and Ari Morcos, Head of Research at Datology AI, reacting to Andrej Karpathy's recent statement that AGI is at least a decade away and that current AI capabilities are "slop." The discussion explores whether we're in an AI bubble, with both guests pushing back on overly bearish narratives while acknowledging legitimate concerns about hype and excessive CapEx spending. They debate the sustainability of AI scaling, examining whether continued progress will come from massive compute increases or from efficiency gains through better data quality, architectural innovations, and post-training techniques like reinforcement learning. The conversation also tackles which companies truly need frontier models versus those that can succeed with slightly-behind-the-curve alternatives, the surprisingly static landscape of AI application categories (coding, healthcare, and legal remain dominant), and emerging opportunities from brain-computer interfaces to more efficient scaling methods. (0:00) Intro(1:04) Debating the AI Bubble(1:50) Over-Hyping AI: Realities and Misconceptions(3:21) Enterprise AI and Data Center Investments(7:46) Consumer Adoption and Monetization Challenges(8:55) AI in Browsers and the Future of Internet Use(14:37) Deepfakes and Ethical Concerns(26:29) AI's Impact on Job Markets and Training(31:38) Google and Anthropic: Strategic Partnerships(34:51) OpenAI's Strategic Deals and Future Prospects(37:12) The Evolution of Vibe Coding(44:35) AI Outside of San Francisco(48:09) Data Moats in AI Startups(50:38) Comparing AI to the Human Brain(56:07) The Role of Physical Infrastructure in AI(56:55) The Potential of Chinese AI Models(1:03:15) Apple's AI Strategy(1:12:35) The Future of AI Applications With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq’d by VMWare) @jordan_segall - Partner at Redpoint

Oct 24, 20251h 16m

AI Round Up: Ari Morcos from Datalogy AI and Rob Toews from Radical VC on AI Talent Wars, xAI’s $200B Valuation, & Google’s Comeback

bonus

This episode features a deep dive into the current state of AI model progress with Ari Morcos (CEO of Datalogy AI and former DeepMind/Meta researcher) and Rob Toews (partner at Radical Ventures). The conversation tackles whether model progress is genuinely slowing down or simply shifting into new paradigms, exploring the role of reinforcement learning in scaling capabilities beyond traditional pre-training. They examine the talent wars reshaping AI labs, Google's resurgence with Gemini, the sustainability of massive valuations for companies like OpenAI and Anthropic, and the infrastructure ecosystem supporting this rapid evolution. The discussion weaves together technical insights on data quality, synthetic data generation, and RL environments with strategic perspectives on acquisitions, regulatory challenges, and the future intersection of AI with physical robotics and brain-computer interfaces. (0:00) Intro(2:59) Debate on Model Progress(8:03) Challenges in AI Generalization and RL Environments(15:44) Enterprise AI and Custom Models(20:27) Google's AI Ascent and Market Impact(24:30) Valuations and Future of AI Companies(27:55) Evaluating xAI's Position in the AI Landscape(30:31) The Talent War in AI Research(35:45) The Impact of Acquihires on Startups(42:35) The Future of AI Infrastructure(48:28) The Potential of Brain-Computer Interfaces(54:45) The Evolution of AI and Robotics(1:00:50) The Importance of Data in AI Research With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq’d by VMWare) @jordan_segall - Partner at Redpoint

Sep 24, 20251h 2m

Ep 75Ep 75: Nano Banana’s Oliver Wang and Nicole Brichtova - Behind the Breakthrough as Gemini Tops the Charts

Fill out this short listener survey to help us improve the show: https://forms.gle/bbcRiPTRwKoG2tJx8This week on Unsupervised Learning, Jacob sits down with Nicole Brichtova and Oliver Wang, the Google researchers behind "Nano Banana" - the breakthrough AI image model that achieved unprecedented character consistency and took over social media.The conversation covers how their model fits into creative workflows, why we're still in the early innings of image AI development despite impressive current capabilities, and how image and video generation are converging toward unified models. They also share honest perspectives on current limitations, safety approaches, and why the expectation of going from prompt to production-ready content is fundamentally overhyped.(0:00) Intro(1:42) Early Nano Banana Use Cases and Character Consistency(3:05) Popular Features and User Requests(3:54) Future Frontiers in Image Models(5:26) Personalization and Aesthetic Models(7:39) Model Success and User Engagement(10:59) Product Design for Different Users(19:30) Advanced Use Cases and Future Workflows(23:14) Editing Workflows and Chatbots(25:14) Google's Image Model Applications(27:12) Milestones in Image Generation(29:30) MidJourney's Success(30:54) Future of Image Models(33:55) Image Models vs. Video Models(36:35) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq’d by VMWare) @jordan_segall - Partner at Redpoint

Sep 17, 202541 min

Ep 74Ep 74: Chief Scientist of Together.AI Tri Dao On The End of Nvidia's Dominance, Why Inference Costs Fell & The Next 10X in Speed

Fill out this short listener survey to help us improve the show: https://forms.gle/bbcRiPTRwKoG2tJx8 Tri Dao, Chief Scientist at Together AI and Princeton professor who created Flash Attention and Mamba, discusses how inference optimization has driven costs down 100x since ChatGPT's launch through memory optimization, sparsity advances, and hardware-software co-design. He predicts the AI hardware landscape will shift from Nvidia's current 90% dominance to a more diversified ecosystem within 2-3 years, as specialized chips emerge for distinct workload categories: low-latency agentic systems, high-throughput batch processing, and interactive chatbots. Dao shares his surprise at AI models becoming genuinely useful for expert-level work, making him 1.5x more productive at GPU kernel optimization through tools like Claude Code and O1. The conversation explores whether current transformer architectures can reach expert-level AI performance or if approaches like mixture of experts and state space models are necessary to achieve AGI at reasonable costs. Looking ahead, Dao sees another 10x cost reduction coming from continued hardware specialization, improved kernels, and architectural advances like ultra-sparse models, while emphasizing that the biggest challenge remains generating expert-level training data for domains lacking extensive internet coverage. (0:00) Intro(1:58) Nvidia's Dominance and Competitors(4:01) Challenges in Chip Design(6:26) Innovations in AI Hardware(9:21) The Role of AI in Chip Optimization(11:38) Future of AI and Hardware Abstractions(16:46) Inference Optimization Techniques(33:10) Specialization in AI Inference(35:18) Deep Work Preferences and Low Latency Workloads(38:19) Fleet Level Optimization and Batch Inference(39:34) Evolving AI Workloads and Open Source Tooling(41:15) Future of AI: Agentic Workloads and Real-Time Video Generation(44:35) Architectural Innovations and AI Expert Level(50:10) Robotics and Multi-Resolution Processing(52:26) Balancing Academia and Industry in AI Research(57:37) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq’d by VMWare) @jordan_segall - Partner at Redpoint

Sep 10, 202558 min