[Linkpost] “The Extreme Inefficiency of RL for Frontier Models” by Toby_Ord

EA Forum Podcast (Curated & popular) · EA Forum Team

February 2, 202614m 34s

Audio is streamed directly from the publisher (dl.type3.audio) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

This is a link post. The new scaling paradigm for AI reduces the amount of information a model can learn from per hour of training by a factor of 1,000 to 1,000,000. I explore what this means and its implications for scaling. The last year has seen a massive shift in how leading AI models are trained. 2018–2023 was the era of pre-training scaling. LLMs were primarily trained by next-token prediction (also known as pre-training). Much of OpenAI's progress from GPT-1 to GPT-4, came from scaling up the amount of pre-training by a factor of 1,000,000. New capabilities were unlocked not through scientific breakthroughs, but through doing more-or-less the same thing at ever-larger scales. Everyone was talking about the success of scaling, from AI labs to venture capitalists to policy makers. However, there's been markedly little progress in scaling up this kind of training since (GPT-4.5 added one more factor of 10, but was then quietly retired). Instead, there has been a shift to taking one of these pre-trained models and further training it with large amounts of Reinforcement Learning (RL). This has produced models like OpenAI's o1, o3, and GPT-5, with dramatic improvements in reasoning (such as solving [...] --- First published: February 2nd, 2026 Source: <a href="https://forum.effectivealtruism.org/posts/64iwgmMvGSTBHPdHg/the-extreme-inefficiency-of-rl-for-frontier-models?utm_source=TYPE_III_AUDIO&utm_medium=Podcast&utm_content=Source+URL+in+episode+description&utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank">https://forum.effectivealtruism.org/posts/64iwgmMvGSTBHPdHg/the-extreme-inefficiency-of-rl-for-frontier-models</a> Linkpost URL: <a href="https://forum.effectivealtruism.org/out?url=https%3A%2F%2Fwww.tobyord.com%2Fwriting%2Finefficiency-of-reinforcement-learning" rel="noopener noreferrer" target="_blank">https://www.tobyord.com/writing/inefficiency-of-reinforcement-learning</a> --- Narrated by <a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&utm_medium=Podcast&utm_content=Narrated+by+TYPE+III+AUDIO&utm_term=ea_forum&utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank">TYPE III AUDIO</a>. ---<div style="max-width: 100%";>Images from the article:<a href="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/64iwgmMvGSTBHPdHg/vmu4iiayawjzycsddcey" target="_blank"><img src="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/64iwgmMvGSTBHPdHg/vmu4iiayawjzycsddcey" alt="Graph showing "Time-horizon of software engineering tasks different LLMs can complete 50% of the time" with task duration versus LLM release date." style="max-width: 100%;" /></a>Apple Podcasts and Spotify do not show images in the episode description. Try <a href="https://pocketcasts.com/" target="_blank" rel="noreferrer">Pocket Casts</a>, or another podcast app.</div>

← All episodes of EA Forum Podcast (Curated & popular)