![[Linkpost] “The Extreme Inefficiency of RL for Frontier Models” by Toby_Ord](https://forum-podcasts.effectivealtruism.org/images/ea-forum/ea-forum--curated-popular.jpg)
[Linkpost] “The Extreme Inefficiency of RL for Frontier Models” by Toby_Ord
EA Forum Podcast (Curated & popular) · EA Forum Team
February 2, 202614m 34s
Audio is streamed directly from the publisher (dl.type3.audio) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
This is a link post.<p> The new scaling paradigm for AI reduces the amount of information a model can learn from per hour of training by a factor of 1,000 to 1,000,000. I explore what this means and its implications for scaling.</p><p> The last year has seen a massive shift in how leading AI models are trained. 2018–2023 was the era of pre-training scaling. LLMs were primarily trained by next-token prediction (also known as pre-training). Much of OpenAI's progress from GPT-1 to GPT-4, came from scaling up the amount of pre-training by a factor of 1,000,000. New capabilities were unlocked not through scientific breakthroughs, but through doing more-or-less the same thing at ever-larger scales. Everyone was talking about the success of scaling, from AI labs to venture capitalists to policy makers. </p><p> However, there's been markedly little progress in scaling up this kind of training since (GPT-4.5 added one more factor of 10, but was then quietly retired). Instead, there has been a shift to taking one of these pre-trained models and further training it with large amounts of Reinforcement Learning (RL). This has produced models like OpenAI's o1, o3, and GPT-5, with dramatic improvements in reasoning (such as solving [...]</p> <p>---</p>
<p><b>First published:</b><br/>
February 2nd, 2026 </p>
<p><b>Source:</b><br/>
<a href="https://forum.effectivealtruism.org/posts/64iwgmMvGSTBHPdHg/the-extreme-inefficiency-of-rl-for-frontier-models?utm_source=TYPE_III_AUDIO&utm_medium=Podcast&utm_content=Source+URL+in+episode+description&utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank">https://forum.effectivealtruism.org/posts/64iwgmMvGSTBHPdHg/the-extreme-inefficiency-of-rl-for-frontier-models</a> </p>
<p><strong>Linkpost URL:</strong><br><a href="https://forum.effectivealtruism.org/out?url=https%3A%2F%2Fwww.tobyord.com%2Fwriting%2Finefficiency-of-reinforcement-learning" rel="noopener noreferrer" target="_blank">https://www.tobyord.com/writing/inefficiency-of-reinforcement-learning</a></p>
<p>---</p>
<p>Narrated by <a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&utm_medium=Podcast&utm_content=Narrated+by+TYPE+III+AUDIO&utm_term=ea_forum&utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank">TYPE III AUDIO</a>.</p>
<p>---</p><div style="max-width: 100%";><p><strong>Images from the article:</strong></p><a href="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/64iwgmMvGSTBHPdHg/vmu4iiayawjzycsddcey" target="_blank"><img src="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/64iwgmMvGSTBHPdHg/vmu4iiayawjzycsddcey" alt="Graph showing "Time-horizon of software engineering tasks different LLMs can complete 50% of the time" with task duration versus LLM release date." style="max-width: 100%;" /></a><p><em>Apple Podcasts and Spotify do not show images in the episode description. Try <a href="https://pocketcasts.com/" target="_blank" rel="noreferrer">Pocket Casts</a>, or another podcast app.</em></p></div>