PLAY PODCASTS
[Linkpost] “Is there a Half-Life for the Success Rates of AI Agents?” by Toby_Ord

[Linkpost] “Is there a Half-Life for the Success Rates of AI Agents?” by Toby_Ord

EA Forum Podcast (Curated & popular) · EA Forum Team

February 2, 202619m 45s

Audio is streamed directly from the publisher (dl.type3.audio) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

This is a link post.<p> Building on the recent empirical work of Kwa et al. (2025), I show that within their suite of research-engineering tasks the performance of AI agents on longer-duration tasks can be explained by an extremely simple mathematical model — a constant rate of failing during each minute a human would take to do the task. This implies an exponentially declining success rate with the length of the task and that each agent could be characterised by its own half-life. This empirical regularity allows us to estimate the success rate for an agent at different task lengths. And the fact that this model is a good fit for the data is suggestive of the underlying causes of failure on longer tasks — that they involve increasingly large sets of subtasks where failing any one fails the task. Whether this model applies more generally on other suites of tasks is unknown and an important subject for further work.</p><p> METR's results on the length of tasks agents can reliably complete</p><p> A recent paper by Kwa et al. (2025) from the research organisation METR has found an exponential trend in the duration of the tasks that frontier AI agents can [...]</p> <p>---</p><p><strong>Outline:</strong></p><p>(05:33) Explaining these results via a constant hazard rate</p><p>(14:54) Upshots of the constant hazard rate model</p><p>(18:47) Further work</p><p>(19:25) References</p> <p>---</p> <p><b>First published:</b><br/> February 2nd, 2026 </p> <p><b>Source:</b><br/> <a href="https://forum.effectivealtruism.org/posts/qz3xyqCeriFHeTAJs/is-there-a-half-life-for-the-success-rates-of-ai-agents-3?utm_source=TYPE_III_AUDIO&utm_medium=Podcast&utm_content=Source+URL+in+episode+description&utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank">https://forum.effectivealtruism.org/posts/qz3xyqCeriFHeTAJs/is-there-a-half-life-for-the-success-rates-of-ai-agents-3</a> </p> <p><strong>Linkpost URL:</strong><br><a href="https://forum.effectivealtruism.org/out?url=https%3A%2F%2Fwww.tobyord.com%2Fwriting%2Fhalf-life" rel="noopener noreferrer" target="_blank">https://www.tobyord.com/writing/half-life</a></p> <p>---</p> <p>Narrated by <a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&utm_medium=Podcast&utm_content=Narrated+by+TYPE+III+AUDIO&utm_term=ea_forum&utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank">TYPE III AUDIO</a>.</p> <p>---</p><div style="max-width: 100%";><p><strong>Images from the article:</strong></p><a href="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/qz3xyqCeriFHeTAJs/i68dte0l4hua413wvubt" target="_blank"><img src="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/qz3xyqCeriFHeTAJs/i68dte0l4hua413wvubt" alt="Graph showing "Length of tasks AI agents have been able to complete autonomously" over time with exponential trend line." style="max-width: 100%;" /></a><hr style="margin-top: 24px; margin-bottom: 24px;" /><a href="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/qz3xyqCeriFHeTAJs/edk33wdhrnmajvucjheu" target="_blank"><img src="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/qz3xyqCeriFHeTAJs/edk33wdhrnmajvucjheu" alt="Diagram showing three sections: Diverse Task Suite, Task Performance, and Time Horizon Analysis." style="max-width: 100%;" /></a><hr style="margin-top: 24px; margin-bottom: 24px;" /><a href="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/qz3xyqCeriFHeTAJs/vkwzt5iz3h9popevrxf1" target="_blank"><img src="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/qz3xyqCeriFHeTAJs/vkwzt5iz3h9popevrxf1" alt="Graph showing survival percentage S(t) declining over task length with markers at T₈₀, T₅₀, and T₂₅." style="max-width: 100%;" /></a><hr style="margin-top: 24px; margin-bottom: 24px;" /><a href="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/qz3xyqCeriFHeTAJs/vitusefkjjbgqklcbmgp" target="_blank"><img src="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/qz3xyqCeriFHeTAJs/vitusefkjjbgqklcbmgp" alt="Six graphs comparing success probability versus task length across different Claude AI models and their time horizons." style="max-width: 100%;" /></a><hr style="margin-top: 24px; margin-bottom: 24px;" /><a href="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/qz3xyqCeriFHeTAJs/eaileroszvjhkibd9qjc" target="_blank"><img src="https://res.cloudinary.com/cea/image/upload/f_auto,q_auto/v1/mirroredImages/qz3xyqCeriFHeTAJs/eaileroszvjhkibd9qjc" alt="Graph showing human baseliner performance declining as task length increases from seconds to days." style="max-width: 100%;" /></a><p><em>Apple Podcasts and Spotify do not show images in the episode description. Try <a href="https://pocketcasts.com/" target="_blank" rel="noreferrer">Pocket Casts</a>, or another podcast app.</em></p></div>