PLAY PODCASTS
(Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses

(Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses

Interconnects

December 11, 202412m 40s

Audio is streamed directly from the publisher (api.substack.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

Original post:

https://www.interconnects.ai/p/openais-reinforcement-finetuning

Chapters

00:00 Introduction

04:19 The impact of reinforcement finetuning’s existence

07:29 Hypotheses on reinforcement finetuning’s implementation

Figures

Fig. 1, Yann’s Cake

Fig. 2, Grader config

Fig. 3, RLVR learning curves



This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe