Episode 907

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

June 13, 202520m 39s

Audio is streamed directly from the publisher (media.transistor.fm) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

🤗 Upvotes: 76 | cs.CL, cs.LG

Authors:
Pengyi Li, Matvey Skripkin, Alexander Zubrey, Andrey Kuznetsov, Ivan Oseledets

Title:
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Arxiv:
http://arxiv.org/abs/2506.06395v3

Abstract:
Large language models (LLMs) excel at reasoning, yet post-training remains critical for aligning their behavior with task goals. Existing reinforcement learning (RL) methods often depend on costly human annotations or external reward models. We propose Reinforcement Learning via Self-Confidence (RLSC), which uses the model's own confidence as reward signals-eliminating the need for labels, preference models, or reward engineering. Applied to Qwen2.5-Math-7B with only 16 samples per question and 10 or 20 training steps, RLSC improves accuracy by +13.4% on AIME2024, +21.2% on MATH500, +21.7% on Minerva Math, +20.8% on Olympiadbench, and +9.7% on AMC23. RLSC provides a simple, scalable post-training method for inference models, requiring only a small number of samples and unlabelled supervision.

← All episodes of Daily Paper Cast