
#124: GAIA: a benchmark for General AI Assistants
Misreading Chat · Jun Mukai
December 22, 202341m 33s
Audio is streamed directly from the publisher (misreadingchat.files.wordpress.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
LLM に解かせる難問集と採点結果を向井が睨みました。ご意見感想などは Reddit やおたより投書箱にお寄せください。iTunes のレビューや星もよろしくね。
- [2311.12983] GAIA: a benchmark for General AI Assistants
- gaia-benchmark/GAIA · Datasets at Hugging Face