LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

December 22, 202417m 35s

Audio is streamed directly from the publisher (media.rss.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page View transcript

Show Notes

LongBench v2 is a new test to see how well AI can understand and answer questions about really long texts, like books, articles, and code. The test has over 500 questions, and even experts have trouble answering them quickly. The test covers lots of different types of questions, like figuring out who did a crime in a story, translating a new language, and understanding how a computer program works. The test is hard because it makes AI think deeply about the information and not just find simple answers. The researchers who made LongBench v2 hope it will help make AI even smarter and better at understanding complicated things.

https://arxiv.org/pdf/2412.15204

← All episodes of AI Papers Podcast Daily