Multimodal LM roundup: Unified IO 2, inputs and outputs, Gemini, LLaVA-RLHF, and RLHF questions

January 10, 202415m 58s

Audio is streamed directly from the publisher (api.substack.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

A sampling of recent happenings in the multimodal space. Be sure to expect more this year.
This is AI generated audio with Python and 11Labs
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/multimodal-rlhf

00:00 Multimodal LM roundup: Unified IO 2, inputs and outputs, Gemini, LLaVA-RLHF, and RLHF questions
02:46 Unified IO 2: Scaling multi-input, multi-output model pretraining
07:47 Collecting preference data for images
09:31 LLaVA-RLHF: The first experiments in multimodal RLHF fine-tuning
13:20 Multimodal RLHF questions, ideas, and resources

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe

← All episodes of Interconnects