Interconnects

156 episodes — Page 4 of 4

Model merging lessons in The Waifu Research Department

Note: some of the audio in the second half is a little wonky, but the general voice was upgraded so hopefully it's a little less "poppy" until then!I'm trying to fix little pronunciation problems on a weekly basis. Thanks to my early fans! It'll keep improving. E.g. some of the months were wonky.When what seems like pure LLM black magic is actually supported by the literature.This is AI generated audio with Python and 11LabsSource code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/model-merging00:00 Model merging lessons in The Waifu Research Department02:21 How and why does model merging work?07:13 Aside: merging vs. ensembles vs. mixture of experts08:21 Why are people doing this?11:22 Tools & Links11:51 Brief (visual) literature review12:07 Full model merging and recent methods15:55 Weight averaging during pretraining17:18 LoRA merging17:53 More backgroundFigure 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-merging/img_005.pngFigure 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-merging/img_016.pngFigure 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-merging/img_042.pngFigure 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-merging/img_051.pngFigure 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-merging/img_055.pngFigure 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-merging/img_058.pngFigure 7: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-merging/img_060.pngFigure 8: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-merging/img_062.pngFigure 9: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-merging/img_065.pngFigure 10: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-merging/img_075.pngFigure 11: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-merging/img_077.pngFigure 12: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-merging/img_084.png This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe

Jan 29, 202419 min

Local LLMs, some facts some fiction

Local LLMs: the latency solution, Meta's open AGI, personalization myth, and moats X factorThe deployment path that'll break through in 2024. Plus, checking in on strategies across Big Tech and AI leaders.This is AI generated audio with Python and 11LabsSource code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/local-llms0:00 Local LLMs: the latency solution, Meta's open AGI, personalization myth, and moats X factor4:15 The personalization myth7:13 Meta's local AGI and moats X factorsFigure 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/local-llms/img_026.png This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe

Jan 24, 20249 min

Multimodal blogging: My AI tools to expand your audience

A fun demo on how generative AI can transform content creation, and tools for my fellow writers on Substack!This is AI generated audio with Python and 11LabsSource code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/multimodal-blogging-tools0:00 Multimodal blogging tools2:57 Stratechery, passport, and wonderful customer experiences5:51 Wrap-up, features, and next stepsFigure 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/multimodal-blogging/img_006.pngFigure 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/multimodal-blogging/img_008.pngFigure 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/multimodal-blogging/img_012.pngFigure 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/multimodal-blogging/img_020.png This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe

Jan 17, 20248 min

Multimodal LM roundup: Unified IO 2, inputs and outputs, Gemini, LLaVA-RLHF, and RLHF questions

A sampling of recent happenings in the multimodal space. Be sure to expect more this year.This is AI generated audio with Python and 11LabsSource code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/multimodal-rlhf00:00 Multimodal LM roundup: Unified IO 2, inputs and outputs, Gemini, LLaVA-RLHF, and RLHF questions02:46 Unified IO 2: Scaling multi-input, multi-output model pretraining07:47 Collecting preference data for images09:31 LLaVA-RLHF: The first experiments in multimodal RLHF fine-tuning13:20 Multimodal RLHF questions, ideas, and resources This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe

Jan 10, 202415 min

Where 2024’s “open GPT4” can’t match OpenAI’s

And why the comparisons don't really matter. Repeated patterns in the race for reproducing ChatGPT, another year of evaluation crises, and people who will take awesome news too far.This is AI generated audio with Python and 11LabsSource code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/open-gpt4-limitations00:00 Where 2024's "open GPT4" can't match OpenAI's03:19 Models vs. products04:51 RLHF progress: Revisiting Llama 2's release and potential in 202408:30 Smaller scale open RLHF10:33 Opportunities12:24 Commentary This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe

Jan 5, 202413 min

Interviewing Tri Dao and Michael Poli of Together AI on the future of LLM architectures

This interview is on YouTube and podcast players.Giving a voice to researchers is the best way to cut through the noise and understand what is happening with core developments of LLM technologies. I’m excited to get to talk with Michael Poli (Stanford PhD student + research at Together AI) and Tri Dao (incoming professor at Princeton + Chief Scientist at Together AI). This builds on the mega-post from yesterday on the same topics, though the interview is obviously less math heavy:Interconnects is a reader-supported publication. Consider becoming a subscriber.Topics: Introductions | Why Attention works and may not scale | Quadratic scaling in attention | What is Striped Hyena | What is Mamba | Mamba hardware optimization | Predictions for 2024 architectures | More predictions for AIIntroductions[00:00:00] Nathan Lambert: Okay. Hey, everyone. Welcome to the first interview that we're going to post on interconnects. I'm really trying to bring more scientific voices into the AI discourse as media is covering a lot these days. I'm happy to be here with Michael Poli and Tri Dao, experts in some of these non attention architectures that have been really blowing up in the last few weeks of December.So, Michael, do you want to introduce yourself first?[00:00:25] Michael Poli: Sure. Thank you. Thank you, Nathan. For inviting me, I, do research at Together AI. And I was also a PhD student at Stanford, working with Stefano Ermon and, and, Chris Re, that's, that's how I met Tri as well. if I had to choose maybe, I moved to a few different areas of research.if I had to choose one, I like to, do research at the intersection of signal processing, dynamical systems, and deep learning, and most recently, luckily, there's been more interest in, in kind of efficient architectures that use some of these principles. to improve scaling, along different axes and to, to get sort of new, new trade offs at inference time.[00:01:13] Nathan Lambert: Great. And Tri?[00:01:16] Tri Dao: Everyone, thanks Nathan for, for, hosting us. really excited to be here. I'm Tri. I, just finished my PhD at Stanford. and I'm being assistant professor at Princeton, and right now I'm chief scientist at Together AI. it's, it's a startup working on AI infrastructure. And, yeah, I've been working at the intersection of machine learning and systems, so designing algorithms that take advantage of the hardware that, that they run on.I'm interested in, long range, dependencies, how to encode that into a model, designing architectures that can, can, learn long range dependencies. yeah, really excited to be here.Why Attention works and may not scale[00:02:01] Nathan Lambert: Okay. I think I'm going to, I have some questions dive right into this. I think you two will kind of both answer them or someone can answer longer, whatever you want.I think to start with, we should talk about maybe why attention works and what the limitations of attention are. I think. Almost every person in tech broadly now knows that a transformer is a model built with attention and chat GPT does that but like, why is this so good, even like how much of a transformer is built with attention are there other things going on, and what might be challenges there.[00:02:35] Tri Dao: Right. so, transformer which is this. Contexture that powers most of the exciting applications that we're seeing nowadays, as you mentioned, and so on. attention is kind of the core layer there, and attention actually became, earlier, around 2014, 2015, and then transformer came out, incorporating that, focusing a lot on, on, attention, with these, MLPs, interleaving, MLP and, and attention.And I think the success largely has been, They are, they seem to be able to scale really well so that you can scale up the models, with more, more parameters, with more data. And that has been the recipe for, for success. It sounds obvious now, but I think, five years ago that wasn't, that wasn't clear.so it seems like, you know, Transformer Architecture is, is a hugely successful one. and, you know, a couple of reasons why it's successful. I think it's like General enough that it's able to learn a lot from data. And two is, is pretty friendly to hardware. You can, there's no kind of sequential dependency like previous RNNs.so as a result, you can run it well on GPUs, TPUs. you can scale it up. It's very hardware efficient. I've personally have worked on making it more hardware efficient as well. So it's just kind of the recipe for, for success, which is, general architecture that scales well. if you're an NLP person, maybe you, you, you said, you know, incorporate some kind of inductive bias for, for, to protect, personally, I think it's a very general architecture that, that scales well and it's hardware friendly.[00:04:16] Nathan Lambert: Yeah. Yeah. It's, it's remarkable how it seems so obvious now and it's like. I think one of the things that studying this work is the context length becomes a really interesting access to stud

Dec 21, 202335 min

« Prev 1 2 34