📅 ThursdAI - Feb 29 - Leap Year Special ✨

ThursdAI - The top AI news from the past week · Alex Volkov, Prateek Jain, Aditya Kusupati, and Nisten

March 1, 20241h 53m

Audio is streamed directly from the publisher (api.substack.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

Happy leap year day everyone, very excited to bring you a special once-in-a-4 year edition of ThursdAI 👏

(Today is also Dune 2 day (am going to see the movie right after I write these here words) and well.. to some folks, this is the bull market ₿ days as well. So congrats to all who weathered the bear market!)

This week we had another great show, with many updates, and a deep dive, and again, I was able to cover most of the news AND bring you a little bit of a deep dive into a very interesting concept called Matryoshka Representation Learning (aka 🪆 embeddings) and two of the authors on paper to chat with me on the pod!

TL;DR of all topics covered:

* AI Art & Diffusion & 3D

* Playground releases a new diffusion foundational model Playground V2.5 (DEMO)

* Alibaba teasing EMO - incredible animating faces (example)

* Ideogram 1.0 announced - SOTA text generation (Annoucement)

* Open Source LLMs

* Gemma update - hard to finetune, not better than 7B mistral

* LLama 3 will release in June 2024, not anytime soon

* Starcoder 2 + stack V2 (Announcement)

* Berkeley Function-Calling leaderboard Leaderboard (Announcement)

* Argilla released OpenHermesPreferences the largest open dataset for RLHF & DPO (Announcement)

* STORM from Stanford to write long documents (Thread)

* Big CO LLMs + APIs

* Mistral releases Mistral Large & Le Chat (Announcement, Le Chat)

* Microsoft + Mistral strike a deal (Blog)

* Google teases GENIE - model makes images into interactive games (announcement)

* OpenAI allowing fine-tune on GPT 3.5

* Wordpress & Tumbler preparing to sell user data to OpenAI & Midjourney

* Other

* Mojo releases their MAX inference engine, compatible with PyTorch, Tensorflow & ONNX models (Announcement)

* Interview with MRL (Matryoshka Representation Learning) authors (in audio only)

AI Art & Diffusion

Ideogram 1.0 launches - superb text generation!

Ideogram, founded by ex google Imagen folks, which we reported on before, finally announces 1.0, and focuses on superb image generation. It's really great, and I generated a few owls already (don't ask, hooot) and I don't think I will stop. This is superb for meme creation, answering in multimedia, and is fast as well, I'm very pleased! They also announced a round investment from A16Z to go with their 1.0 release, definitely give them a try

Playground V2.5

Suhail Doshi and Playground release a new foundational image model called Playground v2.5 and it looks awesome, very realistic and honestly looks like it beats MJ and DALL-E on many simple prompts.

They also announced that this model received higher user preference scores based on 1K prompts (which we didn't get to see) but they have released this model into the wild, you can download it and play with a free demo provided by modal folks

Another SORA moment? Alibaba teases EMO 🤯 (website)

Ok this one has to be talked about, Alibaba released quite a few preview videos + paper about something called EMO, a way to animate a talking/singing Avatars from just 1 image. It broke my brain, and I couldn't stop staring at it. Honestly, it's quite quite something. This model animates not only the mouth, eyes are blinking, there are emotions, hairs move, even earrings, and the most impressive, the whole Larynx muscle structure seem to be animated as well!

Just look at this video, and then look at it again.

The Github repo was created but no code released and I really hope we get this code at some point, because animating videos with this fidelity + something like SORA can mean so many possible creations!

I wrote this tweet only two weeks ago, and I'm already feeling that it's outdated and we're farther along on the curve to there with EMO, what a great release!

And just because it's so mind-blowing, here are a few more EMO videos for you to enjoy:

Open Source LLMs

Starcoder 2 + The Stack V2

Folks at hugging face and BigCode have released a beast on us, StarCoder 2 ⭐️ The most complete open Code-LLM 🤖 StarCoder 2 is the next iteration for StarCoder and comes in 3 sizes, trained 600+ programming languages on over 4 Trillion tokens on Stack v2. It outperforms StarCoder 1 by margin and has the best overall performance across 5 benchmarks 🚀🤯.

TL;DR;🧮 3B, 7B & 15B parameter version🪟 16384 token context window🔠 Trained on 3-4T Tokens (depending on size)💭 600+ Programming languages🥇 15B model achieves 46% on HumanEval🧠 Grouped Query Attention and Sliding Window Attention💪🏻 Trained on 1024 x H100 NVIDIA GPUs✅ commercial-friendly license🧑🏻‍💻 Can be used for local Copilots

The Stack v2 is a massive (10x) upgrade on the previous stack dataset, containing 900B+ tokens 😮

Big CO LLMs + APIs

🔥 Mistral announces Mistral-Large + Le Chat + Microsoft partnership

Today, we are releasing Mistral Large, our latest model. Mistral Large is vastly superior to Mistral Medium, handles 32k tokens of context, and is natively fluent in English, French, Spanish, German, and Italian.

We have also updated Mistral Small on our API to a model that is significantly better (and faster) than Mixtral 8x7B.

Lastly, we are introducing Le Chat , a chat interface (currently in beta) on top of our models.

Two important notes here, one, they support function calling now on all mistral models in their API, which is a huge deal, and two, the updated Mistral Small to a "significantly better and faster" model than Mixtral 8x7B is quite the hint!

I want to also highlight Arthur’s tweet clarifying their commitment to Open Source because it's very important. They released a new website, it again had mentions of "don't train on our models" which they removed, and the new website also had removed the section that committed them to open weights and they put a much bigger section back up quickly!

This weeks Buzz (What I learned with WandB this week)

I mentioned this before, but this may shock new subscribers, ThursdAI isn't the only (nor the first!) podcast from Weights & Biases. Our CEO Lukas has a long standing podcast that's about to hit 100 episodes and this week he interviewed the CEO of Mayo Clinic - John Hamalka

It's a fascinating interview, specifically because Mayo Clinic just recently announced a mutli-year collaboration with Cerebras about bringing AI to everyone who googles their symptoms and ends up on mayo clinic websites anyway, and apparently John has been in AI for longer that I was alive so he's incredibly well positioned to do this and bring us the AI medicine future!

Modular announces MAX (Modular Accelerated Xecution) Developer Edition Preview (blog)

Modular, the company that created Mojo Lang from Chris Lattner, has now announced the second part of their stack, coming to all of us, and it's called MAX. It's an inference engine that has Mojo built in, that supports PyTorch, Tensorflow and ONNX and is supposedly going to run the same AI models we run now, significantly faster. MAX is a unified set of tools and libraries that unlock performance, programmability and portability for your AI inference pipelines

Right now they support only CPU inference, and significantly boost performance on CPU, however, they are planning GPU support soon as well, and promise up to 5x faster AI inference for most models like Mistral, LLama etc

I personally think this is a huge development, and while it's still early, definitely worth taking a look at the incredible speed performances that we are seeing lately, from Groq (as we chatted with them last week) and Modular, we're are very well on our way to run huge models faster, and small models instantly!

🪆 MRL (Matryoshka Embeddings) interview with Aditya & Prateek

Recently OpenAi has released 2 new embeddings models recently that replaced their ada-002 embeddings, and when they released it, they mentioned a new way of shortening dimensions. Soon after, on X, the authors of a 2022 paper MRL (Matryoshka Representation Learning) spoke out and said that this new "method" is actually MRL, the concept they came up with and presented at NeurIPS.

Since then I saw many folks explore Matryoshka embeddings, from Bo Wang to Connor Shorten and I wanted to get in on the action! It's quite exciting to have heard from Aditya and Prateek about MRL, how they are able to significantly reduce embeddings size by packing the most important information into the first dimentions, the implications of this for speed of retrieval, the significant boost in use-cases post the chatGPT LLM boom and more! Definitely give this one a listen if you're interested, the interview starts at 01:19:00 on the pod.

Thank you for reading, I really appreciate you coming back here week to week, and if you enjoy this content, please share with 1 friend and give us a ⭐ rating on Apple Pod? Here's a nice Ideogram image as a preemptive thank you!

As always, here’s the full transcript

[00:00:00] Intro and welcome

[00:00:00]

[00:00:00] Alex Volkov: Hey, you're on ThursdAI. This is Alex. Happy Leap Year Special Edition. Today's February 29th. We had a great show today. So great that got carried away during the recap, and it's almost twice as long as it usually is. The recap, not the show. But no worries. As always, if you're short on time, the first 25 minutes or so of this almost two hour podcast will catch you up on everything that happened in AI this week.

[00:00:29] Alex Volkov: If you're using Apple Podcasts, or any other modern podcatcher, you can also skip to the chapters, that I'm outlining every week and listen to the part that interests you, and only to that part.

[00:00:39] Alex Volkov: This week. After the newsy updates, we also had a deep dive into something called Matryoshka Embeddings, with the authors of the MRL paper, Aditya and Pratik.

[00:00:49] Alex Volkov: And thank you guys, and I really enjoyed chatting with them both. And we geeked out on why OpenAI decided to release something they came up with two years ago and how it affects the AI industry post the LLM explosion world. So definitely give them a listen!

[00:01:05] Alex Volkov: at the end of this episode. A brief TLDR, then a full news conversation you're used to, broken down to chapters, and then a deep dive, after this brief message from Weights Biases.

[00:01:15] AI teams are all asking the same question. How can we better manage our model development workflow? The path to production is increasingly complex, and it can get chaotic keeping track of thousands of experiments and models. Messy spreadsheets and ad hoc notebooks aren't going to cut it. The best AI teams need a better solution.

[00:01:38] and better tools. They need Weights Biases, the AI developer platform, to unlock their productivity and achieve production ML at scale. Replace messy spreadsheets with an automated system of record for experiments.

[00:01:57] Communicate about model evaluation. and collaboratively review results across the team. Clean up disorganized buckets of models with a unified registry. Automatically capture full model lineage. All the data and code used for training and testing. Seamlessly connect to compute to scale up training. And run large scale sweeps efficiently to optimize models.

[00:02:24] Analyze the performance of large language models. And monitor LLM usage and costs with live, customizable dashboards. Get your team on the same page to bridge the gaps from ideation to production. Use Weights Biases to build, manage, and deploy better models, faster.

[00:02:51] Alex Volkov: folks, here we go.

[00:03:10] Alex Volkov: Welcome, everyone. Welcome. This is ThursdAI, leap year of 2024. Today is February 29th. Don't get to say this often, February 29th. And this is ThursdAI, your weekly AI news update show and deep dive. We'll see a lot of it. My name is Alex Volkov. I'm an AI evangelist with weights and biases. And I get to do this as, and bring you all the AI updates that we've collected for the past week.

[00:03:43] Alex Volkov: And I'm joined here from week to week on stage with guests and experts and co hosts. I have Yam Pelig with me and Nisten Tahirai, and we're gonna have a few more guests later in the show today. And on this very Happy leap year, very special day. We're going to talk about a bunch of updates from the AI world, including big company updates, open source stuff.

[00:04:07] TL;DR for ThursdAI - February 29th

[00:04:07] Alex Volkov: Alright, so here's everything that we've talked about on ThursdAI for February 29th. This was a great once in a four year show. I just want to shout out before I recap everything that As always, I'm very happy when folks who build the stuff that we talk about, join and talk about that stuff. And this also happened today, so we had a deep dive, which I'm going to cover at the end.

[00:04:33] Alex Volkov: And also I will shout out that we're coming up on a one year ThursdAI stuff, which is March 14th. So in two weeks, we're going to have a one year celebration. I'm not quite sure what we're going to do with this. Maybe we'll do a give out of GPU credits. Maybe I'll, maybe I'll do some other musical stuff, but yeah, that's coming.

[00:04:50] Alex Volkov: I'm very excited. It's been a year and it's been crazy, a year of AI. Maybe we'll do a full recap. So with that, everything that we've talked about in ThursdAI for February 29th. We've started with open source LLM, our coordinator, and we've talked about. Google's Gemma update. So last week we covered the Gemma was just released and how the whole community got to start using Gemma and start to think about fine tuning and support and ElumStudio and Allama and all these things and Gemma It's been a week or so since the demo was out there, and we've tried to identify from the Vibes perspective and from the Finetuners perspective whether or not Gemma is this replacement for the top running Mistral 7b models that we had, even though on evaluations Gemma looks a little better and performs a little better than Mistral, we covered that It's not really 7b, it's like 8.

[00:05:40] Alex Volkov: 5 billion parameters, they just counted this differently. And we also saw that for multiple attempts from friends of the pod, Eric Hartford, Technium, Yam was here it's really hard to fine tune. The last curve goes crazy and we haven't seen like great fine tunes yet. Something from Hugging Face, from Philipp Schmid, but definitely.

[00:05:57] Alex Volkov: The Finetuners community didn't yet make this, take this model and make it like significantly better as we expected that they would and they're still working on this, so expect more to hear about this soon. And we also highlighted how much Mistral 7b set a very high bar in open source LLMs, and it's really hard to beat, even if you're Google, even if you have a huge amount of TPUs.

[00:06:19] Alex Volkov: We then covered briefly an unfortunate announcement from the information from Meta that Lama 3 will not be breaking news in ThursdAI this week or next week. Lama 3 release is probably scheduled to June in 2024, so not anytime soon. And it doesn't look like there's any information as to why that is, only speculation.

[00:06:39] Alex Volkov: So we definitely covered that this news happened. We then moved and talked about Starcoder 2, plus the Stack version 2 as well. Starcoder 2 is from I think Hugging Face and the Starcoder team. and they released a new model that beats pretty much DeepSea Coder before this was the best coding model in this area in the 15 and 7b parameters and StarCoder 2 is this model that now beats those quite significantly and together with this they also released a stack v2 which stack is a just a huge data set of code from github and other places and this is this data set is 10x the previous one

[00:07:16] Alex Volkov: And it also includes opt out, so you could, if you don't want your code to be trained on and to put into the stack this StackV2 includes opt out requests as well, and definitely great contribution to the open source It's 900 plus billion tokens in the stack, which is crazy.

[00:07:33] Alex Volkov: And I think there's the duplication, so it reduces a huge data set and supports , 600 programming languages. And quite impressive. We then also mentioned that Berkeley, the folks from Berkeley, Guerrilla, they previously released work in making AI's retrieve and call functions. And now they released what's called a function calling leaderboard and function called leaderboard is very cool because in addition to the MTB embeddings leaderboard that we've mentioned.

[00:08:02] Alex Volkov: Today, and obviously the open source LLM leaderboard on HagenFace that we all look to and see what's the best performing models. Now we also have something that measures the ability of models to do function calling. Function calling started with OpenAI, and then Entropic added support, and now Mistral added support.

[00:08:18] Alex Volkov: So we covered this effort as well, [00:08:20] and links will be in the show notes. We then moved and covered Illa or Illa, I'm never sure how to pronounce this. They used the Open IMIS dataset. Open IMIS is the dataset from news research that is fully open. And you can use this in production without being afraid of being sued.

[00:08:37] Alex Volkov: And open imis preferences is the new. Largest open dataset for RLHF and DPO, so Direct Preference Optimization, Argea used their distilled label feature to actually take every instruction in that dataset and turn it into a preference instruction where the model would basically learn one or another, which one of the instructions are preferable.

[00:08:59] Alex Volkov: So both could be correct, but one could be more preferable. So this is basically a very short version of DPO. And Argear released the largest open source like DPO dataset as according to them. And they used interestingly, they used another Nous model based on Ye34 to actually create those pairs and those preferences, which is super cool.

[00:09:18] Alex Volkov: I love how now open source uses other open source in order to rank and improve itself, which is really cool. So this is everything we covered in the open source. And then we moved into big companies, LLM and APIs. And the big companies we talked about, the biggest news from this week was If you guys remember, we can talk about Mistral's OpenWeights model in the OpenSource LLMs and OpenWeights LLMs, but Mistral is also now an API provider, and they have this platform called LaPlatform, or LaPlatformer, and then, pardon my very bad French as well, they released a huge model for us called Mistral Large, which we only speculated about whether that's coming at some point as well, plus they also released something called LeChat.

[00:09:59] Alex Volkov: And, Mistral Large is based on some MMLU stuff is actually second performing model in the world getting 81. 2 percent on, I think, MMLU and second only to GPT 4. So Bitscloud 2 and Gemini Pro, they didn't add Ultra here, so I'm actually not sure how it compares to Ultra, but definitely now is available over API for Mistral folks.

[00:10:20] Alex Volkov: One highlight that we've talked about, it's handles 32, 000 tokens of context. And because Mistral is trying to position themselves as the leader in at least European. This model is native in French and German and Spanish and Italian. And it's definitely well performing in those languages as well.

[00:10:39] Alex Volkov: In addition to this, those models, all of the models in there, the platform now support function calling as well, which is. This is really cool that we now have multiple providers that support function calling. Plus, we have a leaderboard for function calling so definitely a lot of highlights from what happens in this area.

[00:10:56] Alex Volkov: And also, they introduced LeChat, which is a chat interface currently in beta on top of ORDEL models, so you Actually, you can go and use this if you don't pay for, let's say, GPT 4, and you only get access to three, you can go to the chat and try their models out. Shout out to Mistral. They also announced a partnership with Microsoft and for the open source community.

[00:11:15] Alex Volkov: This sounded hey, they're releasing models, but they're not dropping torrent links anymore. Are there still proponents of open source? And they came out and said, yes, we're still proponents of open source. It's very important for us. And give us some time, we'll give you some more models. Basically, was the response from Arthur Mensch from Mistral.

[00:11:31] Alex Volkov: We also talked about Google teasing Genie, which is a model that makes images into interactive games. And that was really cool to see. I'll add this link to the show notes. It's quite remarkable to see this video from one image of a character in the world. It creates a full world. Imagine how much imagine like a full Mario just created from one image of Mario.

[00:11:52] Alex Volkov: It's quite remarkable. has been in the news lately for the past week or so, we've talked about this, but basically following up of what we talked, where Gemini release was celebrated in some areas because Gemini Ultra beats GPT 4 on different things. It, it also released a lot of responses online in terms of how it reacts to certain prompts, and it, it went, potentially also affected their stock price.

[00:12:15] Alex Volkov: I'm not sure if that was the one thing, but definitely Sundar Pichai, the CEO of Google, sent an email to the whole company talking about how this release was not quite received as much as they hoped, and I'm using choice words here, he actually talked about structural changes and a potential review of the whole process of releasing this and They've took down the ability to generate people from the image version of the Gemini model, but they also talked about specifically the Gemini model itself refusing different things.

[00:12:45] Alex Volkov: This is in addition to them delivering very well and giving us Gemini 1. 5 Pro, which has 1 million tokens in the context window, which I played with this week, and I definitely think it's a great thing from Google. This announcement from Google. released in open weights Jema models and Gemini 1.

[00:13:01] Alex Volkov: 5 doing like crazy new things, but also the Gemini release at large did not go probably as expected. Potentially the reason why Google took their time to release something for us. We then covered the OpenAI is allowing Finetune on GPT 3. 5 and also the OpenAI response to New York times and said, Hey, we actually did not, do the things that you accusers are doing, but also that New York Times did some trickery in prompts to get the model to respond this way. So the saga between OpenAI and New York Times continues, and that's going to be interesting to follow along. And, OpenAI was also featured in another piece of news, actually two pieces of news.

[00:13:37] Alex Volkov: One of them is now there's a conversation that WordPress and Tumblr, both companies from the automatic company daughter companies they will prepare to sell their user data. So basically everybody who had a blog on wordpress. com and everybody who had a Tumblr account. Most of this information probably was already scraped and already featured in datasets from OpenAI, but now they're preparing to sell this information to OpenAI and MidJourney.

[00:14:00] Alex Volkov: And similar to the Reddit Google deal for 200 million dollars recently announced WordPress and Tumblr are now preparing to sell to OpenAI and MidJourney as well. And also OpenAI, and the robotics company also announced a collaboration as well. Brad Atcock's company will integrate with OpenAI's models as well.

[00:14:23] Alex Volkov: Then we moved on to AI Art in Diffusion, which had an incredible week this week with two foundational models, or I guess like big new models that are not Stable Diffusion or DALY or Mid Journey. So the first one was Playground. Playground is a, was an interface. At first it was an interface for DALY and Stable Diffusion.

[00:14:41] Alex Volkov: And they built a very nice, very simple interface that's super fast. You can inject styles. So they used all this data to actually release a new foundational model called Playground V2. And in user preference, this Playground V2 beats Midjourney and beats Stable Diffusion Excel and beats the previous model Playground and DALI.

[00:14:56] Alex Volkov: It looks really cool. And specifically, they talk about their ability to generate photorealistic images very well. And also specifically different. ratios of images. So if you think about the standard 1024 by 1024 image for stable diffusion, Excel, for example, or different other sizes, their ability to generate other nonstandard ratio models, images, it looks very cool.

[00:15:21] Alex Volkov: And in the internal user preference, they actually beat by user preference, they're showing two images for the same prompt. They beat, their v2 beats Midjourney 5. 2 in DALY by 9 percent difference in, in the previous model. And SDXL by a significant margin as well. It looks really cool and definitely worth checking this out.

[00:15:40] Alex Volkov: I'll put a link in the show notes. And the other news That's not stable Fusion, mid journey or daily related. It's quite a mouthful to say ideogram, which we've covered before, announced a version 1.0 of Ideogram X Google, folks who worked on the Google models program, like a website called Ideogram.

[00:15:56] Alex Volkov: And their approach is very participatory. It's very I think Instagram is the source of their name, like Instagram for ideas. And they announced a version 1. 0 and investment from A16z. And specifically it's state of the art on text generation. Text generation is something that we know that other models have and their model is able to put.

[00:16:19] Alex Volkov: text very well inside images. So if you want like reactions or memes or if you're doing presentations, for example I had multiple creators and characters hold like ThursdAI spaces. I think we had some folks even react as I was talking with with ideogram generated text images in in the comments as well.

[00:16:36] Alex Volkov: We, so this is all we covered in AR and diffusion [00:16:40] until we got to this like jaw dropping thing called Emo from Alibaba, which is a tease. It's not a model they released yet, but definitely there is a bunch of videos that were to me as Jaw dropping as Sora from a couple of weeks ago there is something called Emo, which is a way to animate faces to take an image and create a singing or talking face, and it's not only the face, like the shoulders move and everything, so animate an avatar based on one image, and I will not be able to do it justice, because I'm still collecting my jaw from the floor, but definitely I will add some links and some videos, and Coherence with which these models generate talking faces is just incredible.

[00:17:17] Alex Volkov: It's not only about animating the mouth, they animate eyes and eyebrows movement and even different other things like hair and earrings . And one, one last thing that I noticed that really took me a second was they even animate the vocal cords and the muscles in the throat where somebody sings, for example.

[00:17:35] Alex Volkov: And when I saw this, I was like. This is another Sora moment for being able to create with these tools. It's really incredible and I really hope they release this in open source so we'd be able to animate whatever we created with Sora.

[00:17:47] Alex Volkov: And we covered all of this. And then we had a deep dive with Aditya Kusupalli Pratik Jain the authors of MRL paper, Matryoshka Representation Learning, and they talked to us how recently OpenAI released a new version of their embedding model, and you were able to specify the number of dimensions you want, and many folks didn't understand what this is and how it works.

[00:18:08] Alex Volkov: And apparently, Even though OpenAI built all of this from scratch, it was based on the paper that they released two, almost two years ago called MRL, Matryoshka Representation Learnings. And they, we had a very nice chat and deep dive into how this actually works and how they pack The information, the embedded information from later on dimensions into some of the first dimensions.

[00:18:30] Alex Volkov: If you're interested in this area and this area is very hot, I definitely recommend you check out this conversation. It was really great. And thank you, Aditya and Pratik and the rest of the Matryoshka team for joining and talking to us about this new and exciting field

[00:18:42] Alex Volkov: And I think we started already chatting a little bit, and I see some folks from Hug Face in the audience sending sad emojis.

[00:18:48] Alex Volkov: And I want to send I want to send hugs to the Huginface ML Ops team yesterday because for many of us who now work with

[00:18:57] Hugging Face was down, we were sad and thankful

[00:18:57] Alex Volkov: Huginface, and by work actually our code includes a bunch of imports from Huginface there's transformers as well. Yesterday was a realization of how big Hug Face is now part of many of our lives.

[00:19:11] Alex Volkov: I think for the first time for many of us, this was like such a big realization because that imports stopped working and the downloads didn't actually work. And so we actually had a long space yesterday pretty much throughout the whole downtime as we were holding each other's hands. It reminded me, I don't know Yam, if you want to chime in, but it reminded me previously when GitHub was down, basically You know, you could work, but if you can't commit your code,

[00:19:34] Alex Volkov: What does it help? And I wanted to hear from you, because I think you had some models queued up for some stuff, and then you were waiting for them?

[00:19:42] Yam Peleg: Yeah, look HuggingFace is really the hub today. It's not only for using, for most people, I think it's because they cannot fork or clone models from HuggingFace, so they cannot do many things that they do because your code relies on on getting the model from HuggingFace. This is why, by the way, they tweeted just For anyone that doesn't know, you can work offline.

[00:20:05] Yam Peleg: If you ever cloned a model from HuggingFace ever, you probably have it already on your computer, so you can just use the offline version. So there is a command for that. But for many people, it's cloning the models, but for many other people, it's also the feedback that you get from HuggingFace. I can tell you some people are, some people, some other people here in the stage, that we submit models to the leaderboard, and try to get Try to fine tune better and better models, and for us it's also the feedback of what is going on, where our models shine, and where do we need to make them even better.

[00:20:41] Yam Peleg: And for me at least, I was I had four models that I waited for results for, and many other people as well. And just shout out to Hugging Face for actually doing it. I'm running evals locally, and I know how to do it. Heavy it is to actually run them and how much compute it takes for how long.

[00:21:01] Yam Peleg: And it's amazing to see that they have such a leaderboard with so many models. It's amazing. It's thousands, like hundreds of thousands of dollars of compute to actually create such a leaderboard. So it's amazing to see. And they provide it literally for free where, the community is growing every day.

[00:21:18] Yam Peleg: So it. It does cost so huge shout out for them,

[00:21:22] Alex Volkov: I was trying to prepare

[00:21:23] Yam Peleg: are all addicted much.

[00:21:25] Alex Volkov: Absolutely, Dicta, I was trying to prepare yesterday for this space, and part of my preparation is reading X and Twitter, but definitely part of my presentation preparation is going to Hug Face, reading the model cards reading the leaderboards, for example. I was trying to count in my head how much stuff we're getting for free from Hug Face, and one such example is just their blog, which was also done, which I read specifically to prepare for the Matryoshka conversation today.

[00:21:50] Alex Volkov: And, That's just like a huge resource on its own. There's the whole conversation piece where, there's the hub, but there's also the conversations. AK posts papers, for example, they post them on Hug Face, and then there's a whole discussion threads about them as well. That wasn't accessible.

[00:22:04] Alex Volkov: Leaderboards themselves weren't accessible. And just the amount of compute, like you're saying, that they throw at us for free to be able to support this open source is definitely worth a shout out, and definitely shout out to engineers there that brought the hub back. Nisten, what are your thoughts on this?

[00:22:22] Nisten Tahiraj: Yeah, without Hugging Face, this place turned into a flea market for models. People were asking, does anyone have Quan72? And I was like, no, I have the Finetune. And then, the dev lead of Quan72 pointed us to some Chinese site where they can download it. It was pretty

[00:22:39] Alex Volkov: Wait. Modelscope is not just some Chinese site. Modelscope is where I think most of the Chinese folks are posting their models. It's like the, I think modelscope. cn, I think is the alternative on the Chinese area. So there is at least a backup for some Chinese, like models. Although I think you have to translate that website, right?

[00:22:59] Alex Volkov: But yeah, I don't know we had a conversation yesterday, and Far El was also talking about datasets, where many folks just upload the dataset, don't keep a local version of it locally, and then to be able to run evaluations, or do different things like this, that also was prevented yesterday.

[00:23:14] Alex Volkov: Definitely yesterday we discovered how big Hug Face became part of many of our lives, and it was a sobering realization, but, I don't know, for me, like I saw people complain online, And I get it, folks. I get it. Sometimes, you complain. But honestly, sometimes As far as I understood, the downtime wasn't even some their fault.

[00:23:32] Alex Volkov: There was like a mongo thing in AWS. I'm not sure. I didn't dive in deep. I just, when this happens, in my head, when I dealt with downtimes before in my professional career, Nothing but appreciation for the team to work hard. And the, I think, Yam, Clem, the CEO, even responded to you. When you said hug and face it down, right?

[00:23:55] Yam Peleg: To many people, not just to me, but yeah they are responsive.

[00:23:59] Alex Volkov: Responsiveness and like being in the community and saying, Hey folks, we understand, we're sorry about this. I think that's basically, besides having folks work on this actively, which we know they had, this is all we can basically ask for. So I'm just sending positive vibes and appreciation. I saw some people getting salty.

[00:24:17] Alex Volkov: I saw some people saying Oh, this sucks. And we need a backup. And I was like, yes, but also, this doesn't mean that, you can ignore everything for free that we've got so far from this incredible organization. So shout out. And I don't work there, but I do have many friends who do.

[00:24:33] Alex Volkov: I think, yeah, Nisten, go ahead. And then we'll move on to actual recap of everything we're going to talk about.

[00:24:39] Nisten Tahiraj: Yeah, and same for the leaderboard. We give Hugging Face so much crap when things don't work, and I really appreciated that. It's actually the CEO that responds directly to your Complaints and tickets and it's not just some like support person. No, it's Clem. He's the actual CEO. They'll respond [00:25:00] They're the first ones to respond.

[00:25:01] Nisten Tahiraj: So so that's pretty amazing You don't really see it in other companies Like we don't expect the president of microsoft brad smith to ever respond to a github issue. Could you imagine that? So

[00:25:12] Alex Volkov: He is not your favorite. I would love Satya though to, to chime in on the discourse but not Brad. Yeah, absolutely cannot imagine this and kudos, kudos to them for the participation in the community.

[00:25:23] Open Source AI corner

[00:25:23] Alex Volkov: And I guess we should start with our usual thing open source. So I guess let's start with open source Alright folks, this is our regular update every week for the Open Source Corner, where we're gonna start with Interestingly, Mistral is not in the open source corner, is not featured in the open source corner today, but we'll mention them anyway, because from last week, if you guys remember Jammer was released, it wasn't open source, it was open weights, but definitely Google stepped in and gave us two models to run, and since then, I just wanted to mention that many folks started using these models, and there's quite a few stuff that, yeah, I'm actually wanting to hear from you about, because we talked about this, the Gemma models are not necessarily seven billion parameters, right?

[00:26:24] Gemma from google is hard to finetune and is not as amazing as we'd hoped

[00:26:24] Alex Volkov: This was a little bit of a thing. And also about fine tuning. Could you give us like a brief out like how the last week in terms of Gemma acceptance in the community was?

[00:26:32] Yam Peleg: Oh, wow. Gemma is giving me a hard time. This is for sure. I'm fine tuning Gemma for, or at least struggling with fine tuning Gemma for a week at the moment. Okay, so starting from the beginning, GEMMA is not exactly 7 bit. The way it is referred in the paper is that the parameters in the model itself, apart from the embeddings, are exactly 7 billion parameters.

[00:27:01] Yam Peleg: But then you add the embeddings and you're a little bit over 8. 5, if I remember correctly. Um, which is fine. I don't think anyone has any problem with a bigger model. Just, I think that it'll be less, it'll be more genuine to just say it's an 8p parameters model. It's fine. That's first.

[00:27:23] Yam Peleg: Second, it's, it behaves differently. than what we're used to with Mistral and Lama. I'm not sure why. Maybe someone can tell me, but I'm not sure why. It behaves differently. And many people are currently working and struggling to fine tune it better. This is where it is at the moment. I heard, I've seen already Orca.

[00:27:54] Yam Peleg: Someone fine tuned on Orca and didn't get Great results. I also heard that Hermes, someone Finetune on Hermes, I think from Nous. I'm not sure, but I think. Also, results are not great. I'm continuing pre training and the loss is is doing whatever it wants. It goes down and then out of the blue it starts to jump.

[00:28:16] Yam Peleg: I'm not sure exactly why. It might be because the architecture is slightly different. There are slight modifications. So maybe that or maybe something else, but yeah, I think we're still. exploring the model. We don't have an answer yet.

[00:28:35] Alex Volkov: Yeah that's what I got as well. I pinned a few examples of Eric Hartford from DolphinFan, I think he now works in Abacus and Technium as well, tried to, to do some stuff and all these losses look crazy. All these losses look like jumping around up and down. I saw a tweet from Philip Schmidt from Hug Face where they were able to, to fine tune some stuff and the conversation from Eric and Wing Lian from Axolotl.

[00:29:00] Alex Volkov: And there looks to be an effort to try and hone this thing and see if actually, fine tuning this on some stuff. The Hermes stuff, Finetune, was not really like an official news research thing. It looked like somebody just took the data set and folks weren't able to actually Get it to run or perform well as far as I saw I haven't seen an update from this But I definitely follow up with news.

[00:29:22] Alex Volkov: So I would just remind folks, last week we talked about Jemma was well received.

[00:29:26] Alex Volkov: Everybody hopped on board like super quick and added support. LMStudio and Olami added support like super quick. Wing started adding support to Axolotl for fine tuning. Hug and Face added support in, I think, Transformers. There's a bunch of TreeDAO added support for Flash Intention. There's a whole community effort to receive GEM as much as possible.

[00:29:47] Alex Volkov: And they also released some stuff in, in, in quantized versions from Google. So very good effort from Google and then very big acceptance from the community. But since then, what I'm trying to highlight is a lot of the stuff that we've talked about a lot of the way we judge models, whether or not they're good or not is, if they're finetunable, for example, is one thing, but also if they're instruction following, if it's easy to converse with them. I haven't seen any of this come across my timeline at all. I will be frank, I only interacted with the 2 billion parameter model. And wasn't impressed. It's great that we released it.

[00:30:20] Alex Volkov: I wouldn't, would not be using this for any of my workloads. Nisten, do you have any other feedback as well? Specifically around like how Mistral 7b seems to be still. A good alternative, even though it's performing less on evaluations.

[00:30:34] Nisten Tahiraj: Yeah, I feel like we have been spoiled by just how high of a bar Mistral 7b has set for everyone, that it even made Mistral large feel somewhat unimpressive, although it was answering everything perfectly well. But, yeah, not only has it set a very high bar, but it was also very easy to work with. So the amount of innovation that came upon the community just building off of the initiated weights, has made This class of models, extremely competitive that even Google has a hard time cracking through that.

[00:31:15] Nisten Tahiraj: Yeah, our expectations now for a 7b model are extremely high. It has to run on my phone. It has to do what I want. It has to respond. It has to summarize stuff, has to carry forward the conversation. Oh, and it has to score high on the benchmarks too. And it. This pace of innovation that the community has set upon this is just very hard and also incredibly interesting to see that Google is having a very hard time matching or getting close.

[00:31:46] Alex Volkov: Specifically because, in the land of GPU poor and GPU rich, in the original article that defined the two categories, Google is the GPU slash TPU rich, right? They could and have thrown a bunch of compute at these models and still the folks from Mistral, a team that's less than 30 people that started eight months ago released a model.

[00:32:06] Alex Volkov: 6 months ago? I think Mistral 7B is around 6 months ago, right? September? That Google, 6 months after, with all the GPU richness, is very barely able to match, not to mention, beat significantly. Which is unlike any pace that we're used to. We're used to a 7B model beating a 7TB model week after week.

[00:32:25] Alex Volkov: And here's a huge company coming out and saying, Hey. Here's our best attempt at the 7b model that YUM doesn't even consider a 7b model, and it's in at least our attempts to play around with this. It's not beating significantly, which is strange. But also not being able to get fine tuned very easily.

[00:32:43] Alex Volkov: Very interesting and very a highlight of how much quality the the Mistral model was. I will also say that Arthur Mensch we'll cover this in the Mistral section afterwards, but he came out and he said something and basically said, we can only do so much with 1500. H100s, 1500 H100s just by contrast, Meta announced a few months ago famously, Zuckerberg came out and said, by the end of this year, they're going to have 600, 000 worth of equivalent of H100 compute, 600, 000 H100s to train and host and probably, do inference on Meta and Llama.

[00:33:19] Alex Volkov: And [00:33:20] this is like 1500 H100s that Mistral was able to use in Finetune, a model that Google cannot wipe off the board completely.

[00:33:29] LLama 3 won't be released until June 2024

[00:33:29] Alex Volkov: It's very crazy. Moving on to basically another news update that's not a news update. We've been waiting for Lama 3 for every week. I've been saying, Hey, it could get released here and et cetera.

[00:33:41] Alex Volkov: There was a leak from the information. I actually don't know if it was a leak or not, but the information came out and then a bunch of other companies followed with this news where Lama 3 will be released. I think in June, this was the update. LLAMA 3 will not get updated and released for us anytime this year.

[00:34:00] Alex Volkov: We were hoping for a one year anniversary. LLAMA 1 was released in February 2023. And now we're not gonna see LLAMA 3, even though it's like a finished training

← All episodes of ThursdAI - The top AI news from the past week