Latent Space: The AI Engineer Podcast

211 episodes — Page 5 of 5

No Moat: Closed AI gets its Open Source wakeup call — ft. Simon Willison

It’s now almost 6 months since Google declared Code Red, and the results — Jeff Dean’s recap of 2022 achievements and a mass exodus of the top research talent that contributed to it in January, Bard’s rushed launch in Feb, a slick video showing Google Workspace AI features and confusing doubly linked blogposts about PaLM API in March, and merging Google Brain and DeepMind in April — have not been inspiring. Google’s internal panic is in full display now with the surfacing of a well written memo, written by software engineer Luke Sernau written in early April, revealing internal distress not seen since Steve Yegge’s infamous Google Platforms Rant. Similar to 2011, the company’s response to an external challenge has been to mobilize the entire company to go all-in on a (from the outside) vague vision.Google’s misfortunes are well understood by now, but the last paragraph of the memo: “We have no moat, and neither does OpenAI”, was a banger of a mic drop.Combine this with news this morning that OpenAI lost $540m last year and will need as much as $100b more funding (after the complex $10b Microsoft deal in Jan), and the memo’s assertion that both Google and OpenAI have “no moat” against the mighty open source horde have gained some credibility in the past 24 hours.Many are criticising this memo privately:* A CEO commented to me yesterday that Luke Sernau does not seem to work in AI related parts of Google and “software engineers don’t understand moats”. * Emad Mostaque, himself a perma-champion of open source and open models, has repeatedly stated that “Closed models will always outperform open models” because closed models can just wrap open ones.* Emad has also commented on the moats he does see: “Unique usage data, Unique content, Unique talent, Unique product, Unique business model”, most of which Google does have, and OpenAI less so (though it is winning on the talent front)* Sam Altman famously said that “very few to no one is Silicon Valley has a moat - not even Facebook” (implying that moats don’t actually matter, and you should spend your time thinking about more important things)* It is not actually clear what race the memo thinks Google and OpenAI are in vs Open Source. Neither are particularly concerned about running models locally on phones, and they are perfectly happy to let “a crazy European alpha male” run the last mile for them while they build actually monetizable cloud infrastructure.However moats are of intense interest by everybody keen on productized AI, cropping up in every Harvey, Jasper, and general AI startup vs incumbent debate. It is also interesting to take the memo at face value and discuss the searing hot pace of AI progress in open source. We hosted this discussion yesterday with Simon Willison, who apart from being an incredible communicator also wrote a great recap of the No Moat memo. 2,800 have now tuned in on Twitter Spaces, but we have taken the audio and cleaned it up here. Enjoy!Timestamps* [00:00:00] Introducing the Google Memo* [00:02:48] Open Source > Closed?* [00:05:51] Running Models On Device* [00:07:52] LoRA part 1* [00:08:42] On Moats - Size, Data* [00:11:34] Open Source Models are Comparable on Data* [00:13:04] Stackable LoRA* [00:19:44] The Need for Special Purpose Optimized Models* [00:21:12] Modular - Mojo from Chris Lattner* [00:23:33] The Promise of Language Supersets* [00:28:44] Google AI Strategy* [00:29:58] Zuck Releasing LLaMA* [00:30:42] Google Origin Confirmed* [00:30:57] Google's existential threat* [00:32:24] Non-Fiction AI Safety ("y-risk")* [00:35:17] Prompt Injection* [00:36:00] Google vs OpenAI* [00:41:04] Personal plugs: Simon and TravisTranscripts[00:00:00] Introducing the Google Memo[00:00:00] Simon Willison: So, yeah, this is a document, which Kate, which I first saw at three o'clock this morning, I think. It claims to be leaked from Google. There's good reasons to believe it is leaked from Google, and to be honest, if it's not, it doesn't actually matter because the quality of the analysis, I think stands alone.[00:00:15] If this was just a document by some anonymous person, I'd still think it was interesting and worth discussing. And the title of the document is We Have No Moat and neither does Open ai. And the argument it makes is that while Google and OpenAI have been competing on training bigger and bigger language models, the open source community is already starting to outrun them, given only a couple of months of really like really, really serious activity.[00:00:41] You know, Facebook lama was the thing that really kicked us off. There were open source language models like Bloom before that some G P T J, and they weren't very impressive. Like nobody was really thinking that they were. Chat. G P T equivalent Facebook Lama came out in March, I think March 15th. And was the first one that really sort of showed signs of being as capable maybe as chat G P T.[00:01:04] My, I don't, I think all of these models, they've been, the analysis

May 5, 202343 min

Training a SOTA Code LLM in 1 week and Quantifying the Vibes — with Reza Shabani of Replit

Latent Space is popping off! Welcome to the over 8500 latent space explorers who have joined us. Join us this month at various events in SF and NYC, or start your own!This post spent 22 hours at the top of Hacker News.As announced during their Developer Day celebrating their $100m fundraise following their Google partnership, Replit is now open sourcing its own state of the art code LLM: replit-code-v1-3b (model card, HF Space), which beats OpenAI’s Codex model on the industry standard HumanEval benchmark when finetuned on Replit data (despite being 77% smaller) and more importantly passes AmjadEval (we’ll explain!)We got an exclusive interview with Reza Shabani, Replit’s Head of AI, to tell the story of Replit’s journey into building a data platform, building GhostWriter, and now training their own LLM, for 22 million developers!8 minutes of this discussion go into a live demo discussing generated code samples - which is always awkward on audio. So we’ve again gone multimodal and put up a screen recording here where you can follow along on the code samples!Recorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold. We would really appreciate if you shared our pod with friends on Twitter, LinkedIn, Mastodon, Bluesky, or your social media poison of choice!Timestamps* [00:00:21] Introducing Reza* [00:01:49] Quantitative Finance and Data Engineering* [00:11:23] From Data to AI at Replit* [00:17:26] Replit GhostWriter* [00:20:31] Benchmarking Code LLMs* [00:23:06] AmjadEval live demo* [00:31:21] Aligning Models on Vibes* [00:33:04] Beyond Chat & Code Completion* [00:35:50] Ghostwriter Autonomous Agent* [00:38:47] Releasing Replit-code-v1-3b* [00:43:38] The YOLO training run* [00:49:49] Scaling Laws: from Kaplan to Chinchilla to LLaMA* [00:52:43] MosaicML* [00:55:36] Replit's Plans for the Future (and Hiring!)* [00:59:05] Lightning RoundShow Notes* Reza Shabani on Twitter and LinkedIn* also Michele Catasta and Madhav Singhal* Michele Catasta’s thread on the release of replit-code-v1-3b* Intro to Replit Ghostwriter* Replit Ghostwriter Chat and Building Ghostwriter Chat* Reza on how to train your own LLMs (their top blog of all time)* Our Benchmarks 101 episode where we discussed HumanEval* AmjadEval live demo* Nat.dev* MosaicML CEO Naveen Rao on Replit’s LLM* MosaicML Composer + FSDP code* Replit’s AI team is hiring in North America timezone - Fullstack engineer, Applied AI/ML, and other roles!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my co-host, swyx, writer and editor of Latent Space.[00:00:21] Introducing Reza[00:00:21] swyx: Hey and today we have Reza Shabani, Head of AI at Replit. Welcome to the studio. Thank you. Thank you for having me. So we try to introduce people's bios so you don't have to repeat yourself, but then also get a personal side of you.[00:00:34] You got your PhD in econ from Berkeley, and then you were a startup founder for a bit, and, and then you went into systematic equity trading at BlackRock in Wellington. And then something happened and you were now head of AI at Relet. What should people know about you that might not be apparent on LinkedIn?[00:00:50] One thing[00:00:51] Reza Shabani: that comes up pretty often is whether I know how to code. Yeah, you'd be shocked. A lot of people are kind of like, do you know how to code? When I was talking to Amjad about this role, I'd originally talked to him, I think about a product role and, and didn't get it. Then he was like, well, I know you've done a bunch of data and analytics stuff.[00:01:07] We need someone to work on that. And I was like, sure, I'll, I'll do it. And he was like, okay, but you might have to know how to code. And I was like, yeah, yeah, I, I know how to code. So I think that just kind of surprises people coming from like Ancon background. Yeah. Of people are always kind of like, wait, even when people join Relet, they're like, wait, does this guy actually know how to code?[00:01:28] Is he actually technical? Yeah.[00:01:30] swyx: You did a bunch of number crunching at top financial companies and it still wasn't[00:01:34] Reza Shabani: obvious. Yeah. Yeah. I mean, I, I think someone like in a software engineering background, cuz you think of finance and you think of like calling people to get the deal done and that type of thing.[00:01:43] No, it's, it's not that as, as you know, it's very very quantitative. Especially what I did in, in finance, very quantitative.[00:01:49] Quantitative Finance and Data Engineering[00:01:49] swyx: Yeah, so we can cover a little bit of that and then go into the rapid journey. So as, as you, as you know, I was also a quantitative trader on the sell side and the buy side. And yeah, I actually learned Python there.[00:02:01] I learned my, I wrote my own data pipelines there before airflow was a thing, and it was just me wri

May 3, 20231h 9m

Mapping the future of truly Open Models and Training Dolly for $30 — with Mike Conover of Databricks

The race is on for the first fully GPT3/4-equivalent, truly open source Foundation Model! LLaMA’s release proved that a great model could be released and run on consumer-grade hardware (see llama.cpp), but its research license prohibits businesses from running it and all it’s variants (Alpaca, Vicuna, Koala, etc) for their own use at work. So there is great interest and desire for *truly* open source LLMs that are feasible for commercial use (with far better customization, finetuning, and privacy than the closed source LLM APIs).The previous leading contenders were Eleuther’s GPT-J and Neo on the small end (FLAN-T5 (137B), PaLM (540B), and BigScience’s BLOOM (176B) on the high end. But Databricks is to my knowledge the first to release not just a cleanly licensed, high quality LLM that can run on affordable devices, but also a simple Databricks notebook that can be customized to be finetuned for your data/desired style - for $30 in 30 minutes on one machine!Mike Conover tells the story of how a small team of Applied AI engineers got convinced Ali Ghodsi and 5,000 of their coworkers to join in the adventure of building the first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use. He also indulges our questions on other recent open source LLM projects, CerebasGPT and RedPajama, though we recorded this a week before Stability’s StableLM release. Stick around to the end for some easter eggs featuring AI Drake!Recorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold.Show Notes* Mike Conover LinkedIn and Twitter* Dolly 1.0* Dolly 2.0* CICERO and Diplomacy* Dolly and Deepspeed* LLMops: * https://nat.dev/* PromptLayer* HumanLoop* Spreadsheets??* Quadratic* Alessio’s Email GPT Drafter* Open Models* Open Assistant* Cerebras GPT* RedPajama* Reflexion, Recursive Criticism and Improvement* Lightning Round* AI Product: Google Maps* AI People: EleutherAI, Huggingface’s Stas Bekman* AI Prediction: Open LLaMA reproduction, AI Twins of People (AI Drake), Valuing Perplexity * Request for Startups: LLMOps/Benchmarks, Trail MappingTimestamps* [00:00:21] Introducing Mike Conover* [00:03:10] Dolly 1.0* [00:04:18] Making Dolly* [00:06:12] Dolly 2.0* [00:09:28] Gamifying Instruction Tuning* [00:11:36] Summarization - Thumbnails for Language* [00:15:11] CICERO and Geopolitical AI Agents* [00:17:09] Datasets vs Intentional Design* [00:21:44] Biological Basis of AI* [00:23:27] Training Your Own LLMs* [00:28:21] You May Not Need a Large Model* [00:29:59] Good LLM Use cases* [00:31:33] Dolly Cost $30 on Databricks* [00:36:06] Databricks Open Source* [00:37:31] LLMOps and Prompt Tooling* [00:42:26] "I'm a Sheets Maxi"* [00:44:19] AI and Workplace Productivity* [00:47:02] OpenAssistant* [00:47:41] CerebrasGPT* [00:51:35] RedPajama* [00:54:07] Why Dolly > OpenAI GPT* [00:56:19] Open Source Licensing for AI Models* [00:57:09] Why Open Source Models?* [00:58:05] Moving Models* [01:00:34] Learning in a Simulation* [01:01:28] Why Model Reflexion and Self Criticism Works* [01:03:51] Lightning RoundTranscripts[00:00:00] Hey everyone. Welcome to the Latent Space Podcast. This is Alessio Partner and CT and Residence and Decibel Partners. I'm Joan Bama, cohost swyx Brighter and Editor of Space. Welcome, Mike.[00:00:21] Introducing Mike Conover[00:00:21] Hey, pleasure to be here. Yeah, so[00:00:23] we tend to try to introduce you so that you don't have to introduce yourself. Yep.[00:00:27] But then we also ask you to fill in the blanks. So you are currently a, uh, staff software engineer at Databricks. Uh, but you got your PhD at Indiana on the University of Bloomington in Complex Systems analysis where you did some, uh, analysis of clusters on, on Twitter, which I found pretty interesting.[00:00:43] Yeah. Uh, I highly recommend people checking that out if you're interested in getting information from indirect sources or I, I don't know how you describe it. Yes. Yeah. And then you went to LinkedIn working on. Homepage News, relevance, and then SkipFlag, which is a smart enterprise knowledge graph, which was then acquired, uh, by Workday, where you became director of machine learning engineering and now your Databricks.[00:01:06] So that's the quick bio and we can kind of go over Yeah. Step by step. But, uh, what's not on your LinkedIn that people[00:01:12] should know about you? So, because I worked at LinkedIn, that's actually how new hires introduce themselves at LinkedIn is this question. So I, okay. I have a pat answer to it. Uhhuh. Um, I love getting off trail in the backcountry.[00:01:25] Okay. And I, you know, I think that the sort of like radical responsibility associated to that is clarifies the mind. And I think that the, the things that I really like about machine learning engineering and sort of the topology of high-dimensional spaces kind of manifest when you think about a topographic mat as a contour plot.[0

Apr 29, 20231h 15m

AI-powered Search for the Enterprise — with Deedy Das of Glean

The most recent YCombinator W23 batch graduated 59 companies building with Generative AI for everything from sales, support, engineering, data, and more:Many of these B2B startups will be seeking to establish an AI foothold in the enterprise. As they look to recent success, they will find Glean, started in 2019 by a group of ex-Googlers to finally solve AI-enabled enterprise search. In 2022 Sequoia led their Series C at a $1b valuation and Glean have just refreshed their website touting new logos across Databricks, Canva, Confluent, Duolingo, Samsara, and more in the Fortune 50 and announcing Enterprise-ready AI features including AI answers, Expert detection, and In-context recommendations.We talked to Deedy Das, Founding Engineer at Glean and a former Tech Lead on Google Search, on why he thinks many of these startups are solutions looking for problems, and how Glean’s holistic approach to enterprise probllem solving has brought so much success. Deedy is also just a fascinating commentator on AI current events, being both extremely qualified and great at distilling insights, so we also went over his many viral tweets diving into Google’s competitive threats, AI Startup investing, and his exposure of Indian University Exam Fraud!Show Notes* Deedy on LinkedIn and Twitter and Personal Site* Glean* Glean and Google Moma* Golinks.io* Deedy on Google vs ChatGPT* Deedy on Google Ad Revenue* Deedy on How much does it cost to train a state-of-the-art foundational LLM?* Deedy on Google LaMDA cost* Deedy’s Indian Exam Fraud Story* Lightning Round* Favorite Products: (covered in segment)* Favorite AI People: AI Pub* Predictions: Models will get faster for the same quality* Request for Products: Hybrid Email Autoresponder* Parting Takeaway: Read the research!Timestamps* [00:00:21] Introducing Deedy* [00:02:27] Introducing Glean* [00:05:41] From Syntactic to Semantic Search* [00:09:39] Why Employee Portals* [00:12:01] The Requirements of Good Enterprise Search* [00:15:26] Glean Chat?* [00:15:53] Google vs ChatGPT* [00:19:47] Search Issues: Freshness* [00:20:49] Search Issues: Ad Revenue* [00:23:17] Search Issues: Latency* [00:24:42] Search Issues: Accuracy* [00:26:24] Search Issues: Tool Use* [00:28:52] Other AI Search takes: Perplexity and Neeva* [00:30:05] Why Document QA will Struggle* [00:33:18] Investing in AI Startups* [00:35:21] Actually Interesting Ideas in AI* [00:38:13] Harry Potter IRL* [00:39:23] AI Infra Cost Math* [00:43:04] Open Source LLMs* [00:46:45] Other Modalities* [00:48:09] Exam Fraud and Generated Text Detection* [00:58:01] Lightning RoundTranscript[00:00:00] Hey everyone. Welcome to the Latent Space Podcast. This is Alessio, partner and CTO and residence at Decibel Partners. I'm joined by my, cohost swyx, writer and editor of[00:00:19] Latent Space. Yeah. Awesome.[00:00:21] Introducing Deedy[00:00:21] And today we have a special guest. It's Deedy Das from Glean. Uh, do you go by Deedy or Debarghya? I go by Deedy. Okay.[00:00:30] Uh, it's, it's a little bit easier for the rest of us to, uh, to, to spell out. And so what we typically do is I'll introduce you based on your LinkedIn profile, and then you can fill in what's not on your LinkedIn. So, uh, you graduated your bachelor's and masters in CS from Cornell. Then you worked at Facebook and then Google on search, specifically search, uh, and also leading a sports team focusing on cricket.[00:00:50] That's something that we, we can dive into. Um, and then you moved over to Glean, which is now a search unicorn in building intelligent search for the workplace. What's not on your LinkedIn that people should know about you? Firstly,[00:01:01] guys, it's a pleasure. Pleasure to be here. Thank you so much for having me.[00:01:04] What's not on my LinkedIn is probably everything that's non-professional. I think the biggest ones are I'm a huge movie buff and I love reading, so I think I get through, usually I like to get through 10 books ish a year, but I hate people who count books, so I should say the number. And increasingly, I don't like reading non-fiction books.[00:01:26] I actually do prefer reading fiction books purely for pleasure and entertainment. I think that's the biggest omission from my LinkedIn.[00:01:34] What, what's, what's something that, uh, caught your eye for fiction stuff that you would recommend people?[00:01:38] Oh, I recently, we started reading the Three Body Problem and I finished it and it's a three part series.[00:01:45] And, uh, well, my controversial take is I did not really enjoy the second part, and so I just stopped. But the first book was phenomenal. Great concept. I didn't know you could write alien fiction with physics so Well, and Chinese literature in particular has a very different cadence to it than Western literature.[00:02:03] It's very less about the, um, let's describe people and what they're all about and their likes and dislikes. And it's like, here's a person, he's a professor of physics. That's all you ne

Apr 22, 20231h 4m

Segment Anything Model and the Hard Problems of Computer Vision — with Joseph Nelson of Roboflow

2023 is the year of Multimodal AI, and Latent Space is going multimodal too! * This podcast comes with a video demo at the 1hr mark and it’s a good excuse to launch our YouTube - please subscribe! * We are also holding two events in San Francisco — the first AI | UX meetup next week (already full; we’ll send a recap here on the newsletter) and Latent Space Liftoff Day on May 4th (signup here; but get in touch if you have a high profile launch you’d like to make). * We also joined the Chroma/OpenAI ChatGPT Plugins Hackathon last week where we won the Turing and Replit awards and met some of you in person!This post featured on Hacker News.Out of the five senses of the human body, I’d put sight at the very top. But weirdly when it comes to AI, Computer Vision has felt left out of the recent wave compared to image generation, text reasoning, and even audio transcription. We got our first taste of it with the OCR capabilities demo in the GPT-4 Developer Livestream, but to date GPT-4’s vision capability has not yet been released. Meta AI leapfrogged OpenAI and everyone else by fully open sourcing their Segment Anything Model (SAM) last week, complete with paper, model, weights, data (6x more images and 400x more masks than OpenImages), and a very slick demo website. This is a marked change to their previous LLaMA release, which was not commercially licensed. The response has been ecstatic:SAM was the talk of the town at the ChatGPT Plugins Hackathon and I was fortunate enough to book Joseph Nelson who was frantically integrating SAM into Roboflow this past weekend. As a passionate instructor, hacker, and founder, Joseph is possibly the single best person in the world to bring the rest of us up to speed on the state of Computer Vision and the implications of SAM. I was already a fan of him from his previous pod with (hopefully future guest) Beyang Liu of Sourcegraph, so this served as a personal catchup as well. Enjoy! and let us know what other news/models/guests you’d like to have us discuss! - swyxRecorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold.Show Notes* Joseph’s links: Twitter, Linkedin, Personal* Sourcegraph Podcast and Game Theory Story* Represently* Roboflow at Pioneer and YCombinator* Udacity Self Driving Car dataset story* Computer Vision Annotation Formats* SAM recap - top things to know for those living in a cave* https://segment-anything.com/* https://segment-anything.com/demo* https://arxiv.org/pdf/2304.02643.pdf * https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/* https://blog.roboflow.com/segment-anything-breakdown/* https://ai.facebook.com/datasets/segment-anything/* Ask Roboflow https://ask.roboflow.ai/* GPT-4 Multimodal https://blog.roboflow.com/gpt-4-impact-speculation/Cut for time:* WSJ mention* Des Moines Register story* All In Pod: timestamped mention* In Forbes: underrepresented investors in Series A* Roboflow greatest hits* https://blog.roboflow.com/mountain-dew-contest-computer-vision/* https://blog.roboflow.com/self-driving-car-dataset-missing-pedestrians/* https://blog.roboflow.com/nerualhash-collision/ and Apple CSAM issue * https://www.rf100.org/Timestamps* [00:00:19] Introducing Joseph* [00:02:28] Why Iowa* [00:05:52] Origin of Roboflow* [00:16:12] Why Computer Vision* [00:17:50] Computer Vision Use Cases* [00:26:15] The Economics of Annotation/Segmentation* [00:32:17] Computer Vision Annotation Formats* [00:36:41] Intro to Computer Vision & Segmentation* [00:39:08] YOLO* [00:44:44] World Knowledge of Foundation Models* [00:46:21] Segment Anything Model* [00:51:29] SAM: Zero Shot Transfer* [00:51:53] SAM: Promptability* [00:53:24] SAM: Model Assisted Labeling* [00:56:03] SAM doesn't have labels* [00:59:23] Labeling on the Browser* [01:00:28] Roboflow + SAM Video Demo * [01:07:27] Future Predictions* [01:08:04] GPT4 Multimodality* [01:09:27] Remaining Hard Problems* [01:13:57] Ask Roboflow (2019)* [01:15:26] How to keep up in AITranscripts[00:00:00] Hello everyone. It is me swyx and I'm here with Joseph Nelson. Hey, welcome to the studio. It's nice. Thanks so much having me. We, uh, have a professional setup in here.[00:00:19] Introducing Joseph[00:00:19] Joseph, you and I have known each other online for a little bit. I first heard about you on the Source Graph podcast with bian and I highly, highly recommend that there's a really good game theory story that is the best YC application story I've ever heard and I won't tease further cuz they should go listen to that.[00:00:36] What do you think? It's a good story. It's a good story. It's a good story. So you got your Bachelor of Economics from George Washington, by the way. Fun fact. I'm also an econ major as well. You are very politically active, I guess you, you did a lot of, um, interning in political offices and you were responding to, um, the, the, the sheer amount of load that the Congress people have in terms of the, the support.[00:01:00

Apr 13, 20231h 19m

AI Fundamentals: Benchmarks 101

We’re trying a new format, inspired by Acquired.fm! No guests, no news, just highly prepared, in-depth conversation on one topic that will level up your understanding. We aren’t experts, we are learning in public. Please let us know what we got wrong and what you think of this new format!When you ask someone to break down the basic ingredients of a Large Language Model, you’ll often hear a few things: You need lots of data. You need lots of compute. You need models with billions of parameters. Trust the Bitter Lesson, more more more, scale is all you need. Right?Nobody ever mentions the subtle influence of great benchmarking.LLM Benchmarks mark our progress in building artificial intelligences, progressing from * knowing what words go with others (1985 WordNet)* recognizing names and entities (2004 Enron Emails) * and image of numbers, letters, and clothes (1998-2017 MNIST)* language translation (2002 BLEU → 2020 XTREME)* more and more images (2009 ImageNet, CIFAR)* reasoning in sentences (2016 LAMBADA) and paragraphs (2019 AI2RC, DROP)* stringing together whole sentences (2018 GLUE and SuperGLUE)* question answering (2019 CoQA)* having common sense (2018 Swag and HellaSwag, 2019 WinoGrande)* knowledge of all human tasks and professional exams (2021 MMLU)* knowing everything (2022 BIG-Bench)People who make benchmarks are the unsung heroes of LLM research, because they dream up ever harder tests that last ever shorter periods of time.In our first AI Fundamentals episode, we take a trek through history to try to explain what we have learned about LLM Benchmarking, and what issues we have discovered with them. There are way, way too many links and references to include in this email. You can follow along the work we did for our show prep in this podcast’s accompanying repo, with all papers and selected tests pulled out.Enjoy and please let us know what other fundamentals topics you’d like us to cover!Timestamps* [00:00:21] Benchmarking Questions* [00:03:08] Why AI Benchmarks matter* [00:06:02] Introducing Benchmark Metrics* [00:08:14] Benchmarking Methodology* [00:09:45] 1985-1989: WordNet and Entailment* [00:12:44] 1998-2004 Enron Emails and MNIST* [00:14:35] 2009-14: ImageNet, CIFAR and the AlexNet Moment for Deep Learning* [00:17:42] 2018-19: GLUE and SuperGLUE - Single Sentence, Similarity and Paraphrase, Inference* [00:23:21] 2018-19: Swag and HellaSwag - Common Sense Inference* [00:26:07] Aside: How to Design Benchmarks* [00:26:51] 2021: MMLU - Human level Professional Knowledge* [00:29:39] 2021: HumanEval - Code Generation* [00:31:51] 2020: XTREME - Multilingual Benchmarks* [00:35:14] 2022: BIG-Bench - The Biggest of the Benches* [00:37:40] EDIT: Why BIG-Bench is missing from GPT4 Results* [00:38:25] Issue: GPT4 vs the mystery of the AMC10/12* [00:40:28] Issue: Data Contamination* [00:42:13] Other Issues: Benchmark Data Quality and the Iris data set* [00:45:44] Tradeoffs of Latency, Inference Cost, Throughput* [00:49:45] ConclusionTranscript[00:00:00] Hey everyone. Welcome to the Latent Space Podcast. This is Alessio, partner and CTO and residence at Decibel Partners, and I'm joined by my co-host, swyx writer and editor of Latent Space.[00:00:21] Benchmarking Questions[00:00:21] Up until today, we never verified that we're actually humans to you guys. So we'd have one good thing to do today would be run ourselves through some AI benchmarks and see if we are humans.[00:00:31] Indeed. So, since I got you here, Sean, I'll start with one of the classic benchmark questions, which is what movie does this emoji describe? The emoji set is little Kid Bluefish yellow, bluefish orange Puffer fish. One movie does that. I think if you added an octopus, it would be slightly easier. But I prepped this question so I know it's finding Nemo.[00:00:57] You are so far a human. Second one of these emoji questions instead, depicts a superhero man, a superwoman, three little kids, one of them, which is a toddler. So you got this one too? Yeah. It's one of my favorite movies ever. It's the Incredibles. Uh, second one was kind of a letdown, but the first is a.[00:01:17] Awesome. Okay, I'm gonna ramp it up a little bit. So let's ask something that involves a little bit of world knowledge. So when you drop a ball from rest, it accelerates downward at 9.8 meters per second if you throw it downward instead, assuming no air resistance, so you're throwing it down instead of dropping it, it's acceleration immediately after leaving your hand is a 9.8 meters per second.[00:01:38] B, more than 9.8 meters per second. C less than 9.8 meters per second. D cannot say unless the speed of the throw is. I would say B, you know, I started as a physics major and then I changed, but I think I, I got enough from my first year. That is B Yeah. Even proven that you're human cuz you got it wrong.[00:01:56] Whereas the AI got it right is 9.8 meters per second. The gravitational constant, uh, because you are no longer accelerating after you leave the ha

Apr 7, 202350 min

Grounded Research: From Google Brain to MLOps to LLMOps — with Shreya Shankar of UC Berkeley

We are excited to feature our first academic on the pod! I first came across Shreya when her tweetstorm of MLOps principles went viral:Shreya’s holistic approach to production grade machine learning has taken her from Stanford to Facebook and Google Brain, being the first ML Engineer at Viaduct, and now a PhD in Databases (trust us, its relevant) at UC Berkeley with the new EPIC Data Lab. If you know Berkeley’s history in turning cutting edge research into gamechanging startups, you should be as excited as we are!Recorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold.Edit from the future: Shreya obliged us with another round of LLMOps hot takes after the pod!Other Links* Shreya’s About: https://www.shreya-shankar.com/about/* Berkeley Sky Computing Lab - Utility Computing for the Cloud* Berkeley Epic Data Lab - low-code and no-code interfaces for data work, powered by next-generation predictive programming techniques* Shreya’s ML Principles * Grounded Theory* Lightning Round:* Favorite AI Product: Stability Dreamstudio* 1 Year Prediction: Data management platforms* Request for startup: Design system generator* Takeaway: It’s not a fad!Timestamps* [00:00:27] Introducing Shreya (poorly)* [00:03:38] The 3 V's of ML development* [00:05:45] Bridging Development and Production* [00:08:40] Preventing Data Leakage* [00:10:31] Berkeley's Unique Research Lab Culture* [00:11:53] From Static to Dynamically Updated Data* [00:12:55] Models as views on Data* [00:15:03] Principle: Version everything you do* [00:16:30] Principle: Always validate your data* [00:18:33] Heuristics for Model Architecture Selection* [00:20:36] The LLMOps Stack* [00:22:50] Shadow Models* [00:23:53] Keeping Up With Research* [00:26:10] Grounded Theory Research* [00:27:59] Google Brain vs Academia* [00:31:41] Advice for New Grads* [00:32:59] Helping Minorities in CS* [00:35:06] Lightning RoundTranscript[00:00:00] Hey everyone. Welcome to the Latent Space podcast. This is Alessio partner and CTM residence at Decibel Partners. I'm joined by my co-host, swyx writer and editor of Latent Space. Yeah,[00:00:21] it's awesome to have another awesome guest Shankar. Welcome .[00:00:25] Thanks for having me. I'm super excited.[00:00:27] Introducing Shreya (poorly)[00:00:27] So I'll intro your formal background and then you can fill in the blanks.[00:00:31] You are a bsms and then PhD at, in, in Computer Science at Stanford. So[00:00:36] I'm, I'm a PhD at Berkeley. Ah, Berkeley. I'm sorry. Oops. . No, it's okay. Everything's the bay shouldn't say that. Everybody, somebody is gonna get mad, but . Lived here for eight years now. So[00:00:50] and then intern at, Google Machine learning learning engineer at Viaduct, an OEM manufacturer, uh, or via OEM analytics platform.[00:00:59] Yes. And now you're an e I R entrepreneur in residence at Amplify.[00:01:02] I think that's on hold a little bit as I'm doing my PhD. It's a very unofficial title, but it sounds fancy on paper when you say[00:01:09] it out loud. Yeah, it is fancy. Well, so that is what people see on your LinkedIn. What's, what should, what should people know about you that's not on your LinkedIn?[00:01:16] Yeah, I don't think I updated my LinkedIn since I started the PhD, so, I'm doing my PhD in databases. It is not AI machine learning, but I work on data management for building AI and ML powered software. I guess like all of my personal interests, I'm super into going for walks, hiking, love, trying coffee in the Bay area.[00:01:42] I recently, I've been getting into cooking a lot. Mm-hmm. , so what kind of cooking? Ooh. I feel like I really like pastas. But that's because I love carbs. So , I don't know if it's the pasta as much as it's the carb. Do you ever cook for[00:01:56] like large[00:01:57] dinners? Large groups? Yeah. We just hosted about like 25 people a couple weeks ago, and I was super ambitious.[00:02:04] I was like, I'm gonna cook for everyone, like a full dinner. But then kids were coming. and I was like, I know they're not gonna eat tofu. The other thing with hosting in the Bay Area is there's gonna be someone vegan. There's gonna be someone gluten-free. Mm-hmm. . There's gonna be someone who's keto. Yeah.[00:02:20] Good luck, .[00:02:21] Oh, you forgot the seeds. That's the sea disrespects.[00:02:25] I know. . So I was like, oh my God, I don't know how I'm gonna do this. Yeah. The dessert too. I was like, I don't know how I'm gonna make everything like a vegan, keto nut free dessert, just water. It was a fun challenge. We ordered pizza for the children and a lot of people ate the pizza.[00:02:43] So I think , that's what happens when you try to cook, cook for everyone.[00:02:48] Yeah. The reason I dug a bit on the cooking is I always find like if you do cook for large groups, it's a little bit like of an ops situation. Yeah. Like a lot of engineering. A lot of like trying to figure out like what you need to deliver and then like what the

Mar 29, 202341 min

Emergency Pod: ChatGPT's App Store Moment (w/ OpenAI's Logan Kilpatrick, LindyAI's Florent Crivello and Nader Dabit)

This blogpost has been updated since original release to add more links and references.The ChatGPT Plugins announcement today could be viewed as the launch of ChatGPT’s “App Store”, a moment as significant as when Apple opened its App Store for the iPhone in 2008 or when Facebook let developers loose on its Open Graph in 2010. With a dozen lines of simple JSON and a mostly-english prompt to help ChatGPT understand what the plugin does, developers will be able to add extensions to ChatGPT to get information and trigger actions in the real world. OpenAI itself launched with some killer first party plugins for: * Browsing the web, * writing AND executing Python code (in an effortlessly multimodal way), * retrieving embedded documents from external datastores,* as well as 11 launch partner plugins from Expedia to Milo to Zapier.My recap thread was well received:But the thing that broke my brain was that ChatGPT’s Python Interpreter plugin can run nontrivial code - users can upload video files and ask ChatGPT to edit it, meaning it now has gone beyond mere chat to offer a substantial compute platform with storage, memory and file upload/download. I immediately started my first AI Twitter Space to process this historical moment with Alessio and friends of the pod live. OpenAI’s Logan (see Episode 1 from *last month*…) suggested that you might be able to link ChatGPT up with Zapier triggers to do arbitrary tasks! and then Flo Crivello, who just launched his AI Assistant startup Lindy, joined us to discuss the builder perspective.Tune in on this EMERGENCY EPISODE of Latent Space to hear developers ask and debate all the issues spilling out from the ChatGPT Plugins launch - and let us know in the comments if you want more/have further questions!SPECIAL NOTE: I was caught up in the hype and was far more negative on Replit than I initially intended as I tried to figure out this new ChatGPT programming paradigm. I regret this. Replit is extremely innovative and well positioned to help you develop and host ChatGPT plugins, and of course Amjad is already on top of it:Mea culpa.Timestamps* [00:00:38] First Reactions to ChatGPT Plugins* [00:07:53] Q&A: Keeping up with AI* [00:10:39] Q&A: ChatGPT Intepreter changes Programming* [00:12:27] Q&A: ChatGPT for Education* [00:15:21] Q&A: GPT4 Sketch to Website Demo* [00:16:32] Q&A: AI Competition and Human Jobs* [00:18:44] ChatGPT Plugins as App Store* [00:34:40] Google vs ChatGPT* [00:36:04] Nader Dabit on Selling His GPT App* [00:43:16] Q&A: ChatGPT Waitlist and Voice* [00:45:26] LangChain with Human in the Loop* [00:46:58] Google vs Microsoft vs Apple* [00:51:43] ChatGPT Plugin Ideas* [00:53:49] Not an app store?* [00:55:24] LangChain and the Future of AI* [01:00:48] Q&A: ChatGPT Bots and Cronjobs* [01:04:43] Logan Joins Us!* [01:07:14] Q&A: Plugins Rollout* [01:08:26] Q&A: Plugins Discovery* [01:10:00] Q&A: OpenAI vs BingChat* [01:11:03] Q&A: App Store Monetization* [01:14:45] Q&A: ChatGPT Plugins API* [01:17:17] Q&A: Python Interpreter* [01:19:58] The History of App Stores and Marketplaces* [01:22:40] LindyAI's Flo Crivello Joins Us* [01:29:42] AI Safety* [01:31:07] Multimodal GPT4* [01:32:10] Designing AI-safe APIs* [01:34:39] Flo's Closing CommentsTranscript[00:00:00] Hello and welcome to the Latent Space Emergency episode. This is our first ever where chatty PT just dropped a plugin ecosystem today, or at least they demoed their plugins. It's still on the wait list, but it is the app store moment for ai. And we did an emergency two hour space with Logan from OpenAI and Flo Coveo from Lin AI and a bunch of our friends.[00:00:28] And if you ever wanted to listen to what it's like to hear developers process in real time when a new launch happens, this is it. Enjoy,[00:00:38] First Reactions to ChatGPT Plugins[00:00:38] I assume everyone has read the blog post. For me the, the big s**t was do you see Greg Brockman's tweet about FFMPEG? I did not. I should check it out. It is amazing. Okay, so. So ChatGPT can generate Python code. We knew this, this is not new, and they can now run the code that it generates.[00:00:58] This is not new. I mean this is like, this is good. It's not like surprising. It's, it's fine. It can run FFMPEG code. You can upload a file, ask it to edit the video file, and it can process the video file and then it can give you the link to download the video file. So it's a general purpose compute platform.[00:01:22] Wow. Did they show how to do this? Agents? I just, I just, I just pinned it. I just, it did I, did I turn into this space? I dunno how to use it. Yeah, it's, it's showing up there. Okay. It can run like is. Is, is, is my And by, by the way hi to people. I, I don't know how to run spaces. I, I not something I normally do.[00:01:42] But You wanna say something? Please request. But yeah, reactions have a look at this video because it run, it generates and runs video editing code. You can upload any arbitrary file. It seems to have good enough compu

Mar 24, 20231h 36m

From Astrophysics to AI: Building the future AI Data Stack — with Sarah Nagy of Seek.ai

If Text is the Universal Interface, then Text to SQL is perhaps the killer B2B business usecase for Generative AI. You may have seen incredible demos from Perplexity AI, OSS Insights, and CensusGPT where the barrier of learning SQL and schemas goes away and you can intuitively converse with your data in natural language.But in the multi-billion dollar data engineering industry, Seek.ai has emerged as the forerunner in building a conversational engine and knowledge base that truly democratizes data insights. We’re proud to present our first remote interview with Sarah Nagy to learn how AI can help you “seek what matters”!Timestamps* 00:00: Intro to Sarah* 03:40: Seek.ai origin* 05:45: Data driven vs Data backfit* 09:15: How Enterprises adopt AI* 12:55: Patents and IP Law* 14:05: The Semantic Layer* 16:35: Interfaces - Dashboards vs Chat?* 21:05: LLM performance and selection* 26:05: LLMOps and LangChain* 30:55: Lightning roundShow notes* Sarah Nagy Linkedin* Seek.ai* Sarah on the dbt podcastLightning Rounds* Favorite AI Product: Stable Diffusion* Favorite AI Community: Eleuther* One year prediction: Things will move fast!* Request for Startup: Scheduling/Emails (shoutout Ipso.ai from our hackathon!)* Takeaway: Automate everything! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe

Mar 10, 202337 min

97% Cheaper, Faster, Better, Correct AI — with Varun Mohan of Codeium

OpenAI just rollicked the AI world yet again yesterday — while releasing the long awaited ChatGPT API, they also priced it at $2 per million tokens generated, which is 90% cheaper than the text-davinci-003 pricing of the “GPT3.5” family. Their blogpost on how they did it is vague: Through a series of system-wide optimizations, we’ve achieved 90% cost reduction for ChatGPT since December; we’re now passing through those savings to API users.We were fortunate enough to record Episode 2 of our podcast with someone who routinely creates 90%+ improvements for their customers, and in fact have started productizing their own infra skills with Codeium, the rapidly growing free-forever Copilot alternative (see What Building “Copilot for X” Really Takes). Varun Mohan is CEO of Exafunction/Codeium, and he indulged us in diving deep into AI infrastructure, compute-optimal training vs inference tradeoffs, and why he loves suffering.Recorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold. Timestamps* 00:00: Intro to Varun and Exafunction* 03:06: GPU Efficiency, Model Flop Utilization, Dynamic Multiplexing* 05:30: Should companies own their ML infrastructure?* 07:00: The two kinds of LLM Applications* 08:30: Codeium* 14:50: “Our growth is 4-5% day over day”* 16:30: Latency, Quality, and Correctability* 20:30: Acceleration mode vs Exploration mode* 22:00: Copilot for X - Harvey AI’s deal with Allen & Overy* 25:00: Scaling Laws (Chinchilla)* 28:45: “The compute-optimal model might not be easy to serve”* 30:00: Smaller models* 32:30: Deepmind Retro can retrieve external infromation* 34:30: Implications for embedding databases* 37:10: LLMOps - Eval, Data Cleaning* 39:45: Testing/User feedback* 41:00: “Users Is All You Need”* 42:45: General Intelligence + Domain Specific Dataset* 43:15: The God Nvidia computer* 46:00: Lightning roundShow notes* Varun Mohan Linkedin* Exafunction* Blogpost: Are GPUs Worth it for ML* Codeium* Copilot statistics* Eleuther’s The Pile and The Stack* What Building “Copilot for X” Really Takes* Copilot for X* Harvey, Copilot for Law - deal with Allen & Overy* Scaling Laws* Training Compute-Optimal Large Language Models - arXiv (Chinchilla paper)* chinchilla's wild implications (LessWrong)* UL2 20B: An Open Source Unified Language Learner (20B)* Paper - Deepmind Retro* “Does it make your beer taste better”* HumanEval benchmark/dataset* Reverse Engineering Copilot internals* Quora Poe* Prasanna Sankar notes on FLOPs and Bandwidth* NVIDIA H100 specs - 3TB/s GPU memory, 900GB/s NVLink Interconnect* Optimizer state is 14x size of model - 175B params => 2.5TB to store state → needs at least 30 H100 machines with 80GB each* Connor Leahy on The Gradient PodcastLightning Rounds* Favorite AI Product: Midjourney* Favorite AI Community: Eleuther and GPT-J* One year prediction: Better models, more creative usecases* Request for Startup: Superathlete Fitness Assistant* Takeaway: Continue to tinker!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my cohost, swyx, writer, editor of L Space Diaries.[00:00:20] swyx: Hey, and today we have Varun Mohan from Codeium / Exafunction on. I should introduce you a little bit because I like to get the LinkedIn background out of the way.[00:00:30] So you did CS at MIT and then you spent a few years at Nuro where you were ultimately tech lead manager for autonomy. And that's an interesting dive. Self-driving cars in AI and then you went straight into Exafunction with a few of your coworkers and that's where I met some of them and started knowing about Exafunction.[00:00:51] And then from out of nowhere you cloned GitHub Copilot. That's a lot of progress in a very short amount of time. So anyway, welcome .[00:00:59] Varun Mohan: That's high praise.[00:01:00] swyx: What's one thing about you that doesn't appear on LinkedIn that is a big part of what people should know?[00:01:05] Varun Mohan: I actually really like endurance sports actually.[00:01:09] Like I, I've done multiple triathlons. I've actually biked from San Francisco to LA. I like things that are like suffering. I like to suffer while I, while I do sports. Yeah.[00:01:19] swyx: Do you think a lot about like code and tech while you're doing those endurance sports or are you just,[00:01:24] Varun Mohan: your mind is just focused?[00:01:26] I think it's maybe a little bit of both. One of the nice things about, I guess, endurance athletics, It's one of the few things you can do where you're not thinking about, you can't really think about much beyond suffering. Like you're climbing up a hill on a bike and you see like, uh, you see how many more feet you need to climb, and at that point you're just struggling.[00:01:45] That's your only job. Mm-hmm. . Yeah. The only thing you can think of is, uh, pedaling one more pedal. So it's actually like a nice, a nice way

Mar 2, 202350 min

ChatGPT, GPT4 hype, and Building LLM-native products — with Logan Kilpatrick of OpenAI

We’re so glad to launch our first podcast episode with Logan Kilpatrick! This also happens to be his first public interview since joining OpenAI as their first Developer Advocate. Thanks Logan!Recorded in-person at the beautiful StudioPod studios in San Francisco. Full transcript is below the fold.Timestamps* 00:29: Logan’s path to OpenAI* 07:06: On ChatGPT and GPT3 API* 16:16: On Prompt Engineering* 20:30: Usecases and LLM-Native Products* 25:38: Risks and benefits of building on OpenAI* 35:22: OpenAI Codex* 42:40: Apple's Neural Engine* 44:21: Lightning RoundShow notes* Sam Altman’s interview with Connie Loizos* OpenAI Cookbook* OpenAI’s new Embedding Model* Cohere on Word and Sentence Embeddings* (referenced) What is AGI-hard?Lightning Rounds* Favorite AI Product: https://www.synthesia.io/* Favorite AI Community: MLOps * One year prediction: Personalized AI, https://civitai.com/* Takeaway: AI Revolution is here!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my cohost, swyx writer editor of L Space Diaries. Hey.[00:00:20] swyx: Hey . Our guest today is Logan Kilpatrick. What I'm gonna try to do is I'm gonna try to introduce you based on what people know about you, and then you can fill in the blanks.[00:00:28] Introducing Logan[00:00:28] swyx: So you are the first. Developer advocate at OpenAI, which is a humongous achievement. Congrats. You're also the lead developer community advocate of the Julia language. I'm interested in a little bit of that and apparently as I've did a bit of research on you, you got into Julia through NASA where you interned and worked on stuff that's gonna land on the moon apparently.[00:00:50] And you are also working on computer vision at Apple. And had to sit at path, the eye as you fell down the machine learning rabbit hole. What should people know about you that's kind of not on your LinkedIn that like sort of ties together your interest[00:01:02] Logan Kilpatrick: in story? It's a good question. I think so one of the things that is on my LinkedIn that wasn't mentioned that's super near and dear to my heart and what I spend a lot of time in sort of wraps a lot of my open source machine learning developer advocacy experience together is supporting NumFOCUS.[00:01:17] And NumFOCUS is the nonprofit that helps enable a bunch of the open source scientific projects like Julia, Jupyter, Pandas, NumPy, all of those open source projects are. Facilitated legal and fiscally through NumFOCUS. So it's a very critical, important part of the ecosystem and something that I, I spend a bunch of my now more limited free time helping support.[00:01:37] So yeah, something that's, It's on my LinkedIn, but it's, it's something that's important to me. Well,[00:01:42] swyx: it's not as well known of a name, so maybe people kind of skip over it cuz they were like, I don't know what[00:01:45] Logan Kilpatrick: to do with this. Yeah. It's super interesting to see that too. Just one point of context for that is we tried at one point to get a Wikipedia page for non focus and it's, it's providing, again, the infrastructure for, it's like a hundred plus open source scientific projects and they're like, it's not notable enough.[00:01:59] I'm like, well, you know, there's something like 30 plus million developers around the world who use all these open source tools. It's like the foundation. All open source like science that happens. Every breakthrough in science is they discovered the black hole, the first picture of the black hole, all that stuff using numb focus tools, the Mars Rovers, NumFOCUS tools, and it's interesting to see like the disconnect between the nonprofit that supports those projects and the actual success of the projects themselves.[00:02:26] swyx: Well, we'll, we'll get a bunch of people focused on NumFOCUS and we'll get it on Wikipedia. That that is our goal. . That is the goal. , that is our shot. Is this something that you do often, which is you? You seem to always do a lot of community stuff. When you get into something, you're also, I don't know where this, where you find time for this.[00:02:42] You're also a conference chair for DjangoCon, which was last year as well. Do you fall down the rabbit hole of a language and then you look for community opportunities? Is that how you get into.[00:02:51] Logan Kilpatrick: Yeah, so the context for Django stuff was I'd actually been teaching and still am through Harvard's division of continuing education as a teaching fellow for a Django class, and had spent like two and a half years actually teaching students every semester, had a program in Django and realized that like it was kind of the one ecosystem or technical tool that I was using regularly that I wasn't actually contributing to that community.[00:03:13] So, I think sometime in 2021 like applied to be on the board of directors of the Django Events Foundation, nort

Feb 23, 202351 min

« Prev 2 3 45