📆 ThursdAI Sep 21 - OpenAI 🖼️ DALL-E 3, 3.5 Instruct & Gobi, Windows Copilot, Bard Extensions, WebGPU, ChainOfDensity, RemeberAll

ThursdAI - The top AI news from the past week · Alex Volkov

September 22, 20231h 9m

Audio is streamed directly from the publisher (api.substack.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

Hey dear ThursdAI friends, as always I’m very excited to bring you this edition of ThursdAI, September 21st, which is packed full of goodness updates, great conversations with experts, breaking AI news and not 1 but 2 interviews

ThursdAI - hey, psst, if you got here from X, dont’ worry, I don’t spam, but def. subscribe, you’ll be the coolest most up to date AI person you know!

TL;DR of all topics covered

* AI Art & Diffusion

* 🖼️ DALL-E 3 - High quality art, with a built in brain (Announcement, Comparison to MJ)

* Microsoft - Bing will have DALL-E 3 for free (Link)

* Big Co LLMs + API updates

* Microsoft - Windows Copilot 🔥 (Announcement, Demo)

* OpenAI - GPT3.5 instruct (Link)

* OpenAI - Finetuning UI (and finetuning your finetunes) (Annoucement, Link)

* Google - Bard has extensions (twitter thread, video)

* Open Source LLM

* Glaive-coder-7B (Announcement, Model, Arena)

* Yann Lecun testimony in front of US senate (Opening Statement, Thread)

* Vision

* Leak : OpenAI GPT4 Vision is coming soon + Gobi multimodal? (source)

* Tools & Prompts

* Chain of Density - a great summarizer prompt technique (Link, Paper, Playground)

* Cardinal - AI infused product backlog (ProductHunt)

* Glaive Arena - (link)

AI Art + Diffusion

DALL-E 3 - High quality art, with a built in brain

DALL-E 2 was the reason I went hard into everything AI, I have a condition called Aphantasia, and when I learned that AI tools can help me regain a part of my brain that’s missing, I was in complete AWE. My first “AI” project was a chrome extension that injects prompts into DALL-E UI to help with prompt engineering.

Well, now not only is my extension no longer needed, prompt engineering for AI art itself may die a slow death with DALL-E 3, which is going to be integrated into chatGPT interface, and chatGPT will be able to help you… chat with your creation, ask for modifications, alternative styles, and suggest different art directions!

In addition to this incredible new interface, which I think is going to change the whole AI art field, the images are of mind-blowing quality, coherence of objects and scene elements is top notch, and the ability to tweak tiny detail really shines!

Additional thing they really fixed is hands and text! Get ready for SO many memes coming at you!

Btw, I created a conversational generation bot in my telegram chatGPT bot (before there was an API with stability diffusion and I can only remember how addicting this was!) and so did my friends from Krea :) so y’know… where’s our free dall-e credits OpenAI? 🤔

Just kidding, an additional awesome thing that now, DALL-E will be integrated into chatGPT plus subscription (and enterprise) and will refuse to generate any living artists art, and has a very very strong bias towards “clean” imagery.

I wonder how fast will it come to an API, but this is incredible news!

P.S - if you don’t want to pay for chatGPT, apparently DALL-E 3 conversational is already being rolled out as a free offering for Bing Chat 👀 Only for a certain percentage of users, but will be free for everyone going forward!

Big Co LLM + API updates

Copilot, no longer just for code?

Microsoft has announced some breaking news on #thursdai, where they confirmed that Copilot is now a piece of the new windows, and will live just a shortcut away from many many people. I think this is absolutely revolutionary, as just last week we chatted with Killian from Open Interpreter and having an LLM run things on my machine was one of the main reasons I was really excited about it!

And now we have a full on, baked AI agent, inside the worlds most popular operating system, running for free, for all mom and pop windows computers out there, with just a shortcut away!

Copilot will be a native part of many apps, not only windows, here’s an example of a powerpoint copilot!

As we chatted on the pod, this will put AI into the hands of so so many people for whom opening the chatGPT interface is beyond them, and I find it incredibly exciting development! (I will not be switching to windows for it tho, will you?)

Btw, shoutout to Mikhail Parakhin who lead the BingChat integration and is now in charge of the whole windows division! It shows how much dedication to AI Microsoft is showing and it really seems that they don’t want to “miss” this revolution like they did with mobile!

OpenAI releases GPT 3.5 instruct turbo!

For many of us, who used GPT3 APIs before it was cool (who has the 43 character API key 🙋‍♂️) we remember the “instruct” models where all the rage, and then OpenAI basically told everyone to switch to the much faster and more RLHFd chat interfaces.

Well now, they brought GPT3.5 back, with instruct and turbo mode, it’s no longer a chat, it’s a completion model, that is apparently much better at chess?

An additional interesting thing is, it includes logprobs in the response, so you can actually build much more interesting software (by asking for several responses and then looking at the log probabilities), for example, if you’re asking the model for a multiple choice answer to a question, you can rank the answers based on logprobs!

Listen to the pod, Raunak explains this really well!

FineTune your finetunes

OpenAI also released a UI for finetuning GPT3.5 and upped the number of concurrent finetunes to 3, and now, you can finetune your finetunes!

So you can continue finetuning already finetuned models!

Bard extensions are like chatGPT plugins but more native.

While we wait for Gemini (cmon google, just drop it!) the multi modal upcoming incredible LLM that will beat GPT-4 allegedly, Google is shoving new unbacked features into Bard (remember Bard? It’s like the 5th most used AI assistant!)

You can now opt in, and @ mention stuff like Gmail, Youtube, Drive and many more Google services and Bard will connect to them, do a search (not a vector search apparently, just a keyword search) and will show you results (or summarize your documents) inside Bard interface.

The @ ui is really cool, and reminded me of Cursor (where you can @ different files or documentation) but in practice, from my 2 checks, it really didn’t work at all and was worse than just a keyword search.

Open Source LLM

Glaive-coder-7B reaches an incredible 63% on human eval

Friends of the pod Anton Bacaj and Sahil Chaudhary have open sourced a beast of a coder model Glaive-coder-7B, with just 7B parameters, this model achieves an enormous 63% on HumanEval@1, which is higher than LLaMa 2, Code LLaMa and even GPT 3.5 (based on technical reports) at just a tiny 7B parameters 🔥 (table from code-llama released for reference, the table is now meaningless 😂)

Yann Lecun testimony in front of US senate

Look, we get it, the meeting of the CEOs (and Clem from HuggingFace) made more waves, especially on this huge table, who wasn’t there, Elon, Bill Gates, Sundar, Satya, Zuck, IBM, Sam Altman

But IMO the real deal government AI thing was done by Yann Lecun, chief scientist at Meta AI, who came in hot, with very pro open source opening statements, and was very patient with the very surprised senators on the committee. Opening statement is worth watching in full (I transcribed it with Targum cause… duh) and Yann actually retweeted! 🫶

Here’s a little taste, where Yann is saying, literally “make progress as fast as we can” 🙇‍♂️

He was also asked about, what happens if US over-restricts open source AI, and our adversaries will … not? Will we be at a disadvantage? Good questions senators, I like this thinking, more of this please.

Vision

Gobi and GPT4-Vision are incoming to beat Gemini to the punch?

According to The Information, OpenAi is gearing up to give us the vision model of GPT-4 due to the hinted upcoming release of Gemini, a multi modal model from Google (that’s also rumored to be released very soon, I’m sure they will release this on next ThursdAI, or the one after that!)

It seems to be the case for both DALL-E 3 and the leak about GPT-4 Vision, because apparently Gemini is multi modal on the input (can take images and text) AND the output (can generate text and images) and OpenAI maybe wants to get ahead of that.

We’ve seen images of GPT-4 Vision in the chatGPT UI that were leaked, so it’s only a matter of time.

The most interesting thing from this leak was the model codenamed GOBI, which is going to be a “true” multimodal model, unlike GPT-4 vision.

Here’s an explanation of the difference from Yam Peleg , ThursdAI expert on everything language models!

Voice

Honestly, nothing major happened with voice since last week 👀

Tools

Chain of Density

The Salesforce AI team has developed a new technique for improving text summarization with large language models. Called Chain of Density (CoD), this prompting method allows users to incrementally increase the informational density of a summary.

The key insight is balancing the right amount of details and main ideas when summarizing text. With CoD, you can prompt the model to add more detail until an optimal summary is reached. This gives more control over the summary output.

The Salesforce researchers tested CoD against vanilla GPT summaries in a human preference study. The results showed people preferred the CoD versions, demonstrating the effectiveness of this approach.

Overall, the Salesforce AI team has introduced an innovative way to enhance text summarization with large language models. By tuning the density of the output, CoD prompts can produce higher quality summaries. It will be exciting to see where they take this promising technique in the future.

RememberAll - extend your LLM context with a proxy

We had Raunak from rememberAll on the pod this week, and that interview is probably coming on Sunday, but wanted to include this in tools as it’s super cool.

Basically with 2 lines of code change, you can send your API calls through RememberAll proxy, and they will extract the key information, and embed and store it in a vectorDB for you, and then inject it back on responses.

Super clever way to extend memory, here’s a preview from Raunak (demo) and a more full interview is coming soon!

Cardinal has launched on ProductHunt, from my friends Wiz and Mor (link)

Quick friendly plug, Wix and Mor are friends of mine and they have just launched Cardinal, an AI infused product backlog, that extracts features, discussion about feature requests, and more, from customer feedback, from tons of sources.

Go give them a try, if you’re looking to make your product backlog work better, it’s really really slick!

Hey, if you arrived here, do me a quick favor? Send me a DM with this emoji 🥔 , and then share this newsletter with 1 friend who like you, loves AI?

Thanks, I expect many potatoes in my inbox! See you next ThursdAI 🫡

Here’s the full transcript (no video this time, I’m finishing this up at 10:30 and video will take me at least 3 more hours, apologies 🙇‍♂️)

[00:10:21] Alex Introduces Yam Peleg

[00:10:57] Alex Introduces Nisten Tahiraj

[00:11:10] Alex Introduces Far El

[00:11:24] Alex Introduces Xenova

[00:11:44] Alex Introduces Roie S. Cohen

[00:11:53] Alex Introduces Tzafrir Rehan

[00:12:16] DALL-E 3 - An AI art model with a brain, coming to chatGPT plus

[00:20:33] Microsoft c launches Windows CoPilot

[00:30:46] Open AI leaks, GPT-4 Vision, Gobic

[00:38:36] 3.5 instruct model from OpenAI

[00:43:03] Raunak intro

[00:43:25] Bard Extensions allow access to GMail, Youtube, Drive

FULL transcript:

[00:00:00] Alex Volkov: So, Thursday I is this wonderful thing that happened and happened organically as well.

[00:00:26] And basically what happens is we have this live recording every Thursday, every ThursdAI on Twitter spaces. I am I'm very grateful to share the stage with experts in their fields, and we all talk about different things, because AI updates are so multidisciplinary right now. It's really hard for even experts in their one field to follow everything.

[00:00:51] I find this mixture of experts type model on stage very conducive because we all go and find the most up to date things from the last week. And then we have folks who, it's their specification, for example, to comment on them. And you guys in the audience get the benefit of this. And it just happened organically through many conversations we had on, on Spaces since GPT 4 was launched.

[00:01:16] Literally the day, March 14th, 2023 aka Pi Day. It was the first day we started these spaces, and since then the community has grown to just... An incredible amount of people who join quality experts, top of their field people. I'm, I'm just so humbled by all of this. And since then, many folks told me, like Roy here in the audience, that, Hey, Alex, you're doing this in this weirdest hour.

[00:01:42] Thursday a. m. in San Francisco, nobody's gonna come. It's really hard to participate in the actual live recording. And so, I started a newsletter and a podcast for this. And so, if you aren't able to make it, I more than welcome you to register to the newsletter. You know what? Even if you are here every week, register to the newsletter, because why not?

[00:02:03] Because, share it with your friends. We're talking about everything AI related. Hopefully, hopefully no hype. And I have friends here to reduce the hype when I'm getting too hypey. Definitely none of the, Hey, here's a new AI tool that will help you fix the thing you don't need fixing.

[00:02:18] And I think that's, that's been resonating with the community. And so, as you now are here, you're also participant in this community. I welcome everybody to Tag Thursday AI on their news about ai or #thursdAI, or just like the Thursday iPod, which probably should join this so people get some more visibility. but you are part of the community. Now, those of you who come back, those of you who listen in, those of you who share all of them. All of these things are very helpful for the community to grow and for us to just know about more stuff.

[00:02:49] It's actually an incredible signal when two or three or more of you react under a piece of news and say, hey, we probably should cover this in Thursday. It really helps, truly. I think with that, yeah, I think this intro is enough intro. Welcome. What's up, Tzafrir? How are you?

[00:03:06] Tzafrir Rehan: All's well. Thank you very much. I wanted to, to strengthen your point about the time factor. So we expand. So anyone here who wants to be a little bit interested in generative technologies and breaking news and have some things to do in the meanwhile, and also looking to actually build something cool from all of this.

[00:03:31] Time is the limiting factor here. That's like the, the hardest resource here. Having this group and having everyone explore everything together. It's a lifesaver. It's like a order of magnitude improvement on our ability to move forward each one individually. And that's a group together just to give examples.

[00:03:53] So I'm interested in generative images, videos, and audio. And for each of these, there are hundreds of models right now available. With the availability to make fine tunes on specific datasets for some of these generating a single asset like a video can take hours. Training takes hours. If you want to explore a little bit like the effect of different prompts, just generating hundreds of samples takes hours.

[00:04:26] So without this group, it would be impossible to even know. Where to go and where to invest my time And the name of the game right now is to just choose where you invest your time on To actually get things done and keep up. So thank you. Thank you. Thank you for you and for this group And let's have fun.

[00:04:46] Alex Volkov: Thank you. Thank you everyone. I definitely feel super powered by the people in this group who can like back me up on, I read one tweet and then I saw some people react to this tweet, but I didn't have the time or the capability or the experience to dive in.

[00:05:00] And then there's folks here who did, and then we're going to complete each other. And I think our model, I haven't shared since we started, but our motto is we stay up to date. So you don't have to and have to, I think is the operating word. You want to stay up to date and you're welcome to stay up to date and you're welcome to tag us and talk with us and leave comments here in the chat as well, but you don't have to anymore because, there's a, there's a newsletter that will update you and there's folks on stage who will talk about this.

[00:05:26] I want to briefly cover one tiny thing that I did on the podcast that I think I will start doing as well. So, so far editing this hour and a half, two hours that we have here live was a pain, but I just decided to lean into this because. The conversation we're having here is so much more informative and interesting that any type of summary that I want to do or wanted to do is not going to do it justice.

[00:05:50] And so I had some different feedback from different folks about the length of the podcast. Some people said, yeah, 25 minutes, just the updates is like the right spot. And yeah, the podcast is moving towards. This is going to be the live recording. I'm going to edit this don't worry.

[00:06:04] But besides that, the podcast will be this conversation. Going forward as much as I'm able to edit this, and ship both the newsletter and the podcast in time on Thursday But with that Tzafrir thank you for the kind words, man. I appreciate you being here and sharing with us your expertise

[00:06:20] I want to say hi to Zenova and Arthur.

[00:06:22] We'll start with Zenova. Welcome Josh. How are you?

[00:06:27] Xenova: Yeah. Hey Yeah, pretty good. Been busy, busy, busy

[00:06:33] for those who Don't know. I'll just quickly introduce myself. I am the creator of Transformers. js, which is a JavaScript library for running HuggingFace Transformers directly in the browser, or Node, or Deno, or maybe Bunsoon.

[00:06:49] Who knows when that gets sorted out properly, but any JavaScript environment that you're, that you're looking for. And, yeah, I recently joined HuggingFace, which is exciting. Now I'm able to sort of work on it basically full time. And yeah, lots of, lots of exciting things are, are in the pipeline.

[00:07:06] Alex Volkov: It's been incredible to have you here and then see your progress with Transformer.

[00:07:10] js and then you joining Hug and Faceman. I appreciate the time here.

[00:07:13] Arthur, thank you for joining. Please feel free to introduce yourself.

[00:07:18] Arthur Islamov: Okay. So, my name is Arthur and I'm fixing and making WebAssembly to work with big models.

[00:07:25] So, soon you will be able to run anything huge in the browser, and I'm particularly interested in diffusion models, so right now I'm making the Staple Diffusion 2. 1 to work in the browser, and then have some plans to make SDXL, and maybe as well as Lama and other models too. With all that work done.

[00:07:50] Alex Volkov: That's awesome. Thank you for joining.

[00:07:52] Far El: Yo, what's up? Yeah, I'm my name is Farouk. I'm like founder of Nod. ai where we build autonomous agents and also working on skunkworks. ai, which is an open source group where we are pushing the boundaries of what we can do with LLMs and AI as a whole, really.

[00:08:10] Our first, like, major project is this open source MOE architecture that we've been tinkering around with for the last couple months. We're also exploring even more, exotic AI arcs to try to get, to GPT 4 level capability for open source.

[00:08:28] Alex Volkov: Awesome. Awesome. Awesome. And Nistan, welcome brother.

[00:08:33] Yam Peleg: Yeah. Hey everyone, I'm Nistan Tahirai and I'm terminally online. That's the introduction. Thank you. Yeah, I, I'm also, I'm a dev in Toronto. I worked on the first doctor wrapper which is still doing pretty well. Like no complaints so far, six months later, knock on wood. And yeah, recently started doing a lot more open source stuff.

[00:09:03] Put out a bunch of open source doctor models on, on HuggingFace, which I still need to write a benchmark for because there is no safety benchmarks that are public. And yeah, lately been working with Farouk to make the whole Sconcrooks AI mixture of experts model more usable because it's still, it's not even bleeding edge.

[00:09:26] And this one is more like hemorrhaging edge technology. It takes like three people to get it to work. And yeah, I've been extremely interested on the web GPU side ever since Zenova on a random tweet just gave me the command to start Chrome Canary properly. And then I was able to load it. Whole seven B model.

[00:09:48] And yeah, I'm thinking next for the future, if, if things go okay. I mean, my goal that I've set myself is to have some kind of distributed. Mixture of experts running via WebGPU and then having Gantt. js encrypts the connections between the, the different nodes and experts. And we'll see how that plays out because everything is changing so quickly.

[00:10:14] But yeah, it's, it's good to be here. And I'm glad I found this Twitter space randomly way back in

[00:10:21] Alex Introduces Yam Peleg

[00:10:21] Alex Volkov: Yeah, for a long time. I just want to welcome Yam to the stage. And Yam doesn't love introducing himself, but I can do it for you Yam this time if you'd like.

[00:10:31] All right. So, I will just run through the speakers on stage just real quick. Yam, thank you for joining us. Folks, Yam is our, I could say, resident... Machine learning engineer extraordinaire everything from data sets and training large language models understanding the internals of how they work and baking a few of his own definitely The guy who if we found the interesting paper, he will be able to explain this to us

[00:10:57] Alex Introduces Nisten Tahiraj

[00:10:57] Alex Volkov: Nisten. I call you like The AI engineer hacker type, like the stuff that you sometimes do, we're all in awe of being able to run stuff on CPU and doing different, like, approaches that, like, nobody thought of them before.

[00:11:10] Alex Introduces Far El

[00:11:10] Alex Volkov: Far El you're doing, like, great community organizing and we're waiting to see from the MOE and Skunkworks.

[00:11:15] And folks should definitely follow Far El for that and join Skunkworks OS. It's really hard for me to say. Skunks. Works OS efforts in the discord.

[00:11:24] Alex Introduces Xenova

[00:11:24] Alex Volkov: Zenova is our run models on the client guy so Transformers. js, everything related to ONNX and everything related to quantization and making the models smaller.

[00:11:35] All of that. All models, all modularities, but I think the focus is on, on the browser after you're new, but obviously you introduce yourself, WebGPU stuff.

[00:11:44] Alex Introduces Roie S. Cohen

[00:11:44] Alex Volkov: We have Roy, who's a DevRel in Pinecon, who he didn't say, but Pinecon and VectorDB is in Context Windows and, and discussion about RAG, like all of these things Roy is our go to.

[00:11:53] Alex Introduces Tzafrir Rehan

[00:11:53] Alex Volkov: And Tzafrir also introduced himself, everything vision, audio, and excitement. So a very well rounded group here. And I definitely recommend everybody to follow. And now with that, now that we are complete, let's please start with the updates because we have an incredible, incredible Thursday, literally every week, right folks?

[00:12:12] Literally every week we have an incredible Thursday

[00:12:16] DALL-E 3 - An AI art model with a brain, coming to chatGPT plus

[00:12:16] Alex Volkov: so we'll start with, with two big ones. I want to say the first big update was obviously DALL-E 3. So I will just share briefly about my story with DALL-E and then I would love folks on stage also to chime in. Please raise your hand so we don't talk over each other. DALL-E when it came out, When the announcement came out for DALL-E 2, I want to say it was a year ago in, a year and a half ago, maybe, in January, February or something, this blew me away.

[00:12:47] I have something called aphantasia, where, I don't know if you saw this, but like, I don't have like the visual mind's eye, so I can't like visually see things, and it's been a thing with me all my life, and then here comes the AI tool that can draw. Very quickly, then I turned my, I noticed stable diffusion, for example, and I just like.

[00:13:04] It took away from there. Everything that I have, all my interest in AI started from DALL-E basically. And DALL-E 3 seems like the next step in all of this. And the reason I'm saying this is because DALL-E 3 is visually incredible, but this is not actually like the biggest part about this, right? We have mid journey.

[00:13:22] I pinned somebody's comparison between DALL-E and mid journey. And Midrani is beautiful and Gorgias is a way smaller team. DALL-E 3 has this beautiful thing where it's connected to ChatGPT. So not only is it like going to be not separate anymore, you're going to have the chat interface into DALL-E 3.

[00:13:41] ChatGPT will be able to help you. As a prompt engineer, and you'd be able to chat with the creation process itself. So you will ask for an image, and if you don't know how to actually define what you want in this image, which types, you'd be able to just chat with it. You will say, you know what, actually make it darker, make it more cartoony, whatever.

[00:14:01] And then chatGPT itself with its brain is going to be your prompt engineer body in the creation. And I think. Quality aside, which quality is really, really good. The thing they're highlighting for, for DALL-E 3 is the ability to have multiple. Objects and subjects from your prompt in one image because it understands them.

[00:14:23] But also definitely the piece where you can keep talking to an image is changing the image creation UI significantly where, mid journey. With all, all the love we have for Midjourney is still stuck in Discord. They're still working on the web. It's, it's taking a long time and we've talked about Ideogram to lead them from the side.

[00:14:44] We know that Google has multiple image models like Imogen and different ones. They have like three, I think at this point, that they haven't yet released. And DALL-E, I think is the first. Multimodal on the output model that we'll get, right? So multimodal on the output means that what you get back towards you is not only text generation and we saw some other stuff, right?

[00:15:06] We saw some graphs, we saw some code interpreter can run code, etc. But this is a multimodal on the output. And Very exciting. I, I, DALL-E 3 news took Twitter by storm. Everybody started sharing this, including us. We can't wait to play with DALL-E 3. I welcome folks on stage. I want to start with Zafreer reaction, but definitely to share what we think about this.

[00:15:26] And the last thing I'll say... Say is that now that the community community is growing, suddenly people dmm me. So first of all, you're all welcome to DM me about different stuff. I see I see somebody in the audience with DM me. I think she's still here. So shout out about joining the better test for DALL-E three, which now they, they're able to share about Funny tidbit, it will, it's right now baked into the UI.

[00:15:48] So Dally 3 is going to be baked into ChatGPT and ChatGPT Enterprise UIs. However, when they tested this, they tested it via a plugin. So OpenAI actually built a plugin and had like a restricted access to this plugin. And folks who like talked with this plugin, the plugin ran the Dally ChatGPT version behind the scenes.

[00:16:06] And we don't have access to it yet. I don't know if anybody on stage has access. Please tell me if you do. The access is coming soon, which is interesting from OpenAI. And I think that's most of the daily stuff that I had. And I want to, please, please, buddy, I want to hear from Zafira, please.

[00:16:23] And please raise your hand. I really need us to not talk over each other.

[00:16:30] Thank you.

[00:16:31] Tzafrir Rehan: So yeah, DALL-E 3 is looking amazing. I did see some examples that people with early

[00:16:38] access were

[00:16:38] generating, and it's far more detailed and coherent than the things we are used to seeing from stable diffusion. And much less randomness, I would say. And what's exciting here is a few changes in the paradigm of how it works.

[00:16:56] For example, like you said,

[00:16:59] it doesn't expect you to know all the intricacies. You can describe in

[00:17:03] your natural language what you want to see

[00:17:05] and it will use

[00:17:07] GPT, however much they are powering the, for generating a prompt to make the whole image. That's the one thing. The other thing is that it's not

[00:17:19] text to image.

[00:17:21] It's more a conversation. Similar to how chat GPT is a conversation between you and the assistant. DALL-E 3 is a chat. So you can see in the video that they released. You generate one image and then you discuss if you want to make changes to it, if you want to make more variations, and that would be very interesting to see the flow.

[00:17:44] From the AI artist perspective, I think it will be met with a little bit hesitation, at least not knowing how much fine control they are providing. If they are letting away... to influence all these various parameters that the model uses. That is a lot of the workflow for generating AI art.

[00:18:06] And when you want to make a piece for release as an artist, you spend a lot of time fine tuning it.

[00:18:13] And today with Stable Diffusion, and with Mid Journey, we have a lot of fine grained control over changing the parameters by a little bit, adding one more word, That's one thing, and another thing is that artists usually actually want to have that control over the prompt. For example, this week I saw an interesting example, I'll try to find it for you, where the artist adds the words Event horizon to an image.

[00:18:44] Now the image is not of space, but the model does take that idea of the event horizon shape, and makes the image more shaped like an event horizon. So those are the kinds of tricks that right now prompt engineers use to make very specific changes in the image. So I'm interested to knowing if DALL-E 3 will allow that kind of control.

[00:19:08] And most of all, finally, we had DAL E2 very early in the game, before Stable Diffusion even gave the first clunky models, before everything, and there was so much work and mid journey. And so many much interesting things coming out in image generation and open AI will always like hanging back.

[00:19:30] We have this very basic value too, which sometimes works and usually doesn't gives you very weird results. So yeah, good to see that they are still working on actually

[00:19:43] innovating

[00:19:44] and thinking of the next step and how we can combine all of these technologies. To make something that's much more fun to the user experience.

[00:19:53] Alex Volkov: Absolutely. And I will remind some folks the internals behind kind of diffusion models, like stable diffusion, et cetera. OpenAI actually made the whole field happen, I think, with some was it VIT? Vision Transformer that they released and,

[00:20:05] Yam Peleg: they released the first diffusion. The first diffusion model.

[00:20:08] Alex Volkov: Yes. And so like the whole field is all to open the eye and it's great. I, it's a fair, I joined you in the, it's super great to see them innovate and give us some new UIs for this because. I heard from multiple people who have access to this, that this, you can get lost in just chatting to a picture, to the creation process.

[00:20:26] It's like a whole new creation process, basically, like prompting, but chatting. I'm very excited about this, very excited.

[00:20:31] , so we'll definitely talk more about this.

[00:20:33] Microsoft c launches Windows CoPilot

[00:20:33] Alex Volkov: I want to move on to the next thing, which is exciting. And so. Until today, basically, the word co pilot meant GitHub co pilot, at least for those of us with VS Code, those of us who write code. GitHub co pilot obviously is the auto complete engine that, gives you code abilities.

[00:20:50] And many of us use it, many of us don't use it. But, today, I think, Microsoft who owns GitHub and who is very close with OpenAI has announced Copilot for Windows. And it's coming soon with the Windows update. And we've seen some previews about this in some discussions. And I find it very interesting that Microsoft is innovating in AI, whereas we're waiting for Google to come up with Gemini.

[00:21:18] We're waiting for Google to, we're going to talk about Bard updates as well. But Copilot for Windows will be able To be just like a shortcut away. I think windows C is the new shortcut and you'd be able to ask it like he asked you for different things. And for those of us in the audience who didn't join us in the previous ThursdAIs, we.

[00:21:40] Talked with Killian from this open source called Open Interpreter. And one of the things that we all like in Open Interpreter is that it runs on my machine and it generates code, and some of that code could be AppleScript. And so it's very easy to run stuff on the Mac using AppleScript. You can open Calendar, you can send emails, you can do a bunch of stuff.

[00:21:58] And so it was beautiful to see that, like, even an open source agent like Open Interpreter is able to Run code and then, activate stuff on your computer. Having, and I think Kilian mentioned, like, Microsoft's Copilot is coming. And not just a week later, exactly a week later after that discussion, we now have Windows Copilot.

[00:22:16] Which is going to be able to run Windows for you. It's going to be able to open apps and shut down apps. It's going to be able to just like... Be a, chat GPT, but living inside windows. And I think it's going to be based on GPT 4. It only makes sense with the Microsoft OpenAI collaboration. And like I can't understate this for a second.

[00:22:38] GPT 4 was released on March, right? Chat GPT was released less than a year ago on November something. And now the next version of world's probably most. Common operating system, Windows, is going to have AI built in as a companion. How insane is this, folks? I, I, I, I have a Windows machine, because I have an NVIDIA GPU, blah, blah, blah, and not only I'm not only on the Mac and I'm really excited to, like, play with this.

[00:23:09] An additional thing that they've announced together with this update is connecting to the previous thing that we said, which is Bing, Chat, and Windows Copilot will both have DALL-E 3 built in for free. So DALL-E 3 is going to be possible on GPT Plus subscribers, the ones of us who paid the 20 bucks.

[00:23:32] However... For, through Bing, you'll be able to get it for free, and it's going to be part of Windows. Right, so, my mom, who probably doesn't use Windows, okay, her husband, my mom's husband uses Windows, he'd be able to use GPT 4 to run his Windows and also generate images. I think that's incredible, and, only Microsoft can give it out for free.

[00:23:52] I think that's mostly it in... The Microsoft update. However, it's breaking news. Literally, they released the tweet once we started the space So I'm sure more stuff will come out of there But I invite folks on stage to chime in with Windows Copilot news What do you think about this whether or not, you know This is going to change multiple people's usage of Windows or Or not

[00:24:16] Nisten Tahiraj: I mean the whole Using software thing is all up in the air now, right? Everyone's in creative mode. Yeah, it's pretty hard to predict what's going to be the, the better interface voice is getting really good. Open interpreter show that it can do a whole bunch of stuff. You can also delete all the Jason files on your computer accidentally, but I think those, those will be worked out those issues.

[00:24:43] Yeah, it is hard to, it's hard to call because again, being is still a free beta service, they haven't quite figured out how to fully monetize that, because that's not cheap to run especially considering that it is the multimodal image one, so. Yeah, don't have that much an opinion.

[00:25:05] I think it's still too early to call as to how interfaces will change.

[00:25:09] Alex Volkov: I agree. I just, I'm excited that AI that we've come to known for less than a year is now baked into an operating system for everyone, right? Even going to a website like chatGPT registering is not for everyone and they will. They will definitely , lower the bar for usage here. What's up, Yam

[00:25:28] Yam Peleg: hi I just want to say that we've seen, because everything is so early, we've seen really great infrastructure for RAG but we haven't seen a wide scale product using RAG on this scale. So, and, and it makes sense at the end.

[00:25:47] I mean, you have a lot of information scattered around all different software and different devices. It's, I think it's the perfect idea to just merge everything with the RAG and just allow you to chat with whatever information you have everywhere. And Microsoft is perfectly positioned to do that. And I'm looking forward.

[00:26:13] I think that I think it's a great idea. I don't know if the implementation. Will be great. It's, we need to see, I think it will, but we need to see, but I think that's it. As a concept is a great concept.

[00:26:26] Alex Volkov: Something that I saw from a person who's very close with the Microsoft team, for some reason, the guy behind being his name is Michael Perakin, and he has this like very non branded Twitter account that barely has an avatar image.

[00:26:43] And he's been doing, he's open. Yeah. He's been doing, he's been doing like customer support basically on Twitter. Like people will say, Oh, Bing has this, has that. And he's like been very, very responsive to some people. And so two things that he did say, first of all, Dally three is already part of Bing for some percentage of population.

[00:27:00] So if you use Bing, and we've talked about Bing before about image and vision. If you use Bing, go try and generate images with it. It used to be Dally too, but if you get. Good ones. You may get value three, which is incredible. You may already have this. And the second thing is I saw somebody commented that he is now head of windows, right?

[00:27:17] So the guy behind being the guy who pushed a I into being is now moving to be ahead of windows. And I think this together with this release shows us that. How just how much Microsoft is serious about a I everywhere and is determined to not miss this new wave like they missed the mobile wave. And everybody says that, Apple overtook Microsoft and Microsoft was like late to mobile.

[00:27:37] And And it just goes to show like how much they invest in this whole thing. And I find it like very, very good because for many people, even going to a website is a barrier of entry. And then when it's just like one click in their operating system of choice, I think it's going to be very it's going to shove AI into way more people's faces.

[00:27:54] I also want to say that Microsoft out of the big ones is fairly based in terms of. Safety and regulation, which we usually don't talk about we can talk about in maybe the next space, but like, we can have worse than Microsoft, which is surprising for me because I used to hate on the Internet Explorer most of my life.

[00:28:12] And so now Microsoft is very based. I think less comments on Windows Copilot here, folks, and then we can move on to the next stuff from OpenAI, actually.

[00:28:22] Nisten Tahiraj: So my last one is I've started using Edge Canary as my daily browser just because of the sidebar and the splitting. So if you have a widescreen monitor, it's actually very handy because you can have code interpreter on one side, and I'll show an image of it very quickly.

[00:28:39] And I have Bing, which has an excellent voice back and forth. And it has really good voice generation, which normally would be very expensive if you're paying for it, but it's in beta And then I have the actual work and on the sidebar you can have Anyway, this interface is a bit convoluted and edge browser is it's still a little bit clunky, but Overall, it's been working pretty well for me.

[00:29:06] So I I don't know. I sort of see the browser as being more and more important. That's your operating system. Some people disagree. They're trying like Sean is, is trying to do more of a OS native stuff with his tool that lets you run multiple ones. But Yeah, you can see the screenshot of how I started using it with voice, so.

[00:29:28] In general, I see it as you'll just talk to it back and forth. I think That's,

[00:29:32] Alex Volkov: at least that's what I want. Were you referring to Swix's Godmode app where you can run all the LLMs in like a window?

[00:29:39] Nisten Tahiraj: Yes, but that one, for example, on the Mac is right, there's an icon right beside the clock. And you just click that and it pops up, so it's unintrusively there.

[00:29:49] And it adds to your experience instead of getting in the way. And I, I do like that part because it is using real estate on the screen efficiently, but again, if you have a. If you use a wider monitor, so can Edge with all of its right sidebar shortcuts, because then you can add your discord, your outlook and stuff there too, right where the GPT like right where I use the code interpreter window and even have some completion and document writing stuff too now.

[00:30:19] So that's how I see it. I, it's again, it's up in the air, what people will find most helpful

[00:30:25] Alex Volkov: absolutely. And I've been using Bing somewhat as well. And yes. The sidebar can also read from the page, right? So the Bing chat in the sidebar has access to the page if you give it.

[00:30:37] And that for like summarization and different things, that's really, really excellent as well. Like it completes your browsing experience. So I'm assuming that they're doing some stuff with the co pilot.

[00:30:46] Open AI leaks, GPT-4 Vision, Gobic

[00:30:46] Alex Volkov: All right, folks, we're moving forward because we have much to cover. And, there's more news from OpenAI.

[00:30:52] They actually came before DALL-E, and we were supposed to talk about them first, and then DALL-E, but sorry, and then DALL-E came out. And now let's cover some news from OpenAI. So... It feels like the theme behind all of these news is OpenAI is trying to rush stuff to the door or to announce some stuff to the door because they know or they hear or they saw the information from Google breaking out about Gemini, the multi model wolf.

[00:31:19] Huge model from, from Google that is potentially GPT 4 like and can do images in the input and output is multimodal on the output as well. And so we don't know many sorry, we don't know much information about Gemini so far, but we do know that the information kind of the publication called the information released that Gemini is coming very soon.

[00:31:40] And we see the response from OpenAI in multiple places, right? So DALL-E 3 is one of them. OpenAI released so the information also leaked. about open the eye gearing up to give us vision for those of you who remember pretty much every space since march we're talking about gpt4 that is also multi model on the input and yeah we can probably go into the details whether or not it's fully multi model versus gobby and i would love for you to participate in this but basically gpt4 when they announced they showed the demo of it they gave it some screenshot they gave it like a sketch of a website that was able to code that and then we didn't get That feature, the Multimodality from GPT 4, we didn't get it.

[00:32:20] The only people who got it, and me and Nisten interviewed the CEO of this, is Be My Eyes, which is this app for blind folks, and they just like shove GPT 4 vision in there to help those with eyesight issues. And it seems that now Google has finally stepping into the arena, sorry for the pun, and that we may get GPT 4 vision very soon.

[00:32:42] I actually saw some screenshots how it looks inside the GPT 4 chat GPT interface. And the additional exciting thing is, they have a different model. With the code name Gobi, that as apparently it works in OpenAI. And that one is going to be multi modal and like fully. So, Yam, I would love to, if you can repeat what we talked about last night, about the differences and how GPT 4 is multi modal, but not fully.

[00:33:06] I would love for you to expand on this.

[00:33:09] Yam Peleg: Yeah. First it's important to understand that there is a huge difference in infrastructure between the two companies. And the infrastructure dictates what is possible or not possible, what is hard or not

← All episodes of ThursdAI - The top AI news from the past week