ThursdAI July 13 - Show recap + Notes

ThursdAI - The top AI news from the past week · Alex Volkov, swyx (Shawn), and Junaid Dawud

July 14, 20231h 42m

Audio is streamed directly from the publisher (api.substack.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

Welcome Friends, to the first episode of ThursdAI recap.

If you can’t come to the spaces, subscribing is the next best thing. Distilled, most important updates, every week, including testimony and tips and tricks from a panel of experts. Join our community 👇

Every week since the day GPT-4 released, we’ve been meeting in twitter spaces to talk about AI developments, and it slowly by surely created a community that’s thirsty to learn, connect and discuss information.

Getting overwhelmed with daily newsletters about tools, folks wanted someone else to do the legwork, prioritize and condense the most important information about what is shaping the future of AI, today!

Hosted by AI consultant Alex Volkov (available for hire), CEO of Targum.video, this information-packed edition covered groundbreaking new releases like GPT 4.5, Claude 2, and Stable Diffusion 1.0. We learned how Code Interpreter is pushing boundaries in computer vision, creative writing, and software development. Expert guests dove into the implications of Elon Musk's new XAI startup, the debate around Twitter's data, and pioneering techniques in prompt engineering. If you want to stay on top of the innovations shaping our AI-powered tomorrow, join Alex and the ThursdAI community.

Since the audio was recorded from a twitter space, it has quite a lot of overlaps, I think it’s due to the export, so sometimes it sounds like folks talk on top of each other, most of all me (Alex) this was not the case, will have to figure out a fix.

Topics we covered in July 13, ThursdAI

GPT 4.5/Code Interpreter:

00:02:37 - 05:55 - General availability of Chad GPT with code interpreter announced. 8k context window, faster than GPT-4.

05:56 - 08:36 - Code interpreter use cases, uploading files, executing code, skills and techniques.

08:36 - 10:11 - Uploading large files, executing code, downloading files.

Claude V2:

20:11 - 21:25 - Anthropic releases Claude V2, considered #2 after OpenAI.

21:25 - 23:31 - Claude V2 UI allows uploading files, refreshed UI.

23:31 - 24:30 - Claude V2 product experience beats GPT-3.5.

24:31 - 27:25 - Claude V2 fine-tuned on code, 100k context window, trained on longer outputs.

27:26 - 30:16 - Claude V2 good at comparing essays, creative writing.

30:17 - 32:57 - Claude V2 allows multiple file uploads to context window.

32:57 - 39:10 - Claude V2 better at languages than GPT-4.

39:10 - 40:30 - Claude V2 allows multiple file uploads to context window.

X.AI:

46:22 - 49:29 - Elon Musk announces X.AI to compete with OpenAI. Has access to Twitter data.

49:30 - 51:26 - Discussion on whether Twitter data is useful for training.

51:27 - 52:45 - Twitter data can be transformed into other forms.

52:45 - 58:32 - Twitter spaces could provide useful training data.

58:33 - 59:26 - Speculation on whether XAI will open source their models.

59:26 - 61:54 - Twitter data has some advantages over other social media data.

Stable Diffusion:

89:41 - 91:17 - Stable Diffusion releases SDXL 1.0 in discord, plans to open source it.

91:17 - 92:08 - Stable Diffusion releases Stable Doodle.

GPT Prompt Engineering:

61:54 - 64:18 - Intro to Other Side AI and prompt engineering.

64:18 - 71:50 - GPT Prompt Engineer project explained.

71:50 - 72:54 - GPT Prompt Engineer results, potential to improve prompts.

72:54 - 73:41 - Prompts may work better on same model they were generated for.

73:41 - 77:07 - GPT Prompt Engineer is open source, looking for contributions.

Related tweets shared:

https://twitter.com/altryne/status/1677951313156636672

https://twitter.com/altryne/status/1677951330462371840

@Surya - Running GPT2 inside code interpreter

tomviner - scraped all the internal knowledge about the env

Peter got all pypi packages and their description

swyx added Claude to to smol menubar (which we also discussed)

SkalskiP awesome code interpreter experiments repo

See the rest of the tweets shared and listen to the original space here:

https://spacesdashboard.com/space/1YpKkggrRgPKj/thursdai-space-code-interpreter-claude-v2-xai-sdxl-more

Full Transcript:

00:02 (Speaker A) You. First of all, welcome to Thursday. We stay up to date so you

don't have to. There's a panel of experts on top here that discuss

everything.

00:11 (Speaker A) If we've tried something, we'll talk about this. If we haven't, and

somebody in the audience tried that specific new AI stuff, feel free

to raise your hand, give us your comment. This is not the space for

long debates.

00:25 (Speaker A) We actually had a great place for that yesterday. NISten and Roy from

Pine, some other folks, we'll probably do a different one. This

should be information dense for folks and this will be recorded and

likely we posted at some point.

00:38 (Speaker A) So no debate, just let's drop an opinion and discuss the new stuff

and kind of continue. And the goal is to stay up to date so you don't

have to in the audience. And I think with that, I will say hi to Alan

Janae and we will get started.

00:58 (Speaker B) Hi everyone, I'm NISten Tahira. I worked on, well, released one of

the first Docker chat bots on the market for Dr. Gupta and scaled it,

and now we're working on getting the therapist bought out once. We

can also pass more testing and get Voice to work at a profitable

manner because we don't really have VC. So at the scale of few

hundred thousand users, the API bills matter quite a bit.

01:31 (Speaker B) So, yeah, these spaces have been pretty helpful because I have some

trouble with running a Voice transformer, trying to run it on the

browser on web GPU, and then the person that wrote Transformers JS

comes in here and just says, oh yeah, that back end is messed up.

Just try blas and synth and stuff. So these have been very

interesting and technical spaces.

01:54 (Speaker A) Yeah, we need to get Zenova in here. Zenova is the guy who NISten was

referring to. Al Janae, do you want to give a few words of intro and

say hi and then we'll start? Just briefly, please, because I think we

need to get going.

02:09 (Speaker C) Sure. Hi, I'm Janae.

02:11 (Speaker D) I'm the resident noob, I started messing around with AI at the

beginning of.

02:16 (Speaker E) The year, and I also host the.

02:18 (Speaker D) Denver AI Tinkerers coming up next week.

02:20 (Speaker A) And if you're in Colorado area, greater Denver, please join us. It's

going to be a blast.

02:27 (Speaker F) Hi, I'm Al Chang. I'm kind of an old school technologist. Just

getting started with the AI again and just here to help.

02:36 (Speaker A) Yeah. All right, folks, so I think we've had a whole space on this.

Simon Wilson and me and many, many other folks chimed in. The second

this was released.

02:50 (Speaker A) Was that six? Was that Sunday? It's hard to keep track of actual

days. Saturday, Saturday, last week, exactly during those spaces, by

the way, as we were talking, Chad GPT, Logan and everybody else from

OpenAI announced general availability of Chad GPT with code

interpreter. So GPT four with code interpreter.

03:12 (Speaker A) And I think we just heard from Matt that even some folks who got

access to the slept on it a little bit because it's maybe potentially

because of its very horrible name that's really hard to type

interpreter and get lost in the R's. But it's an extremely powerful

new superpower that we've got. And we've had the whole space talking

about use cases that people already had.

03:37 (Speaker A) It was like three days into it and since then I bet that many more

people tried it. I think Swyx 20,000 listens to that space, plus the

pod. At least people definitely want to hear new use cases, right?

03:53 (Speaker G) Yeah, not much else to add about it. I think it's the feature for

Switch.

03:59 (Speaker A) Posted a whole deep dive essay and coined it GPT 4.5 between us

friends. And one of the interesting things about it is that we think

at least that's where we are currently after playing around with

this, is that it's a fine tuned model. So they kept training this on

actually running code and executing code.

04:21 (Speaker A) That's what we believe. We don't know, nobody confirmed this and then

that it's fine tuned from an earlier checkpoint of GBT Four. And so

we actually had some folks on spaces talking about that it's less

restricted and better like previous times.

04:36 (Speaker A) So it's an interest, I think NISten right. We have some folks who

tell us they're using code interpreter without the code part. They

just stopped the GPT Four just because it's that model.

04:48 (Speaker A) And I think also they took down the 25 messages per hour restriction

on code interpreter. I've had like four hour sessions and it stopped

like I didn't saw complaints.

05:03 (Speaker G) So it's just better.

05:06 (Speaker A) It's also fast. I think it's fast because not many people maybe use

this by default and this could be the reason for the speed, but it's

definitely faster for sure. I think also context window, was it Yam?

Somebody summarized the context window and they told us the context

window for code interpreter is eight k versus the regular GPD for

actually that could be also a kick.

05:29 (Speaker G) You mean Yam copied and pasted.

05:34 (Speaker A) I would encourage you and Yam need to kiss in the cup because Yama is

doing a lot of legwork to take down the stuff that he posted and Yama

is working on that and it's very visible and you guys need to do

there you go, yam, you need to clear the air. However, Pharrell and

Gabriel bring you up as well. And we're going to keep talking about

code interpreter because that's what we're here to do. NISten and a

few other folks and we started cooking with code interpreter.

05:59 (Speaker A) And by cooking I mean we started stretching the complete boundaries

of what's possible there. And I think Simon Willison kick started

this with the latent space Pod. So for folks who are not following

latent space pod, feel free to follow SWIX, his main account, not

this hidden one.

05:59 (Speaker A) And SWIX reposted the spaces we had simon Wilson was able to run node

JS and Dino within code interpreter, even though OpenAg didn't allow

for that by uploading like a binary and asking code interpreter to

generate. Simon then promptly said they fine tuned the model away

from that and we found ways anyway to ask it to do some stuff. I have

a thread on how I was able to run a vector DB chroma inside code

interpreter.

06:10 (Speaker A) I ran whisper CPP. We saw some folks running GPT-2 inside code

interpreter, right? So imagine an Ll GPD Four running another and

talking to it. It's like a little brother inside.

06:10 (Speaker A) I personally love that inception. I don't know if the person who ran

GPD Two is in the audience as Dan I think was the nickname NISten. I

don't know.

07:22 (Speaker A) Surya.

07:23 (Speaker B) Surya. He also wrote the search to PDF plugin for GP Four plugins and

he wrote that in like two days and it's more used than any other

enterprise thing, which is pretty hilarious.

07:36 (Speaker A) We need to get surya.

07:38 (Speaker B) Yeah, he just did that as I'm just going to do a search plugins for

PDF and it's like the most used.

07:45 (Speaker A) So dope pretty amazing. Again, in that space we've talked about

having like a living manual, so to speak, for code interpreter use

cases because it's coding. So it covers pretty much everything that

we can think of as coders, maybe just in Python, maybe restricted to

an environment. And I've been trying to do that with the code

interpreter can hashtag and I encourage all of you, let me pin this

to the top of the space, to the jumbotron if you have an interesting

code interpreter thing and I'll bring up Skalsky P to the stage as

well.

08:03 (Speaker A) And Lantos, so many good friends. If you have a very interesting code

interpreter technique or skill or new thing that people can do

without coding skills, please tag with this hashtag so folks can find

this. Otherwise I will cover the main three things the code

interpreter gave us besides the new model.

08:42 (Speaker A) One of them is uploading files. And since we've talked, we've noticed

that you can upload up to 250 megabyte files and those can be zips of

other files. So we've uploaded like full models weights.

08:55 (Speaker A) We've uploaded bin files. It's incredible that you can now drag and

drop whole directory and have JPT just know about this and read about

this. We've uploaded weights in embeddings.

09:08 (Speaker A) You can then obviously execute code in a secure environment, which is

again incredible, and you can download files, you can ask it to

actually generate a download for you, which is also super, super

cool. Maybe one last thing I'll say before I'll give it to the

audience for a few more cool use cases. And folks in the stage,

please feel free to raise your hand.

09:21 (Speaker A) I'll get to you in the order that you raise your hand if you have a

use case. Some folks built like a built in memory built in brain

within code interpreter just to save to a file. That's what I try to

do with my vector DB and then they download that memory at the end of

every session and then upload this to the next one and have some like

a prompt that reminds the jgpd like to start from that point.

09:50 (Speaker A) So in addition to the context window, they're also having a separate

offloaded file persisted memory. So code interpreter incredible.

Again.

10:00 (Speaker A) Potentially GPT 4.5. And if you haven't played with this, feel free

to if you don't know what to play with, follow the code interpreter

can hashtag and let's get to Skowski.

10:11 (Speaker A) What's up, man?

10:14 (Speaker H) Hi, hello. Do you hear me?

10:15 (Speaker A) Yeah, we can hear you fine.

10:19 (Speaker H) Yeah, I've been playing a lot with code interpreter over the past

five days, mostly with computer vision use cases because that's what

I do. I haven't introduced myself. I'm pretty much doing computer

vision full time for the past five years and was focusing on like

when I saw that you can input image and video, that was immediately

what I was thinking, we need to make it to computer vision. So I went

through some low effort tasks.

10:46 (Speaker H) So I managed to run old school computer vision algorithms, face

detection, tracking of objects, stuff like that. But I also managed

to exploit it a little bit. So you can add yolo object detection

models to the list of models that were run in code interpreter.

11:15 (Speaker H) There are some problems with memory management, so I'm not yet fully

happy with the result. But yeah, I managed to run it on images and on

videos and the things that are super cool and are kind of like

underrated right now, false positive. So when the model detects

something that shouldn't be detected, you can really use text to ask

code interpreter to filter out false detections.

11:48 (Speaker H) You can just give it your feeling like why that stuff is happening or

when or where. And it's very good at cleaning the detections, which

was kind of like mind blowing for me. And one thing that I noticed

that it sucks at is I managed to create an application that counts

objects moving on the video when they cross the line.

11:55 (Speaker H) And I didn't use any off the shelf libraries, I just had detector and

say, okay, now draw a line and count objects when they cross the

line. It's terrible at that, writing math logic to figure out that

something crossed something, we had like ten prompts or twelve

prompts exchange and I basically bailed out on that, forget it. So

there are some things that blow my mind, but there are something that

probably not.

12:49 (Speaker A) So folks, feel free to follow Skowski. And also I just pin to the top

of the Tweet his brand new awesome code interpreter use cases, git

repo, and there's a list, there's a bunch of use cases there. This

could also serve as a de facto manual. So feel free to go there at

PRS and follow that for updates.

12:52 (Speaker A) And I want to get to Lentos because he seems to be unmuting. What's

up, Lentos?

13:12 (Speaker H) I was just going to say I can't follow him because he's blocked me.

13:15 (Speaker C) Sad face.

13:16 (Speaker H) Oh, no, I noticed that, but I'm not sure why. I will undo that.

13:20 (Speaker A) All right, I'm the peacemaker in the status. Please kiss and make up.

You two as well. Everybody should get along.

13:26 (Speaker A) Yay. I want to get to some other folks who came up on stage recently.

And Gabriel, welcome to talk about code interpreter and your use

cases.

13:35 (Speaker A) Jeanette, if you play with this, I would like to hear two more

opinions before we move on to the next incredible thing. Yeah. Oh,

you guys are talking about let's get together and then June sorry, I

should have been explicit about the order.

13:54 (Speaker E) No worries. So I just posted a comment on this space about the

message cap on a conversation. So even though in the UI, it still

says 25 messages per 3 hours, if you look at the network request, you

can see that. And I posted this, it's actually 100 messages per 3

hours now.

14:12 (Speaker E) And I don't know if they're scaling that up and down as demand

increases and decreases, or they're just trying to trick people into

conserving their messages, but it's definitely been on 100 for a

little while now. Can you confirm same thing you can see in the

network?

14:32 (Speaker A) Can you confirm the same for the regular mode, or do you think the

regular mode is still restricted? Well.

14:41 (Speaker E) Based on just the fact that there's only one message cap, they don't

have message cap per model. So I think it's just consistent across

all the GP four models. And that's also my experience in the last

it's been a little while now. It's probably at least a couple of

weeks that it's been higher.

14:51 (Speaker E) And same thing we discussed, I think, on Saturday about the context

window. And you can also see it in the API that the context window is

eight K for plugins and code interpreter, and it's 4K for the base

GPT four model.

15:16 (Speaker A) That's awesome. Like suicide. Better in every single way.

15:22 (Speaker D) Yeah.

15:23 (Speaker A) Awesome. Thanks.

15:24 (Speaker E) Yeah. In terms of use cases I can share, I've been digging around a

lot in the code interpreter, and I was really trying to hone in on

why are the packages that are installed there, the Python packages in

the environment? Why are they there? Some of them seem really random,

and some of them make a lot of sense. And they released it, saying

it's for, basically data analysis. And a lot of them make sense for

that, but some of them are just really wild, like the ML packages.

15:54 (Speaker A) And the Gabriel folks in the audience. If you look up at the jumbo

tone where we pin Tweets two Tweets before there's a Tweet by Peter

Zero Zero G, who actually printed all the packages and asked GPT Four

to kind of summarize what they do. So if you have no idea about the

potential capabilities of what it can do, feel free to pin that tweet

for yourself. And then it has a bunch of descriptions of what's

possible.

16:11 (Speaker A) So go ahead. Gabriel. Yeah, cool.

16:28 (Speaker E) Yeah, I've done the same kind of thing with just a short yeah, I got

it to do a four word description for each one. So if you're looking

for a really short description of each package, I'll post that tweet.

And if you're looking for a long one, I think Peters is great. And

what you can see there is that there are packages for web

development, right? There's Fast API, there's Flask, there's a bunch

of other packages for Web development.

16:40 (Speaker E) And besides the fact that there's no network access, which obviously

other people using it might be turning it on, but it was just

interesting to me. My perspective is that OpenAI has been using this

internally throughout all their teams for development and testing it

internally, but probably also using it pretty consistently. They

probably have access to the Internet.

17:14 (Speaker A) Yeah, I'm sure they have access to.

17:15 (Speaker E) The Internet and they can install new packages. But I think they also

have the ability, instead of uploading files and downloading files,

they have the ability to just mount persist memory, I don't think, to

persist. I think they just mount their local working directory on

their computer right wherever they're working. So they have their

active directory where they have their project, and they just mount

that and give the code interpreter access to the whole directory with

their whole repo of their project.

17:48 (Speaker C) Yeah.

17:48 (Speaker E) And then Chat Gvt is just writing code to the working directory and

reading from there and it can explore their whole project. We can do

that now by uploading, you can zip your whole project and upload the

whole thing zipped and have it unzipped. And then it can kind of

explore your whole project. But then once it makes some changes, you

want to commit them, you have to ask it to zip the whole thing back,

download it and upload it.

17:48 (Speaker E) And then I think what they're able to do is more of like a kind of

peer programming thing where the developer makes some changes and

then Chat GPT makes some changes and they're kind of working

together. This is taking it one step further. I don't know if they

have this or not, but it would be super.

18:29 (Speaker A) Cool in the realm of updates unless there is no speculation. But I

would love to explore this more with you in the next stage because

this applies to open source and how people already saw somebody tag

us after the last space and said, hey, I'll build this open source. I

would love to pin this to the top of the space. However, I want to

move on to new space and then move on to other updates.

18:51 (Speaker A) Sorry to interrupt, but thanks. I think that the collaborative,

persistent code superpower that probably maybe at some point will

come to us as well. Plus the internet access is like another ten x I

want to get to Skowskin and lent us and I think we'll move on to

Claude.

19:08 (Speaker A) Thanks Gabriel.

19:11 (Speaker H) Yeah, I have a question. I'm not really sure guys, if you notice that

I was obviously experimenting with PyTorch because I needed it for

computer vision. I noticed that the PyTorch version that is installed

in the environment actually pre compiled to work with CUDA. So it's a

GPU version of PyTorch.

19:31 (Speaker H) Even though that in the environment you don't have access to GPU, you

only have CPU. So I'm curious guys, what you think about that. Why is

that? Any ideas?

19:42 (Speaker A) Ideas that just come from what Gabriel just said? Likely we're

getting the same Kubernetes container. However, the open AI folks

have like unlimited stuff. They probably also have CUDA that would

make sense right there is probably connected to a GPU as well, but

that's just an idea. Lantos, I want to get to you and then we'll move

on to Claude.

20:02 (Speaker A) Folks and folks in the audience, feel free to hit the little right

button on the bottom left looks like a little message and leave

comments through commenting as well. Moving on to Claude V Two. Folks

in the audience and folks on stage, feel free to hit up the emojis

plus one.

20:19 (Speaker A) Minus one if you have tried Claude V two if you like it and you

haven't liked it. I'm going to cover this anyway because I think

somebody called me, I think Roy from Python called me a Cloud V Two

fanboy yesterday and I first got offended and I told him that I'm

just a fanboy for 24 hours. Before that I was a code interpreter

fanboy and then I figured with myself whether or not I am a fanboy of

Claude V Two.

20:43 (Speaker A) And yeah, I am and Sweet told me to relax and in fact I invited him

here to be the red blanket on the other side of the list. Anthropic

the company that we can definitely consider number two after opener.

I think that's fair in terms of quality.

21:02 (Speaker A) Have long released Claude version and they made some ways when they

released Claude AKS clong with 100K complex window, they have

released Cloud V Two and let me paste some Claude sorry, pin some

Claude thingies in the jumbotron, sorry. However, Cloud V Two

released with multiple stuff and I want to focus on two stuff and I

think we'll cover the UI first and then we're going to talk about the

model itself, UI wise and product wise. My hot take and I'll pin this

to the top.

21:38 (Speaker A) Unfortunately not debate this, but I love you, all of you. Is that as

products, Cloud V Two right now beats JPD as a product. My mom can go

into two websites and she'll prefer one versus the other one.

21:51 (Speaker A) Or my friends that don't know Xai as plugged in as we are, theirs is

free. And I think Cloud V Two beats GPD 3.5, which is also free, and

100K context window with the model being traded, 200 unleashes, a

bunch of use cases that were not possible before.

22:12 (Speaker A) It just frees you up. If you heard Skowski just say the limitations

of code interpreter. A bunch of these limitations stem from the eight

K context window.

22:13 (Speaker A) If you print a bunch within the code that you're doing, code

interpreter sometimes forgets what you guys talked about 20 minutes

ago. And the 100K context window also means a long, long conversation

history with the model. And I think it's really great.

22:37 (Speaker A) Not to mention that you can drag and drop full books in there. Those

books need to be in like one or two files and they still don't accept

zip files. And I'm planning to release an extension soon that does

this for us and unifies and single files.

22:51 (Speaker A) So hopefully by next week we'll have some updates. However, once you

upload that much or you can upload like a transcript or a podcast,

you can do a bunch of stuff because Cloud V Two is also better

trained on code and we saw a significant jump in wait, I'm switching

to the model, so let me get back to the UI. The UI allows you to

upload files.

23:09 (Speaker A) The UI has a command k interface, which I personally love. I hit

Command K in every website and see if they support it. You can just

start a new chat real quick.

23:21 (Speaker A) It doesn't have Share, but it's definitely refreshed and free UI.

It's called Cloud AI and that's the URL, and if you haven't tried it,

definitely try it. Comments about just the product side and the UI

side before we move to the model? Anybody play with this? Anybody

like it? Anybody loves the upload files feature? I would love to hear

hands and comments.

23:42 (Speaker A) Go ahead, Matt.

23:44 (Speaker D) A bit of a weird thing, but what I've noticed is it's actually quite

frustrating if you want to paste text in it actually, if it's over a

certain length, will paste in as a file. Little small thing.

Hopefully they'll change it, but it is really annoying because then

you can't edit it. Chat GP does do that much better, but I generally

agree with you that overall the product experience on Claude is.

24:03 (Speaker A) Significantly the new one. The fresh coat of paint they released for

us. I will say that Cloud so far was kind of a hidden gem, that only

folks who got access to the API actually got access to their UI, and

that UI was very restricted and folks who have access to Cloud API

know what I'm talking about. I think that UI is still around.

24:22 (Speaker A) It still shows your history. It's like very restrictive. It's not as

cool as this it's not as leak as this.

24:27 (Speaker A) So we like cloud AI, definitely a plus. Check it out. Now, let's talk

about the model behind this UI, because that model also changed and

several incredible things that changed with it.

24:38 (Speaker A) First of all, they released a new model, same price as the previous

one. We love to see this. Please everybody, including opinion,

continue giving the same price and cheaper and cheaper down the line.

24:41 (Speaker A) We love to see this. Second of all, they claim it's been fine tuned

on several things. One of them is code.

24:54 (Speaker A) And we actually saw a bump in the evaluation called Human Eval, which

is a set of questions that OpenAI released and I think the bump was

from like 55% to 78%, which I think beats 3.5 and is not there

compared to GPT four. Correct?

25:14 (Speaker C) Yeah, and four and four on past first on the first, not on GPT four

that is allowed to refine and fix it there, but on the first trial.

Yeah, by a little bit.

25:33 (Speaker A) So, news to me and thank you for joining in the past numbers is how

many times it's able to reflect upon the sensors and improve them.

25:43 (Speaker C) The past time is kind of what I meant by reflection is even stronger

GPT four. If GPT four sees the exception, it can come up with a

solution. So this is not in the Human Eval test, but if you use GPT

four this way, you get to 90 something percent, which is which I

think it's more realistic if you think about it. No programmer writes

the whole code in a one go.

26:10 (Speaker C) You write it intuitively, six bugs and so on. And also in code

interpreter, you see it. But it is remarkable to see state.

26:19 (Speaker A) Of the art on first and it's significantly better in code. And I

suggest folks who previously tried quad and haven't impressed to try

as well. An additional crazy thing that they've trained on is 100K

contacts window and they've actually trained, they claim on 200K

contact window, so twice as much as the previous round. And we follow

this one guy of your press, the guy behind Self Ask with Search and

the guy behind Alibi, the ability to extend complex windows.

26:55 (Speaker A) He just defended his PhD and he talked about complex windows and he

was impressed with the way they presented and the way they showed

their loss curve. And so this could be we saw the paper maybe this

week the folks saw the paper where the window dips in the middle.

There's like less attention in the middle of the beginning at the

end.

27:03 (Speaker A) And it looks like that's not the case for Claude as well. So I

suggest you try the huge context window and al you have your raised

hand and then we'll talk about some other model changes.

27:26 (Speaker F) Yeah, I would talk a little bit about I used Claude about a month and

a half ago to win Best Solo Hacker at the Craft Ventures hackathon

david Sachs won. Yeah, it had like 200 entries, but it's

exceptionally good at creative writing and also like comparing and

contrasting. I don't think people have really taken advantage of what

the context window is capable of doing. It's more than just loading

single files in.

27:53 (Speaker F) So what I did for the project was I loaded these large legislative

bills, these like 50 page unreadable bills, and you turned them into

relatable narratives. So one of the things that Claude can do is you

can adopt a persona. So a lot of times with summaries, summaries just

compress the text that you see, but you can tell it to say, write

1000 words from a social conservative point of view, or a bus

driver's point of view, or a social liberal point of view.

28:21 (Speaker F) And what that does is it takes all of its knowledge about the outside

world and gives you not a summary, but it gives you essentially an

essay about the practical effects of something like a bill. I've

actually been working with the idea of reading a book and having it

tell you what I would have learned from this, because that's actually

probably what you're more interested in. What it can do in terms of

comparing and contrasting large essays is exceptional.

28:51 (Speaker F) So you could have it say, write 2000 words from a social conservative

point of view, 2000 words from a social liberal point of view, and

then have it contrast the essays, which is something that would be

very difficult for a human to do. So you get to give it multiple

files and have it just give you a more balanced approach so you get

rid of some of the bias that comes in.

29:18 (Speaker A) My dream, go to my dream project that I never get to is to create

this for Twitter as like a Chrome extension that I can select a bunch

of tweets and then say, remove the bias from this and just give me

the debiased version of all of this. Yeah, completely. Like the cross

reference ability of Cloud between because of this context window is

incredible for many, many use cases.

29:41 (Speaker F) Yeah, I would say that as far it's not as good as GPT Four for

certain things. But that context window is fantastic. And I would say

a lot of people that are using embeddings and retrieval, you can

actually just put the whole thing in the context window and ask

questions to that and then you have a baseline to compare your

results from it. Most people, if they're chatting to a website or

something like that, you actually can just put the whole thing in

there as opposed to trying to chunk it up and do questions and you'll

see that your results are much better that way.

29:51 (Speaker F) And for most people, that would be good enough.

30:17 (Speaker A) So additional thing that the additional thing that Cloud was trained

on, they've talked about the output tokens, just the number. Of

output tokens of how much cloud is able to generate. And they've said

that previous models, I don't know if the same about GPT, I haven't

seen numbers on GPT Four, but they've said that previous Claude

models were focused on shorter outputs just as they were trained. And

this latest model was trained to output up to 4000 tokens in output.

30:47 (Speaker A) This is added to the fact that they also fine tuned it and trained to

output JSON files, complete JSON files as responses, which we as

engineers, we waited for this and Open Xai gave us functions via kind

of here you go, there's the function interface. And we love the

function interface. The function interface kind of locks us down to

the OpenAI ecosystem.

31:04 (Speaker A) And it's great to see another model that's like very close to state

of the art in human evil that also is now fine tuned to respond in

full intact JSONs. And those JSONs can be 4000 tokens at length. Any

thoughts on these?

31:28 (Speaker F) Yeah, I can confirm on it being able to write large amounts of

output. I mean, I was having it write like 2000, 3000 word like sort

of essays and outputs and it was fine with that.

31:40 (Speaker A) Yes. And I think it's I'm going to.

31:45 (Speaker B) Stick with GPT Four myself. But this might be pretty useful for just

dumping in an entire code base, given the 100k context window and

then getting some reviews and stuff, and then maybe moving some of

the stuff.

32:02 (Speaker A) Once I stop posting status and build that chrome extension that you

upload the zip and it flatlines it to one file and then upload it,

then we'd be able to do, like, a proper comparison, because code

interpreter can take zip files and then extract them. Oh, one

difference that I want to for folks in the audience, GPD Four with

code interpreter allows you to upload zip files, et cetera. We talked

about this. It does not load them into context window, right? So

there's like eight k context window.

32:30 (Speaker A) The files that you upload are not automatically in the context

window. The model doesn't it has to write Python code that actually

prints the files. And it usually does like the first few lines, hint,

hint.

32:30 (Speaker A) The folks in the audience who get my drift. But it doesn't usually

read all the unless you specifically ask it to and Claude does. So

everything you upload to, Claude goes directly to the immediate

working memory of the complex window.

32:38 (Speaker A) And that's a major difference to watch out for and also take care of.

Go ahead.

33:00 (Speaker C) I would like to ask everyone before I say my opinion, what do you

think about it in comparison to GPT Four about the performance? What

do you think?

33:10 (Speaker A) I would like comments from folks who actually use both and did the

comparison. And before I get to folks, please raise your hand to

answer. I want to call out SWIX's small menu bar which allows you to

actually Swyx. Can you give us like a brief two minutes on the menu

bar thing?

33:28 (Speaker G) Yeah, well, you don't have to choose. Just run it all the time on

every single chat. So it's a little electron app that runs in the

menu bar. And I've been maintaining it and I just added Cloud Two

this week.

33:42 (Speaker G) Cloud Two is not super stable yet. Sometimes it will fail to submit

the button. So you just have to retry manually to submit the button.

33:50 (Speaker G) But yeah, it's a great way to a B test models, but then also just

amplify every question with between four to five different chat

models with the answers. So I've been trying it. It's up to you if

you want.

34:07 (Speaker A) To.

34:10 (Speaker C) Find it.

34:14 (Speaker A) With the announcements, if you can. Yeah, awesome. Yeah, just

basically and maybe for instance, you don't have to stop using, you

don't have to choose. So I think the last thing that we need to

acknowledge it's, Claude, is the multilinguality.

34:28 (Speaker A) So they actually focused on showing us how much better, like, the new

ones from previous ones, and they posted blue scores, Bleu scores,

clock Two is significantly better at languages than the previous

versions. I think, to answer your question, I think it's close to GPD

Four, if not better at some things. Hebrew goes fluently, and usually

Hebrew is not that great.

34:57 (Speaker A) Russian and Ukrainian that I use also go fluently. And that part is

really good with a lot of context because you sometimes need to do a

lot of translation, or at least I need to do a lot of translation.

35:11 (Speaker C) Yeah, multilinguality works great. I was surprised. Absolutely. What

I think if you just compare the two on the same prompt, the same

question, I have a feeling that GPT Four is slightly better, but I

just don't have an example to tell you.

35:31 (Speaker C) Okay, here I don't know, it's a strange situation, but I really

wanted to ask you, like, what did you try and work better here and

there?

← All episodes of ThursdAI - The top AI news from the past week