
ThursdAI July 13 - Show recap + Notes
ThursdAI - The top AI news from the past week · Alex Volkov, swyx (Shawn), and Junaid Dawud
Audio is streamed directly from the publisher (api.substack.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
Welcome Friends, to the first episode of ThursdAI recap.
If you can’t come to the spaces, subscribing is the next best thing. Distilled, most important updates, every week, including testimony and tips and tricks from a panel of experts. Join our community 👇
Every week since the day GPT-4 released, we’ve been meeting in twitter spaces to talk about AI developments, and it slowly by surely created a community that’s thirsty to learn, connect and discuss information.
Getting overwhelmed with daily newsletters about tools, folks wanted someone else to do the legwork, prioritize and condense the most important information about what is shaping the future of AI, today!
Hosted by AI consultant Alex Volkov (available for hire), CEO of Targum.video, this information-packed edition covered groundbreaking new releases like GPT 4.5, Claude 2, and Stable Diffusion 1.0. We learned how Code Interpreter is pushing boundaries in computer vision, creative writing, and software development. Expert guests dove into the implications of Elon Musk's new XAI startup, the debate around Twitter's data, and pioneering techniques in prompt engineering. If you want to stay on top of the innovations shaping our AI-powered tomorrow, join Alex and the ThursdAI community.
Since the audio was recorded from a twitter space, it has quite a lot of overlaps, I think it’s due to the export, so sometimes it sounds like folks talk on top of each other, most of all me (Alex) this was not the case, will have to figure out a fix.
Topics we covered in July 13, ThursdAI
GPT 4.5/Code Interpreter:
00:02:37 - 05:55 - General availability of Chad GPT with code interpreter announced. 8k context window, faster than GPT-4.
05:56 - 08:36 - Code interpreter use cases, uploading files, executing code, skills and techniques.
08:36 - 10:11 - Uploading large files, executing code, downloading files.
Claude V2:
20:11 - 21:25 - Anthropic releases Claude V2, considered #2 after OpenAI.
21:25 - 23:31 - Claude V2 UI allows uploading files, refreshed UI.
23:31 - 24:30 - Claude V2 product experience beats GPT-3.5.
24:31 - 27:25 - Claude V2 fine-tuned on code, 100k context window, trained on longer outputs.
27:26 - 30:16 - Claude V2 good at comparing essays, creative writing.
30:17 - 32:57 - Claude V2 allows multiple file uploads to context window.
32:57 - 39:10 - Claude V2 better at languages than GPT-4.
39:10 - 40:30 - Claude V2 allows multiple file uploads to context window.
X.AI:
46:22 - 49:29 - Elon Musk announces X.AI to compete with OpenAI. Has access to Twitter data.
49:30 - 51:26 - Discussion on whether Twitter data is useful for training.
51:27 - 52:45 - Twitter data can be transformed into other forms.
52:45 - 58:32 - Twitter spaces could provide useful training data.
58:33 - 59:26 - Speculation on whether XAI will open source their models.
59:26 - 61:54 - Twitter data has some advantages over other social media data.
Stable Diffusion:
89:41 - 91:17 - Stable Diffusion releases SDXL 1.0 in discord, plans to open source it.
91:17 - 92:08 - Stable Diffusion releases Stable Doodle.
GPT Prompt Engineering:
61:54 - 64:18 - Intro to Other Side AI and prompt engineering.
64:18 - 71:50 - GPT Prompt Engineer project explained.
71:50 - 72:54 - GPT Prompt Engineer results, potential to improve prompts.
72:54 - 73:41 - Prompts may work better on same model they were generated for.
73:41 - 77:07 - GPT Prompt Engineer is open source, looking for contributions.
Related tweets shared:
https://twitter.com/altryne/status/1677951313156636672
https://twitter.com/altryne/status/1677951330462371840
@Surya - Running GPT2 inside code interpreter
tomviner - scraped all the internal knowledge about the env
Peter got all pypi packages and their description
swyx added Claude to to smol menubar (which we also discussed)
SkalskiP awesome code interpreter experiments repo
See the rest of the tweets shared and listen to the original space here:
https://spacesdashboard.com/space/1YpKkggrRgPKj/thursdai-space-code-interpreter-claude-v2-xai-sdxl-more
Full Transcript:
00:02 (Speaker A) You. First of all, welcome to Thursday. We stay up to date so you
don't have to. There's a panel of experts on top here that discuss
everything.
00:11 (Speaker A) If we've tried something, we'll talk about this. If we haven't, and
somebody in the audience tried that specific new AI stuff, feel free
to raise your hand, give us your comment. This is not the space for
long debates.
00:25 (Speaker A) We actually had a great place for that yesterday. NISten and Roy from
Pine, some other folks, we'll probably do a different one. This
should be information dense for folks and this will be recorded and
likely we posted at some point.
00:38 (Speaker A) So no debate, just let's drop an opinion and discuss the new stuff
and kind of continue. And the goal is to stay up to date so you don't
have to in the audience. And I think with that, I will say hi to Alan
Janae and we will get started.
00:58 (Speaker B) Hi everyone, I'm NISten Tahira. I worked on, well, released one of
the first Docker chat bots on the market for Dr. Gupta and scaled it,
and now we're working on getting the therapist bought out once. We
can also pass more testing and get Voice to work at a profitable
manner because we don't really have VC. So at the scale of few
hundred thousand users, the API bills matter quite a bit.
01:31 (Speaker B) So, yeah, these spaces have been pretty helpful because I have some
trouble with running a Voice transformer, trying to run it on the
browser on web GPU, and then the person that wrote Transformers JS
comes in here and just says, oh yeah, that back end is messed up.
Just try blas and synth and stuff. So these have been very
interesting and technical spaces.
01:54 (Speaker A) Yeah, we need to get Zenova in here. Zenova is the guy who NISten was
referring to. Al Janae, do you want to give a few words of intro and
say hi and then we'll start? Just briefly, please, because I think we
need to get going.
02:09 (Speaker C) Sure. Hi, I'm Janae.
02:11 (Speaker D) I'm the resident noob, I started messing around with AI at the
beginning of.
02:16 (Speaker E) The year, and I also host the.
02:18 (Speaker D) Denver AI Tinkerers coming up next week.
02:20 (Speaker A) And if you're in Colorado area, greater Denver, please join us. It's
going to be a blast.
02:27 (Speaker F) Hi, I'm Al Chang. I'm kind of an old school technologist. Just
getting started with the AI again and just here to help.
02:36 (Speaker A) Yeah. All right, folks, so I think we've had a whole space on this.
Simon Wilson and me and many, many other folks chimed in. The second
this was released.
02:50 (Speaker A) Was that six? Was that Sunday? It's hard to keep track of actual
days. Saturday, Saturday, last week, exactly during those spaces, by
the way, as we were talking, Chad GPT, Logan and everybody else from
OpenAI announced general availability of Chad GPT with code
interpreter. So GPT four with code interpreter.
03:12 (Speaker A) And I think we just heard from Matt that even some folks who got
access to the slept on it a little bit because it's maybe potentially
because of its very horrible name that's really hard to type
interpreter and get lost in the R's. But it's an extremely powerful
new superpower that we've got. And we've had the whole space talking
about use cases that people already had.
03:37 (Speaker A) It was like three days into it and since then I bet that many more
people tried it. I think Swyx 20,000 listens to that space, plus the
pod. At least people definitely want to hear new use cases, right?
03:53 (Speaker G) Yeah, not much else to add about it. I think it's the feature for
Switch.
03:59 (Speaker A) Posted a whole deep dive essay and coined it GPT 4.5 between us
friends. And one of the interesting things about it is that we think
at least that's where we are currently after playing around with
this, is that it's a fine tuned model. So they kept training this on
actually running code and executing code.
04:21 (Speaker A) That's what we believe. We don't know, nobody confirmed this and then
that it's fine tuned from an earlier checkpoint of GBT Four. And so
we actually had some folks on spaces talking about that it's less
restricted and better like previous times.
04:36 (Speaker A) So it's an interest, I think NISten right. We have some folks who
tell us they're using code interpreter without the code part. They
just stopped the GPT Four just because it's that model.
04:48 (Speaker A) And I think also they took down the 25 messages per hour restriction
on code interpreter. I've had like four hour sessions and it stopped
like I didn't saw complaints.
05:03 (Speaker G) So it's just better.
05:06 (Speaker A) It's also fast. I think it's fast because not many people maybe use
this by default and this could be the reason for the speed, but it's
definitely faster for sure. I think also context window, was it Yam?
Somebody summarized the context window and they told us the context
window for code interpreter is eight k versus the regular GPD for
actually that could be also a kick.
05:29 (Speaker G) You mean Yam copied and pasted.
05:34 (Speaker A) I would encourage you and Yam need to kiss in the cup because Yama is
doing a lot of legwork to take down the stuff that he posted and Yama
is working on that and it's very visible and you guys need to do
there you go, yam, you need to clear the air. However, Pharrell and
Gabriel bring you up as well. And we're going to keep talking about
code interpreter because that's what we're here to do. NISten and a
few other folks and we started cooking with code interpreter.
05:59 (Speaker A) And by cooking I mean we started stretching the complete boundaries
of what's possible there. And I think Simon Willison kick started
this with the latent space Pod. So for folks who are not following
latent space pod, feel free to follow SWIX, his main account, not
this hidden one.
05:59 (Speaker A) And SWIX reposted the spaces we had simon Wilson was able to run node
JS and Dino within code interpreter, even though OpenAg didn't allow
for that by uploading like a binary and asking code interpreter to
generate. Simon then promptly said they fine tuned the model away
from that and we found ways anyway to ask it to do some stuff. I have
a thread on how I was able to run a vector DB chroma inside code
interpreter.
06:10 (Speaker A) I ran whisper CPP. We saw some folks running GPT-2 inside code
interpreter, right? So imagine an Ll GPD Four running another and
talking to it. It's like a little brother inside.
06:10 (Speaker A) I personally love that inception. I don't know if the person who ran
GPD Two is in the audience as Dan I think was the nickname NISten. I
don't know.
07:22 (Speaker A) Surya.
07:23 (Speaker B) Surya. He also wrote the search to PDF plugin for GP Four plugins and
he wrote that in like two days and it's more used than any other
enterprise thing, which is pretty hilarious.
07:36 (Speaker A) We need to get surya.
07:38 (Speaker B) Yeah, he just did that as I'm just going to do a search plugins for
PDF and it's like the most used.
07:45 (Speaker A) So dope pretty amazing. Again, in that space we've talked about
having like a living manual, so to speak, for code interpreter use
cases because it's coding. So it covers pretty much everything that
we can think of as coders, maybe just in Python, maybe restricted to
an environment. And I've been trying to do that with the code
interpreter can hashtag and I encourage all of you, let me pin this
to the top of the space, to the jumbotron if you have an interesting
code interpreter thing and I'll bring up Skalsky P to the stage as
well.
08:03 (Speaker A) And Lantos, so many good friends. If you have a very interesting code
interpreter technique or skill or new thing that people can do
without coding skills, please tag with this hashtag so folks can find
this. Otherwise I will cover the main three things the code
interpreter gave us besides the new model.
08:42 (Speaker A) One of them is uploading files. And since we've talked, we've noticed
that you can upload up to 250 megabyte files and those can be zips of
other files. So we've uploaded like full models weights.
08:55 (Speaker A) We've uploaded bin files. It's incredible that you can now drag and
drop whole directory and have JPT just know about this and read about
this. We've uploaded weights in embeddings.
09:08 (Speaker A) You can then obviously execute code in a secure environment, which is
again incredible, and you can download files, you can ask it to
actually generate a download for you, which is also super, super
cool. Maybe one last thing I'll say before I'll give it to the
audience for a few more cool use cases. And folks in the stage,
please feel free to raise your hand.
09:21 (Speaker A) I'll get to you in the order that you raise your hand if you have a
use case. Some folks built like a built in memory built in brain
within code interpreter just to save to a file. That's what I try to
do with my vector DB and then they download that memory at the end of
every session and then upload this to the next one and have some like
a prompt that reminds the jgpd like to start from that point.
09:50 (Speaker A) So in addition to the context window, they're also having a separate
offloaded file persisted memory. So code interpreter incredible.
Again.
10:00 (Speaker A) Potentially GPT 4.5. And if you haven't played with this, feel free
to if you don't know what to play with, follow the code interpreter
can hashtag and let's get to Skowski.
10:11 (Speaker A) What's up, man?
10:14 (Speaker H) Hi, hello. Do you hear me?
10:15 (Speaker A) Yeah, we can hear you fine.
10:19 (Speaker H) Yeah, I've been playing a lot with code interpreter over the past
five days, mostly with computer vision use cases because that's what
I do. I haven't introduced myself. I'm pretty much doing computer
vision full time for the past five years and was focusing on like
when I saw that you can input image and video, that was immediately
what I was thinking, we need to make it to computer vision. So I went
through some low effort tasks.
10:46 (Speaker H) So I managed to run old school computer vision algorithms, face
detection, tracking of objects, stuff like that. But I also managed
to exploit it a little bit. So you can add yolo object detection
models to the list of models that were run in code interpreter.
11:15 (Speaker H) There are some problems with memory management, so I'm not yet fully
happy with the result. But yeah, I managed to run it on images and on
videos and the things that are super cool and are kind of like
underrated right now, false positive. So when the model detects
something that shouldn't be detected, you can really use text to ask
code interpreter to filter out false detections.
11:48 (Speaker H) You can just give it your feeling like why that stuff is happening or
when or where. And it's very good at cleaning the detections, which
was kind of like mind blowing for me. And one thing that I noticed
that it sucks at is I managed to create an application that counts
objects moving on the video when they cross the line.
11:55 (Speaker H) And I didn't use any off the shelf libraries, I just had detector and
say, okay, now draw a line and count objects when they cross the
line. It's terrible at that, writing math logic to figure out that
something crossed something, we had like ten prompts or twelve
prompts exchange and I basically bailed out on that, forget it. So
there are some things that blow my mind, but there are something that
probably not.
12:49 (Speaker A) So folks, feel free to follow Skowski. And also I just pin to the top
of the Tweet his brand new awesome code interpreter use cases, git
repo, and there's a list, there's a bunch of use cases there. This
could also serve as a de facto manual. So feel free to go there at
PRS and follow that for updates.
12:52 (Speaker A) And I want to get to Lentos because he seems to be unmuting. What's
up, Lentos?
13:12 (Speaker H) I was just going to say I can't follow him because he's blocked me.
13:15 (Speaker C) Sad face.
13:16 (Speaker H) Oh, no, I noticed that, but I'm not sure why. I will undo that.
13:20 (Speaker A) All right, I'm the peacemaker in the status. Please kiss and make up.
You two as well. Everybody should get along.
13:26 (Speaker A) Yay. I want to get to some other folks who came up on stage recently.
And Gabriel, welcome to talk about code interpreter and your use
cases.
13:35 (Speaker A) Jeanette, if you play with this, I would like to hear two more
opinions before we move on to the next incredible thing. Yeah. Oh,
you guys are talking about let's get together and then June sorry, I
should have been explicit about the order.
13:54 (Speaker E) No worries. So I just posted a comment on this space about the
message cap on a conversation. So even though in the UI, it still
says 25 messages per 3 hours, if you look at the network request, you
can see that. And I posted this, it's actually 100 messages per 3
hours now.
14:12 (Speaker E) And I don't know if they're scaling that up and down as demand
increases and decreases, or they're just trying to trick people into
conserving their messages, but it's definitely been on 100 for a
little while now. Can you confirm same thing you can see in the
network?
14:32 (Speaker A) Can you confirm the same for the regular mode, or do you think the
regular mode is still restricted? Well.
14:41 (Speaker E) Based on just the fact that there's only one message cap, they don't
have message cap per model. So I think it's just consistent across
all the GP four models. And that's also my experience in the last
it's been a little while now. It's probably at least a couple of
weeks that it's been higher.
14:51 (Speaker E) And same thing we discussed, I think, on Saturday about the context
window. And you can also see it in the API that the context window is
eight K for plugins and code interpreter, and it's 4K for the base
GPT four model.
15:16 (Speaker A) That's awesome. Like suicide. Better in every single way.
15:22 (Speaker D) Yeah.
15:23 (Speaker A) Awesome. Thanks.
15:24 (Speaker E) Yeah. In terms of use cases I can share, I've been digging around a
lot in the code interpreter, and I was really trying to hone in on
why are the packages that are installed there, the Python packages in
the environment? Why are they there? Some of them seem really random,
and some of them make a lot of sense. And they released it, saying
it's for, basically data analysis. And a lot of them make sense for
that, but some of them are just really wild, like the ML packages.
15:54 (Speaker A) And the Gabriel folks in the audience. If you look up at the jumbo
tone where we pin Tweets two Tweets before there's a Tweet by Peter
Zero Zero G, who actually printed all the packages and asked GPT Four
to kind of summarize what they do. So if you have no idea about the
potential capabilities of what it can do, feel free to pin that tweet
for yourself. And then it has a bunch of descriptions of what's
possible.
16:11 (Speaker A) So go ahead. Gabriel. Yeah, cool.
16:28 (Speaker E) Yeah, I've done the same kind of thing with just a short yeah, I got
it to do a four word description for each one. So if you're looking
for a really short description of each package, I'll post that tweet.
And if you're looking for a long one, I think Peters is great. And
what you can see there is that there are packages for web
development, right? There's Fast API, there's Flask, there's a bunch
of other packages for Web development.
16:40 (Speaker E) And besides the fact that there's no network access, which obviously
other people using it might be turning it on, but it was just
interesting to me. My perspective is that OpenAI has been using this
internally throughout all their teams for development and testing it
internally, but probably also using it pretty consistently. They
probably have access to the Internet.
17:14 (Speaker A) Yeah, I'm sure they have access to.
17:15 (Speaker E) The Internet and they can install new packages. But I think they also
have the ability, instead of uploading files and downloading files,
they have the ability to just mount persist memory, I don't think, to
persist. I think they just mount their local working directory on
their computer right wherever they're working. So they have their
active directory where they have their project, and they just mount
that and give the code interpreter access to the whole directory with
their whole repo of their project.
17:48 (Speaker C) Yeah.
17:48 (Speaker E) And then Chat Gvt is just writing code to the working directory and
reading from there and it can explore their whole project. We can do
that now by uploading, you can zip your whole project and upload the
whole thing zipped and have it unzipped. And then it can kind of
explore your whole project. But then once it makes some changes, you
want to commit them, you have to ask it to zip the whole thing back,
download it and upload it.
17:48 (Speaker E) And then I think what they're able to do is more of like a kind of
peer programming thing where the developer makes some changes and
then Chat GPT makes some changes and they're kind of working
together. This is taking it one step further. I don't know if they
have this or not, but it would be super.
18:29 (Speaker A) Cool in the realm of updates unless there is no speculation. But I
would love to explore this more with you in the next stage because
this applies to open source and how people already saw somebody tag
us after the last space and said, hey, I'll build this open source. I
would love to pin this to the top of the space. However, I want to
move on to new space and then move on to other updates.
18:51 (Speaker A) Sorry to interrupt, but thanks. I think that the collaborative,
persistent code superpower that probably maybe at some point will
come to us as well. Plus the internet access is like another ten x I
want to get to Skowskin and lent us and I think we'll move on to
Claude.
19:08 (Speaker A) Thanks Gabriel.
19:11 (Speaker H) Yeah, I have a question. I'm not really sure guys, if you notice that
I was obviously experimenting with PyTorch because I needed it for
computer vision. I noticed that the PyTorch version that is installed
in the environment actually pre compiled to work with CUDA. So it's a
GPU version of PyTorch.
19:31 (Speaker H) Even though that in the environment you don't have access to GPU, you
only have CPU. So I'm curious guys, what you think about that. Why is
that? Any ideas?
19:42 (Speaker A) Ideas that just come from what Gabriel just said? Likely we're
getting the same Kubernetes container. However, the open AI folks
have like unlimited stuff. They probably also have CUDA that would
make sense right there is probably connected to a GPU as well, but
that's just an idea. Lantos, I want to get to you and then we'll move
on to Claude.
20:02 (Speaker A) Folks and folks in the audience, feel free to hit the little right
button on the bottom left looks like a little message and leave
comments through commenting as well. Moving on to Claude V Two. Folks
in the audience and folks on stage, feel free to hit up the emojis
plus one.
20:19 (Speaker A) Minus one if you have tried Claude V two if you like it and you
haven't liked it. I'm going to cover this anyway because I think
somebody called me, I think Roy from Python called me a Cloud V Two
fanboy yesterday and I first got offended and I told him that I'm
just a fanboy for 24 hours. Before that I was a code interpreter
fanboy and then I figured with myself whether or not I am a fanboy of
Claude V Two.
20:43 (Speaker A) And yeah, I am and Sweet told me to relax and in fact I invited him
here to be the red blanket on the other side of the list. Anthropic
the company that we can definitely consider number two after opener.
I think that's fair in terms of quality.
21:02 (Speaker A) Have long released Claude version and they made some ways when they
released Claude AKS clong with 100K complex window, they have
released Cloud V Two and let me paste some Claude sorry, pin some
Claude thingies in the jumbotron, sorry. However, Cloud V Two
released with multiple stuff and I want to focus on two stuff and I
think we'll cover the UI first and then we're going to talk about the
model itself, UI wise and product wise. My hot take and I'll pin this
to the top.
21:38 (Speaker A) Unfortunately not debate this, but I love you, all of you. Is that as
products, Cloud V Two right now beats JPD as a product. My mom can go
into two websites and she'll prefer one versus the other one.
21:51 (Speaker A) Or my friends that don't know Xai as plugged in as we are, theirs is
free. And I think Cloud V Two beats GPD 3.5, which is also free, and
100K context window with the model being traded, 200 unleashes, a
bunch of use cases that were not possible before.
22:12 (Speaker A) It just frees you up. If you heard Skowski just say the limitations
of code interpreter. A bunch of these limitations stem from the eight
K context window.
22:13 (Speaker A) If you print a bunch within the code that you're doing, code
interpreter sometimes forgets what you guys talked about 20 minutes
ago. And the 100K context window also means a long, long conversation
history with the model. And I think it's really great.
22:37 (Speaker A) Not to mention that you can drag and drop full books in there. Those
books need to be in like one or two files and they still don't accept
zip files. And I'm planning to release an extension soon that does
this for us and unifies and single files.
22:51 (Speaker A) So hopefully by next week we'll have some updates. However, once you
upload that much or you can upload like a transcript or a podcast,
you can do a bunch of stuff because Cloud V Two is also better
trained on code and we saw a significant jump in wait, I'm switching
to the model, so let me get back to the UI. The UI allows you to
upload files.
23:09 (Speaker A) The UI has a command k interface, which I personally love. I hit
Command K in every website and see if they support it. You can just
start a new chat real quick.
23:21 (Speaker A) It doesn't have Share, but it's definitely refreshed and free UI.
It's called Cloud AI and that's the URL, and if you haven't tried it,
definitely try it. Comments about just the product side and the UI
side before we move to the model? Anybody play with this? Anybody
like it? Anybody loves the upload files feature? I would love to hear
hands and comments.
23:42 (Speaker A) Go ahead, Matt.
23:44 (Speaker D) A bit of a weird thing, but what I've noticed is it's actually quite
frustrating if you want to paste text in it actually, if it's over a
certain length, will paste in as a file. Little small thing.
Hopefully they'll change it, but it is really annoying because then
you can't edit it. Chat GP does do that much better, but I generally
agree with you that overall the product experience on Claude is.
24:03 (Speaker A) Significantly the new one. The fresh coat of paint they released for
us. I will say that Cloud so far was kind of a hidden gem, that only
folks who got access to the API actually got access to their UI, and
that UI was very restricted and folks who have access to Cloud API
know what I'm talking about. I think that UI is still around.
24:22 (Speaker A) It still shows your history. It's like very restrictive. It's not as
cool as this it's not as leak as this.
24:27 (Speaker A) So we like cloud AI, definitely a plus. Check it out. Now, let's talk
about the model behind this UI, because that model also changed and
several incredible things that changed with it.
24:38 (Speaker A) First of all, they released a new model, same price as the previous
one. We love to see this. Please everybody, including opinion,
continue giving the same price and cheaper and cheaper down the line.
24:41 (Speaker A) We love to see this. Second of all, they claim it's been fine tuned
on several things. One of them is code.
24:54 (Speaker A) And we actually saw a bump in the evaluation called Human Eval, which
is a set of questions that OpenAI released and I think the bump was
from like 55% to 78%, which I think beats 3.5 and is not there
compared to GPT four. Correct?
25:14 (Speaker C) Yeah, and four and four on past first on the first, not on GPT four
that is allowed to refine and fix it there, but on the first trial.
Yeah, by a little bit.
25:33 (Speaker A) So, news to me and thank you for joining in the past numbers is how
many times it's able to reflect upon the sensors and improve them.
25:43 (Speaker C) The past time is kind of what I meant by reflection is even stronger
GPT four. If GPT four sees the exception, it can come up with a
solution. So this is not in the Human Eval test, but if you use GPT
four this way, you get to 90 something percent, which is which I
think it's more realistic if you think about it. No programmer writes
the whole code in a one go.
26:10 (Speaker C) You write it intuitively, six bugs and so on. And also in code
interpreter, you see it. But it is remarkable to see state.
26:19 (Speaker A) Of the art on first and it's significantly better in code. And I
suggest folks who previously tried quad and haven't impressed to try
as well. An additional crazy thing that they've trained on is 100K
contacts window and they've actually trained, they claim on 200K
contact window, so twice as much as the previous round. And we follow
this one guy of your press, the guy behind Self Ask with Search and
the guy behind Alibi, the ability to extend complex windows.
26:55 (Speaker A) He just defended his PhD and he talked about complex windows and he
was impressed with the way they presented and the way they showed
their loss curve. And so this could be we saw the paper maybe this
week the folks saw the paper where the window dips in the middle.
There's like less attention in the middle of the beginning at the
end.
27:03 (Speaker A) And it looks like that's not the case for Claude as well. So I
suggest you try the huge context window and al you have your raised
hand and then we'll talk about some other model changes.
27:26 (Speaker F) Yeah, I would talk a little bit about I used Claude about a month and
a half ago to win Best Solo Hacker at the Craft Ventures hackathon
david Sachs won. Yeah, it had like 200 entries, but it's
exceptionally good at creative writing and also like comparing and
contrasting. I don't think people have really taken advantage of what
the context window is capable of doing. It's more than just loading
single files in.
27:53 (Speaker F) So what I did for the project was I loaded these large legislative
bills, these like 50 page unreadable bills, and you turned them into
relatable narratives. So one of the things that Claude can do is you
can adopt a persona. So a lot of times with summaries, summaries just
compress the text that you see, but you can tell it to say, write
1000 words from a social conservative point of view, or a bus
driver's point of view, or a social liberal point of view.
28:21 (Speaker F) And what that does is it takes all of its knowledge about the outside
world and gives you not a summary, but it gives you essentially an
essay about the practical effects of something like a bill. I've
actually been working with the idea of reading a book and having it
tell you what I would have learned from this, because that's actually
probably what you're more interested in. What it can do in terms of
comparing and contrasting large essays is exceptional.
28:51 (Speaker F) So you could have it say, write 2000 words from a social conservative
point of view, 2000 words from a social liberal point of view, and
then have it contrast the essays, which is something that would be
very difficult for a human to do. So you get to give it multiple
files and have it just give you a more balanced approach so you get
rid of some of the bias that comes in.
29:18 (Speaker A) My dream, go to my dream project that I never get to is to create
this for Twitter as like a Chrome extension that I can select a bunch
of tweets and then say, remove the bias from this and just give me
the debiased version of all of this. Yeah, completely. Like the cross
reference ability of Cloud between because of this context window is
incredible for many, many use cases.
29:41 (Speaker F) Yeah, I would say that as far it's not as good as GPT Four for
certain things. But that context window is fantastic. And I would say
a lot of people that are using embeddings and retrieval, you can
actually just put the whole thing in the context window and ask
questions to that and then you have a baseline to compare your
results from it. Most people, if they're chatting to a website or
something like that, you actually can just put the whole thing in
there as opposed to trying to chunk it up and do questions and you'll
see that your results are much better that way.
29:51 (Speaker F) And for most people, that would be good enough.
30:17 (Speaker A) So additional thing that the additional thing that Cloud was trained
on, they've talked about the output tokens, just the number. Of
output tokens of how much cloud is able to generate. And they've said
that previous models, I don't know if the same about GPT, I haven't
seen numbers on GPT Four, but they've said that previous Claude
models were focused on shorter outputs just as they were trained. And
this latest model was trained to output up to 4000 tokens in output.
30:47 (Speaker A) This is added to the fact that they also fine tuned it and trained to
output JSON files, complete JSON files as responses, which we as
engineers, we waited for this and Open Xai gave us functions via kind
of here you go, there's the function interface. And we love the
function interface. The function interface kind of locks us down to
the OpenAI ecosystem.
31:04 (Speaker A) And it's great to see another model that's like very close to state
of the art in human evil that also is now fine tuned to respond in
full intact JSONs. And those JSONs can be 4000 tokens at length. Any
thoughts on these?
31:28 (Speaker F) Yeah, I can confirm on it being able to write large amounts of
output. I mean, I was having it write like 2000, 3000 word like sort
of essays and outputs and it was fine with that.
31:40 (Speaker A) Yes. And I think it's I'm going to.
31:45 (Speaker B) Stick with GPT Four myself. But this might be pretty useful for just
dumping in an entire code base, given the 100k context window and
then getting some reviews and stuff, and then maybe moving some of
the stuff.
32:02 (Speaker A) Once I stop posting status and build that chrome extension that you
upload the zip and it flatlines it to one file and then upload it,
then we'd be able to do, like, a proper comparison, because code
interpreter can take zip files and then extract them. Oh, one
difference that I want to for folks in the audience, GPD Four with
code interpreter allows you to upload zip files, et cetera. We talked
about this. It does not load them into context window, right? So
there's like eight k context window.
32:30 (Speaker A) The files that you upload are not automatically in the context
window. The model doesn't it has to write Python code that actually
prints the files. And it usually does like the first few lines, hint,
hint.
32:30 (Speaker A) The folks in the audience who get my drift. But it doesn't usually
read all the unless you specifically ask it to and Claude does. So
everything you upload to, Claude goes directly to the immediate
working memory of the complex window.
32:38 (Speaker A) And that's a major difference to watch out for and also take care of.
Go ahead.
33:00 (Speaker C) I would like to ask everyone before I say my opinion, what do you
think about it in comparison to GPT Four about the performance? What
do you think?
33:10 (Speaker A) I would like comments from folks who actually use both and did the
comparison. And before I get to folks, please raise your hand to
answer. I want to call out SWIX's small menu bar which allows you to
actually Swyx. Can you give us like a brief two minutes on the menu
bar thing?
33:28 (Speaker G) Yeah, well, you don't have to choose. Just run it all the time on
every single chat. So it's a little electron app that runs in the
menu bar. And I've been maintaining it and I just added Cloud Two
this week.
33:42 (Speaker G) Cloud Two is not super stable yet. Sometimes it will fail to submit
the button. So you just have to retry manually to submit the button.
33:50 (Speaker G) But yeah, it's a great way to a B test models, but then also just
amplify every question with between four to five different chat
models with the answers. So I've been trying it. It's up to you if
you want.
34:07 (Speaker A) To.
34:10 (Speaker C) Find it.
34:14 (Speaker A) With the announcements, if you can. Yeah, awesome. Yeah, just
basically and maybe for instance, you don't have to stop using, you
don't have to choose. So I think the last thing that we need to
acknowledge it's, Claude, is the multilinguality.
34:28 (Speaker A) So they actually focused on showing us how much better, like, the new
ones from previous ones, and they posted blue scores, Bleu scores,
clock Two is significantly better at languages than the previous
versions. I think, to answer your question, I think it's close to GPD
Four, if not better at some things. Hebrew goes fluently, and usually
Hebrew is not that great.
34:57 (Speaker A) Russian and Ukrainian that I use also go fluently. And that part is
really good with a lot of context because you sometimes need to do a
lot of translation, or at least I need to do a lot of translation.
35:11 (Speaker C) Yeah, multilinguality works great. I was surprised. Absolutely. What
I think if you just compare the two on the same prompt, the same
question, I have a feeling that GPT Four is slightly better, but I
just don't have an example to tell you.
35:31 (Speaker C) Okay, here I don't know, it's a strange situation, but I really
wanted to ask you, like, what did you try and work better here and
there?