PLAY PODCASTS

When the Camera Won: How Five Flagship Podcasts Made Peace With Video

More than half of US podcast consumption now happens on video-capable platforms. We compare how five flagship shows have rebuilt their craft around the camera — and what gets quietly lost when the microphone grows a lens.

The shift was so quiet that nobody quite remembers the moment podcasts stopped being audio. By the start of 2026, more people watch The Joe Rogan Experience on YouTube than listen to it on Spotify. Diary of a CEO publishes its own studio b-roll. The Daily — yes, The Daily — now cuts video clips for TikTok of Michael Barbaro nodding seriously at a script. Audio-only purists call it a betrayal. The shows themselves call it Tuesday.

This is the strangest thing to happen to the form since the Serial-era boom of 2014, and most of the writing about it has focused on platform politics: Spotify's video push, YouTube's mid-roll ads, Apple's belated video support. Less attention has gone to what video does to the craft — to the cold opens, the silences, the small editorial choices that make a podcast distinctive. Hosts who'd spent a decade learning how to fill a microphone are suddenly being asked to fill a frame as well, and the two skills are not the same.

Below: how five shows we listen to weekly have made their peace with the format, with concrete numbers on who's watching where, and a closer look at what each one's video strategy says about its editorial identity.

The numbers, briefly

Edison's Q1 2026 Infinite Dial reports that 45% of US podcast consumption is now happening on video-capable platforms (YouTube, Spotify, TikTok), up from 28% two years ago. The shift is sharper among younger listeners: of under-25s, 67% say they 'watch' podcasts more often than they 'listen' to them.

The supply side is more interesting still:

  1. YouTube is now the largest podcast platform in the US by monthly users, with roughly 98 million monthly podcast viewers by its own 2026 figures.
  2. Spotify carries video on more than 300,000 podcasts and pays a premium royalty rate to shows that publish video natively.
  3. Apple Podcasts added video support in 2024 — quietly, with almost no adoption among flagship shows.
  4. Across the top fifty US shows, video accounts for roughly 40–55% of episode listens, depending on subject matter.
  5. A 2025 Triton Digital study found that the median audio-only listener completes 78% of an episode; the median video viewer completes only 41%.

That last figure is the one that should worry anyone who cares about podcasting as a long-form medium. Video viewers drop out earlier — and yet, increasingly, video is what the show is being made for.

Five shows, five strategies

ShowVideo share of audienceProduction setupEpisode lengthClip strategyEditorial posture
The Joe Rogan Experience~70%Multi-cam, 4K, fixed studio2.5–3.5 hrsFull + viral clipsVideo-canonical
Diary of a CEO~60%Cinematic multi-cam, set design, A/B cuts1.5–2.5 hrsFull + heavy shortsVideo-led
Lex Fridman Podcast~55%Two static cameras, unobtrusive2.5–4 hrsFull + occasional clipsAudio-first, video as record
Call Her Daddy~50%Bright key-light, magazine staging60–90 minShort-form-driven, TikTok-firstVideo-as-marketing
Acquired~25%Single static camera, occasional3–4.5 hrsAudio-only by defaultAudio-purist

The spread tells you everything. JRE and Acquired sit on the same Spotify shelf, and yet the share of their audience consuming video differs by forty-five percentage points. There is no 'podcast format' any more — only a portfolio of formats, picked from depending on what an audience brought to the room.

The Joe Rogan Experience — video as the canonical version

It's worth remembering that JRE was filmed long before YouTube was the point. Rogan started recording video in 2009 out of comedian-club instinct: he wanted to see his guest. By the Spotify deal in 2020, the audience was already a roughly even audio-video split. Today it's about seventy-thirty in YouTube's favour, and — more importantly — the video version is the canonical version. It's the cut Rogan's team edits for, the cut guests prepare for. Audio listeners are getting a derivative product, stripped of facial cues and the occasional whiteboard.

The craft consequence is visible in the show's pacing. JRE has more silences than it used to — quiet thinking moments that on audio land as dead air, but on video read as deliberation. Fans on YouTube praise it as authenticity. On the audio feed, the same silences are why some long-time listeners say the show has got harder to follow at 1.5x.

Diary of a CEO — the most aggressive video pivot

Steven Bartlett's show is the cleanest case study going. Look at any episode from 2020: single camera, unflattering lighting, mid-shot of two men talking. By 2024 the production had four cameras, dimmable studio lighting, set-dressed bookshelves, and a card-game segment built specifically for video shareability. The '53 dating questions' gimmick that drives many episodes' YouTube clicks has no real audio analogue; on the audio feed you hear cards shuffle and a man read.

This is where video most visibly changes craft. Diary of a CEO no longer looks like a podcast trying to be television. It is television, with a feed attached.

Lex Fridman — the ascetic compromise

Fridman represents the contested middle. The cameras are there, but they are deliberately ordinary: two locked-off shots, no zooms, no cuts to b-roll, no graphics. Fridman has said in interviews that he treats audio as the primary product and video as a permanent record. The result is a show that performs well on YouTube while remaining almost unchanged in audio form from his earliest 2018 episodes — the same long pauses, the same patient questioning, the same five-second beat after a guest's answer.

If video has a 'least-bad' model for legacy audio podcasters, Fridman is it. The cameras are present but invisible; the editorial instincts are still audio's.

Call Her Daddy — video as marketing front-end

Alex Cooper's show sits somewhere interesting. The full episodes are mid-length by current standards (60–90 minutes). The video matters less for the canonical episode than for the clips that fan out to TikTok, Instagram Reels, and YouTube Shorts. Cooper's team has reportedly told guests the short-form clip is the real product; the full episode is the long tail that follows.

That changes the structure of the conversation. Episodes are visibly built around two or three set-piece moments — a confession, a reaction, a memorable exchange — that will cut to thirty seconds without context. Audio-only listeners will sometimes feel the show pause on what feels like a minor moment, lean into it, and then move on. They are listening to the show structurally arrange itself around a clip they cannot see.

Acquired — the holdout

Ben Gilbert and David Rosenthal's three-to-five-hour business histories are the network's audio purist. Acquired publishes video, but it's almost dutiful — a single camera, no graphics, the occasional pause to let a slide land for the minority of viewers actually watching. Their stated reason is straightforward: listener research consistently shows the show is consumed on long drives, walks, and at the gym. Video would be a distraction from a product the audience explicitly wants in audio form.

It's a useful counter to a dominant industry assumption — that all podcasts will inevitably move to video. Some won't. Some can't, because their listening occasion is incompatible with a screen.

What gets lost when the mic gets a lens

The honest answer is: silence and abstraction.

Audio podcasting's most distinctive craft tools are pauses, ambient sound, and the listener's imagination filling in the visual. A documentary podcast can spend ninety seconds on the sound of rain on a roof and trust the listener to build the room around it. The moment a camera is involved, that camera has to show something during the rain. Production teams either cut to a face — which is now editing for video, not audio — or hold a static shot that any video viewer experiences as awkward.

The interview pause, which we covered in our earlier piece on the master interviewers who wield silence, is similarly altered. On audio, a five-second silence is tension. On video, a five-second silence is a slightly uncomfortable shot of two people not saying anything. The same craft choice reads completely differently depending on whether you're seeing or only hearing.

The flip side is that video is genuinely additive for some kinds of show. Demonstrations, reactions, comedians playing off each other's faces — these were always audio-secondary. For shows in those genres, video isn't a compromise; it's the proper form of the work, and the audio version is the courtesy.

A note on the listeners who never watch

It's easy to read the numbers above and assume audio is in retreat. It isn't, particularly. The same Edison report shows the absolute number of audio-only listeners has continued to grow — just less quickly than the video segment. The platform mix is changing more than the audience is shrinking.

What is changing is which shows the audio-only listener can comfortably follow. Shows whose canonical version is video — JRE, Diary of a CEO, much of the new political-podcast wave — increasingly include moments that are hard to parse without the picture. Shows that have stayed audio-first remain audio-first. The risk is that the middle thins out: that podcasting splits cleanly into a video-canonical tier and an audio-purist tier, with very few shows comfortably serving both.

Where this leaves the craft

If you produce a podcast, the question isn't really whether to add video. It's which version of the show is canonical, and what that means for everything from pause length to the points at which you let a guest interrupt you. A show that is video-canonical has different editorial instincts to one that isn't, even when the audio feed sounds superficially similar.

If you listen, the practical advice is shorter: when a show starts to feel harder to follow than it used to, check whether you're hearing the audio version of something now built for video. It's not the show that's slipped — it's that you've slipped into the smaller half of its audience.

The microphone hasn't gone away. It's just got a lens beside it now, and we're all still working out what that does to the work.