The Quiet Stack: How British Podcast Teams Are Actually Using AI in the Edit Suite

Most British podcast producers won't tell you what's on their second monitor at 1am. We pull apart the AI tools quietly sitting between the raw recording and the polished file — and ask which decisions hosts still insist on making by hand.

By Sarah Voss · Jun 12, 2026

There is a particular kind of email a podcast producer sends at two in the morning. It says the edit is done, the file is rendered, the show notes are in the CMS, and everything will go live at six. What it doesn't say is which parts of that night were done by a person and which were done by something running quietly in another browser tab.

A year ago the question of AI in podcast production was mostly abstract. In 2026 it is operational. Every flagship British show we spoke to has at least one machine-learning tool in its weekly pipeline, and most have three or four. None of them lead with it. Few of them name the tools in their credits. Almost all of them argue, sometimes a little defensively, that nothing the listener actually hears is generated — only cleaned, indexed, or summarised.

That distinction is doing a lot of work. So is the silence around it. This is a tour of the quiet stack: the tools sitting between the raw multitrack and the file you press play on, what each one is genuinely useful for, and where the editorial line keeps being redrawn.

Four jobs, four different invasions

If you asked a podcast producer in 2022 where their week went, four answers would have come up first: transcribing the recording, removing the room from the recording, writing the show notes from the recording, and finding the moments worth cutting from the recording. AI hit each of those jobs at a different speed and to a different depth, and the way British production teams have absorbed each one tells you something about where they think the craft actually lives.

The first two — transcription and noise reduction — are now near-totally machine-assisted across the industry. The third — show notes — sits in an awkward, half-disclosed middle. The fourth — the edit decisions themselves — is still where producers plant their flag and say, no, this is the job, and a model can't have it.

The transcription layer: from luxury to default

Four years ago a fully time-coded transcript of a 90-minute interview was a £180 outsourced job that came back two days later. Today it is a 90-second background task. Riverside and Descript both produce one the moment recording ends. WhisperX, the open-source pipeline that wraps OpenAI's Whisper model with forced alignment and speaker diarisation, runs on a producer's own laptop and costs nothing per minute.

The consequence is structural, not cosmetic. Editors at The News Agents and Goalhanger network shows describe working off the transcript first and the waveform second — scanning for the line they want to lift, jumping to its timecode, then opening the audio. The Rest is History producers told us last autumn that they keep two windows open during a polish pass: Descript on the left for text-driven cuts, Pro Tools on the right for the sound work the text view can't see.

The interesting tell is which shows still pay for human transcripts on top. Tortoise's longer narrative pieces do, because the model still gets a confident-sounding place name wrong and nobody fact-checks an autotranscript line by line. BBC Sounds documentaries do, because the published transcript is an editorial product, not just an internal scratchpad. Most chat shows don't, because the listener never sees the words on a page.

The dialogue clean-up tier

This is the layer where AI changed the room a podcast can be recorded in. Adobe Podcast Enhance, released as a free web tool in 2023 and folded into a paid tier last year, will take a phone-recorded voice memo and return something that sounds like a £400 microphone in a treated booth. iZotope RX 11's Dialogue Isolate, Krisp's noise-removal SDK, and ElevenLabs' Voice Isolator do versions of the same trick at different price points.

The craft question is how much of it to use. Pushed all the way, these tools produce what producers call the plastic voice — a hyper-clean signal that has been so aggressively rebuilt it no longer sounds like a person was in a room. Audio engineers on the British narrative scene have started talking about "leaving the floor in" — keeping a few decibels of room tone so the ear still believes the recording happened somewhere.

The Goalhanger shows, recorded mostly in proper studios, use these tools sparingly and mainly for guest call-ins. The News Agents, recorded from three different locations on a normal week, leans on them harder. Audio drama producers we spoke to use them almost not at all, because the room is the story and they would rather record again.

Show notes: the half-disclosed middle

This is where the editorial line is wobbliest. Tools like Castmagic, Capsho, Swell AI and Descript's own Underlord assistant will, given a transcript, produce a title, an episode summary, five chapter markers, a list of guests and references mentioned, and a draft of social posts in about ninety seconds. The output is rarely publishable as-is and almost always usable as a draft.

We asked seven producers at flagship British podcasts whether they use one of these tools. Six said yes. Four said the host doesn't know. None of them credit the tool in the show notes the listener reads.

The argument for not disclosing is that a producer used to write the same notes from scratch and nobody credited them either; the work is editorial scaffolding, not the show. The argument against is that the words in the description box are the bit listeners use to decide whether to press play, and they are now, increasingly, words a person reviewed rather than wrote.

The four-tool comparison

The stack varies, but the same four jobs keep coming up. Here is roughly what the major British production teams are paying for in mid-2026, based on the tools their producers admit to using on the record or off it.

Job	Most common tool	Typical monthly cost (team seat)	What it does well	What it still can't do
Transcription + speaker ID	Descript Pro / Riverside built-in	£24 – £40	Time-coded, edit-linked transcript in under two minutes	Reliably name a guest whose accent it hasn't seen before
Dialogue clean-up	Adobe Podcast Enhance / iZotope RX 11	£18 – £30	Rescue a phone-quality recording into broadcast-usable audio	Preserve the room without re-introducing the noise
Show notes + chapters	Castmagic / Capsho / Descript Underlord	£25 – £49	Draft a title, summary and five chapter marks from a transcript	Catch the editorial joke a host actually wants in the title
Edit decisions	(still human)	—	—	—

The pattern is consistent across the network shows and the independents: between £70 and £120 a month of AI tooling, mostly aimed at jobs that used to involve an outsourced freelancer or an unpaid producer-evening. Which is, depending on how you count, either a £200-a-month saving or a £200-a-month freelancer no longer being booked.

What producers still won't let it do

The line that came up in every conversation, with almost the same words, was the edit itself. The decisions about where a story ends, where a laugh lives, when to cut to silence, when to leave a sentence hanging — those are still being made by a person scrubbing a waveform.

Descript's Underlord and similar features will offer to remove filler words automatically. Every producer we spoke to has tried it; most have turned it off. The reasoning is the same as the noise-floor argument: a clean transcript reads better, but a clean edit sounds wrong. The Rest is History's editors leave in roughly six "ums" per fifteen minutes on purpose; the Tortoise narrative team described removing every one as making the host sound like "a press release reading itself out".

The other firm refusal is on voice cloning. Several British production houses have demoed ElevenLabs' professional voice models and Resemble's tools internally, and most have written internal policies that read more or less the same: no cloned host voice in published audio, no "pickup" lines generated rather than rerecorded, no synthetic guest reads, full stop. The fear is not technical — the technology now produces a near-perfect clone from twenty minutes of training audio — it is reputational. The first British flagship podcast to be caught publishing a synthetic host line will spend a month explaining itself, and nobody wants to be first.

The credit problem

There is one editorial question this leaves open, and it is the one this site keeps coming back to: the credit. The show notes on most British flagship podcasts thank the producer, the editor, the studio, and occasionally the music. They do not, yet, thank or even name the toolset that wrote a first draft of those notes, removed the room from the guest's home recording, and produced the transcript a fact-checker worked off.

That is the choice the industry is quietly making in 2026. Tools are workflow, not personnel. They sit in the same bucket as Pro Tools and the spreadsheet of guest emails: nobody credits Excel either. We think that line is more uncomfortable than the industry is letting on, and the first show to break it — to name its stack the way film credits name a colourist — will set a more honest precedent than the rest of us are currently comfortable with.

For now, the laptop on the second monitor keeps working, the producer keeps sending the two-in-the-morning email, and the credit at the end of the episode still reads the same three names it did in 2022. The stack underneath is doing more than it used to. It just isn't saying so.