PLAY PODCASTS
404: The Transcription Challenge: Building Infrastructure That Scales With The World

404: The Transcription Challenge: Building Infrastructure That Scales With The World

The Bootstrapped Founder

July 18, 202527m 47s

Audio is streamed directly from the publisher (2.gum.fm) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

Today we’ll talk about keeping up with an avalanche of audio data and how I built Podscan’s transcription infrastructure.

This episode of The Bootstraped Founder is sponsored by Paddle.com

The blog post: https://thebootstrappedfounder.com/the-transcription-challenge-building-infrastructure-that-scales-with-the-world/
The podcast episode: https://tbf.fm/episodes/404-the-transcription-challenge-building-infrastructure-that-scales-with-the-world


Check out Podscan, the Podcast database that transcribes every podcast episode out there minutes after it gets released: https://podscan.fm
Send me a voicemail on Podline: https://podline.fm/arvid

You'll find my weekly article on my blog: https://thebootstrappedfounder.com

Podcast: https://thebootstrappedfounder.com/podcast

Newsletter: https://thebootstrappedfounder.com/newsletter


My book Zero to Sold: https://zerotosold.com/

My book The Embedded Entrepreneur: https://embeddedentrepreneur.com/

My course Find Your Following: https://findyourfollowing.com

Here are a few tools I use. Using my affiliate links will support my work at no additional cost to you.
- Notion (which I use to organize, write, coordinate, and archive my podcast + newsletter): https://affiliate.notion.so/465mv1536drx
- Riverside.fm (that's what I recorded this episode with): https://riverside.fm/?via=arvid
- TweetHunter (for speedy scheduling and writing Tweets): http://tweethunter.io/?via=arvid
- HypeFury (for massive Twitter analytics and scheduling): https://hypefury.com/?via=arvid60
- AudioPen (for taking voice notes and getting amazing summaries): https://audiopen.ai/?aff=PXErZ
- Descript (for word-based video editing, subtitles, and clips): https://www.descript.com/?lmref=3cf39Q
- ConvertKit (for email lists, newsletters, even finding sponsors): https://convertkit.com?lmref=bN9CZw

Topics

TranscriptionEfficiencyCost-effectivenessPodscanPodcastsQueuing ProcessPodcast Index Project APIOpen-source ToolsWhisperWhispercppGlobal Podcast BoomCustomer DemandPrioritizing EpisodesResource ManagementGPUsDiarizationWord-level TimestampsS3 StorageOpenSearch ClustersTranscript Data ManagementAudio QualityQuality Checking System