PLAY PODCASTS
345: Scrape or Be Scraped

345: Scrape or Be Scraped

The Bootstrapped Founder

September 6, 202420m 3s

Audio is streamed directly from the publisher (2.gum.fm) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

Welcome to the weird world of web scraping in the AI age, where founders have to protect their data from hungry AI companies but also need to collect information from all kinds of (not so) public APIs.

Today, I dive into a particularly confusing situation I am in with Podscan when it comes to scraping and keeping the web free and open.

This episode is sponsored by Podscan.fm

The blog post: https://thebootstrappedfounder.com/crawl-or-be-crawled/

The podcast episode: https://tbf.fm/episodes/345-scrape-or-be-scraped


Check out Podscan to get alerts when you're mentioned on podcasts: https://podscan.fm
Send me a voicemail on Podline: https://podline.fm/arvid

You'll find my weekly article on my blog: https://thebootstrappedfounder.com

Podcast: https://thebootstrappedfounder.com/podcast

Newsletter: https://thebootstrappedfounder.com/newsletter


My book Zero to Sold: https://zerotosold.com/

My book The Embedded Entrepreneur: https://embeddedentrepreneur.com/

My course Find Your Following: https://findyourfollowing.com

Here are a few tools I use. Using my affiliate links will support my work at no additional cost to you.
- Notion (which I use to organize, write, coordinate, and archive my podcast + newsletter): https://affiliate.notion.so/465mv1536drx
- Riverside.fm (that's what I recorded this episode with): https://riverside.fm/?via=arvid
- TweetHunter (for speedy scheduling and writing Tweets): http://tweethunter.io/?via=arvid
- HypeFury (for massive Twitter analytics and scheduling): https://hypefury.com/?via=arvid60
- AudioPen (for taking voice notes and getting amazing summaries): https://audiopen.ai/?aff=PXErZ
- Descript (for word-based video editing, subtitles, and clips): https://www.descript.com/?lmref=3cf39Q
- ConvertKit (for email lists, newsletters, even finding sponsors): https://convertkit.com?lmref=bN9CZw


Topics

Web ScrapingAI EraData ProtectionWeb Scraping StrategiesAI PlatformsLegal BattlesCorporate Intellectual PropertyOpen Data AccessData Collection PracticesServer Overload SignalsRSS FeedsTwitter PresencePodcast Data ScanningPodScanPodcastTwitterRate LimitsEncoded IDsHTTP FeaturesCachingData TransferHistorical ContextResponsible Web CitizenCompetitive NecessityPublic Data AccessData EnumerationOverwhelming SystemLast Modified DatesE-tagsNon-check PeriodsIndustry Practices