PLAY PODCASTS
Internet Archive Book Scanning with Davide Semenzin

Internet Archive Book Scanning with Davide Semenzin

Software Engineering Daily · softwareengineeringdaily.com

September 15, 202047m 47s

Audio is streamed directly from the publisher (traffic.megaphone.fm) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

The Internet Archive collects historical records of the Internet. The Wayback Machine is one tool from the Internet Archive which you may be familiar with. One project you may be unfamiliar with is book scanning. Internet Archive scans high volumes of books in order to digitize them.

In today’s episode, Davide Semenzin joins the show to talk through the history of the Internet Archive and the engineering behind book digitization. We talk through OCR, storage, architecture, and scalability.

Sponsorship inquiries: [email protected]