
Google’s Site Reliability Engineering with Todd Underwood
Software Engineering Daily · softwareengineeringdaily.com
Audio is streamed directly from the publisher (traffic.megaphone.fm) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
Google’s site reliability engineers are responsible for maintaining the highly available services that power the Google software that we all use on a regular basis. O’Reilly recently published the book “Site Reliability Engineering: How Google Runs Production Systems”, and the book provides a comprehensive window into how the site reliability engineering role works.
Todd Underwood is a director of site reliability engineering. On today’s episode, Todd explains how the role of a SRE relates to devops. We discuss the relationship between the engineers who are developing Google services, and the SREs who are maintaining it. Google’s internal data center operating system “Borg” is also discussed.