PLAY PODCASTS
Netflix Observability with Kevin Lew
Episode 896

Netflix Observability with Kevin Lew

Software Engineering Daily · softwareengineeringdaily.com

September 12, 201850m 38s

Audio is streamed directly from the publisher (traffic.megaphone.fm) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

Netflix users stream terabytes of data from the cloud to their devices every day. During a high bandwidth, long-lived connection, a lot can go wrong. Networks can drop packets, machines can run out of memory, and the Netflix app on a user’s device can have a bug. All of these events can result in a bad user experience.

Other errors can occur that do not disrupt the user experience. Netflix runs thousands of machine learning jobs, logging servers, and other pieces of internal infrastructure. Customer service dashboards, CI/CD pipelines, and AB testing frameworks are all software built by Netflix–and when an error occurs in any of these places, engineers need to be able to diagnose and debug that error.

Observability is the practice of using logs, monitoring, metrics, and distributed tracing to understand how a system is working. Kevin Lew is a senior software engineer at Netflix with the Edge Insights team. He joins the show to talk about adding observability across the microservices deployed at Netflix. We also talk about how to manage high volumes of logging data effectively using stream processing.