PLAY PODCASTS
Root Cause Analysis vs. Resilience Engineering w/special guest Lorin Hochstein
Season 2 · Episode 12

Root Cause Analysis vs. Resilience Engineering w/special guest Lorin Hochstein

This is Fine! A podcast about resilience engineering and software · Colette Alexander and Clint Byrum

October 16, 202559m 43s

Audio is streamed directly from the publisher (static1.squarespace.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

A history of the 5 whys and root cause analysis from papers

Some critiques of the 5 whys:

From John Allspaw: https://www.oreilly.com/radar/the-infinite-hows/

From Alan J Card: https://qualitysafety.bmj.com/content/26/8/671



James Reason and the Swiss Cheese Model: 

https://pmc.ncbi.nlm.nih.gov/articles/PMC8514562/

James Reason’s book Human Error: https://bookshop.org/p/books/human-error/9e06d8a100a07537?ean=9780521314190&next=t



And a classic from Sidney Dekker (et al.) on the implication of complexity within safety investigations:

https://www.sciencedirect.com/science/article/abs/pii/S0925753511000105?via%3Dihub



We always recommend the Howie Guide: https://howie-guide.pagerduty.com/

STAMP is starting to get popular: https://functionalsafetyengineer.com/introduction-to-stamp/

Google’s STAMP paper: https://www.usenix.org/publications/loginonline/evolution-sre-google

Google’s STAMP discussion on ProdCast: https://sre.google/prodcast/#season4-episode7

And presentation at SRECon: https://www.usenix.org/conference/srecon25americas/presentation/klein

Nancy Leveson’s google scholar is always worth browsing: https://scholar.google.com/citations?user=78y4sEcAAAAJ&hl=en

Allspaw’s LinkedIn post that we quoted: https://www.linkedin.com/posts/jallspaw_important-reminders-about-learning-effectively-activity-7378775591447183360-c_eD


Lorin’s Law: https://surfingcomplexity.blog/2017/06/24/a-conjecture-on-why-reliable-systems-fail/

Want to talk more about this subject? We’re doing a live event co-sponsored by RISF and you can sign up for it here: https://resilienceinsoftware.org/networks/events/146485