PLAY PODCASTS
The Failover That Failed Successfully - Lessons from a Successfully Failed Disaster Recovery and Failover Test
Season 1 · Episode 12

The Failover That Failed Successfully - Lessons from a Successfully Failed Disaster Recovery and Failover Test

IT Horror Stories with Jack Smith

April 6, 202633m 0sExplicit

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

Conducted during a busy release weekend, the failover test exposed gaps not in the technology itself, but in coordination and communication. While production ultimately stayed unaffected, the situation quickly escalated as subcontractors weren't aligned, assumptions didn't match reality, and information didn't flow when it mattered most.

We unpack how a well-intentioned test turned into a coordination challenge, where timing, dependencies, and unclear responsibilities created confusion across teams. It's a story about how resilience isn't just about systems and infrastructure, but also about people, processes, and making sure everyone is on the same page — especially when things are supposed to "just be a test."

00:00 Welcome & Setup 01:34 Corporate Environments 03:30 Failover Planning 07:19 Double Disaster 09:08 Critical Failure 13:20 Realization Moment 15:28 Split Brain 17:34 The Recovery 21:13 Lessons Learned 31:32 Conclusion