Oliver Leaver-Smith - On how "just a monitoring change" took down the entire site and resilience engineering - #5

Oliver Leaver-Smith - On how "just a monitoring change" took down the entire site and resilience engineering - #5

Author: Ronak Nathani, Guang Yang February 19, 2021 Duration: 1:01:22
Oliver Leaver-Smith, better known as Ols, is a Senior Devops Engineer at Sky Betting and Gaming. In this episode, we discuss how a seemingly simple monitoring change ended up taking down the entire site. We also talk about chaos and resilience engineering. We discuss how the team at Sky Betting and Gaming conducts fire drills (chaos engineering exercises) where they not only test the resiliency of their software systems but also their people systems. We walk through a recent example of a fire drill, how they have evolved over the past few years and the lessons learned in the process.

Behind every line of code, there's a person with a story, and that's where Software Misadventures finds its pulse. Hosts Ronak Nathani and Guang Yang pull up a chair with engineers, founders, and investors, but the conversation rarely stays in the technical manual. Instead, it wanders into the human territory of career detours, hard-won insights, and those unpredictable stumbles that often teach the most. This podcast is built on the idea that the journey is just as important as the destination, especially in the fast-moving tech world. You'll hear guests recount the projects that went sideways, the decisions they'd rethink, and the moments of clarity that emerged from the chaos. It’s a refreshingly honest look at the industry, emphasizing that expertise isn't just about what you build, but what you learn when things don't go as planned. Tune in for conversations that are less about perfect solutions and more about the real, sometimes messy, process of creating with technology. Each episode offers a blend of professional wisdom and personal narrative, making it a compelling listen for anyone curious about the lives woven into our digital landscape.
Author: Language: English Episodes: 55

Software Misadventures
Podcast Episodes
Podcast update and news! [not-audio_url] [/not-audio_url]

Duration: 13:41
Some reflections on running the podcast and Ronak has some eggciting news to share :) Music: Vlad Gluschenko — Forest License: Creative Commons Attribution 3.0 Unported: https://creativecommons.org/licenses/by/3.0/deed.e…
Uncrating the Oxide Rack | Bryan Cantrill, Steve Tuck (Oxide) [not-audio_url] [/not-audio_url]

Duration: 1:26:35
Oxide co-founders Bryan and Steve are back on the show to give an impromptu peek at the Oxide server rack and to chat about writing their own manufacturing software, overcoming false summits before shipping the first rac…
Early Twitter's fail-whale wars | Dmitriy Ryaboy [not-audio_url] [/not-audio_url]

Duration: 1:08:46
A veteran of early Twitter's fail whale wars, Dmitriy joins the show to chat about the time when 70% of the Hadoop cluster got accidentally deleted, the financial reality of writing a book, and how to navigate acquisitio…
Behind designing Kubernetes' APIs | Brian Grant (Google) [not-audio_url] [/not-audio_url]

Duration: 2:10:56
As the original architect and API design lead of Kubernetes, Brian joins the show to chat about why "APIs are forever", the keys to evangelizing impactful projects, and being an Uber Tech at Google, and more. Segments: (…
Growing and selling an indie business | Michael Lynch (TinyPilot) [not-audio_url] [/not-audio_url]

Duration: 1:40:18
Having quit Google in 2018 to bootstrap indie software businesses, Michael is known for writing very transparently about the ups and downs of his journey. After recently selling his hardware business TinyPilot for $600K,…