
The Internet Report
135 episodes — Page 2 of 3

Ep 85AT&T Outage and Disruptions at Google Cloud, Front, and More | Pulse Update
Load is a fundamental but, at times, challenging variable for networks and operations teams to handle. In the past few weeks, ThousandEyes saw various load-related problems impact organizations including Google Cloud, Front, several Australian banks, and Minnesota State University Moorhead.Tune in to learn more about what happened during these incidents, as well as hear our commentary on the recent outage impacting AT&T. Use the timestamps below to jump to the sections that most interest you: CHAPTERS:00:00 Intro00:59 AT&T outage impacts cellular services nationwide04:40 Australian banks appear to lose online and app-based services for 24 hours07:46 Google Cloud metadata store faces sudden demand spike09:19 Front’s “large unexpected increase in web traffic”10:23 Box experiences outage as third-party network component fails11:23 Minnesota State University Moorhead’s case study on good visibility12:44 Outage trends: By the numbers15:35 Get in touch ———For more insights, check out these links:- Explore the Front disruption in the ThousandEyes platform (no login required): https://aczocbxpamiipkdqnaqdedkytwhlfjtw.share.thousandeyes.com?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q3_internetreportpulse30_podcast- Internet Outages Timeline: https://www.thousandeyes.com/resources/internet-outages-timeline?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q3_internetreportpulse30_podcast———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on X: @thousandeyes

Ep 84Square Outage, Data Center Issues & Planning for Resiliency | Pulse Update
When outages happen, it’s what you do next that matters. It’s important to have a backup plan in place that you can quickly activate to minimize the impact of an incident.Over the past two weeks, companies initiated a range of resiliency actions, including asking customers to use alternate authentication methods (or to avoid logging out of a service), setting up a new contact center to re-establish lines of communication, and reverting to manual processes.Tune in to learn more about what happened during these and other recent incidents.CHAPTERS:00:00 Intro01:00 Square Outage Impacts “Multiple” Services04:18 Applied Digital’s Multi-week Data Center Issue07:14 UC Berkeley Data Center Outage08:44 Russia .ru Domain Outage10:05 Cyber Attack at Lurie Children's Hospital11:36 Outage Trends: By the Numbers15:55 Get in Touch———For more insights on outage trends and analysis of some of the most notable outages of 2023, check out our on-demand webinar: https://www.thousandeyes.com/resources/amer-top-outages-2023-analyses-takeaways-webinar?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q3_internetreportpulse29_podcastIn EMEA and want to tune in live? We’re hosting one more live webinar session on Feb. 22 at 10 AM (GMT). Register now: https://www.thousandeyes.com/webinars/emea-top-outages-2023-analyses-takeaways?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q3_internetreportpulse29_podcastAnd also check out these links:- The Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-square-outage-and-more-news?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q3_internetreportpulse29_podcast- Explore the .ru domain outage in the ThousandEyes platform (no login required): https://awuyqonlmzxmvizsdcgevvdhcbzervks.share.thousandeyes.com?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q3_internetreportpulse29_podcast- Internet Outages Timeline: https://www.thousandeyes.com/resources/internet-outages-timeline?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q3_internetreportpulse29_podcast———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on X: @thousandeyes

Ep 83Security, Great Digital Experiences & Why Visibility Matters
The ThousandEyes Internet Intelligence team joins us from Cisco Live in Amsterdam, talking about a major theme from the event—security.Tune in to hear their thoughts on how visibility can help companies in their security efforts, the sovereignty of data in flight, and why you don’t have to choose between security and performance.———CHAPTERS00:00 Intro01:09 Evolving Security Landscape04:53 Security Excellence & Optimal Digital Experience10:13 Sovereignty of Data in Flight14:57 Key Takeaways15:55 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on X: @thousandeyes

Ep 82Understanding the Microsoft Teams & Azure Disruptions | Pulse Update
What happened during the recent Microsoft Teams and Azure disruptions? Go under the hood of these incidents and also explore other recent disruptions in this week’s Pulse Update.CHAPTERS- 01:03 Network issue leads to Microsoft Teams service disruption- 04:09 Azure Resource Manager exhausts capacity, causing service issues- 06:20 Oracle Cloud experiences network outage- 09:56 Jira users encounter 503s and other errors- 10:30 Sage outage impacts South Africa- 11:08 Red Hat experiences four search-related incidents- 11:45 Recent outage trends and numbersFor more insights on outage trends and analysis of some of the most notable outages of 2023, check out our on-demand webinar: https://www.thousandeyes.com/resources/amer-top-outages-2023-analyses-takeaways-webinar?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q3_internetreportpulse28_podcastIn EMEA and want to tune in live? We’re hosting one more live webinar session on Feb. 22 at 10 AM (GMT). Register now: https://www.thousandeyes.com/webinars/emea-top-outages-2023-analyses-takeaways?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q3_internetreportpulse28_podcastAnd also check out these links:- The Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-microsoft-teams-azure-outage?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q3_internetreportpulse28_podcast- Internet Outages Timeline: https://www.thousandeyes.com/resources/internet-outages-timeline?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q3_internetreportpulse28_podcast———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on X: @thousandeyes

Ep 81Unpacking Recent ChatGPT Issues & Other Outage News | Pulse Update
What caused recent dips in performance for OpenAI’s ChatGPT? Tune in to hear The Internet Report team unpack this and other recent disruptions, including a hack that led to an outage at the Spanish branch of the Orange mobile network, and a blip for customers of the cloud services provider DigitalOcean.They’ll also cover the outage trends they’re seeing in 2024 so far and how extreme cold weather can cause problems for data centers.For more insights on outage trends and analysis of some of the most notable outages of 2023, register for the upcoming Top Outages of 2023 webinar: https://www.thousandeyes.com/webinars/amer-top-outages-2023-analyses-takeaways?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q2_internetreportpulse27_podcastAlso check out these links:- Blog: 2023 Internet Outage Trends & the New Outage Landscape: https://www.thousandeyes.com/blog/internet-report-pulse-update-2023-internet-outage-trends?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q2_internetreportpulse27_podcast- Internet Outages Timeline: https://www.thousandeyes.com/resources/internet-outages-timeline?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q2_internetreportpulse27_podcast———CHAPTERS00:00 Intro01:12 Two Consecutive Service Degradations at ChatGPT04:43 Hack Leads to Orange Spain Outage13:05 DigitalOcean Disruption15:55 By the Numbers23:26 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on X: @thousandeyes

Ep 802023 Internet Outage Trends & the New Outage Landscape | Pulse Update
As they launch into 2024, organizations are facing a different outage landscape than they had at the start of 2023. The past year saw increases in cloud service provider (CSP) outages, application outages, and the percentage of U.S.-centric outages—all of which point to an evolution in the way outages happen and the need for different strategies to minimize the impact of disruptions.In this episode, Mike Hicks (Principal Solutions Analyst at ThousandEyes) unpacks these trends and shares practical tips for mitigating disruptions and optimizing performance. Listen on YouTube or tune in on your favorite podcast platform.And for more insights, check out these resources:- Top Outages of 2023 Webinar: https://www.thousandeyes.com/webinars/amer-top-outages-2023-analyses-takeaways?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q2_internetreportpulse26_podcast (*In APJC? Check out the APJC session: https://cisco.webex.com/webappng/sites/cisco/meeting/register/f22aa6a322284e07abf0350b255c88c8?ticket=4832534b0000000658abcedc3030906188c1af83e52ab18645645592a5962e7bdd3a5f2afdc393ae&timestamp=1705102128048&RGID=r57f263db115ebc6efa4c0a05429caa6f)- Internet Outages Timeline: https://www.thousandeyes.com/resources/internet-outages-timeline?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q2_internetreportpulse26_podcast———CHAPTERS00:00 Intro00:38 Cloud Service Provider Outages Trending Up02:30 Percent of U.S.-centric Outages Rises06:55 Application Outages Up in 202309:55 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on X: @thousandeyes

Ep 79Insights From the Ghosts of NetOps Past, Present, and Future
As 2023 comes to a close, in the spirit of Dickens’ holiday classic “A Christmas Carol,” let’s reflect on the valuable insights left by the ghosts of network operations teams past, present, and yet to come. Tune in to hear host Mike Hicks (Principal Solutions Analyst at ThousandEyes) discuss lessons from the NetOps teams of the past, the current state of NetOps, and what the future might hold—all with the goal of helping teams take steps to optimize performance and deliver delightful digital experiences in 2024.And also check out Mike’s related article in TechRadar: https://www.techradar.com/pro/the-ghosts-of-network-operations-past-present-and-yet-to-come———CHAPTERS00:00 Intro01:06 NetOps Past06:24 NetOps Present12:14 NetOps Yet to Come20:35 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on X: @thousandeyes

Ep 78Peering Issues, Internet Resilience, and Cloud Outage News | Pulse Update
Recent changes appeared to trigger a series of events for two peering points internationally—with very different impacts. Tune in to learn more about these incidents, why they differed, and the lessons they leave.Mike Hicks, Principal Solutions Analyst at ThousandEyes, will also cover the latest outage numbers and explore other recent incidents, including an Oracle Cloud outage and a duo of disruptions at Alibaba Cloud.Interested in more outage analysis? Check out our Internet Outages Timeline, which covers several notable Internet outages and application issues from the past year, along with the lessons they leave: https://www.thousandeyes.com/resources/internet-outages-timeline?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q2_internetreportpulse24_podcast———CHAPTERS00:00 Intro00:45 Optus Outage02:07 AMS-IX Outage06:50 Oracle Cloud Outage08:39 Duo of Alibaba Cloud Incidents 09:44 By the Numbers13:13 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on X: @thousandeyes

Ep 77Scaling To Meet the Black Friday Demand: Tips for IT Teams
As companies gear up for Black Friday, The Internet Report team shares some best practices for delivering great customer experiences and minimizing downtime during one of the retail industry’s biggest days of the year. Mike Hicks, Principal Solutions Analyst at ThousandEyes, will cover some helpful case studies of Black Fridays that experienced some hiccups and what you can do to guard against similar disruptions.To learn more, check out the link below: - https://www.thousandeyes.com/blog/internet-report-episode-54-black-friday-2023———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on Twitter: @thousandeyes

Ep 76Understanding the Recent Workday and Cloudflare Outages | Pulse Update
Backend-related incidents have been a recurring theme in outages across 2023, caused by everything from data center issues and hardware mishaps to failures at common (shared) services.Recently, we saw two examples of these backend issues when data center power problems led to outages at both Cloudflare and Workday.Tune in to hear more about what happened at Cloudflare and Workday, as well as our analysis of disruptions at OneLogin and GitLab.———CHAPTERS00:00 Intro01:00 OneLogin Disruption05:22 GitLab.com Availability Issues09:14 Workday and Cloudflare Outages31:16 Get in Touch———For more insights, check out these links:- The Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-workday-cloudflare-outages?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q2_internetreportpulse23_podcast- Interested in more outage analysis? Check out our Internet Outages Timeline, which covers several notable Internet outages and application issues from the past year, along with the lessons they leave: https://www.thousandeyes.com/resources/internet-outages-timeline?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q2_internetreportpulse23_podcast———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on X: @thousandeyes

Ep 75Halloween Special: Ghosts of Outages Past
This Halloween, The Internet Report team is sharing some of their most thrilling (and chilling) networking tales.Pull up a chair (and a big bowl of your favorite Halloween candy) to hear what happened—and important lessons learned.———CHAPTERS00:00 Intro01:40 Haunting obstacles with a dynamic routing protocol that thwarted crew changes on an oil platform10:00 A spooky code base rollout that unleashed memory leak mischief18:58 A chilling application rollout that failed to deliver on user expectations around the globe29:45 Mysterious application issues that sent shivers down spines, before they were discovered to be caused by a wicked broadcast storm42:43 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on X: @thousandeyes

Ep 74Insights From Outages at Citibank, DBS, and Other News | Pulse Update
In recent weeks, back-end infrastructure work and other backend-related issues impacted various online and consumer banking services, including DBS and Citibank in Singapore.Simple front-facing customer experiences that we’ve become accustomed to today can often mask considerable complexity on the backend. The service delivery chain of technologies powering the front end often comprises a mix of on-premises assets, cloud services, containers, and APIs.A degradation or outage to just one of those components can have massive impact. Depending on the architecture of the app and resilience of the backend, an incident in one part can be routed around in the best case scenario, or take down critical systems for hours in the worst case.Tune in to this episode to learn more about how backend changes led to outages at DBS, Citibank, and a number of Japanese banks—and how other backend issues appeared to contribute to a Google Cloud VMware Engine disruption and potentially also a Microsoft Exchange incident.For more insights, check out these links:- The Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-dbs-citibank-outages?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q1_internetreportpulse22_podcast- Explore the Equinix issues that impacted DBS and Citibank: https://ajhrlohbopohbnmekzbcvrbeslqaijfr.share.thousandeyes.com/- Interested in more outage analysis? Check out our Internet Outages Timeline, which covers several notable Internet outages and application issues from the past year, along with the lessons they leave: https://www.thousandeyes.com/resources/internet-outages-timeline?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q1_internetreportpulse22_podcast———CHAPTERS00:00 Intro00:47 The Download04:10 By the Numbers06:40 Equinix Chiller Upgrade Leads to DBS, Citibank Outages in Singapore23:19 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on X: @thousandeyes

Ep 73Talking Data Freshness + Slack, Cloudflare, and Google Outages | Pulse Update
Outages and degradations can happen when underlying data isn’t fresh enough. In recent weeks, stale data may have contributed to incidents at both Slack and Cloudflare. Slack began experiencing issues when, by our best guess, its app stopped trusting the freshness of the data in the cache; and, separately, Cloudflare’s 1.1.1.1 DNS resolver ran into some issues related to stale root zone data.Watch this Pulse Update episode to hear more about the Cloudflare and Slack outages, and also explore recent disruptions at Google.For more insights, check out these links:- Explore the Slack outage in the ThousandEyes platform (NO LOGIN REQUIRED): https://apiyhhcphzaowmpqpyxrtdgggadiiujg.share.thousandeyes.com- Interested in more outage analysis? Check out our Internet Outages Timeline, which covers several notable Internet outages and application issues from the past year, along with the lessons they leave: https://www.thousandeyes.com/resources/internet-outages-timeline?utm_source=transistor&utm_medium=referral&utm_campaign=na_fy24q1_internetreportpulse21_podcast———CHAPTERS00:00 Intro01:11 The Download04:46 By the Numbers09:30 Slack Outage: Cached Data Freshness Issues23:08 Cloudflare Outage: Resolvers Use Stale Root Zone29:55 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on X (formerly Twitter): @thousandeyes

Ep 72Internet Outages: Why One Small Link Can Break the Whole Chain | Pulse Update
Providing great digital experiences relies on a complex service delivery chain. The past few weeks brought multiple reminders that the root cause of cloud and app disruptions often comes down to one single link in this chain. While the component at issue may appear small, if it’s not functioning normally, the consequences can be significant. Additionally, the impact of a malfunctioning “link” is often intensified by a lack of understanding or visibility into the entire end-to-end service delivery chain, especially in situations where a change is made outside standard operating procedures or pipelines.In this episode, explore how this phenomenon appeared to play out recently when .au domains failed to resolve, as well as during disruptions at Salesforce and Microsoft Azure.For more insights, check out these links:- Explore the .au incident in the ThousandEyes platform (NO LOGIN REQUIRED): https://ajdaombojgmvnbvsclhtvgmrskaicjvo.share.thousandeyes.com/- Internet Outages Timeline: https://www.thousandeyes.com/resources/internet-outages-timeline?utm_source=transistor&utm_medium=referral&utm_campaign=internetreportpulseep20———CHAPTERS00:00 Intro00:54 The Download06:01 By the Numbers08:23 .au Domains Fail to Resolve15:42 PlayStation Network Disruption21:10 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on X (formerly Twitter): @thousandeyes

Ep 71Data Center Disruptions, Square Down, and More News | Pulse Update
In a world that operates at “hyperscale,” the potential for hyperscale-sized problems is also very real. The measure of a good provider—and a well-engineered system—is how well they handle these anomalous conditions and minimize disruption.During recent weeks, some of these hyperscale-sized outages hit, including data center-focused disruptions that impacted companies like Square, Oracle OCI, NetSuite, and Microsoft Azure. Tune into this Pulse Update episode to go under the hood of these outages and discover how the companies responded—and important lessons learned.For more insights, check out these links:- The Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-square-down?utm_source=transistor&utm_medium=referral&utm_campaign=internetreportpulseep19- Explore the Square outage in the ThousandEyes platform (NO LOGIN REQUIRED): https://akinmwcoyjwhlwmhnykikzqkxcltwasv.share.thousandeyes.com———CHAPTERS00:00 Intro00:59 The Download04:33 By the Numbers09:02 Square Outage23:08 Oracle OCI, NetSuite, and Microsoft Azure Outages32:19 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on X (formerly Twitter): @thousandeyes

Ep 70Disruptions at Slack and X + Thoughts on “Take Twos” | Pulse Update
An outage occurs, a change is rolled back, and everything stabilizes. But what happens when the change is attempted a second time?These second tries often go much more smoothly. While another outage might still occur during this “take two,” the impact is usually far less severe. The engineering team has learned from what went wrong the first time and is ready to stop at the first hint of trouble. Slack recently experienced a pair of disruptions that appear to illustrate this “take two” scenario: a longer disruption resulting from a routine database cluster migration, followed by a much shorter outage a few weeks later that also involved database work, potentially indicative of related work that went more smoothly.And for more insights, check out these links:- The Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-slack-x-outage?utm_source=transistor&utm_medium=referral&utm_campaign=internetreportpulseep18- Explore the Slack and X disruptions in the ThousandEyes platform (NO LOGIN REQUIRED): Slack: https://afkmcwbeszwdtqqpvouwgjolywiugryx.share.thousandeyes.com/X: https://adcsnhfupsardmzyocrxqdcvriengkew.share.thousandeyes.com———CHAPTERS00:00 Intro00:47 The Download04:06 By the Numbers05:41 Slack Disruptions09:25 X Disruptions20:20 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on X (formerly Twitter): @thousandeyes

Ep 69An August Slack Outage and Why Context Matters | Pulse Update
Context matters when working on a distributed web-based application or service where everything is linked and dependent on each part functioning correctly. It’s all too easy for one team to make a change that unexpectedly affects something another team is working on. Or the combined impact of both changes may also accidentally break something.To avoid such mishaps, teams should cut back on silos as much as possible.However, it’s hard to completely eliminate siloed operations or decision-making. But the potential negative effects of silos can be reduced if each team has a view of the end-to-end service that’s tailored to their specific area or domain—that is, presented to them in a context that they understand.Tune in to this week’s episode to learn more about mitigating silos and also explore lessons from recent disruptions at Slack, Spotify, and Wells Fargo.And for more insights, check out these links:- The Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-slack-outage?utm_source=transistor&utm_medium=referral&utm_campaign=internetreportpulseep17- Explore the Slack disruption in the ThousandEyes platform (NO LOGIN REQUIRED): https://apiijiaoljxvlpnzjkxwwlcqmmcovppx.share.thousandeyes.com/———CHAPTERS00:00 Intro00:49 The Download04:35 By the Numbers06:55 Slack Disruption27:18 Spotify Disruption33:15 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on X (formerly Twitter): @thousandeyes

Ep 68SharePoint Outage and Security Certificate Considerations | Pulse Update
In an end-to-end service delivery chain, isolated changes can have broad consequences. This played out recently when an erroneous SSL certificate change at Microsoft appeared to cause a SharePoint Online and OneDrive for Business outage.While this incident definitely underscores the importance of valid security certificates, it’s also a reminder of what can happen when even one component in an end-to-end service delivery chain experiences issues. Every component needs to work in sync to maintain the service’s availability. As a result, all changes, especially manual ones, should be made with care and teams should have a deep understanding of every dependency and interconnection within their service delivery chain.Watch this week’s episode to learn more and explore other recent outages that impacted Slack, Starbucks, and NASA.And for more insights, check out these links:- The Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-sharepoint-outage?utm_source=transistor&utm_medium=referral&utm_campaign=internetreportpulseep16- Explore the OneDrive outage in the ThousandEyes platform (NO LOGIN REQUIRED): https://arcbfeptuostafskynuemdpgwcyerodc.share.thousandeyes.com———CHAPTERS00:00 Intro00:43 The Download04:03 By the Numbers06:09 SharePoint Outage21:47 Slack Outage25:31 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on Twitter: @thousandeyes

Ep 67Azure Disruption, Meta App Issues, and Navigating Edge Cases | Pulse Update
Let’s face it. Not every contingency can be planned for. Sometimes an outlier scenario pops up and causes an unexpected outage or disruption.Over the past few weeks, multiple companies appeared to be impacted by such edge cases: Azure; GitLab; and Meta’s WhatsApp, Facebook, Instagram, and Threads—its newest addition.Tune into the latest Pulse Update episode to learn more about what happened during these disruptions and why robust visibility is so important for navigating unexpected outlier scenarios.And for more insights, check out these links:- Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-azure-disruption?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp15- Explore the Azure disruption in the ThousandEyes platform (NO LOGIN REQUIRED): https://augfulkplwamllucivbbxxahisxddgay.share.thousandeyes.com- Cloud Performance Report: https://www.thousandeyes.com/resources/cloud-performance-report-2022?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp15———CHAPTERS00:00 Intro00:41 The Download03:34 By the Numbers05:26 Azure Disruption12:01 GitLab Outage18:20 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on Twitter: @thousandeyes

Ep 66A Front Door, But No House: Explaining Application Outages | Pulse Update
The application opens, but users encounter errors when they try to do anything—what gives? It’s the curious case of the disappearing backend. Discover why application issues often show up like this, with the service reachable but unresponsive beyond rendering a basic landing page, and sometimes an accompanying error message.In this episode, hosts Mike Hicks and Brian Tobia discuss this common problem and explore related incidents at CBA, GitHub, and Microsoft Teams. They also unpack other recent outage trends and disruptions, including the UK emergency services outage.To learn more, check out these links:- Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-explaining-application-outages?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp14- Explore the Microsoft Teams outage in the ThousandEyes platform (NO LOGIN REQUIRED): https://asinlehhlglxpowzmpqnuiyrzmniyozf.share.thousandeyes.com———CHAPTERS00:00 Intro00:26 The Download04:55 By the Numbers07:08 Microsoft Teams Outage13:55 GitHub Outage16:42 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on Twitter: @thousandeyes

Ep 65Application Outages Up in 2023—What to Know | Pulse Update
Though network outages are still far more common, application outages seem to be increasing in 2023—and having bigger impacts. Tune in to learn more about this trend and dive into incidents at Okta and Instagram. Host Mike Hicks will also explore other outage trends from the first half of the year in this special episode reflecting on the state of the Internet in 2023 thus far.To learn more, check out these links:- Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-application-outages-increasing?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp13- Explore the Instagram outage and Okta disruption in the ThousandEyes platform (NO LOGIN REQUIRED): Instagram: https://azavwfwqcgxyeqjyhwkicqxgsqwtcmzq.share.thousandeyes.com/Okta: https://awoleuudwuvnwklukifbrpghghynjjwy.share.thousandeyes.com- Internet Outages Timeline: https://www.thousandeyes.com/resources/internet-outages-timeline?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp13- Microsoft Outage Analysis: https://www.thousandeyes.com/blog/microsoft-outage-analysis-january-25-2023?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp13- AWS Outage Analysis: https://www.thousandeyes.com/blog/aws-outage-analysis-june-13-2023?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp13———CHAPTERS00:00 Intro00:40 The Download04:02 2023 Outage Trends: By the Numbers11:37 Instagram Outage17:20 Okta Disruption21:07 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on Twitter: @thousandeyes

Ep 64Is Spring Cleaning Causing an Outage Spike? | Pulse Update
For three consecutive years, there appears to have been a spike in outages and degradations in May. A potential “spring cleaning effect” may explain why. Tune in to learn more about this possible trend and explore what happened during recent incidents at Twitter; Microsoft 365; Slack; Instagram; Apple’s iMessage; and subscription-based streaming service, Max (formerly known as HBO Max).After watching, check out these links to dive deeper: Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-spring-cleaning-outage-spike?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp12Explore the Instagram outage in the ThousandEyes platform (NO LOGIN REQUIRED): https://aljqalczdkfpdqjynukcshuifepbwasi.share.thousandeyes.com———CHAPTERS00:00 Intro01:22 The Download07:38 Outage Trends: By the Numbers11:40 Twitter Outage15:01 Microsoft 365 Outages16:23 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on Twitter: @thousandeyes

Ep 63How Outages Can Impact Distributed Dev Teams | Pulse Update
Tune in to explore ways that outages can impact distributed software development teams and what companies can learn from recent incidents at GitHub, Google Cloud, and Apple.To learn more, check out these links: Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-outages-and-distributed-dev-teams?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp11Explore the GitHub service degradation in the ThousandEyes platform (NO LOGIN REQUIRED): https://agiebiuwxkwqowctctfvdaazvvfpxzew.share.thousandeyes.com/———CHAPTERS00:00 Intro00:39 The Download04:40 Outage Trends: By the Numbers09:22 GitHub Service Degradation19:13 Update: Google Cloud Outage22:45 Apple Authentication Issues25:56 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on Twitter: @thousandeyes And if you want to connect with Mike in person, join us for the Cisco Live conference from June 4 - June 8 in Las Vegas. Register now and be sure to stop by the ThousandEyes booth: https://www.thousandeyes.com/events/2023/cisco-live?utm_source=youtube.com&utm_medium=referral&utm_campaign=InternetReportPulseEp11 And don’t miss Mike’s talk on optimizing IT operations with ThousandEyes and OpenTelemetry: https://www.ciscolive.com/global/learn/technical-education/session-catalog.html?search=BRKAPP-2731#/

Ep 62Redundancy in the Cloud Era: Two Case Studies | Pulse Update
When it comes to your technology strategy, it's a good idea to have more than one way to access every resource—just in case. As IT environments have changed, so has the thinking around the right approaches to achieve this desired redundancy.Two recent incidents at Google Cloud and Microsoft 365 reinforce the importance of redundancy—and the need for evolving strategies to meet this goal.To learn more, check out these links: Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-redundancy-in-cloud-era?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp10Cloud Performance Report: https://www.thousandeyes.com/resources/cloud-performance-report-2022?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp10———CHAPTERS00:00 Intro00:35 The Download03:13 Outage Trends: By the Numbers06:58 Google Cloud Outage18:12 Microsoft 365 Outage25:24 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. Or follow us on Twitter: @thousandeyes And if you want to connect with Mike and Kemal in person, join us for the Cisco Live conference from June 4 - June 8 in Las Vegas. Register now and be sure to stop by the ThousandEyes booth: https://www.thousandeyes.com/events/2023/cisco-live?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp10And don’t miss Kemal’s breakout session on rethinking network monitoring: https://www.ciscolive.com/global/learn/technical-education/session-catalog.html?search=BRKAPP-2013#/

Ep 61The Anatomy of an Outage | Pulse Update
Understanding the unique characteristics of different kinds of Internet outages can help you quickly recognize the type of incident you’re dealing with and take the right steps to mitigate its impact. This week’s episode discusses the anatomy of common outage categories and explores recent case studies:- Security-related incidents: Western Digital and SD Worx outages- A single-point-of-aggregation issue: SpaceX’s Starlink outage- Last-mile challenges: Vodafone UK outageTo learn more, check out the links below:- Internet Report: Pulse Update blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-anatomy-of-internet-outage?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp9- Internet Outages Survival Cheat Sheet: https://www.thousandeyes.com/resources/internet-outages-survival-infographic?utm_source=transistor&utm_medium=referral&utm_campaign=InternetReportPulseEp9———CHAPTERS00:00 Intro00:29 The Download03:45 Outage Trends: By the Numbers06:15 Western Digital Outage06:38 SD Worx Outage09:44 SpaceX’s Starlink Outage13:16 Vodafone UK Outage16:48 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on Twitter: @thousandeyes

Ep 60Chatting About the ChatGPT Outage and Other Outage News | Pulse Update
This week’s Pulse Update unpacks OpenAI’s ChatGPT outage and discusses why the outage actually represented a pragmatic move on the part of OpenAI. We’ll also discuss global outage trends; explore other recent incidents at Dish Network, Microsoft, and Virgin Media UK; and look at why responses to performance problems vary, based on application characteristics and usage patterns.To learn more, check out the links below: - Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-chatgpt-outage?utm_source=youtube.com&utm_medium=referral&utm_campaign=InternetReportPulseEp8- Explore the Virgin Media outages in the ThousandEyes platform (NO LOGIN REQUIRED): https://aoqqallcjrsdrxwjpdiizyxvnimmjgde.share.thousandeyes.com/- Learn more about the Virgin Media outages in this deep dive podcast and blog:Podcast: https://www.youtube.com/watch?v=6FK2MIiwKkQBlog: https://www.thousandeyes.com/blog/virgin-media-uk-outage-analysis-april-4-2023?utm_source=youtube.com&utm_medium=referral&utm_campaign=InternetReportPulseEp8———CHAPTERS00:00 Intro00:38 The Download4:10 Outage Trends: By the Numbers8:37 OpenAI’s ChatGPT Outage12:56 Dish Network Outage16:14 Microsoft 365 Outage20:39 Virgin Media UK Outage27:37 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on Twitter: @thousandeyes

Ep 59Understanding the UK Virgin Media Outages on April 4 | Outage Deep Dive
On April 4, 2023, Virgin Media UK (AS 5089) experienced two outages that impacted the reachability of its network and services to the global Internet. The two outages shared similar characteristics, including the withdrawal of routes to its network, traffic loss, and intermittent periods of service recovery. In this episode, we discuss how the outages unfolded and what IT teams can learn from this to help navigate similar incidents in the future. To learn more, check out the links below: - Blog: Virgin Media UK Outage Analysis: https://www.thousandeyes.com/blog/virgin-media-uk-outage-analysis-april-4-2023- Explore the Virgin Media UK outage in the ThousandEyes platform (NO LOGIN REQUIRED): https://aoqqallcjrsdrxwjpdiizyxvnimmjgde.share.thousandeyes.com/———CHAPTERS00:00 Intro1:08 Overview: Virgin Media UK Outage3:29 Under the Hood: Virgin Media UK Outage———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on Twitter: @thousandeyes

Ep 58Exploring Application Errors at Okta, Twitch, Reddit & GitHub | Pulse Update
HTTP 403, 503, and 504 status codes dominated the last few weeks as multiple companies experienced application degradations and outages. These incidents at companies like Okta, Twitch, Reddit, and GitHub leave important lessons on navigating similar issues and minimizing downtime for your own users.To learn more, check out the links below: - Internet Report: Pulse Update Blog: https://www.thousandeyes.com/blog/internet-report-pulse-update-application-errors- Explore the Okta and Reddit outages in the ThousandEyes platform (NO LOGIN REQUIRED): Okta: https://awoleuudwuvnwklukifbrpghghynjjwy.share.thousandeyes.comReddit: https://arblcshhhdpvslhwkuxtvukvmvnlobur.share.thousandeyes.com/view/tests/?roundId=1678820700&metric=availability&scenarioId=httpServer&testId=3561216———CHAPTERS00:00 Intro00:33 The Download2:47 Outage Trends: By the Numbers6:14 Okta Disruption16:47 Twitch Outage18:46 Reddit Outage21:24 GitHub Outage24:03 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on Twitter: @thousandeyes

Ep 57Twitter Performance in the Elon Era, a Ransomware Attack & More Outage News | Pulse Update
It was an eventful fortnight on the Internet as Twitter, Dish Network, Akamai, and Ticketek Australia all experienced outages. Tune into our latest episode for insights from our analysis of these events and practical tips for IT teams.To learn more, check out the links below: - Internet Report: Pulse Update Blog: ttps://www.thousandeyes.com/blog/internet-report-pulse-update-twitter-outages-and-more- Explore the Twitter and Dish Network outages in the ThousandEyes platform (NO LOGIN REQUIRED): Twitter: https://aonfgcjryeodugjpksvxpdhuxodyjaxf.share.thousandeyes.comDish Network: https://amwajhhgwjnienexcktrsjhsmoisvktt.share.thousandeyes.com- Also explore the Akamai Edge Delivery DNS resolution errors in the ThousandEyes platform (NO LOGIN REQUIRED):https://axkcqcnurpqvzfkrhxbnwonzzfbivhdl.share.thousandeyes.com———CHAPTERS00:00 Intro00:38 The Download2:27 Outage Trends: By the Numbers5:39 Twitter Outages14:45 Dish Network Outage After Ransomware Attack18:23 Akamai Edge Delivery DNS Resolution Errors22:27 Ticketek Australia Outage24:54 Get in Touch———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on Twitter: @thousandeyes

Ep 56A Tale of Two Data Center Outages | Pulse Update
In the space of a week, we saw two data center-related incidents lead to long Microsoft and Oracle outages. Join us as we analyze these outages and ways IT teams can minimize downtime in similar situations. We’ll also discuss a series of application issues that impacted companies including Twitter and Tesla.To learn more, check out the links below: Internet Report: Pulse Update BlogExplore the Atlassian outage in the ThousandEyes platform (NO LOGIN REQUIRED)Chapters00:00 Intro00:34 The Download2:42 Outage Trends: By the Numbers4:54 Data Center Incidents5:58 Microsoft Outage8:40 Oracle’s NetSuite Outage10:41 Twitter Outage13:33 Atlassian Outage17:53 Tesla App Outage19:14 Fitbit Outage21:30 Get in TouchWant to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on Twitter: @thousandeyes

Ep 55A Trio of Similar Incidents: Microsoft, Cloudflare, & Slack Outages | Pulse Update
We discuss insights from a recent trio of similar incidents at Microsoft, Cloudflare, and Slack, along with other outage news, including a Comcast outage that impacted some Philadelphia neighborhoods on Super Bowl Sunday. 00:00 Intro00:58 Outage Trends: By the Numbers4:33 Microsoft Outage (Jan. 25)4:58 Cloudflare Outage (Jan. 24)9:27 Slack Outage (Jan. 25)13:16 Microsoft Outlook Outage (Feb. 7)18:06 Square Outage (Feb. 7)20:39 Comcast Outage (Feb. 12)23:23 Get in TouchTo learn more, check out the links below:Internet Report: Pulse Update BlogExplore the January 25 and February 7 Microsoft outages in the ThousandEyes platform (NO LOGIN REQUIRED) Microsoft Outlook Outage Analysis Podcast (Feb. 7)Microsoft Outage Analysis Podcast (Jan. 25)Cloudflare Post-Incident ReportWant to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on Twitter: @thousandeyes

Ep 54The Microsoft Outlook Outage, Explained | Outage Deep Dive
Live from #CiscoLiveEMEA, we discuss the Feb. 7 Microsoft Outlook outage to understand how the event unfolded, why it may have played out the way it did, and what you can learn from this outage event.To dive deeper, check out the links below:Explore the outage in the ThousandEyes platform (NO LOGIN REQUIRED)Microsoft Outlook Outage Analysis Blog (Feb. 7)Microsoft Outage Analysis Blog (Jan. 25)Want to get in touch?If you have questions, feedback, or guests you'd like to see featured on the show, send us a note at [email protected] follow us on Twitter: @thousandeyes.Get your free T-shirt:Every new subscriber gets a ThousandEyes T-shirt. Just send your address and T-shirt size to us at [email protected].

Ep 53Lessons From the FAA, Fastly, & Microsoft Outages | Pulse Update
In this episode, we cover the latest internet trends and unpack important takeaways from the recent FAA, Fastly, and Microsoft outages. We also discuss how several early 2023 outages and disruptions reinforced the need for application monitoring and testing to counter, or at least anticipate the effect of, anomalous conditions on certain routes.00:00 Intro1:32 Outage Trends: Week of Jan. 307:07 FAA Outage (Jan. 11)11:04 Fastly Outage (Jan. 19)15:31 Microsoft 365 Outage (Jan. 17)19:52 Microsoft Outage (Jan. 25)28:40 Get in TouchTo learn more, check out the links below:Follow the Fastly and Microsoft outages in the ThousandEyes platform (NO LOGIN REQUIRED)Internet Report: Pulse Update Blog (Week of Jan. 30, 2023)Microsoft Outage Analysis (Jan. 25)Podcast: Understanding the Microsoft Outage: Why Were Azure, Microsoft Teams, & Outlook Down?Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on Twitter: @thousandeyes.Get your free T-shirt:Every new subscriber gets a ThousandEyes T-shirt. Just send your address and T-shirt size to us at [email protected].

Ep 52Understanding the Microsoft Outage: Why Were Azure, Microsoft Teams, & Outlook Down? | Outage Deep Dive
At around 7:05 a.m. UTC on January 25, 2023, Microsoft started experiencing service related issues. At the same time, ThousandEyes observed BGP withdrawals and a significant number of route changes that resulted in a high amount of packet loss, ultimately affecting various services like Outlook, Teams, SharePoint, and others. 00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. Join our co-hosts Angelique Medina, Head of Internet Intelligence, and Kemal Santja, Principal Internet Architect, both from ThousandEyes, as they discuss the January 25, 2023 Microsoft outage. 00:43 Incident overview as seen in the ThousandEyes platform. Follow along in the ThousandEyes platform *no login required*: https://acimfsmgobnxgdkltxicdesijrowimst.share.thousandeyes.com10:18 Why would rapid announcement changes cause packet loss at the scale that was observed during this event?12:35 What should someone make of the length / duration of the event? 14:28 Why would the change of an IP address on a router cause such a major connectivity disruption? 23:37 What are some of the lessons you can learn from this event? Questions? Feedback? Send us an email at [email protected]

Ep 51Notes on the Spotify Outage | Pulse Update
This episode covers the latest global network outage numbers and interesting end-of-year trends; how resilient application architectures, clouds, and networks are challenging old ways of thinking; and a deep dive into an outage that disrupted Spotify’s music streaming on December 14, 2022.To learn more, check out the links below: Internet Report Pulse Update BlogExplore the Spotify outage in the ThousandEyes platform (NO LOGIN REQUIRED) Part 1Part 2Part 3Chapters00:00 Intro1:12 Outage Trends: By the Numbers10:26 Spotify Outage19:11 Get in TouchWant to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on Twitter: @thousandeyes

Ep 50Twitter in the Elon Era + Microsoft & AWS Outages | Pulse Update
This is the Internet Report: Pulse Update, where we review and provide analysis of significant outages and trends across the Internet, from the previous two weeks. Every other week, we'll publish a new episode covering the latest tally of outage events, and highlighting a few interesting outages. This week, in addition to our usual look at global and U.S. outage trends, we’ll take a brief look at how Twitter is holding up since it's sale to Elon Musk, plus, a couple of interesting outages at Microsoft and AWS.To learn more, read the blog.Chapters00:00 Intro1:45 Outage Trends: By the Numbers5:13 Twitter11:27 Microsoft Outage15:22 AWS Outage21:44 Get in TouchWant to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at [email protected]. And follow us on Twitter: @thousandeyes

Ep 49Unpacking the Dec. 12 Quad9 BGP Route Leak | Outage Deep Dive
Starting at ~12:12 UTC on Dec 12, 2022, an ISP in the Democratic Republic of Congo leaked a route belonging to the Quad9 DNS service, causing some traffic, including Verizon US customer traffic, to get routed to Africa for ~90 minutes. High traffic loss was observed throughout the incident which was resolved at ~13:40 UTC. 00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. Join our co-hosts Mike Hicks, Principal Solutions Analyst, and Kemal Sanjta, Principal Internet Architect, both from ThousandEyes, as they discuss the December 12th Quad9 BGP route leak. 00:56 Under the Hood: Reviewing the Quad9 BGP route leak as seen in the ThousandEyes platform. Explore the incident yourself in the ThousandEyes platform at: https://aioiqfxeunngihtwnkphnuzazgloaiju.share.thousandeyes.com/ Questions? Feedback? Send us an email at [email protected]

Ep 48An Eventful End to October for WhatsApp, Zscaler, Salesforce, and Facebook | Outage Deep Dive
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. In this episode, we unpack four notable outages that impacted WhatsApp, Zscaler, Salesforce, and Facebook, which all appear to have a common theme. Join our co-hosts Mike Hicks, Principal Solutions Analyst at ThousandEyes, and Chris Villemez, Technical Marketing Engineer at ThousandEyes, as they walk through each incident to understand what happened and discuss how network professionals can attempt to mitigate these types of scenarios in the future. FURTHER READING Facebook Outage Analysis → https://www.thousandeyes.com/blog/facebook-outage-analysis

Ep 47Unpacking the March 28th Twitter Outage | Outage Deep Dive
We're back! 00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. On this episode, our newest host, Chris Villemez, is joined by Kemal Sanjta to discuss a BGP-related incident that took down Twitter for many users around the globe on March 28th. 00:36 Under the Hood: Chris Villemez and Kemal Sanjta leverage their extensive operations experience managing the networks of large-scale SaaS, IoT, and cloud providers to analyze this incident using the ThousandEyes platform. They examine the scope of the outage, review the specific BGP changes that resulted in the outage, and discuss what enterprises can do when they’re experiencing a similar BGP hijack or route leak. Sharelinks: Single agent (Manchester) test: https://anislusvvn.share.thousandeyes.com/ Multi-agent global test showing BGP changes: https://axntbxntyk.share.thousandeyes.com/ 31:00 Outro: We've been on a bit of a break for the past several months as things were relatively quiet on the Internet front and for the foreseeable future we'll be a bit reactive in our episodes, when something major happens trust we'll be here. Questions? Feedback? Have an idea for a guest? Send us an email at [email protected]

Ep 46Unpacking the December AWS Outages (December 7, 10, & 15, 2021) | Outage Deep Dive
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. On today’s episode, our newest host and Technical Marketing Engineer, Chris Villemez, is joined by Kemal Sanjta, Principal Engineer, to dive into the details of the recent AWS outages from December 7th, 10th and 15th. They’ll walk through what ThousandEyes saw from its fleet of vantage points, as well as share some insight into what enterprises can learn from these incidents to build resilient cloud architectures.

Ep 45The Facebook Outage, Explained (10/4/21) | Outage Deep Dive
00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. 00:15 Headlines: Today we’re going to do a thorough analysis of the major Facebook outage that took place yesterday, Monday, October 4. I’m joined by Gustavo Ramos, ThousandEyes’ in-house expert on Network Engineering. ThousandEyes Blog: https://www.thousandeyes.com/blog/facebook-outage-analysis Analysis from Facebook: https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/ 1:17 Under the Hood: We'll walk through the sequence of events that led to this outage, understand what went wrong (and what actions may have made the situation worse), and what lessons we can all learn from this outage. 25:40 Outro: We've been on a bit of a break for the past several months as things were relatively quiet on the Internet front and for the foreseeable future we'll be a bit reactive in our episodes, when something major happens trust we'll be here. Questions? Feedback? Have an idea for a guest? Send us an email at [email protected]

Ep 44When BGP Routes Accidentally Get Hijacked: A Lesson In Internet Vulnerability | Outage Deep Dive
00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. 00:08 Headlines: Today, Mike Hicks (Principal Solutions Analyst, ThousandEyes) and I discuss a recent BGP routing incident that had intermittent impacts on Amazon’s services, including Amazon.com and AWS compute resources, during a five-hour period on July 12. 01:04 Under the Hood: When we look into BGP routing at the time, we can see multiple BGP path changes due to a service provider erroneously inserting themselves into the path for a large number of Amazon routes. Watch this episode to see how the BGP incident led to significant packet loss, resulting in service disruption for some Amazon and AWS users. We also discuss why enterprises need to have continuous oversight of the paths their traffic takes over the Internet. 17:58 Outro: Questions? Feedback? Have an idea for a guest? Send us an email at [email protected]

Ep 43The Akamai DNS Outage and the Case for CDN Redundancy (July 1-23, 2021) | Outage Deep Dive
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. I’m joined today by Mike Hicks, principal solutions analyst here at ThousandEyes, to cover the outage of Akamai’s DNS service. The outage, which occurred on July 22nd around 3:38 PM UTC (8:38AM PT), struck during the course of business hours in Europe and North America, resulting in widespread impacts to applications and services hosted within Akamai servers. The outage itself was short-lived and was resolved roughly one hour after the outage began. In this episode, we examine the customer impact, the relationship between DNS and CDNs, and what enterprises should take away from the incident. We also discuss the question of when it might make sense to invest in DNS or CDN redundancy—and when it is, frankly, overkill. Watch this week’s episode to hear our take, and as always let us know on Twitter what you think.

Ep 42BGP Routing Incident Shows Why the Shortest Path Isn’t Always the Chosen Path | Outage Deep Dive
00:00 Welcome:This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. 00:13 Headlines: Today, Kemal and I unpack an interesting BGP incident, in which a large-scale route leak briefly altered traffic patterns across the Internet. 00:58 Under the Hood: The incident began on Thursday, June 3rd at around 10:24 UTC, and resulted in a significant spike in packet loss that was noticeable in ThousandEyes tests. While this packet loss resolved within the hour (at around 10:48 UTC), we observed some interesting routing changes during this window—as traffic was diverted to a Russian telecom provider that had not previously been in the path. Watch this episode as we explore how this network provider managed to get itself into the routing paths of many major services, and why network visibility is so important to recognize these types of incidents in which your site may still be reachable but your traffic is being sent through an unexpected network. 20:45 Outro: Questions? Feedback? Have an idea for a guest? Send us an email at [email protected]

Ep 41Akamai Prolexic Outage Analysis + Takeaways (Week of June 9-17, 2021) | Outage Deep Dive
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. I’m joined by ThousandEyes’ BGP expert, Kemal Sanjta, to review the June 16th outage of Prolexic Routed, a DDoS Mitigation Service operated by Akamai. According to a statement from Akamai, the outage was not due to a DDoS attack or system update, but instead a routing table limitation that was inadvertently exceeded. In this episode, Kemal and I analyzed what happened and how customers of Akamai Prolexic who had automated failover mechanisms in place were able to recover more quickly than those that had to manually switch over to other providers. Watch this episode to learn more about this outage, and how different operational processes resulted in very different service outcomes.

Ep 40Fastly’s Outage and Why CDN Redundancy Matters (Week of June 3-8) | Outage Deep Dive
00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. 00:12 Headlines: Today, I’m joined by Hans Ashlock, Director of Technology & Innovation at ThousandEyes, to unpack today’s major outage at Fastly, a popular CDN provider. 3:46 Under the Hood: Today, I’m joined by Hans Ashlock, Director of Technology & Innovation at ThousandEyes, to unpack today’s major outage at Fastly, a popular CDN provider. The widespread outage occurred around 9:50 UTC, about 5:50 am ET, and mostly impacted users across Europe and Asia due to the timing. he outage lasted approximately one hour until 10:50 UTC, yet residual impacts were felt beyond that. Today’s outage is a good example of the importance of having outside-in visibility not just across your app, but also to your app’s edge and all its dependent services. 39:05 Outro: Questions? Feedback? Have an idea for a guest? Send us an email at [email protected]

Ep 39Bitcoin Dive Sparks Outage at a Popular Crypto Exchange (Weeks of May 17-June 2) | Outage Deep Dive
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. I’m joined today by Mike Hicks, Principle Solution Analyst at ThousandEyes, to cover two recent application-related outages. The first occurred on May 19th around 12:50 UTC at Coinbase—a well-known cryptocurrency exchange. Around the time that news broke saying that the Chinese government would be imposing strict regulation on cryptocurrencies, users attempting to execute transactions were unable to access the application. From the ThousandEyes platform we were able to see a drop in availability around this time as well as increased load times (which in some cases resulted in timeout errors). The second outage happened on May 20th around 17:35 UTC at Slack—an enterprise collaboration platform. While the outage was resolved within 90 minutes, it occurred during normal US business hours, making it particularly disruptive to users attempting to reach the application. These instances remind us that applications, much like the underlying networks they run on, can experience outages, and effective troubleshooting requires end-to-end visibility into both.

Ep 38DNS and BGP and DDoS Attacks—Oh, My! (May 11-17, 2021) | Outage Deep Dive
00:00 Welcome 00:14 Headlines: DNS and BGP and DDoS Attacks—Oh, My! This week we cover a couple of recent service degradation incidents involving DNS providers 2:19 Under the Hood: Kemal Sanjta, ThousandEyes’ resident BGP expert, joins us to discuss the May 6th disruption to Neustar’s UltraDNS service, which lasted nearly four hours. We discuss the BGP routing changes we observed during the incident and what they can tell us about the cause of the disruption. We also cover a separate incident involving Quad 9, a public recursive resolver service, which the company says was caused by a DDoS attack on May 3rd. 16:19 Expert Spotlight: Michael Batchelder (A.K.A., Binky), is here to discuss the two “Ds” of the Internet — DDoS attacks and the DNS Questions for Binky? Contact him at [email protected] 31:49 Outro: Questions? Feedback? Have an idea for a guest? Send us an email at [email protected]

Ep 37Even Magic Can't Stop Internet Outages (April 28-May 3, 2021) | Outage Deep Dive
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. Today, we focused on an interesting outage that impacted Cloudflare Magic Transit, a relatively new offering from the CDN provider which aims to efficiently route and protect the network traffic of its customers. On May 3rd at approximately 3:00 PM PDT (10:00 PM UTC), ThousandEyes vantage points connecting to sites using Magic Transit began to detect significant packet loss at Cloudflare’s network edge—with the loss continuing at varying levels, for approximately 2 hours. While the outage impacted some Magic Transit customers more significantly than others, we also observed mitigation actions by at least one customer to avoid the outage and restore the availability of their service to their users. This outage reminds us that no provider is immune to outages, even cloud and global CDN providers. However, with proactive visibility, you can respond quickly to reduce outage impact on your users. Watch this week’s episode to hear more about the outage from the ThousandEyes perspective.

Ep 36Microsoft Teams Outage Highlights: Need to See Beyond App Front Door (Week of April 20-27, 2021) | Outage Deep Dive
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. We’re joined this week by Hans Ashlock, Director of Technology & Innovation at ThousandEyes, to discuss Tuesday’s Microsoft Teams outage. On Tuesday, April 27th, ThousandEyes tests began to detect an outage affecting the Teams service starting around 3 AM (PT) and lasting approximately 1.5 hours. While the outage occurred in the overnight hours for much of the Americas, the global nature of the outage resulted in service disruption for users connecting from Asia and Europe. Transaction views within the ThousandEyes platform show that Microsoft’s authentication service appeared to be available, however, the Teams application was unable to initialize, resulting in error responses. Watch this week’s episode to hear more about what ThousandEyes revealed about the nature of this outage—and what we can all learn from the incident.