AWS Morning Brief

718 episodes — Page 14 of 15

Ep 68Amazon Detective and the Case of the Giant AWS Bill

AWS Morning Brief for the week of April 6, 2020.

Apr 6, 202011 min

Ep 65Whiteboard Confessional: My Metaphor-Spewing Poet Boss & Why I Don’t Like Amazon ElastiCache for Redis

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.LinksCHAOSSEARCHRedisAmazon ElastiCacheTwitter: @QuinnyPigTranscriptCorey Quinn: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semi-polite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real-world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.When you walk through an airport—assuming that people still go to airports in the state of pandemic in which we live—you’ll see billboards saying, “I love my slow database, says no one ever.” This is an ad for Redis. And the unspoken implication is that everyone loves Redis. I do not. In honor of the recent release of Global DataStore for Amazon ElastiCache for Redis. Today I’d like to talk about that time ElastiCache for Redis helped cause an outage that led to drama. This was a few years back and I worked at a B2B company—B2B of course, meaning business-to-business. We were not dealing direct-to-consumer—I was a different person then, and it was a different time, specifically, the time was late one Sunday evening, and my phone rang. This was atypical because most people didn’t have that phone number. At this stage of my life, my default answer when my phone rang was, “Sorry, you have the wrong number.” If I wanted phone calls, I’d have taken out a personals ad. Even worse when I answered the call, it was work. Because I ran the ops team, I was pretty judicious in turning off alerts for anything that wasn’t actively harming folks. If it wasn’t immediately actionable and causing trouble, then there was almost certainly an opportunity to be able to fix it later during business hours. So, the list of things that could wake me up was pretty small. As a result, this was the first time that I had been called out of hours during my tenure at this company, despite having spent over six months there at this point, so who could possibly be on the phone but my spineless coward of a boss? A man who spoke only in metaphor, we certainly weren’t social friends because who can be friends with a person like that?“What can I do for you?” “As the roses turn their faces to the sun, so my attention turned to a call from our CEO. There’s an incident.” My response was along the lines of, “I’m not sure what’s wrong with you, but I’m sure it’s got a long name, it is incredibly expensive to fix.” Then I hung up on him and dialed into the conference bridge. It seemed that a customer had attempted to log into our website recently and had gotten an error page, and this was causing some consternation. Now, if you’re used to a B2C or business-to-consumer environment, that sounds a bit nutty because you’ll potentially have millions of customers. If one person hits an error page, that’s not CEO level of engagement. One person getting that error is, sure it’s still not great, but it’s not the end of the world. I mean, Netflix doesn’t have an all hands on deck disaster meeting when one person has to restart a stream. In our case, though, we didn’t have millions of customers, we had about five and they were all very large businesses. So, when they said jump, we were already mid-air. I’m going to skip past the rest of that phone call in the evening because it’s much more instructive to talk about this with the clarity lent by the sober light of day the following morning. And the post mortem meeting that resulted from it. So, let’s tal

Apr 3, 202012 min

Ep 66The "AWS For God's Sake Leave Me Alone" Service

AWS Morning Brief for the week of March 30, 2020.

Mar 30, 202011 min

Ep 63Whiteboard Confessional: Console Recorder: The Thing AWS Should Have Built

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.LinksCHAOSSEARCHConsole Recorder on Chrome Web StoreConsole Recorder on Firefox Add-OnsScreaming in the CloudIan Mckay’s GitHub for AWSConsoleRecorderTwitter: @QuinnyPigTranscriptCorey Quinn: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semi-polite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real-world forces us to build, and that the best to call your staging environment is “theory” because invariably, whatever you’ve built works in the theory, but not in production. Let’s get to it.On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.You’ll notice that I periodically refer to various friends of the show as code terrorists. It’s never been explained, until now, exactly what that means and why I use that term. So, today, I thought I’d go ahead and help shine a little bit of light on that. One of the folks that I refer to most frequently is Ian Mckay, a fine gentleman based out of Australia. And he’s built something that is both amazing and terrible all at the same time, called Console Recorder. But let me back up before I dive into this monstrosity. Let’s talk about how we get things that we build in AWS into production. There are basically four tiers. Tier one is using the AWS web console itself, we click around and we build things. Great. Tier two is we use CloudFormation like sensible folks. Tier three is Terraform with all of its various attendant tooling, and then there’s the ultra tier four that I do, which is we use the AWS console and then we lie about it. Some folks are gonna play around here and say that oh, you should use the CDK, or something like that, or custom scripts that wind up spinning up production. And all of those are well and good, but only recently did CloudFormation release the ability to import existing resources. And even then, much like Terraform import, it’s pretty gnarly and not at all terrific. So, what do you wind up generally doing? Well, if you’re like me, you’ll stand up production resources inside of an AWS account. You will click around in the console—I always start with the console, not because I don’t know how these other tools work, that’s a side point, but rather because that helps me get a sense for how these services are imagined by the teams building them. They tend to assume that everyone who interacts with them is going to go through the console at some point, or at least they should. So, it gives me access and exposure to what their vision of this service is. Then once you’ve built something up, it often finds its way into production, if you at all like me, where I’ll spin something up just to test something and it works, and oh my stars, and suddenly you just want to get it out and not worry about it, so you don’t go back and rebuild it properly. So, now you’re left with this hand-built thing that’s just flapping around out there in production. What are you supposed to do? Well, according to the AWS experts, if we’re allowed to use that term to describe them, you’re supposed to smack yourself on the forehead, be convinced that you’re fundamentally missing the boat here, throw everything you’ve just built away and go back and do it properly. Which admittedly seems a little bit on the nose for those of us who’ve done exactly this far more times over the course of our career than we would care to count. So, today, however, I posit that there’s an alternate approach that doesn’t require support from AWS, which, to be honest, long ago seems to have given up on solving this partic

Mar 27, 202012 min

Ep 64Watch Your Bill or They'll CloudWatch It For You

AWS Morning Brief for the week of March 23, 2020.

Mar 23, 202010 min

Ep 62Whiteboard Confessional: Configuration MisManagement

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.Show NotesCHAOSSEARCH.ioTwitter: https://twitter.com/QuinnyPigTranscriptCorey Quinn: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semi-polite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real-world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.Historically, many best practices were, in fact, best practices. But over time, the way that we engage with systems changes. The problems that we’re trying to solve for start resembling other problems. And, at some point entire industries shift. So, what you should have been doing five years ago is not necessarily what you should be doing today. Today, I’d like to talk a little bit about not one or two edge case problems, as I have in previous editions of the Whiteboard Confessional, but rather, I want to talk about an overall pattern that’s shifted. And that shift has been surprisingly sudden, yet gradual enough that you may not entirely have noticed. This goes back into, let’s say 2012, 2013, and is in some ways the story of how I learned to speak publicly. So this is indirectly one of the origin stories of me as a podcaster, and continuing to engage my ongoing love affair with the sound of my own voice. I was one of the very early developers behind SaltStack. Salt, for those who are unfamiliar, is a remote execution framework slash configuration management system that let me participate in code development. It turns out that when you have a pattern of merging every random pull request that some jackass winds up submitting, and then immediately submitting a follow up pull request that fixes everything you just merged in, it’s, first, not the most scalable thing in the world, but on balance provides such a wonderful welcoming community, that people become addicted to participating in it. And SaltStack nailed this in the early days. Now, before this, I’d been focusing on configuration management in a variety of different ways. Some of the very early answers for this were CFEngine, which was written by an academic and is exactly what you would expect an academic to write. It feels more theoretical than it does something that you would want to use in production. But okay, Bcfg2 was something else in this space, and the fact that that is its name tells you everything you need to know about how user-friendly that was. And then the world shifted. We saw Puppet and Chef both arise. You can argue which came first, I don’t care enough in 2020 to have that conversation. But they wound up promoting a model of a domain-specific language, in Puppet’s case, versus chef where they decided, “All right, great, we’re gonna build this stuff out in Ruby.” From there, we then saw a further evolution of Ansible and SaltStack, which really round out the top four. Now, all of these systems fundamentally do the same thing, which is how do we wind up making the current state of a given system look like it should? That means, how do you make sure that certain packages are installed across all of your fleet? How do you make sure that the right users exist across your entire fleet? How do you guarantee that there are files in place, that have the right contents? And when the contents of those files change, how do you restart services? Effectively, how do you run arbitrary commands and converge the state of a remote system

Mar 20, 202013 min

Ep 61The Saddest Kubernetes Hanukkah

AWS Morning Brief for the week of March 16, 2020.

Mar 16, 20209 min

Ep 60Whiteboard Confessional: Everything's a Database Except SQLite

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.LinksCHAOSSEARCH.ioSQLiteTranscriptCorey Quinn: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semipolite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.Many things make fine databases that replicate data from one place to another, that takes various bits of data and puts them where they need to go. Other things do not make fine databases that do such things. Let’s talk about one of those today. For those who have never had the dubious pleasure of working with it, SQLite is a C library that implements a relational database engine. And it’s pretty awesome. It’s very clearly not designed to work in a client-server fashion, but rather to be embedded into existing programs for local use. In practice, that means that if you’re running SQLite, that’s S-Q-L-I-T-E, your database backend is going to be a flat-file or something very much like that, that lives locally. This is technology used all over the place, and mobile apps and embedded systems, in web apps for some very specific things. But that’s not quite the point. I once worked somewhere that decided to build a replicated environment that was active, active, active, across three distinct data centers. You would really hope that that statement was a non sequitur. It’s not. If you were to picture Hacker News coming to life as a person, and that person decided to design a replication model for a database from first principles, you would be pretty close to what I have seen. By taking a replicated model that runs on top of SQLite, you can get this to work, but the only way to handle that—because there’s no concept of client-server, as mentioned—so you have to kick all of the replication and state logic from the database layer, where it belongs up, into the application code itself, where it most assuredly does not belong. The downside of this—well, there are many downsides, but let’s start with a big one that this is not even slightly what SQLite was designed to do at all. However, take a startup that decides if there’s one core competency they have, it’s knowing better than everyone else; this is that story. Now, I am obviously not a developer, and I’m certainly not a database administrator. I was an ops person, which means that a lot of the joy of various development decisions fell to whatever group I happened to be in at that point in time. It turns out that when you run replicated SQLite as a database, that you have to get around an awful lot of architectural pain points by babying this thing something fierce. There are a number of operational problems that going down a path like this will expose. Let me explain what some of them look like, after this.In the late 19th and early 20th centuries, democracy flourished around the world. This was good for most folks, but terrible for the log analytics industry because there was now a severe shortage of princesses to kidnap for ransom to pay for their ridiculous implementations. It doesn’t have to be that way. Consider CHAOSSEARCH. The data lives in your S3 buckets in your AWS accounts, and we know what that costs. You don’t have to deal with running massive piles of infrastructure to be able to query that log data with APIs you’ve come to know and tolerate, and they’re just good people to wor

Mar 13, 202011 min

Ep 59Nothing’s Certain but Death and Distinguished Engineers

AWS Morning Brief for the week of March 9, 2020.

Mar 9, 202011 min

Ep 58Whiteboard Confessional: Scaling Databases in a Single Bound

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.LinksCHAOSSEARCH.ioAmazon Route 53TranscriptCorey: Corey: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semipolite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.But first… On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.So I’m going to deviate slightly from the format that I’ve established so far on these Friday morning whiteboard confessional stories, and talk instead about a pattern that has tripped me and others up more times than I care to remember. So it’s my naive hope that by venting about this for the next 10 minutes or so, I will eventually be able to encounter an environment where someone hasn’t made this particular mistake. And what mistake am I talking about? Well, as with so many terrifying architectural patterns, it goes back to databases. You decide that you’re going to write a small toy application, You’re probably not going to turn this into anything massive. And in all honesty, baby seals will probably get more hits than whatever application you’re about to build will. So you don’t really think too hard about what your database structure is going to look like. You spin up a database, you define the database endpoint inside the application, and you go about your merry way. Now, that’s great. Everything’s relatively happy, and everything we just described will work. But let’s say that you hit that edge or corner case where this app doesn’t fade away into obscurity. In fact, this turns out to have some legs, the thing that you’re building now has attained business viability or is at least seeing enough user traffic that it now has to worry about load.So you start taking a look at this application because you get the worst possible bug reports six to eight months later; it’s slow. Where do you start looking when something is slow? Well, personally, I start looking at the bar, because that is a terribly obnoxious problem to have to troubleshoot. There are so many different ways that latency can get injected into an application. You discover the person reporting the slowness is on the other side of the world with satellite internet connection that they’re apparently trying to set up to the satellite with a tin can and a piece of very long string. There’s a lot of failure states here that you get to start hunting down. The joys of latency hunting. But in many cases, the answer is going to come down to, oh, that database that you defined is now no longer up to the task. You’re starting to bottleneck on that database. Now, you can generally buy your way out of this problem by scaling up whatever database you’re using. Terrific, great, it turns out that you can just add more hardware, which in a time of cloud, of course, just means more money and a bit of downtime while you scale the thing up, but that gets you a little bit further down the road. Until the cycle begins to rinse and repeat, and it turns out, there are only instances that are so large that you’ll be able to get to power databases. Also, they’re not exactly inexpensive. Now, I would name exact sizes of what those databases might look like. But this is AWS, they’re probably going to release at least five different instance families and sizes, by the time I finish recording this. But it gets published later at the end of the week. So instead, there is an alt

Mar 6, 202011 min

Ep 57Amazon Transcribe Gets <REDACTED>

AWS Morning Brief for the week of March 2, 2020.

Mar 2, 202010 min

Ep 56Whiteboard Confessional: How Cluster SSH Almost Got Me Fired

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.LinksCHAOSSEARCH.ioCluster SSH GitHub repositoryAWS Systems Manager Session ManagerEC2 Instance ConnectTranscriptCorey: On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.So, once upon a time, way back in the mists of antiquity, was a year called 2006. This is before many folks listening to this podcast were involved in technology. And I admit as well that it is also several decades after other folks listening to this podcast got involved in technology. But that’s not the point of this story. It was my first real job working in anything resembling a production-style environment. I’d dabbled before this, running various environments on Windows desktop style support. I’d played around with small business servers for running Windows-style environments. And then I decided there wasn’t much of a future in technology and spent some time as a technical recruiter, spent a little bit more time working in a sales role, which I was disturbingly good at, but I was selling tape drives to people. But that’s not the interesting part of the story. What is, is that I somehow managed to luck my way into a job interview for a university, helping to run their Linux and Unix systems. Cool. Turns out that interviewing is a skill like any other. The technical reviewer was out sick that day, and they really liked both the confidence of my answers, as well as my personality. That’s two mistakes right there. One; my personality is exactly what you would expect it to be. And two; hiring the person who sounds the most confident is exactly what you don’t want to do. It also tends to lend credence to people who look exactly like me. So I had converted some systems over in the first few months for that role to FreeBSD, which is like Linux, except it’s not Linux. It’s a Unix and it’s far older, derived from the Berkeley software distribution. and managing a bunch of those systems at scale was a challenge. Now understand, in this era scale meant something radically different than it does today. I had somewhere between 12 and 15 nodes that I had to care about. Some more mail servers. Some were NTP servers, of all things. Utility boxes here and there, the omnipresent web servers that we all dealt with, the Cacti box whose primary job was to get compromised and serve as an attack vector for the rest of the environment, etcetera. This was a university. Mistakes didn’t necessarily mean the same thing there as they would in revenue-generating engineering activities. I was also young, foolish, and the statute of limitations is almost certainly expired by now. So, running the same command on every box was annoying. This was in the days before configuration management was really a thing. BCFG2 was out there and incredibly complex. And CFEngine was also out there, which required an awful lot of in-depth arcane knowledge that I frankly didn’t have. Remember, I bluffed my way into this job and was learning on the fly. So I did a little digging and, lo and behold, I found a tool that solved my problems. called ClusterSSH. And oh, was it a cluster. The way that this works was that it would spin up different xterm windows on your screen that you could then provide a list of hosts for, and it would open one for every host you gave it. Great. So now I’m logged into all of those boxes at once. If this is making you cringe already, it probably should, because this is not a great architectural pattern. But here we are, we’re telling this story, so you probably know how that worked out. One of the intricacies of FreeBSD is that instead of running systems that turn things on or turn things off, as far as services to start on boot. For example, with Red Hat derived systems, before the dark times of systemd, you could write things

Feb 28, 202014 min

Ep 55RSA Thinks AWS Firewall Manager is a Job Title

AWS Morning Brief for the week of February 24, 2020.

Feb 24, 202012 min

Ep 54Whiteboard Confessional: Route 53 DB

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptCorey: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semipolite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.But first… On this show. I talk an awful lot about architectural patterns that are horrifying. Let's instead talk for a moment about something that isn't horrifying: CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets and you can access it using API's you've come to know and tolerate through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive discs in triplicate and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at chaossearch.io.I frequently joke on Twitter about my favorite database being Route 53, which is AWS’s managed database service. It’s a fun joke, to the point where I’ve become Route 53’s de facto technical evangelist. But where did this whole joke come from? It turns out that this started life as an unfortunate architecture that was taken in a terrible direction. Let's go back in time, at this point almost 15 years from the time of this recording, in the year of our Lord 2020. We had a data center that was running a whole bunch of instances—in fact, we had a few data centers, or datas center, depending upon how you chose to pluralize, that’s not the point of this ridiculous story. Instead what we’re going to talk about is what was inside these data centers. In this case, servers. I know, server-less fans, clutch your pearls, because that was a thing that people had many, many, many years ago. Also known as roughly 2007. And on those servers there was this new technology that was running and was really changing our perspective of how we dealt with systems. I am, of course, referring to the amazing transformative revelation known as virtualization. This solved the problem of computers being bored and not being able to process things in a parallelized fashion—because you didn’t want all of your applications running on all of your systems—by building artificial boundaries between different application containers, for a lack of a better term. Now in these days, these weren’t applications. These were full-on virtualized operating systems, so you had servers running inside of servers, and this was very early days. Cloud wasn’t really a thing. It was something that was on the horizon, if you’ll pardon the pun. So, this led to an interesting question of, “All right. I wound up connecting to one of my virtual machines, and there’s no good way for me to tell which physical server that virtual machine was connecting to.” How could we solve for this? Now, back in those days, with the Hypervisor technology we used, which was Xen, that’s X-E-N—it’s incidentally the same virtualization technology that AWS started out with for many years before releasing their Nitro Hypervisor, which is KVM derived, a couple of years ago. Again, not the point of this particular story. And one of the interesting pieces about how this works was that Xen doesn’t really expose anything, at least in those days, that you could use to query the physical host it was running on. So, how would we wind up doing this? Now, at very small scale where you have two or three servers sitting somewhere, it’s pretty easy. You log in and you can check. At significant scale, that starts to get a little bit more concerning. How do you figure out which physical host a virtual instance is running on? Well, there’s a bunch of schools of thought you can approach this from. But what you’re trying to build is known, technically, as a configuration management database, or CMDB. This is, of course, radically different from configuration management, such as Puppet, Chef, Ansible, Salt, and other similar tooling. But, again, this is t

Feb 21, 202013 min

Ep 53EBS Gets Overly Multi-Attached

AWS Morning Brief for the week of February 17, 2020.

Feb 17, 202012 min

Ep 52Polly Brand Voice Want a Platypus?

AWS Morning Brief for the week of February 10, 2020.

Feb 10, 202011 min

Ep 51Networking in the Cloud Fundamentals: BGP Revisited with Ivan Pepelnjak

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptCorey: Hello and welcome to our Networking In The Cloud mini series, sponsored by ThousandEyes. That's right. There may be just one of you, but there are a thousand eyes. On a more serious note, ThousandEyes has sponsored their cloud performance benchmarking report for 2019, at the end of last year, talking about what it looks like when you race various cloud providers. They looked at all the big cloud providers and determined what does performance look like from an end user perspective? What does the user experience look like among and between different cloud providers? To get your copy of this report, you can visit snark.cloud/realclouds. Why real clouds? Well, because they raced AWS, Azure, GCP, IBM Cloud, and Alibaba, all of which are real clouds. They did not include Oracle cloud because, once again, they are real clouds. Check out your copy of the report at snark.cloud/realclouds.Welcome to week 12 of the Networking In The Cloud mini series of the AWS Morning Brief, sponsored by ThousandEyes. So one of the early episodes of the Networking In The Cloud mini series had me opining and relatively uninformed broad brush strokes about the nature of BGP. Today I am joined by Ivan Pepelnjak, who is a former CCIE who wrote a fascinating blog post that I will link to in the show notes, saying, "This is great, but this is what happens when someone who's good at one thing steps completely out of their comfort zone into things they don't fully understand and start opining confidently, if not authoritatively." Ivan, thank you for taking the time to speak with me.Ivan: Thanks for having me on. And no, I was way more polite than your summary.Corey: Absolutely. I believe that there's a way to tell a story of the hero's journey that everyone talks about when they're building a narrative arc. Instead, I go for the moron's journey and I always like to be the moron because, generally, I tend to be, and as I walk through the world and get things sometimes right, occasionally wrong, I love being corrected when I stumble blindly into an area I don't know. First because it gives me an opportunity to learn something new, which is great, but it also gives me that opportunity to be the dumbest person in the room again, which is awesome. So...Ivan: That's exactly why I blog to get your opinions.Corey: Exactly. You have data, I have opinions and mine are louder seems to be the way that discourse works in the modern era. So from a high level, what did I get wrong about BGP?Ivan: Well, you got everything right about the mess that we are in and the fragility of the generic internet infrastructure. The only thing you got wrong was that you blamed the tool, but not people using the tool.Corey: It always feels like it's safer, on some level, to blame technology because if the takeaway is, "Well, the user experience around tool X isn't great, and that adds a contributing factor to why things break." That seems to be a message that carries slightly better than, "And thus the answer is for everyone to be smarter and stop screwing up." And that may very well be the answer. It's just a bitter pill to swallow sometimes. So I find blaming a tool is easy.Ivan: Yeah, but it's like blaming the knives for people to get cut or blaming the chainsaw for people to cut off their arm because they were not properly trained.Corey: One of my assertions was that BGP is more or less a hot mess because it was designed for an era when people on the internet fundamentally could trust one another and that doesn't seem to be the case today. The analogy in my mind, that I don't think I mentioned, was SMTP, the the email protocol, for lack of a better term. When that was built, the internet was more or less comprised of researchers and who in the world would ever abuse a protocol like email? It's not like there was any money involved in the internet. Fast forward today and your spam folder is inherently a garbage fire.Ivan: Yeah, but BGP has a slightly different history. It was redesigned a few times. There were several attempts to get the global routing protocol right. And BGP, the last attempt, already included the tools that allow entities that don't trust each other, like commercial internet service providers, to exchange information and apply policies on inbound and outbound updates. So for example, I don't want to hear about your customers because I hate you and I don't want to peer with you or I don't want to tell you about my customer because that customer has a special deal and their traffic can only go through so

Feb 6, 202022 min

Ep 50Lies, Damned Lies, and Sponsored Benchmarks

AWS Morning Brief for the week of February 3, 2020.

Feb 3, 202010 min

Ep 49Networking in the Cloud Fundamentals: Cloud and the Last Mile

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptCorey: Hello and welcome to our Networking in the Cloud, mini series sponsored by ThousandEyes. That's right. There may be just one of you, but there are a thousand eyes on a more serious note. ThousandEyes has sponsored their cloud performance benchmarking report for 2019 at the end of last year. Talking about what it looks like when you race various cloud providers. They looked at all the big cloud providers and determined what does performance look like from an end user perspective? What does the user experience look like among and between different cloud providers? To get your copy of this report, you can visit snark.cloud/realclouds. Why real clouds? Well, because they raced AWS, Azure, GCP, IBM Cloud and Alibaba, all of which are real clouds.They did not include Oracle Cloud because once again they are real clouds. Check out your copy of the report at snark.cloud/realclouds. It's interesting that that report focuses on the end user experience because as this mini series begins to wind down, we're talking today about the last mile and its impact on what perceived cloud performance looks like. And I will admit that even having given this entire mini series and having a bit of a network engineering background, once upon a time, I still wind up in a fun world of always defaulting to blaming my local crappy ISP.Now today, my local ISP is amazing. I use Sonic in San Francisco. I get Symmetric Gigabit. It's the exact opposite of Comcast who was my last provider until Sonic came to my neighborhood and it was fun that day because I looked up and down the block and saw no fewer than six Sonic trucks ripping Comcast out by the short and curlies. Which let's not kid ourselves, is something we all wish we could do and I was the happiest boy in town the day I got to do it. Now, the hard part is figuring out that yes, it is in fact a local ISP problem because it isn't always. This is also fortuitous because I spent the last month or so fixing my own local internet situation and today I'd like to tell you a little bit more about that as well as how and why.Originally when I first moved into my roughly, we'll call it 2,800 square foot house, it's spread across three stories, I wound up getting EEROs, that's E-E-R-O. They're a mesh network set up that was acquired by Amazon after I'd purchased them. These are generation one. The wireless environment in San Francisco is challenging and in certain parts of my house, the reception as a result, wound up being a steaming bowl of horse crap. The big challenge was figuring out that, that's what the problem was. With weird dropouts and handoff issues, it was interesting. This one area that caused immediate improvement was not having these things talk to each other wirelessly as most full mesh systems will do, but instead making sure that they were cabled up appropriately to a switch, the central patch panel and then hooked them in. Now you have to be careful with switches because a lot of stuff won't do anything approaching full throughput because that can get expensive and a lot of consumer gear is crap.This was a managed HP pro curved device back in the days that HP made networking equipment. That was great. And it's still crap, but it is crap that works at full line rate. So there's that. Next I wound up figuring that ... all right, it's time to take this seriously. So I did some research and talked to people I know who are actually good at things, instead of sounding on the internet like they're good at things. And I figured the next step was to buy some Ubiquiti Networks style stuff. Great. We go ahead and trot some of that out. It's an enterprise gear. It's full mesh. I of course now have a guest wifi that you have to pay for to use the hotspot. It's called Toss a coin to your wifi for an SS ID because of course it is. I have problems. And it's fun and I can play these stupid games, but suddenly every weird internet problem I had in my house started getting better as a result.And it's astonishing how that changed my perception of various third party services. None of whom, by the way, had anything to do with my actual problem. But there were still some perceptual differences. And this impacts the cloud in a number of subtle ways and that's what I want to talk about today. So one of the biggest impacts is DNS. And I don't mean that in the sense of big cloud provider DNS, we've already talked about how DNS works in a previous episode. But rather what resolver you wind up using yourself. One of the things that I did as a part of th

Jan 30, 202014 min

Ep 48Dedicated T3 Instances Burst My Understanding

AWS Morning Brief for the week of January 27th, 2020.

Jan 27, 202012 min

Ep 47Networking in the Cloud Fundamentals: Connectivity Issues in EC2

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptCorey: Welcome to the AWS Morning Briefs miniseries, Networking In the Cloud, sponsored by ThousandEyes. ThousandEyes has released their cloud performance benchmark report for 2020. They effectively race the top five cloud providers. That's AWS, Google Cloud Platform, Microsoft Azure, IBM Cloud, and Alibaba Cloud, notably not including Oracle Cloud, because it is restricted to real clouds, not law firms. It winds up being derived from an unbiased third party and metric-based perspective on cloud performance as it relates to end user experience. So this comes down to what real users see, not arbitrary benchmarks that can't be gamed. It talks about architectural and conductivity differences between those five cloud providers and how that impacts performance. It talks about AWS Global Accelerator in exhausting detail. It talks about the Great Firewall of China and what effect that has on cloud performance in that region, and it talks about why regions like Asia and Latin America experience increased network latency on certain providers. To get your copy of this fascinating and detailed report, visit snark.cloud/realclouds, because again, Oracle's not invited. That's snark.cloud/realclouds, and my thanks to ThousandEyes for their continuing sponsorship of this ridiculous podcast segment.Now, let's say you go ahead and spin up a pair of EC2 instances, and as would never happen until suddenly it does, you find that those two EC2 instances can't talk to one another. This episode of the AWS Morning Brief's Networking in the Cloud Podcast focuses on diagnosing connectivity issues in EC2. It is something that people don't have to care about until suddenly they really, really do. Let's start with our baseline premise, that we've spun up an EC2 instance, and a second EC2 instance can't talk to it. How do we go about troubleshooting our way through that process?The first thing to check, above all else, and this goes back to my grumpy Unix systems administrator days is: are both EC2 instances actually up?Yes, the console says they're up. It is certainly billing you for both of those instances, I mean, this is the cloud we're talking about, and it even says that the monitoring checks, there are two by default for each instance, are passing. That doesn't necessarily mean as much as you might hope. If you go into the EC2 console, you can validate through the system logs that they booted successfully. You can pull a screenshot out of them. If everything else was working, you could use AWS Systems Manager Session Manager, and if you'll forgive the ridiculous name, that's not a half bad way to go about getting access to an instance. It spins up a shell instance in a browser that you can poke around inside that instance within, but that may or may not get you where it needs to go. I'm assuming you're trying to connect to one of those instances or both of those instances and failing, so validate that you can get into both of those instances independently.Something else to check. Consider protocols. Very often, you may not have permitted SSH access to these things. Okay, or maybe you can't ping these and you're assuming they're down. Well, an awful lot of networks block certain types of ICMP traffic, echo requests, for example. Type eight. Otherwise, you may very well find that whatever protocol you're attempting to use isn't permitted all the way through. Note incidentally, just as an aside, that blocking all ICMP traffic is going to cause problems for your network. When things are fragmented and they need to have a different window size set for things that are being sent across the internet, ICMP traffic is how things are made aware of that. You'll see increased latency if you block all ICMP traffic, and it's very difficult to diagnose, so please, for the love of God, don't do that.Something else to consider as you go down the process of tearing apart what could possibly be going on with these EC2 instances not able to speak to each other. Try and connect to them via IP addresses rather than DNS names. Just because there's ... I'm not saying the problem is always DNS, but it usually is DNS, and this removes a whole host of different problems that could be manifesting if you just go by IP address. Suddenly resolution, timeouts, bad DNS, et cetera, fall by the wayside. When you have a system that you're trying to talk to another system and you're only using IP, suddenly there's a whole host of problems you don't have to think about. It goes well.Something else to consider in the wonderfu

Jan 23, 202015 min

Ep 46AWS Back-All-The-Way-Up

AWS Morning Brief for the week of January 20th, 2020.

Jan 20, 20209 min

Ep 45Networking in the Cloud Fundamentals: Data Transfer Pricing

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptCorey: Welcome to the AWS Morning Brief, specifically our 12-part mini series, Networking In The Cloud, sponsored by ThousandEyes. ThousandEyes recently released their state of the cloud benchmark performance report. They raced five clouds together and gave a comparative view of the networking strengths, weaknesses, and approaches of those various providers. Take a look at what it means for you. There's actionable advice hidden within, as well as incredibly useful comparative data, so you can start comparing apples to oranges instead of apples to baseballs. Check them out and get your copy today at snark.cloud/realclouds. That's snark.cloud/realclouds because Oracle cloud was not invited to participate.Now, one thing that they did not bother to talk about in that report, is how much all of that data transfer across different providers costs. Today I'd like to talk about that, which is a bit of a lie because I'm not here to talk about it at all, I'm here to rant like a freaking lunatic for which I make no apologies whatsoever.This episode is about data transfer pricing in AWS. Because honestly I need to rant about something and this topic is entirely too near and dear to my heart, given that I spend most of my time fixing AWS bills for interesting and various sophisticated clients.Let's begin with a simple question. The answer to which is guaranteed to piss you off like almost nothing else. What does it cost to move a gigabyte of data in AWS? Think about that for a second. The correct answer, of course, is that nobody freaking knows. There is no way to get a deterministic answer to that question without asking a giant boatload of other questions.Let me give you some examples, and before I do, I would like to call out that every number I'm about to mention applies only to us-east-1, because of course different regions in different places have varying costs, that every single one of these numbers is different in other places sometimes, but not always. Why? Because things are awful. I told you I was going to rant. I'm not apologizing for it at this point.Let's begin simply and talk about what it takes to just shove a gigabyte of data into AWS. Now in most cases that's free. Inbound bandwidth is always free to AWS usually, until it passes through with load balancer or does something else but we'll get there. What does it cost to move data between two AWS regions? Great. The answer to that is, two cents per gigabyte in the primary regions, except there's one use case which gets slightly less. And that is moving between us-east-1 and us-east-2. One is in Virginia, two is in Ohio. That is half price at one cent per gigabyte. My working theory behind that is that it's because even data wants to get the hell out of Ohio.Let's take it a step further. Let's say you were in an individual region. What does it cost to move data from 1-AZ to another? The documentation was exquisitely unclear, and I had to do some experiments with spinning up a few instances in otherwise empty AWS accounts, and using DD and Netcat to hurl data across various links to find out the answer and then wait till it showed up on my bill. The answer is it also costs 2 cents per gigabyte, the same cost as region to region. It's one cent per gigabyte out of an AZ and one cent per gigabyte in to an AZ. And that's right, it means you get charged twice. If you move 10 gigabytes, you are charged for 20 gigabytes on that particular metric.This also has the fun ancillary side effect of meaning that moving data between Virginia and Ohio is cheaper to do that cross region transfer than it is to move that same data within an existing region. Oh wait, it gets dumber than that. What do load balancer data transfer fees look like? The correct answer is who the hell knows? On the old classic load balancers, it was 0.8 cents per gigabyte in or out to the internet and there was also an instance fee, but that's not what we're talking about today. Traffic from any existing load balancer today to something inside of an AZ is free unless it crosses an availability zone and then we're back into cross AZ data transfer territory and anything going from an availability zone to a load balancer costs one cent per gigabyte.Now the newer load balancer generations, the ALDs and the NLDS, what does that cost? Nobody freaking knows because data throughput is just one of several dimensions that go into a load balancer capacity unit, which mean that what your data transfer price is going to look like is going to vary wildly because in this p

Jan 16, 202016 min

Ep 44Your Database Will Explode in Sixty Seconds

AWS Morning Brief for the week of January 13th, 2020.

Jan 13, 202012 min

Ep 43Networking in the Cloud Fundamentals: The Cloud in China

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptCorey: Welcome back to Networking In The Cloud, a special 12 week mini feature of the AWS morning brief sponsored by ThousandEyes. This week's topic, The Cloud in China, but first, let's talk a little bit about ThousandEyes. You can think of ThousandEyes as the Google maps of the internet, just like you wouldn't leave San Jose to drive to San Francisco without checking which freeway to take because local references are always going to resonate the best when telling these stories, business rely on ThousandEyes to see the end to end paths that their applications and services are taking from their servers to their end users, to identify where the slowdowns are, where the pile ups are, and what's causing these issues. They can use ThousandEyes to figure out what's breaking and ideally notify providers before their customers notice. To learn more, visit thousandeyes.com. And my thanks to them for their sponsoring of this mini series.Now, when we're talking about China, I want to start by saying that I'm not here to pass judgment. Here in the United States, we're sort of the Oracle cloud of foreign policy, so Lord knows that my hands aren't clean any. Instead, I want to have a factual discussion about what networking in China looks like in the world of cloud in 2020. To start, China is a huge market. The market for cloud services in China this year is expected to reach just over a hundred billion dollars. So there's a lot of money on the table, there's a lot riding on companies making significant inroads into an extremely lucrative market that is extremely technologically savvy.Historically, according to multiple Chinese cloud executives who were interviewed for a variety of articles, China's enterprise IT market is probably somewhere between five to seven years behind most Western markets. That means that there's a huge amount of opportunity for companies to be able to make inroads and make an impact on that market before it winds up being dominated, like a lot of the Western markets have been by certain large Seattle-based cloud providers, ahem, ahem.Now, due to Chinese regulations, in order to run a cloud provider in China, it has to be operated by a Chinese company. That's why Microsoft works with a company called 21Vianet, whereas AWS has two partners, Beijing Sinnet and NWCD. Those local partners in fact own and operate the physical infrastructure that the cloud providers are building in China and become known as the seller of record. Although the US cloud companies of course do, or at least ostensibly retain all the rights to their intellectual property, either trademarks, their copyrights, etc.That said, if you take a look at any of the large cloud providers, service and region availability tables, there's very clearly a significant lag between when services get released in most regions and when they do inside the mainland China regions. Some of the concern, at least according to people off the record, comes down to concern over intellectual property theft. And in the current political climate where we have basically picked an unprovoked trade war with China, it winds up complicating this somewhat heavily. If for no other reason, then companies are extremely skittish about subjecting what they rightly perceive to be their incredibly valuable intellectual property to the risks of operating inside of mainland China, so on the one hand they don't want to deal with that. On the other, there are over half a billion people in China with smartphones, just shy of 900 million people on the internet in one form or another. So there's an awful lot of money at stake. So companies find themselves rather willing to overlook some things that they otherwise would not want to bother with. Now again, I'm not here to moralize, I just find the idea to be somewhat fascinating.Most of that stuff you can find out just from reading news articles and various press releases. So let's go a little bit further into how companies are servicing the Chinese market. Not for nothing, but picking on AWS because they are the incumbent in this space, and this is the AWS morning brief. But looking at the map on my wall, they have regions in Tokyo, in Seoul, in Hong Kong, in Singapore and Mumbai. If you squint enough, that sort of forms a periphery around the outside of mainland China. Here in the real world, if it's at all feasible, companies tend to use those regions that are more or less scattered around China, rather than within China if it is even slightly feasible and then provide services to th

Jan 9, 202012 min

Ep 42Burning Amazon Lex to CD-ROM

AWS Morning Brief for the week of January 6th, 2020.

Jan 6, 20209 min

Ep 41Listener Mailbag

AWS Morning Brief for the week of December 30th, 2019.

Dec 30, 201917 min

Ep 40It's a Horrible Lyfebin

AWS Morning Brief for the week of December 23rd, 2019.

Dec 23, 201914 min

Ep 39Networking in the Cloud Fundamentals: Regions and Availability Zones in AWS

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptCorey: Hello, and welcome back to our Networking in the Cloud mini series sponsored by ThousandEyes. That's right. ThousandEyes is state-of-the-cloud Performance Benchmark Report is now available for your perusal. It's really providing a lot of baseline that we're taking all of the miniseries information from. It pointed us in a bunch of interesting directions and helps us tell stories that are actually, for a change, backed by data rather than pure sarcasm. To get your copy, visit snark.cloud/realclouds because it only covers real cloud providers. Thanks again to ThousandEyes for their ridiculous support of this shockingly informative podcast mini series.It's a basic fact of cloud that things break all the time. I've been joking for a while that a big competitive advantage that Microsoft brings to this space is that they have 40 years of experience apologizing for software failures, except that's not really a joke. It's true. There's something to be said for the idea of apologizing to both technical and business people about real or perceived failures being its own skillset, and they have a lot more experience than anyone in this space.There are two schools of thought around how to avoid having to apologize for service or component failures to your customers. The first is to build super expensive but super durable things, and you can kind of get away with this in typical data center environments right up until you can't, and then it turns out that your SAN just exploded. You're really not diversifying with most SANs. You're just putting all of your eggs in a really expensive basket, and of course, if you're still with power or networking outage, nothing can talk to the SAN, and you're back to square one.The other approach is to come at it with a perspective of building in redundancy to everything and eliminating single points of failure. That's usually the better path in the cloud. You don't ever want to have a single point of failure if you can reasonably avoid it, so going with multiple everythings starts to make sense to a point. Going with a full on multi-cloud story is a whole separate kettle of nonsense we'll get to another time. But you realize at some point you will have single points of failure and you're not going to be able to solve for that. We still only have one planet going around one sun for example. If either of those things explode, well, computers aren't really anyone's concern anymore. However, betting the entire farm on one EC2 instance is generally something you'll want to avoid if at all possible.In the world of AWS, there aren't data centers in the way that you or I would contextualize them. Instead, they have constructs known as availability zones and those composed to form a different construct called regions. Presumably, other cloud providers have similar constructs over in non-AWS land, but we're focusing on AWS as implementation in this series, again, because they have a giant multi-year head start over every other cloud provider, and even that manifests in those other cloud providers comparing what they've built and how they operate to AWS. If that upsets you and you work at one of those other cloud providers, well, you should have tried harder. Let's dive in to a discussion of data centers, availability zones, and regions today.Take an empty warehouse and shove it full of server racks. Congratulations. You have built the bare minimum requirement for a data center at its most basic layer. Your primary constraint and why it's a lot harder than it sounds is power, and to a lesser extent, cooling. Computers aren't just crunching numbers, they're also throwing off waste heat. You've got to think an awful lot about how to keep that heat out of the data center.At some point, you can't shove more capacity into that warehouse-style building just because you can't cool it if it's all running at the same time. If your data center's particularly robust, meaning you didn't cheap out on it, you're going to have different power distribution substations that feed the building from different lines that enter the building at different corners. You're going to see similar things with cooling as well, multiply redundant cooling systems.One of the big challenges, of course, when dealing with this physical infrastructure is validating that what it says on the diagram is what's actually there in the physical environment. That can be a trickier thing to explore than you would hope. Also, if you have a whole bunch of systems sitting in that warehouse and you

Dec 19, 201916 min

Ep 38AWS Dep-Ric-ates Treasured Offering

AWS Morning Brief for the week of December 16th, 2019.

Dec 16, 201911 min

Ep 37reInvent Wrap-up, Part 4

AWS Morning Brief for Friday, December 13th, 2019

Dec 13, 201915 min

Ep 36Networking in the Cloud Fundamentals, Part 6

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptCorey: Knock knock. Who's there? A DDOS attack. A DDOS a... Knock. Knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock.Welcome to what we're calling Networking in the Cloud, episodes six, How Things Break in the Cloud, sponsored by ThousandEyes. ThousandEyes recently launched their state of the cloud performance benchmark report that effectively lets you compare and contrast performance and other aspects between the five large cloud providers, AWS, Azure, GCP, Alibaba and IBM cloud. Oracle cloud was not invited because we are talking about real clouds here. You can get your copy of this report at snark.cloud/realclouds. and they compare and contrast an awful lot of interesting things. One thing that we're not going to compare and contrast though, because of my own personal beliefs, is the outages of different cloud providers.Making people in companies, by the way, companies are composed of people, making them feel crappy about their downtime is mean, first off. Secondly, if companies are shamed for outages, it in turn makes it far likelier that they won't disclose having suffered an outage. And when companies talk about their outages in constructive blameless ways, there are incredibly valuable lessons that we all can learn from it. So let's dive into this a bit.If there's one thing that computers do well, better than almost anything else, it's break. And this is, and I'm not being sarcastic when I say this, a significant edge that Microsoft has when they come to cloud. They have 40 some odd years of experience in apologizing for software failures. That's not trying to be insulting to Microsoft, it's what computers do, they break. And being able to explain that intelligently to business stakeholders is incredibly important. They're masters at that. They also have a 20 year head start on everyone else in the space. What makes this interesting and useful is that in the cloud, computers break differently than people would expect them to in a non-cloud environment.Once upon a time when you were running servers and data centers, if you see everything suddenly go offline, you have some options. You can call the data center directly to see if someone cut the fiber, in case you were unaware of fiber optic cables' sole natural predator in the food chain is the mighty backhoe. So maybe something backhoed out some fiber lines, maybe the power is dead to the data center, maybe the entire thing exploded, burst into flames and burned to the ground, but you can call people. In the cloud, it doesn't work that way. Here in the cloud, instead you check Twitter because it's 3:00 AM and Nagios is the original call of duty or PagerDuty calls you, because you didn't need that sleep anyway, telling you there is something amiss with your site. So when a large bond provider takes an outage, and you're hanging out on Twitter at two in the morning, you can see DevOps Twitter come to life in the middle of the night, as they chatter back and forth.And incidentally, if that's you, understand a nuance of AWS availability zone naming. When people say things like us-east-1a is having a problem and someone else says, "No, I just see us-east-1c is having a problem," you're probably talking about the same availability zone. Those letters change, non deterministically, between accounts. You can pull zone IDs, and those are consistent. But by and large, that was originally to avoid having problems like everyone picking A, as humans tend to do or C, getting the reputation as the crappy one.So why would you check Twitter to figure out if your cloud provider's having a massive outage? Well, because honestly, the AWS status page is completely full of lies and gaslights you. It is as green as the healthiest Christmas tree you can imagine, even when things are exploding for a disturbingly long period of time. If you visit the website, stop.lying.cloud, you'll find a Lambda and Edge function that I've put there that cuts out some of the croft, but it's not perfect. And the reason behind this, after I gave them a bit too much crap one day and I got a phone call that started with, "Now you listen here," it turns out that there are humans in the loop, and they need to validate that there is in fact a systemic issue at AWS and what that issue might be, and then finally come up with a way to report that in a way that ideally doesn't get people sued and manually update the status page. Meanwhile, your site's on fire. So that is a traili

Dec 12, 201916 min

Ep 35reInvent Wrap-up, Part 3

AWS Morning Brief for Wednesday, December 11th, 2019

Dec 11, 201910 min

Ep 34reInvent Wrap-up, Part 2

AWS Morning Brief for Tuesday, December 10th, 2019.

Dec 10, 201913 min

Ep 33reInvent Wrap-up, Part 1

AWS Morning Brief for the week of December 9th, 2019.

Dec 9, 201913 min

Ep 32Wherever You May Rome

AWS Morning Brief for the week of December 2nd, 2019.

Dec 2, 201914 min

Ep 31Networking in the Cloud Fundamentals, Part 5

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptCorey: As the world spins faster, it heats up because of friction. Therefore, for the good of humanity, the AWS Global Accelerator must be turned off. Welcome once again to Networking in the Cloud, a 12 week special on the AWS Morning Brief, sponsored by ThousandEyes. Think of ThousandEyes as the Google Maps of the internet without the creepy privacy implications. Just like you wouldn't necessarily go from one place to another without checking which route was less congested during rush hour, businesses rely on ThousandEyes to see the end to end paths that their applications and services are taking, from their servers, to their end users, or between other servers, just to identify where the slow downs are, where the pile ups live, and what's causing various issues. They use ThousandEyes to see what's breaking where and then of course depend upon ThousandEyes to share that data directly with the offending providers, to shame them into accountability and get them to fix the issue. Learn more at thousandeyes.com.So, today we talk about the Global Accelerator, which is an offering from AWS that they announced at re:Invent last year. What is it? Well, when traffic passes through the internet from your computer on route to a cloud provider, or from your data center to a cloud provider, the provider has choices as to how to route that traffic in. Remember, there's no cloud provider that we're going to be talking about that doesn't have a global presence. So, they have a number of different choices.Some, such as GCP and Azure, will route that traffic directly into their networks right away, as close to the end user as possible. Others, like AWS and interestingly Alibaba, will have that traffic ride the public internet as long as possible, until it gets to the region that that traffic is aimed at, and then ingested into the provider's network. And, IBM has an interesting hybrid approach between the two of these that doesn't actually matter, because it's IBM Cloud.Now, Global Accelerator offers a slightly different option here. Because by default, traffic bound to AWS will ride the public internet until it hits the region at the end. That means that traffic is subject to latency based upon public internet congestion. It's subject to non-deterministic latency, as far as leading to... Some packets will get there faster than others, as they take different routes, so jitter becomes a concern.Global Accelerator sort of flips the behavior on its head, where instead of traveling across the entire internet until it smacks into a region, traffic now winds up landing into AWS's network far sooner, and then rides along AWS's backbone to where it needs to go. And then, it smacks into one of a number of different end points. Today, at the time of this recording, it supports application load balancers, either internal or external, network load balancers, elastic IPs and whatever you can tie those to, and of course EC2 instances, public or private. We'll mention that... The caveat about that a little later on.On the other side, to the internet, what happens is that Global Accelerator gives out two IP addresses that are Anycast. What that means is using BGP, those are generally repointed to the closest supported region to the customer. As a result, they can do a lot of changes to network architecture in completely invisible ways to the end user. It supports, for example, shifting traffic to different regions or endpoints. It can shape how that traffic winds up manifesting on the fly.So, other ways of managing this such as using DNS, means that suddenly you don't have high TTLs anymore on the client side. That mean the traffic doesn't shift as closely as you'd like, and IP caching as well once that DNS record is resolved, no longer applies. You see this all over the place with, for example, public DNS resolvers. The same IP addresses are what people use globally to talk to, well known DNS resolvers, but strangely it's always super quick and not traveling across the entire internet. Imagine that.This is similar in some ways to AWS's CloudFront service. CloudFront is, as mentioned, a CDN that has somewhat similar performance characteristics. It generally winds up being a slightly better answer when you're using a protocol like HTTP or HTTPS that the entire CDN service has been designed around. They have a whole bunch of locations that are scattered across the globe, and sure it takes a year and a day to update a distribution or deploy a new one in CloudFront, but that's not really the point of

Nov 28, 201915 min

Ep 30Improving Customers by Stuffing Them Into Containers

AWS Morning Brief for the week of November 25th, 2019.

Nov 25, 201916 min

Ep 29Networking in the Cloud Fundamentals, Part 4

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptAn IPv6 packet walks into a bar. Nobody talks to it.Welcome back to what we're calling a networking in the cloud, a 12 week networking extravaganza sponsored by ThousandEyes. You can think of ThousandEyes as the Google maps of the internet. Just like you wouldn't dare leave San Jose to drive to San Francisco without checking to see if the 101 or the 280 was faster, businesses rely on ThousandEyes to see the end to end pads their apps and services are taking and for localized traffic stories that mean nothing to people outside of the Bay Area. This enables companies to figure out where are the slowdowns happening, where are the pile ups and what's causing issues. They use ThousandEyes to see what's breaking where, and importantly they share that data directly with the offending service providers to hold them accountable in a blameless way and get them to fix the issue fast, ideally before it impacts their end users.Learn more at thousandeyes.com. And my thanks to them for sponsoring this ridiculous podcast mini-series.This week we're talking about load balancers. They generally do one thing and that's balancing load, but let's back up. Let's say that you, against all odds, you have a website and that website is generally built on a computer. You want to share that website with the world, so you put that computer on the internet. Computers are weak and frail and often fall over invariably at the worst possible time. They're herd animals. They're much more comfortable together. And of course, we've heard of animals. We see some right over there.So now you have a herd of computers that are working together to serve your website. The problem now of course, is that you have a bunch of computers serving your website. No one is going to want to go to www6023.twitterforpets.com to view your site. They want to have a unified address that just gets to wherever it has to happen. Exposing those implementation details to customers never goes well.Amusingly, if you go to Deloitte, the giant consultancy's website, the entire thing lives at www2.deloitte.com. But I digress. Nothing says we're having trouble with digital transformation quite so succinctly.So you have your special computer or series of computers now that live in front of the computers that are serving your website. That's where you wind up pointing twitterforpets.com to, or www.twitterforpets.com towards. Those computers are specialized and they're called load balancers because that's exactly what they do; they balance load, it says so right there on the tin. They pass out incoming web traffic to the servers behind the load balancer so that those servers can handle your website while the load balancer just handles being the front door that traffic shows up through.This unlocks a world of amazing possibilities. You can now, for example, update your website or patch the servers without taking your website down with a back in five minutes sign on the front it. You can test new deployments with entire separate fleets of servers. This is often called a blue green deploy or a red black deploy, but that's not the important part of the story. But you can start bleeding off traffic to the new fleet and, "Oh my god, turn it off, turn it off, turn it off. We were terribly wrong. The upgrade breaks everything." But you can do that; turn traffic on, turn traffic off to certain versions and see what happens.Load balancers are simple in concept but they're doing increasingly complicated things. For instance, you're a load balancer. How do you determine which of the 200 servers that you're in front of that all do the same thing because they have the same website and the same application code running on them, how do you determine which one of those receives the next incoming request?There are a few patterns that are common. The first and maybe the simplest is called round robin. You'll also see this referred to as next in loop. Let's say you have four web servers. Your first request goes to server one. Your second request goes to server two. Server three and server four, and the fifth request goes back to server one. It just rotates through the servers in order and passes out requests as they commit.This can work super well for some use cases, but it does have some challenges. For example, if one of those servers get stuck or overloaded, piling more traffic onto it is very rarely going to be the right call. A modification of round robin is known as weighted round robin, which works more or less the same way, but it's smarter. Certain

Nov 21, 201917 min

Ep 28A CloudFormation Feature of Great Import

AWS Morning Brief for the week of November 18th, 2019.

Nov 18, 201912 min

Ep 27Networking in the Cloud Fundamentals, Part 3

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptThis episode of Networking in the Cloud is sponsored by ThousandEyes. Their 2019 Cloud Performance Benchmark Report is now live as of yesterday. Find out which Clouds do what well, AWS, Azure, GCP, Alibaba, and IBM Cloud all have their networking capabilities raced against each other. Oracle was not invited, because we are talking about actual Cloud providers here, not law firms. Get your copy of the report today at Snark.Cloud/realclouds. That's Snark.Cloud/realclouds. That's completely free. Download it, let me know what you think. I'll be cribbing from that in future weeks. Now, for the third week of our AWS Morning Brief Screaming in the Network, or whatever we're calling it, mini-series on how computers talk to one another. Let's talk about the larger internet.Specifically, we begin with BGP, or Border Gateway Protocol. This matters, because it's how different networks talk to one another. If you have a whole bunch of different computer networks gathered into a super network, or internet as some people like to call it, how do those networks know where each one lives? Now, from a home user perspective, or even in some enterprises, that seems like sort of a silly question, because it is. You have a network that lives on your end of things. You plug a single cable in, and every other network lives through that cable. When you're talking about large disparate networks though, how do they find each other? More to the point, because of how the internet was built, it's designed so that any single failure of another network can now be routed around. There are multiple paths to get to different places. Some biased for cost, some biased for performance, some biased for consistency. And all of those decisions have to be made globally. BGP is the lingua franca of how those networks talk to one another. BGP is also a hot mess.It's the routing protocol that runs the internet, and it's comprised of different networks in this parlance, autonomous systems, or AS's, and it was originally designed for a time before jerks ruled the internet, and that's jerks in terms of people causing grief for others, as well as shady corporate interests that are publicly traded on NASDAQ. There's no authentication tied to BGP. Effectively, it is trusted to contain correct data. There is no real signing or authentication that someone who announces something through BGP is authorized to do it, and it's sort of amazing the whole thing works in the first place, but what happens is, is when a large network with other networks behind it winds up doing an announcement, it says, oh, I have routes to these following networks. And it passes them on to its peers. They in turn pass those announcements on, oh, behind me. Then this way two hops is this other series of networks, and so on and so forth.Now this can cause hilariously bad problems that occasionally make the front page of the newspaper when a bad announcement gets out. A few years ago there was an announcement from an ISP that said, oh, all of YouTube lives behind us. That announcement should never have gone out, and their upstream ISP should have quashed it, and they didn't. So suddenly a good swath of the internet was trying to reach YouTube through a relatively small link. As you can imagine, TCP terminated on the floor. Not every link can handle exabytes of traffic. Who knew? That gets us to another interesting point. How do these large networks communicate with each other? You have this idea of one network talks to another network. Does money change hands? Well, in some cases, no. If traffic volumes are roughly equal and desirable on both sides, we'll have our networks talk to one another, and no money changes hands. This is commonly known as peering. At that point, everything is mostly grand, because as traffic continues to climb, you increase the links. Both parties generally wind up paying to operate infrastructure on their own side and in between, and traffic continues to grow. Other times it doesn't work that way where you have one network with a lot of traffic, and another network that doesn't really have much of any, and people want to go from one end to the other. Very often this is known as a transit agreement, and money changes hands from usually the smaller network to the bigger network, but occasionally the other direction depending on the specifics of the business model, and at that point, every byte passing through is metered and generally charged for. Usually this is handled by large ISPs and carriers and businesses behind the

Nov 14, 201915 min

Ep 26EC2 Instances Now On Layaway

AWS Morning Brief for the week of November 11th, 2019.

Nov 11, 201910 min

Ep 25Networking in the Cloud Fundamentals, Part 2

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.Transcript An ancient haiku reads, "It's not DNS. There's no way it's DNS. It was DNS." Welcome to the Thursday episode of the AWS Morning Brief. What you can also think of as networking in the cloud. This episode is sponsored by ThousandEyes and their Cloud State Live Event Wednesday, November 13th from 11:00 AM until noon, Central Time. There'll be live streaming from Austin, Texas, the live reveal of their latest cloud performance benchmark where they pit AWS, Azure, GCP, IBM, and Alibaba cloud against each other from a variety of networking perspectives. Oracle Cloud is pointedly not invited. If you'd like to follow along, visit snark.cloud/cloudstatelive, that's snark.cloud/cloudstatelive, and thanks to ThousandEyes for their sponsorship of this ridiculous yet educational podcast episode.DNS, the domain name system, it's how computers translate numbers into something humans can understand when those humans have a first language that is not math. Put more succinctly if I want to translate www.twitterforpets.com into an IP Address of 1.2.3.4, I probably want a computer able to do that because humans find it easier to remember twitterforpets.com. Originally, this was done with a far more manual process. There was a file on every computer on the internet that was kept in sync with each other. The internet was a smaller place back then, a friendlier time and jerks who are trying to monetize everything at the expense of others were no longer lurked behind every shadow, so how does this service work?Well, let's go back to the beginning. When you look at a typical domain name, let's call it www.twitterforpets.com there's a hierarchy built in and it goes from right to left. In fact, if you pick any domain you'd like that ends .com, .net, .technology, .dev, .anything else you care about there's another dot at the end of it. That's right. You could go to www.google.com., and it works just the same way as you would expect it to. That dot represents the root and there are a number of root servers run by various organizations that no one entity controls scattered around the internet and they have an interesting job where their role is to resolve who is the authoritative responsible DNS server for the top-level domains. That's all that the root servers do.The top-level domains, in turn, have name servers that refer out to who is responsible for any given domain within that top-level domain and so on and so forth. You can have subdomains running at your own company. You could have twitterforpets.com but all of the engineering.twitterforpets.com domains are delegated to a subname server out and so on and so forth. It can hit ludicrous lengths if you'd like. Now, once upon a time, this was relatively straightforward because there were only so many top-level domains that existed; .com, .net, .org, .edu, .mil and so on and so forth, and the governing body, ICAN, decided, "You know what's great? Money," so they wound up, in turn, going for additional top-level domains that you could grab the .technology, .blog, .underpants for all I know, no one can keep them all in their head anymore and one leaps to mind of an incredibly obnoxious purchase by google.dev.Now, you can have anything you want .dev exist as a domain because Google has taken responsibility for owning that subdomain. Why is that obnoxious? Well, historically for the longest time on the internet, there were a finite number of top-level domains that people had to worry about. So internally, when people were building out their own environments, they would come up with something that was guaranteed never to resolve, .dev was a popular pick. You could put that to a local name server inside your firewall or you could even hard-code it on your laptop itself and it worked out super-well. Now, anyone who registers whatever domain you picked has the potential to set up a listener on their end. That is not just a theoretical concern. I worked at a company once that had their domain.com as their external domain and domain.net for their internal domain, which is reasonable, except for the part where they didn't own the .net version of their domain.Someone else did and kept refusing offers to buy it, so periodically, we would try and log into something internal while not being on the VPN, despite thinking that we were, and type a credential into this listener that is set up and immediately have to reset our credentials. It was awful. Try not to do that. If you use a development domain, make sure you own it, it's $12, everyone will be happier

Nov 7, 201916 min

Ep 24The Rain in Spain Falls Mainly on the Control Plane

AWS Morning Brief for the week of November 4th, 2019.

Nov 4, 201910 min

Ep 23Networking in the Cloud Fundamentals, Part 1

Links ReferencedThousandEyesThe Duckbill GroupLast Week in AWSScreaming in the CloudAWS Morning BriefTranscriptUDP. I'd make a joke about it, but I'm not sure you'd get it. This episode is sponsored by ThousandEyes. Think of ThousandEyes as the Google Maps of the internet. Just like you wouldn't dare leave San Jose to drive to San Francisco without checking if 101 or 280 was faster and yes, that's a very localized reference to San Francisco Bay area. Businesses rely on ThousandEyes to see the end to end paths their apps and services are taking from their servers to their end users to identify where the slowdowns are, where the pileups are hiding and what's causing the issues. They use ThousandEyes to see what's breaking where and importantly, they share that data directly with the offending service providers to hold them accountable and get them to fix the issue fast, ideally before it impacts end users. You'll be hearing a fair bit more about ThousandEyes over the next 12 weeks because Thursdays are now devoted to networking in the cloud. It's like screaming in the cloud, only far angrier.We begin today with the first of 12 episodes. Episode one, the fundamentals of cloud networking. You can consider this the AWS morning brief networking edition. So a common perception in the world of cloud today is that networking doesn't matter, and that perception is largely accurate. You don't have to be a network engineer the way that any reasonable systems or operations person did even 10 years ago, because in the cloud, the network doesn't matter at all until suddenly it does at the worst possible time, and then everyone's left scratching their heads.So let's begin with how networking works, because a computer in 2019 is pretty useless if it can't talk to other computers somehow. And for better or worse, Bluetooth isn't really enough to get the job done. Computers talk to one another over networks, basically by having a unique identifier. Generally, we call those IP addresses here in the path that this future has taken. In a different world, we would've gone with token ring and a whole bunch of other addressing protocols, but we didn't. Instead we went with IP, the unimaginatively named internet protocol, and with the current version of the internet protocol, version four, we're not talking about IPv6 because let's not kid ourselves, no one's really using that at scale despite everyone claiming that it's going to happen real soon now.So there are roughly 4 billion IP addresses and change, and those are allocated throughout effectively the entire internet. When this stuff was built back when it was just defense institutions and universities on the internet, 4 billion seemed like stupendous overkill. Now it turns out that some people have 4 billion objects on their person that are talking to the internet and all chirping and distracting them at the same time when you're attempting to have a conversation with them.So those networks are broken down into subnetworks or subnets, for lack of a better term. And they can range anywhere from a single IP address, which in CIDR, C-I-D-R parlance is a /32 to all 4 billion and change, which is a /0. Some common ones tend to be /24, which is 256 IP addresses, of which 254 are usable and you can expand that into 512 with a /23 and so on and so forth. The specific math isn't particularly interesting or important and it's super hard to describe without some kind of whiteboard. So smile, nod and move past that. So then you have all these different subnets. How do they talk to one another? I mean the easy way to think of it is, "Oh, I have one network, I plug it directly into another network and they can talk to each other."Well, sure in theory. In practice, it never works that way because those two networks are often not adjacent. They have to talk to something else, go through different hops to go from here to there to somewhere else, to somewhere else to finally the destination it cares about. And when you take a look at the internet as being this network that spans the entire world, well that turns into a super complicated problem because remember, the internet was originally designed to be something that could withstand a massive disruption generally in the terms of nuclear war where effectively large percentages of the earth were no longer habitable, had to be able to reroute around things and routing is more or less how that wound up working.The idea that you could have different paths to get to the same destination and that solves an awful lot. It's why the internet is as durable as it is, but also explains why these things are terrible and why everyone is super quick to blame the network. One last thing to consider is network address translation. They're private IP address ranges that are not reachable over the general internet, anything starting with a 10 for example, the entire 10/8 is considered private IP address space. Same with one 192.168, anything in that range is as w

Oct 31, 201917 min

Ep 22Last of the JEDI

AWS Morning Brief for the week of October 28th, 2019.

Oct 28, 201911 min

Ep 21AWS CloudWatch Anomaly Wake-Up Calls

AWS Morning Brief for the week of October 21st, 2019.

Oct 21, 201910 min

Ep 20The Right to Bare Metal ARMs

AWS Morning Brief for the week of October 14th, 2019.

Oct 14, 201911 min

Ep 19Hope and Change Management

AWS Morning Brief for the week of October 7th, 2019.

Oct 7, 201912 min

Ep 18API Has Two Syllables

AWS Morning Brief for the week of September 30th, 2019.

Sep 30, 20199 min

« Prev 11 12 131415 Next »