The Gravity of Kubernetes

Kubernetes has become the standard way of deploying new distributed applications. Most new internet businesses started in the foreseeable future will leverage Kubernetes (whether they realize it or not). Many old applications are migrating to Kubernete...

Greatest Hits Archives - Software Engineering Daily

January 13, 20181h 2m

Audio is streamed directly from the publisher (traffic.megaphone.fm) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

Kubernetes has become the standard way of deploying new distributed applications.

Most new internet businesses started in the foreseeable future will leverage Kubernetes (whether they realize it or not). Many old applications are migrating to Kubernetes too.

Before Kubernetes, there was no standardization around a specific distributed systems platform. Just like Linux became the standard server-side operating system for a single node, Kubernetes has become the standard way to orchestrate all of the nodes in your application.

With Kubernetes, distributed systems tools can have network effects. Every time someone builds a new tool for Kubernetes, it makes all the other tools better. And it further cements Kubernetes as the standard.

Google, Microsoft, Amazon, and IBM each have a Kubernetes-as-a-service offering, making it easier to shift infrastructure between the major cloud providers. We are likely to see Digital Ocean, Heroku, and longer tail cloud providers offer a managed, hosted Kubernetes eventually.

In this editorial, I explore the following questions:

- Why is this happening?

- What are the implications for developers?

- How are cloud providers affected?

What are the new business models that are possible in a Kubernetes-standardized world?

Software Standards

Standardized software platforms are both good and bad.

Standards allow developers to have expectations around how their software will run. If a developer builds something for a standardized platform, they can make smart estimations about the total addressable market for that piece of software.

If you write a program in JavaScript, you know that it will run in everyone’s browser. If you create a game for iOS, you know that everyone with an iPhone will be able to download it. If you build a tool for profiling garbage collection in .NET, you know that there is a large community of Windows developers with memory issues who can buy your software.

Standardized proprietary platforms can lead to massive profit returns for the platform provider. In 1995, Windows was such a good platform that Microsoft could sell a CD in a cardboard box for $100. In 2018, the iPhone is so good that Apple can take 30% from all app sales on its platform.

Proprietary standards lead to fragmentation.

Your iPhone app does not run on my Kindle Fire. I can’t use your Snapchat augmented reality sticker on Facebook Messenger. My favorite digital audio workstation only runs on Windows–so I have to keep a Windows computer around just to make music.

When developers see this fragmentation, they groan. They imagine the greedy capitalists, who are making money at the expense of software quality. Developers think, “why can’t we all just get along? Why can’t we have things be open and free?”

Developers think, “we don’t need proprietary standards. We can have open standards.”

Growth of Apache, part of the LAMP (Linux, Apache, MySQL, PHP) Stack

This happened with Linux. These days, new server side applications are mostly in Linux. There was a time when that was controversial (see the far left hand side of chart above).

More recently, we saw a newer open standard with Docker. Docker gave us an open, standardized way of packaging, deploying, and distributing a single node. This was hugely valuable! But for all of the big problems that Docker solved, it highlighted a new problem–how should we be orchestrating these nodes together?

After all–your application is not just a single node. You know you want to be deploying a Docker container–but how are your containers communicating with each other? How are you scaling up instances of these containers? How are you routing traffic between container instances?

Container Orchestration

After Docker became popular, a scramble of open source projects and proprietary platforms emerged to solve the problem of container orchestration.

Mesos, Docker Swarm, and Kubernetes each offered a different set of abstractions for managing containers. Amazon ECS offered a proprietary managed service that took care of installation and scaling of Docker containers for AWS users.

Some developers did not adopt any orchestration platform, and used BASH, Puppet, Chef, and other tools to script their deployments.

Whether a developer was deploying their container by using an orchestration platform or a script, Docker sped up development, and made things more harmonious between developers and operations.

As more developers deployed containers with Docker, the importance of the orchestration platform was becoming clear. A container is as fundamental to a distributed system as an object is to an object oriented program. Everyone wanted to be using a container platform, but there was a struggle between these platforms, and it was hard to see which would come out on top–or if it would be a decades long struggle, like iOS vs. Android.

This struggle between the different container orchestration platforms was causing fragmentation–not because any of the popular orchestration frameworks were proprietary (Swarm, Kubernetes, and Mesos are all open source), but because each container orchestration community had invested so much in their own system.

So, from 2013 – 2016, there was some anxiety among Docker power users. Choosing a container orchestration framework is a huge bet–and if you chose the wrong orchestration system, it would be like opening a movie store and choosing HD DVD over Blu-ray.

These pictures of container ships falling over never get old. Image: Huffington Post

The war between container orchestrators felt like a winner-take-all affair.

And as in any war, there was a fog that was hard to see through. When I was reporting on container orchestration wars, I recorded podcast after podcast with container orchestration experts, where I would ask some form of the question, “so, which container orchestration system is going to win?”

I did this until someone I respect told me that asking about who was going to “win the orchestration wars” was a less interesting question than evaluating the technical tradeoffs between the orchestrators.

Looking back, I regret buying into the narrative of a war between container orchestrators.

As the debates about container orchestrators raged on, the smartest people in the room were mostly having calm, scientific discussions–even when reporters like me were getting worked up, and thinking that this was a story about tribalism.

The conflicts between container orchestrators were not about tribalism–they were more about differences of opinion, and developer ergonomics.

OK, maybe the container orchestration war wasn’t only about differences of opinion. There is a boatload of money to be made in this space. We are talking about contracts with billion dollar legacy organizations–banks, and telcos, and insurance companies–who are making their way onto the cloud.

If you are in the business of helping telcos move onto the winning platform, business is good. If you champion the wrong platform, you end up with a warehouse full of HD-DVDs.

The time where the conflict was the worst was near the end of 2016, when there were rumors about Docker potentially forking, so that Docker the company could change the Docker standard to fit better with their container orchestration system Docker Swarm–but even in those times, it would have made sense to be an optimist.

Creative destruction is painful, but it is a sign of progress–and in the struggle for container orchestration dominance, there was a ton of creative destruction.

And when the dust cleared around the end of 2016, Kubernetes was the clear winner.

Today, with Kubernetes becoming the safe choice, CIOs feel more comfortable adopting container orchestration at their enterprises–which makes vendors feel more comfortable investing in Kubernetes-specific tools to sell to those CIOs.

This brings us to the present:

- The open source developers are rowing in the same direction, excited about what to build.

- Major enterprises (both new and legacy) are buying into Kubernetes.

- The major cloud providers are ready with low-cost Kubernetes-as-a-service.

The numerous vendors of monitoring, logging, security, and compliance software are thrilled because the underlying software stack that they have to integrate with is becoming more predictable.

Going Multicloud

Today, the most lucrative provider of proprietary backend developer infrastructure is Amazon Web Services. Developers do not view AWS resentfully, because AWS is innovative, empowering, and cheap. If you are paying AWS a lot of money, that probably means your business is doing well.

With AWS, developers do not feel the level of lock-in that was administered by the dominant proprietary platforms of the 90s. But there is some lock-in. Once you are deeply embedded in the AWS ecosystem, using services like DynamoDB, Amazon Elastic Container Service, or Amazon Kinesis, it becomes a daunting task to move away from Amazon.

With Kubernetes creating an open, common layer for infrastructure, it becomes theoretically possible to “lift and shift” your Kubernetes cluster from one cloud provider to another.

If you decided to lift-and-shift your application, you would have to rewrite parts of your application to stop using the Amazon-specific services (like Amazon S3). For example, if you wanted an S3 replacement that would run on any cloud, you could configure a Kubernetes cluster with Rook, and start to store objects on Rook with the same APIs that you would use to store them on S3.

This is a nice option to have, but I have not yet heard of anyone actually lifting and shifting their application away from a cloud–except for Dropbox, and their migration was so epic it took two and a half years.

Certainly there is someone out there other than Dropbox who spends so much money on Amazon S3 that they will consider spinning up their own object store, but it would be a huge effort to do such a migration.

(Kubernetes can be used for lifting and shifting–but more likely will be used to have familiar operating layer in different clouds)

Kubernetes probably won’t be a tool for widespread lifting and shifting any time soon. A more likely scenario is that Kubernetes will become a ubiquitous control plane, that enterprises will use on multiple clouds.

NodeJS is a useful analogy. Why do people like NodeJS for their server side applications? It’s not necessarily because Node is the fastest web server. It’s because people like to use the same language on both the client and the server. Just like NodeJS lets you move between your client and server code without having to switch languages, Kubernetes will let you switch between clouds without having to change how you think about operations.

On each cloud, you will have some custom application code running on Kubernetes that interacts with the managed services provided by that cloud.

Companies want to be multi-cloud–partly for disaster recovery, but also because there is actual upside to having access to managed services on the different clouds.

One emerging pattern is to split infrastructure between AWS for user traffic and Google Cloud for data engineering. One company that uses this pattern is Thumbtack:

At Thumbtack, the production infrastructure on AWS serves user requests. The log of transactions that occur get pushed from AWS to Google Cloud, where the data engineering occurs. On Google Cloud, the transaction records are queued in Cloud PubSub, a message queueing service. Those transactions are pulled off the queue and stored in BigQuery, a system for storage and querying of high volumes of data.

BigQuery is used as the data lake to pull from when orchestrating machine learning jobs. These machine learning jobs are run in Cloud Dataproc, a managed service that runs Apache Spark. After training a model in Google Cloud, that model is deployed on the AWS side, where it serves user traffic. On the Google Cloud side, the orchestration of these different managed services is done by Apache Airflow, an open source tool that is one of the few pieces of infrastructure that Thumbtack does have to manage themselves on Google Cloud.

Today, Thumbtack uses AWS for serving user requests and Google Cloud for data engineering and queueing in PubSub. Thumbtack trains its machine learning models in Google, and deploys them to AWS.

This is just the way things are today. Thumbtack might eventually use Google Cloud for user-facing services as well.

(a multi-cloud data engineering pipeline from a Japanese company)

More companies will gradually move towards multiple clouds–and some of them will manage a unique Kubernetes cluster on each cloud.

You might have a GKE Kubernetes cluster on Google to orchestrate workloads between BigQuery, Cloud PubSub, and Google Cloud ML, and you might have an Amazon EKS cluster to orchestrate workloads between DynamoDB, Amazon Aurora, and your production NodeJS application.

Cloud providers are not replaceable commodities. The services provided by the different clouds will get increasingly exotic and differentiated. Enterprises will benefit from having access to the different services on the different cloud providers.

Distributed Systems Distribution

Services like Google BigQuery and AWS Redshift are popular because they give you a powerful, scalable, multi-node tool with a simple API. Developers often choose these managed services because they are so easy to use.

You don’t see these types of paid tools for single nodes. I don’t pay for NodeJS, or React, or Ruby on Rails.

Tools for a single node are much easier to operate than tools for a distributed system. It’s harder to deploy Hadoop across servers than to run a Ruby on Rails application on my laptop. With Kubernetes, this is going to change.

If you are using Kubernetes, you can use a distributed systems package manager called Helm. This is like npm for Kubernetes applications. If you are using Kubernetes, you can use Helm to easily install a complicated multi-node application, no matter what cloud provider you are on.

Here’s a description of Helm:

Helm helps you manage Kubernetes applications — Helm Charts helps you define, install, and upgrade even the most complex Kubernetes application. Charts are easy to create, version, share, and publish — so start using Helm and stop the copy-and-paste madness.

A package manager for distributed systems. Amazing! Let’s take a look at what I can install.

Not pictured: WordPress, Jenkins, Kafka

Distributed systems are hard to set up. Ask anyone who has set up their own Kafka cluster. Helm is going to make installing Kafka as easy as installing a new version of Photoshop on your MacBook. And you will be able to do this on any cloud.

Before Helm, the closest thing to a distributed systems package manager (that I am aware of) was the marketplace that you find on AWS or Azure, or the Google Cloud Launcher. Here again we see how proprietary software can lead to fragmentation. Before Helm, there was no standard, platform-agnostic way of one-click installing Kafka.

You can find a way to one-click install Kafka on AWS, Google, or Azure. But each of these installations had to be written independently, for that specific cloud provider. And to install Kafka on Digital Ocean, you need to follow a 10-step tutorial.

Helm represents a cross-platform system for distributing multi-node software on any Kubernetes instance. You could use an application installed with Helm in any cloud provider or on your own hardware. You could easily install Apache Spark or Cassandra–systems that are notoriously difficult to set up and operate.

Helm is a package manager for Kubernetes–but it also looks like the beginnings of an app store for Kubernetes. With an app store, you could sell software for Kubernetes.

What kind of software could you sell?

You could sell enterprise distributions of distributed systems platforms like Cloudera Hadoop, Databricks Spark, and Confluent Kafka. New monitoring systems like Prometheus and multi-node databases like Cassandra that are hard to install.

Maybe you could even sell higher-level, consumer-grade software like Zendesk.

The idea of a self-hosted Zendesk sounds crazy, but someone could build that, and sell it in the form of a proprietary binary, at the cost of a flat fee instead of a subscription. If I sell you a $99 Zendesk-for-Kubernetes, and you can easily run it on your Kubernetes cluster on AWS, you are going to end up saving a lot of money on support ticketing software.

Enterprises often run their own WordPress to manage a company blog. Is the software for Zendesk that much more complicated than WordPress? I don’t think so–but enterprises are much more scared of managing their own help desk software than managing their own blogging software.

I run a pretty small business but I subscribe to a LOT of different software-as-a-service tools. An expensive WordPress host, an expensive CRM for advertising sales, MailChimp for the newsletter. I pay for these services because they are super reliable and secure, and they are complex multi-node applications. I wouldn’t want to host my own. I wouldn’t want to support them. I don’t want to be responsible for technical troubleshooting when my newsletter fails to send. I want to run less software.

In contrast, I’m not afraid that my single-node software is going to break.

Software that I use from a single node tends to be much less expensive because I don’t have to buy it as a “service”. The software that I use to write music has a 1-time fixed cost. Photoshop has a 1-time fixed cost. I pay for the electricity to run my computer, but other than that I have no ongoing capital expense to run Photoshop.

When multi-node applications are as reliable as single-node applications, we will see changes in the pricing models.

Maybe someday I will be able to purchase a Zendesk-for-Kubernetes. The Zendesk-for-Kubernetes will give me everything I need from a help desk–it will spin up an email server, it will give me a web frontend to manage tickets. And if anything goes wrong, I can pay for support when I need it.

Zendesk is a fantastic service–but it would be cool if it had a fixed pricing model

Metaparticle

With Kubernetes, it becomes easier to deploy and manage distributed applications. With Helm, it becomes easier to distribute those applications to other users. But it’s still pretty hard to develop distributed systems.

This was the focus of a recent CloudNative Con/KubeCon Keynote by Brendan Burns, called “This Job is Too Hard: Building New Tools, Patterns, and Paradigms to Democratize Distributed Systems Development”.

In his presentation, Brendan presented a project called Metaparticle. Metaparticle is a standard library for cloud-native development, with the goal of democratizing distributed systems. In an introduction to Metaparticle, Brendan wrote:

← All episodes of Greatest Hits Archives - Software Engineering Daily