PLAY PODCASTS
Software Engineering Daily

Software Engineering Daily

2,188 episodes — Page 21 of 44

Ep 1347Courier with Troy Goode

A gig economy application generates lots of notifications. There is SMS, mobile phone updates, emails, and native application updates. If you order a ride from Uber, you might receive a text message and a push notification at the same time. If an app overloads the user with notifications, the user might end up annoyed and delete the app from their phone. But perhaps all of these notifications are necessary. You would rather get three simultaneous notifications from your food delivery app than fail to get your food on time. If you are the mobile application developer building the food delivery app, what other choice do you have? At large companies such as Linkedin, there are entire teams devoted to figuring out how to optimize the notifications that they send you. It has a surprisingly large impact on the usability of a mobile application. Troy Goode is the founder of Courier, a company that provides notification optimization. This might sound like a small, trivial problem. But it actually has a large impact on the usage of apps. And it is not an easy engineering problem. Troy joins the show to talk about the problem that Courier solves and the backend infrastructure that powers it. Courier is built entirely on serverless APIs. This is a great case study in how to build a completely scalable infrastructure product based on serverless tools.

Feb 21, 20201h 12m

Ep 1345Data Infrastructure Investing with Eric Anderson

In a modern data platform, distributed streaming systems are used to read data coming off of an application in real-time. There are a wide variety of streaming systems, including Kafka Streams, Apache Samza, Apache Flink, Spark Streaming, and more. When Eric Anderson joined the show back in 2016, he was working at Google on Google Cloud Dataflow, a managed service for handling streaming data. Today, he works as an investor at Scale Venture Partners. In his current job, he analyzes companies built around data infrastructure, developer tooling, and other enterprise engineering domains. Eric also hosts the podcast Contributor, which explores open source maintainers and the stories of their projects. His podcast has featured the creators of projects such as Envoy, Alluxio, and Chef. In today’s episode, Eric returns to the show to discuss data infrastructure, investing, and the evolving world of open source.

Feb 20, 20201h 8m

Ep 1344Materialize: Streaming SQL on Timely Data with Arjun Narayan and Frank McSherry

Distributed stream processing frameworks are used to rapidly ingest and aggregate large volumes of incoming data. These frameworks often require the application developer to write imperative logic describing how that data should be processed. For example, a high volume of clickstream data that is getting buffered to Kafka needs to have a stream processing system evaluate that data to prepare it for a data warehouse, Spark, or some other queryable environment. In practice, many developers simply want to have the high volume of data become queryable in the fewest number of steps possible. Materialize is a streaming SQL materialized view engine that provides materialized views over streaming data. The materialized views are incrementally updated over time and reconciled with new data that may have come in out of order. Arjun Narayan and Frank McSherry are the co-founders of Materialize, a company whose technology is based on the Naiad paper, which was written at Microsoft Research. Arjun and Frank join the show to talk about modern streaming systems and their strategy for taking an academic paper and productizing it.

Feb 19, 20201h 9m

Ep 1342Go Networking with Sneha Inguva

A cloud provider gives developers access to virtualized server infrastructure. When a developer rents this infrastructure via an API call, a virtual server is instantiated on physical machines. That virtual server needs to be made addressable through the allocation of an IP address to make it reachable from the open Internet. When the virtual server starts to receive too much traffic, that traffic needs to be load balanced with another virtual server. The backend networking code that runs a cloud provider needs to be fast, secure, and memory-efficient. Languages that fit that description include C++, Rust, and Go. Digital Ocean’s low-level networking code is mostly written in Go. Sneha Inguva is an engineer with Digital Ocean who has written and spoken about writing networking applications using Go. She joins the show to talk about her work at Digital Ocean, including the implementation of a DHCP server, a network server that assigns IP addresses and other parameters to devices that sit on that network.

Feb 18, 202054 min

Ep 1341Great Expectations: Data Pipeline Testing with Abe Gong

A data pipeline is a series of steps that takes large data sets and creates usable results from them. At the beginning of a data pipeline, a data set might be pulled from a database, a distributed file system, or a Kafka topic. Throughout a data pipeline, different data sets are joined, filtered, and statistically analyzed. At the end of a data pipeline, data might be put into a data warehouse or Apache Spark for ad-hoc analysis and data science. At this point, the end-user of the data set expects that data to be clean and accurate. But how do we have any guarantees about the correctness? Abe Gong is the creator of Great Expectations, a system for data pipeline testing. In Great Expectations, the developer creates tests called “expectations”, which verify certain characteristics of the data set at different phases in a data pipeline. This helps ensure that the end result of a multi-stage data pipeline is correct. Abe joins the show to discuss the architecture of a data pipeline and the use cases of Great Expectations.

Feb 17, 20201h 4m

Ep 1340Data Warehouse ETL with Matthew Scullion

A data warehouse provides low latency access to large volumes of data. A data warehouse is a crucial piece of infrastructure for a large company, because it can be used to answer complex questions involving a large number of data points. But a data warehouse usually cannot hold all of a company’s data at any given time. Users need to move a subset of the data into the data warehouse by reading large files from a data lake on disk and putting that data into the data warehouse. The process of moving data from one place into another is broken down into three sequential steps, often called “ETL” (extract, transform, load) or “ELT” (extract, load, transform). In ETL, the data is extracted from a source such as a data lake, transformed into a schema that is customized for the data warehouse application, and then loaded into the data warehouse. In ELT, the last two steps are reversed, because modern systems can often leave the necessary schema transformation until after the data has been loaded into the data warehouse. Matthew Scullion is the CEO of Matillion, a company that specializes in building tools for data transformations. Matthew joins the show to talk about the problem of data transformation, and how that problem has evolved over the nine years since he started Matillion. If you enjoy the show, you can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

Feb 14, 202053 min

Ep 1337Anyscale with Ion Stoica

Machine learning applications are widely deployed across the software industry. Most of these applications used supervised learning, a process in which labeled data sets are used to find correlations between the labels and the trends in that underlying data. But supervised learning is only one application of machine learning. Another broad set of machine learning methods is described by the term “reinforcement learning.” Reinforcement learning involves an agent interacting with its environment. As the model interacts with the environment, it learns to make better decisions over time based on a reward function. Newer AI applications will need to operate in increasingly dynamic environments, and react to changes in those environments, which makes reinforcement learning a useful technique. Reinforcement learning has several attributes that make it a distinctly different engineering problem than supervised learning. Reinforcement learning relies on simulation and distributed training to rapidly examine how different model parameters could affect the performance of a model in different scenarios. Ray is an open source project for distributed applications. Although Ray was designed with reinforcement learning in mind, the potential use cases go beyond machine learning, and could be as influential and broadly applicable as distributed systems projects like Apache Spark or Kubernetes. Ray is a project from the Berkeley RISE Lab, the same place that gave rise to Spark, Mesos, and Alluxio. The RISE Lab is led by Ion Stoica, a professor of computer science at Berkeley. He is also the co-founder of Anyscale, a company started to commercialize Ray by offering tools and services for enterprises looking to adopt Ray. Ion Stoica returns to the show to discuss reinforcement learning, distributed computing, and the Ray project. If you enjoy the show, you can find all of our past episodes about machine learning, data, and the RISE Lab by going to SoftwareDaily.com and searching for the technologies or companies you are curious about . And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

Feb 13, 202051 min

Ep 1336Flink and BEAM Stream Processing with Maximilian Michels

Distributed stream processing systems are used to read large volumes of data and perform operations across those data streams. These stream processing systems often build off of the MapReduce algorithm for collecting and aggregating large volumes of data, but instead of processing a calculation over a single large batch of data, they process data on an ongoing basis. There are so many different stream processing system for this same use case–Storm, Spark, Flink, Heron, and many others. Why is that? When there seems to be much more consolidation around the Hadoop MapReduce batch processing technology, why are there so many stream processing systems? One explanation is that aggregating the results of a continuous stream of data is a process that very much depends on time. At any given point in time, you can take a snapshot of the stream of data, and any calculation based on that data is going to be out of date by the time that your calculation is finished. There is a latency between when you start calculating something, and when you finish calculating it. There are other design decisions for a distributed stream processing system. What data do you keep in memory? What do you keep on disk? How often do you snapshot your data to disk? What is the method for fault tolerance? What are the APIs for consuming and processing this data? Maximilian Michels has worked on the Apache Flink and Apache BEAM stream processing systems, and currently works on data infrastructure at Lyft. Max joins the show to discuss the tradeoffs of different stream processing systems and his experiences in the world of data processing. You can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

Feb 12, 202046 min

Ep 1335Druid Analytics with Jad Nauous

Large companies generate large volumes of data. This data gets dumped into a data lake for long-term storage, then pulled into memory for processing and analysis. Once it is in memory, it is often read into a dashboard, which presents a human with a visualization of the data. The end-user who is consuming this data is often a data scientist who is looking at the data to find trends and design new machine learning models. Another kind of user is the operational analyst. An operational analyst is creating complex queries across this data to find latencies in the infrastructure, or perhaps slicing and dicing clickstream data that is coming from online advertisements, in order to figure out how to tweak those advertising algorithms and spend money more effectively. For an operational analyst, a key use case for a data warehouse is fast, interactive querying. The operational analyst needs to be able to query the data to quickly create a dashboard, make judgments based on that dashboard, and then change the query slightly to look at a slightly different dashboard. Druid is a high-performance database that is used for these kinds of queries. Druid is used for ad-hoc queries and operational analytics. Imply Data is a company that builds visualization, monitoring, and security around Druid. Jad Naous is vice president of R&D for Imply, and he joins the show to talk about the use case for Druid, the architecture, and the business model of Imply. If you enjoy the show, you can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

Feb 11, 202051 min

Ep 1334The Data Exchange with Ben Lorica

Data infrastructure has been transformed over the last fifteen years. The open source Hadoop project led to the creation of multiple companies based around commercializing the MapReduce algorithm and Hadoop distributed file system. Cheap cloud storage popularized the usage of data lakes. Cheap cloud servers led to wide experimentation for data tools. Apache Spark emerged from academia, and Apache Kafka came out of the corporate challenges faced by LinkedIn. Over these 15 years, Ben Lorica has been following the world of data engineering as an engineer, a conference organizer, and a podcaster. When he was host of the O’Reilly Data Show, his material served as inspiration for some of the episodes of this podcast. Today he hosts The Data Exchange podcast and writes The Data Exchange newsletter. Ben joins the show to talk about modern data engineering, and his opinion on the past and future of data infrastructure. If you enjoy the show, you can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

Feb 10, 20201h 5m

Ep 1333Presto with Justin Borgman

A data platform contains all of the data that a company has accumulated over the years. Across a data platform, there is a multitude of data sources: databases, a data lake, data warehouses, a distributed queue like Kafka, and external data sources like Salesforce and Zendesk. A user of the data platform often has a question that requires multiple data sources to answer. How does this user join two data sources from a data lake? How does this user join data across a transactional database and a data lake? How does the user join data from two different data warehouse technologies? Presto is an open source tool originally developed at Facebook. Presto allows a user to query a data platform with a SQL statement. That query gets parsed and executed across the data platform to read from any heterogeneous data source. For some use cases, Presto is replacing the technology Hadoop MapReduce-based technology Hive. For other use cases, Presto is solving a problem in a completely novel way. Justin Borgman joins the show to discuss the motivation for Presto, the problems it solves, and the architecture of Presto. He also talks about the company he started, Starburst Data, which sells and supports technologies built around Presto. If you enjoy the show, you can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

Feb 7, 20201h 12m

Ep 1332Nubank Data Engineering with Sujith Nair

Nubank is a popular bank that is based in Brazil. Nubank has more than 20 million customers, and has accumulated a high volume of data over the six years since it was started. Mobile computing and cloud computing have given rise to “challenger banks” that operate more like software companies. When a software company reaches the size that Nubank is at today, it needs a data platform. A data platform is a collection of different technologies that move data into different storage formats and applications, so that different members of an organization can access that data. New data often enters an organization through an OLTP database, which supports user transactions. That data is copied into a data lake, which provides cheap bulk storage. From the data lake, the data is moved into a data warehouse system for fast access. Along the way, tools like Kafka, Spark, and S3 are used to implement the needs of the data platform. Data platform architecture is not an exact science. Different companies build their data platform based on their own unique requirements. Previous shows have covered the data infrastructure companies like Lyft, Uber, and Facebook. Today’s show is another case study in data infrastructure, with a modern bank. In a previous episode, we covered the engineering of Nubank. Sujith Nair from Nubank joins today’s show to talk about the data infrastructure of the company. If you enjoy the show, you can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

Feb 6, 20201h 0m

Ep 1331Changelog Podcasting with Adam Stacoviak and Jerod Santo

The Changelog is a podcast about the world of open source. As open source has become closely tied with the entire software development lifecycle, The Changelog has expanded its coverage to the broader software industry. Since starting the podcast ten years ago, Adam Stacoviak and Jerod Santo have become full-time podcasters, and they have started several other podcasts within the Changelog network, including Go Time, JS Party, and Practical AI. Throughout all of their shows, there is a consistent theme of technical, entertaining conversations about software. In the last decade, so much has changed within open source: GitHub became the de facto social network for open source; Kubernetes created a widely used platform for distributed systems; React has given frontend developers a component system to consolidate around. Adam and Jerod return to the show to discuss their perspective on the past and future of open source, and their learnings from interviewing influential software professionals for 10 years.

Feb 5, 20201h 12m

Ep 1330Rive: Animation Tooling with Guido and Luigi Rosso

Animations can be used to create games, app tutorials, and user interface components. Animations can be seen in messaging apps, where animated reactions can convey rich feelings over a text interface. Loading screens can become less boring through animation, and voice assistant products can feel more alive through animation. But we still don’t see much animation in our everyday applications. This is partly because animation tooling is difficult to use. To make an animation, the typical workflow is to go into a tool like After Effects, render your animation, and then export that animation in a movie format. This format is not dynamic enough to be easily used on the wide variety of development platforms. The animation library Lottie did improve this tooling by creating a system for exporting animations to JSON and allowing them to easily scale up and down as vectors. But the animations still were simple and unidirectional. The developer did not have much freedom for how to move an animation in response to user input. Rive is a system for creating dynamic, movable animated objects. Rive allows for the creation of animated elements that respond to user input. Rive has a tool that runs in the browser and allows the user to define the animation. The animations in Rive use a bone system that allows animators and designers to define the points of the animated sprite that the developer can then manipulate with code. This improves the painful handoff process that exists between animators and developers, and gives the developer some programmatic control. Guido and Luigi Rosso are the founders of Rive and they join the show to talk about the frictions of animation tooling, and what they have built to improve

Feb 4, 20201h 16m

Ep 1329John Deere: Farm Software with Ryan Bergman

Robotics has changed modern agriculture. Autonomous systems are powering the tractors, cotton pickers, and corn cutters that yield plants at industrial scale. John Deere is a company that has been making farm equipment for 183 years. Over that period, the planting and harvesting process has become increasingly mechanized, and John Deere has been at the forefront. Over the last few decades, software has played an increasingly important role at John Deere. Today, there is software inside the vehicles. These vehicles can operate autonomously, they collect large amounts of data, and they are supported by a large system of cloud services. The teams within John Deere who create the software have an elaborate testing workflow that allows them to deploy the software to the vehicles and drive them in the field. Ryan Bergman is a software engineer at John Deere and he joins the show to talk about the software engineering, management, and DevOps practices within the company.

Feb 3, 202056 min

Ep 1328Venture Stories with Erik Torenberg

Venture capital investing requires an understanding of market dynamics, technology, and finance. There is also an element of human nature. Consumer trends can make or break the viability of a new product. And early stage venture investing is always a bet on a small team or individual founder. Early stage investments are usually into companies that have not found perfect traction with their product. Judging the worth of an early stage investment means judging the likelihood that the founders can make their vision a reality. Venture Stories is a podcast that explores the wide spectrum of ideas that go into venture investing. Shows include two person interviews on economics, social networking, food technology, cryptocurrencies, and consumer psychology. Erik Torenberg is a co-founder and partner of Village Global, an early stage venture capital firm. He is also the host of Venture Stories. Erik joins the show to discuss investing, media, and the kinds of new technology companies that are being created today.

Jan 31, 202051 min

Ep 1326Alpaca: Stock Trading API with Yoshi Yokokawa

Stock trading takes place across a variety of software platforms. Etrade and Schwab have allowed individual traders to buy securities for decades. Robinhood built a business around a similar model, but also removed the commission. Wealthfront and Betterment provide “roboadvisor” services that abstract away the underlying securities and focus on managing a risk profile. Each of these services has a programmatic execution system for managing assets. In order for a developer to build a product like Robinhood or Wealthfront, that developer needs access to an API that can execute trades. Alpaca is an API for stock trading. Alpaca can be used to build financial products, apps, and algorithmic trading programs. Yoshi Yokokawa is the founder of Alpaca, and he joins the show to talk about why he built an API for trading and the potential applications of Alpaca. Yoshi’s background includes work in finance at Lehman Brothers, a period spent as an individual day trader, and the previous company he started selling custom trading algorithms to enterprises.

Jan 30, 20201h 7m

Ep 1325Cloud Log Analysis with Jack Naglieri

Large software companies have lots of users, and the activity from those users results in high volumes of traffic. These companies also have a large surface area across the enterprise. There are hundreds of services and databases that are fulfilling user requests. As these requests enter the infrastructure of the enterprise, the requests travel through the different services and result in database queries, payments, and other transactions. These transactions result in the generation of log messages. The log messages tell the story of what is happening across the entire company. Log messages can provide valuable data for security and site reliability engineering. But analyzing a high volume of log data requires a scalable system that can account for that high volume. Jack Naglieri is the CEO of Panther Security. He previously worked at Airbnb, where he helped develop a system called StreamAlert. At Airbnb, log messages are buffered into distributed queueing systems like Kafka or Kinesis, and they are written to bucket storage systems like S3. Those logs are processed by AWS Lambda functions that test the log messages for rules defined by a system operator. Jack left Airbnb and started Panther Security to generalize the tools he built within Airbnb and build a company around the same ideas. Jack joins the show to discuss modern logging infrastructure, his work at Airbnb, and his experience building Panther.

Jan 29, 20201h 3m

Ep 1323Replicated Software Delivery with Grant Miller and Marc Campbell

Distributed systems are required to run most modern enterprise software. Application services need multiple instances for scalability and failover. Large databases are sharded onto multiple nodes. Logging services, streaming frameworks, and continuous integration tools all require the orchestration of more than one server. Deploying a distributed system has historically been difficult because the nodes of the system must be managed by the underlying infrastructure. If I have a distributed database that I want to deploy, the complexity of that deployment is going to be different depending on whether I am running on AWS, or VMware, or my own bare metal server infrastructure. Heterogeneous server infrastructure makes it hard to sell distributed applications that get deployed to that infrastructure. A vendor that is selling a distributed database would need to figure out how to make their database work on the infrastructure of any given customer. Kubernetes has simplified the process of deploying a distributed application. Kubernetes is a container orchestration system that has steadily grown in popularity, to the point where the ecosystem is mature and the software is stable. Now that the software industry has a reliable, portable means of deploying a distributed application, the enterprise software market is becoming easier to enter for companies that sell a distributed application. Replicated is a company that builds products for software delivery. Replicated allows for the distribution and updating of applications that would have been hard to deploy in the past. Grant Miller and Marc Campbell are the CEO and CTO of Replicated, and they join the show to talk about the modern enterprise software market, and the process of delivering software to companies that might otherwise have trouble consuming it. Full disclosure: Replicated is a sponsor of Software Engineering Daily.

Jan 28, 20201h 5m

Ep 1322Mattermost with Ian Tien

Chat systems have been a part of software development for decades. Older systems like Pidgin and Yammer were surpassed by newer systems like HipChat. And when Slack was created, it quickly became a part of most software companies. But Slack does not fulfill the needs of every company. Mattermost is an open-source chat system. Mattermost can be configured to work within enterprises that have strong constraints around compliance and data governance. Whereas Slack is a SaaS product that requires users to send their data to the cloud servers managed by Slack, Mattermost allows the enterprise to decide how data moves through services, and where the databases are hosted. Ian Tien is the CEO of Mattermost, and he joins the show to talk about why many companies need their chat system to be hosted in a private cloud or on-premises. In a previous episode with CTO Corey Hulen we discussed the engineering behind the company. In today’s episode, we explore the management and strategy of the business, as well as some additional engineering, since Ian Tien’s background is as a software engineer and computer scientist.

Jan 27, 202047 min

Ep 1321GitLab Strategy with Sid Sibrandij

The word “DevOps” has a different definition depending on who you ask. For some people, it is about the process of managing and releasing code. It can involve container management and server orchestration. It can involve infrastructure-as-code, and safer configuration management. In addition to a set of technologies, DevOps can be seen as a management concept that describes agile practices, and breaking down communication barriers between different teams. One thing that most software companies have decided is that whatever DevOps is, we want it. We want to release more software, we want to do it faster, and we want to do it safer. We want streamlined communication between management and engineering. We want a full understanding of the “value chain” of software. Despite the elusiveness of a single description for what DevOps is, GitLab can credibly describe itself as a tool that satisfies the DevOps needs of most enterprises. GitLab started as an open source version control management system based on Git. It has expanded into products that include continuous integration, security, issue tracking, and monitoring. The trajectory of GitLab into such a large platform is something that nobody anticipated. The best explanation for how it happened is that it is the downstream result of an engineer within GitLab deciding that the code hosting product needed to have a continuous integration product bundled with it as an option for a tightly coupled, unified workflow. Today, there are many enterprises trying to make a big set of changes to their development practices. The world is consolidating around Git for version control and Kubernetes for container management. Almost every enterprise is figuring out a “cloud strategy”. Every team wants to have continuous integration, and they want it to have some security products paired with that release workflow in a popular, vaguely defined set of practices known as “DevSecOps”. With so many changes coming to enterprises, it turns out that many of these enterprises just want some sane defaults. When GitLab came to market with a bundled CI and code hosting product, the company discovered that the customers were very happy to have integrated tools that worked well out of the box. This was in stark contrast to the years of NxN tooling integration that an enterprise would have to make to stitch together their broad range of carefully selected tools. Sid Sibrandij is the CEO of GitLab, and he joins the show for a conversation about how GitLab arrived at its product development strategy. In a previous episode, Sid discussed some of the core features and history of GitLab. Today’s show expands on many of the subjects we explored previously. We also had a spirited discussion of the modern nature of work, and how GitLab’s unique culture and fully remote team have evolved as the company has scaled.

Jan 24, 20201h 1m

Ep 1320Lyft Kubernetes with Vicki Cheung

The ridesharing infrastructure of Lyft has a high volume of traffic that is mostly handled by servers on AWS. When Vicki Cheung joined Lyft in 2018, the company was managing containers with an internally built container scheduler. One of her primary goals at the company was to move Lyft to Kubernetes. In today’s episode, Vicki gives an overview of Lyft infrastructure and the core engineering problems within the company. One subject she touched on was the network communications between the user on a mobile phone and the cloud backend. This was a topic we explored in detail on a previous episode about Envoy Mobile with Matt Klein. Vicki also discussed the broader Kubernetes ecosystem, as well as her time at OpenAI, where she managed infrastructure deployments for scheduling large machine learning jobs.

Jan 23, 202045 min

Ep 1319DFINITY: The Internet Computer with Dominic Williams

If the Internet was reimagined with the software and hardware infrastructure we have today, what would it look like? That is the question that DFINITY is working on answering. DFINITY’s goal is to build a decentralized, secure Internet computer. DFINITY takes concepts from the cryptocurrency world, but it is focused on computation, not financial products. DFINITY can be thought of as a decentralized cloud provider, with redundancy and scalability properties that are achieved by operating on data centers across the world. DFINITY wants to host web applications such as the ones that we use today on centralized servers. A developer who wants to run their application on DFINITY compiles their code to WebAssembly and deploys it to the DFINITY decentralized runtime. Transactions across DFINITY applications are processed through a collateralized proof-of-stake system to ensure reliable, decentralized computation. DFINITY is an ambitious project, and it would seem nearly impossible to bring to market if not for the quality of the team. DFINITY has hired Andreas Rossburg, a co-designer of WebAssembly, as well as talented engineers across security, web development, and backend infrastructure. Dominic Williams is the president and chief scientist of DFINITY, and he joins the show to talk about the vision for DFINITY and the roadmap to making it a reality.

Jan 22, 20201h 9m

Ep 1318Webflow Engineering with Bryant Chou

Webflow is a visual programming tool used by designers, developers, and other technical users. Webflow is a leader in the “low code” or “no code” category of software tools that has become prominent in the last few years. Webflow has been years in the making. In a previous show with Webflow CEO Vlad Magdalin, he told the story of being heads down on Webflow, steadily working through the engineering problems that stood between him and his vision of a visual programming environment. In the early days, it was unclear who would even want to use Webflow. A critic of Webflow might have said that Webflow was too high level for developers, and too technical for designers. But Webflow caters to a large subset of developers and designers who want the kind of low code experience that Webflow provides. The product has also helped define a new category of knowledge worker: the “visual programmer.” Bryant Chou is a co-founder of Webflow and was the CTO in the early days. He joins the show to discuss the engineering problems that the company has had to work through, and his perspective on how Webflow fits into the software market going forward.

Jan 21, 20201h 7m

Ep 1317Software Media with Tim O’Reilly

Software has changed the way the world functions. The rapid pace of change has made it difficult to know how to navigate the new world. Knowledge workers who want to keep advancing in their careers develop a strategy of continuous learning in order to adapt to these changes. O’Reilly Media has existed for almost 40 years, providing resources for the technical consumer. As O’Reilly has expanded its product line from books to conferences to online learning, the business has grown slowly but steadily. That business trajectory stands in contrast to many of the software companies that are financially structured to either grow rapidly or die. Today, O’Reilly has a large impact on the software ecosystem. Software professionals congregate at O’Reilly conferences. Enterprises pay O’Reilly to educate their employees. And O’Reilly continues to grow into new product lines, recently acquiring the interactive learning platform Katacoda, which can be used to learn about Kubernetes and other popular technologies. In a previous episode, we discussed Tim O’Reilly’s book “What’s The Future”. In today’s show, Tim returns to the show to discuss his experience building O’Reilly, and how his business philosophy contrasts with much of the assumed wisdom of software company building.

Jan 20, 20201h 5m

Ep 1316Apollo GraphQL with Geoff Schmidt

GraphQL has become a core piece of infrastructure for many software applications. GraphQL is used to make requests that are structured as GraphQL queries and responded to through a GraphQL server. The GraphQL server processes the query and fetches the response from the necessary databases, APIs, and backend services. Around 2016, when GraphQL was becoming popular, a company called Meteor was deciding what to do with its business. Meteor was started off of the popular framework MeteorJS, a system for building real-time JavaScript applications. MeteorJS was loved by many developers, but Meteor needed to decide if it was the most viable opportunity that it could be pursuing with its resources. From the vantage point within the Meteor company, there were some trends in the frontend ecosystem that were potentially disruptive to the viability of Meteor. There were also some large potential opportunities. The dramatic change to the frontend was largely coming as a downstream effect of Facebook’s open source technologies: React and GraphQL. Amidst these changes, Meteor shifted the company’s efforts entirely towards GraphQL and renamed the company Apollo. Geoff Schmidt is the CEO of Apollo, and he joins the show to talk about the GraphQL ecosystem, the business opportunities around GraphQL, and the process of pivoting from Meteor to Apollo. If you are planning a hackathon, check out FindCollabs Hackathons. Whether you are running an internal hackathon for your company, or you are running an open hackathon so that users can try out your product, FindCollabs Hackathons are a tool for people to build projects and collaborate with each other. You can create your own hackathon at FindCollabs.com.

Jan 17, 20201h 6m

Ep 1315JS Party with Kevin Ball

The JavaScript ecosystem stretches across frontend, backend, and middleware. There are newer tools such as GraphQL, Gatsby, and WebAssembly. There are frameworks like React, Vue, and Angular. There is complex data handling with streams, caches, and TensorFlow.js. JavaScript is unlike any other ecosystem, because a single language can be used to construct every part of an application. Because JavaScript is used for such a broad spectrum of use cases, the amount of tooling available can be intimidating to someone new to the ecosystem. Kevin Ball is a host of JS Party, a podcast on The Changelog network. Kevin joins the show to give his perspective on the JavaScript ecosystem. We discussed ES Modules, the JAM Stack, and the growing number of tools, libraries, and workflows used by JavaScript developers.

Jan 16, 20201h 0m

Ep 1314Packet: Baremetal Infrastructure with Zachary Smith and Nathan Goulding

Cloud infrastructure is usually consumed in the form of virtual machines or containers. These VMs or containers are running on a physical host machine that is also running other VMs and containers. This is called multitenancy. Servers across cloud providers such as AWS have a high utilization because there are multiple virtual instances running on each physical server host. Cloud computing has led to a low cost of compute infrastructure. But in some cases, this low cost comes at the price of not being able to control the underlying hardware with as much precision as the user would want. Some users want specific types of hardware. Other users want to be using dedicated hardware that does not risk the “noisy neighbor” problem of sharing a physical server with some other application that is using most of the resources. Packet is a company that provides remote access to baremetal infrastructure. The user experience is similar to that of a cloud provider, but with more control over how a given physical host will be used. Zachary Smith is the CEO of Packet and Nathan Goulding is the chief architect. Zach and Nathan join the show to talk about the business and the engineering behind Packet, as well as the future goals for where they want to take the company.

Jan 15, 202050 min

Ep 1313Edge Computing Platform with Jaromir Coufal

Edge computing is the usage of servers that are geographically close to the client device. The first common use case for edge computing was CDNs: content-delivery networks. A content delivery network placed media files such as images and videos on multiple servers throughout the world. These are big files, and they take lots of bandwidth to transfer. By placing them at CDNs, the files would be closer to any user around the world. These early use cases for edge computing were purely about storing large files. But the vast majority of compute still took place at the central application servers. Over time, users have required faster and faster application experiences. Today, an increasing amount of compute has been moved to the edge, in addition to the existing storage applications. More user data is being cached at the edge to make for quicker transactional processing. Machine learning model training and hosting at the edge makes for a faster, more responsive machine learning feedback loop. Jaromir Coufal is an engineer with Red Hat. He joins the show to talk about modern applications of edge computing, and how the demand for edge computing is creating a market opportunity for companies that have lots of servers at the edge, such as telecoms. These telecoms can repurpose their widely distributed telecom infrastructure as edge servers that they can sell usage on.

Jan 14, 202050 min

Ep 1312Data Infrastructure Go-To-Market with Sean Knapp

Every large company generates large amounts of data. Data engineering is the process of storing, transforming, and leveraging that data. Data infrastructure companies provide tools and platforms for performing data engineering. The last fifteen years has seen a rise in modern data management companies built in a time of decreasing storage costs, an increased volume of data, and the prevalence of cloud computing. Modern data companies include Hadoop vendors, cloud providers, and a wide variety of individual software companies offering products such as databases, ETL tools, and open source tooling. The go-to-market strategy for a data infrastructure company requires a deep understanding of the data engineering landscape. A company must build something useful, sell it to customers, and eventually build a replicable strategy. Sean Knapp is the CEO of Ascend, a company that builds Apache Spark-based data pipelines that connect APIs, data lakes, and data warehouses together to enable data applications. Sean joins the show to talk about the process of building a data infrastructure company, and his lessons building Ascend.

Jan 13, 202051 min

Ep 1311Slack Data Platform with Josh Wills

Slack is a messaging platform for organizations. Since its creation in 2013, Slack has quickly become a core piece of technology used by a wide variety of technology companies, groups, and small teams. The messages that are sent on Slack are generated at a very high volume, and are extremely sensitive. These messages must be stored on Slack’s servers in a way that does not risk a message from one company accidentally being accessible to another company. The messages must be highly available, and they also must be indexed for search. When Slack was scaling, the company started to encounter limitations in its data infrastructure that the company was unsure how to solve. During this time, Josh Wills was the director of data engineering at Slack, and he joins the show to retell the history of his time at Slack, and why the problem of searching messages was so hard. Josh also provides a great deal of industry context around how engineers from Facebook and Google differ from one another. When Slack was starting to become popular, the company quickly began to attract engineers from both of those companies. Facebook and Google have distinct solutions for how they have tackled the problems of data engineering.

Jan 10, 20201h 19m

Ep 1310NoSQL Optimization with Rick Houlihan

NoSQL databases provide an interface for storing and accessing data that allows the user to work with data in an “unstructured” fashion. SQL databases require the data in the database to be “normalized,” meaning that each object in the entire database has an entry (or a null value) for each field. One advantage of NoSQL is that the different objects are “denormalized,” meaning that different objects in the database can have unique fields. There is a widely held belief that NoSQL databases do not scale, or that there is some significant penalty that a developer will pay for using a NoSQL database as soon as their app becomes popular. The truth is much more subtle than that. NoSQL databases can perform as well as or better than SQL databases if the developers know the query patterns that their applications make. SQL databases will be a better choice in the condition where the database has a very wide spectrum of access patterns. But in many cases, an application has a narrow range of different requests for the database, and a NoSQL database can perform very well if the database is structured and optimized for these requests. Rick Houlihan is an executive with Amazon Web Services who works with database teams and engineers to optimize their products and database infrastructure. Rick joins the show to discuss the tenets of NoSQL and describe the fundamental contrast between NoSQL and SQL database limitations.

Jan 9, 20201h 0m

Ep 1309Amazon EC2 with Dave Brown

Amazon EC2 (Elastic Compute Cloud) is a virtualized server product that provides the user with scalable compute infrastructure. EC2 was created in 2006 as one of the first three AWS services along with S3 and Simple Queueing Service. Since then, EC2 has provided the core server infrastructure for many of the companies that have been built in the cloud. A large scale virtualization product requires its engineers to have a deep understanding of scheduling and multitenancy. In previous shows, we have touched on subjects such as hypervisors, the noisy neighbor problem, the cold start problem, and other aspects of multitenant infrastructure. To make EC2 successful, these issues must be continuously revisited and resolved at different areas of the stack. Dave Brown joined the EC2 team in 2007, and now leads the EC2 Compute, Networking, and Load Balancing teams as a Vice President. Dave joins the show to discuss the history of EC2 and the canonical problems of virtualized server infrastructure.

Jan 8, 202028 min

Ep 1307Amazon Kubernetes with Abby Fuller

Amazon’s container offerings include ECS (Elastic Container Service), EKS (Elastic Kubernetes Service), and Fargate. Through these different offerings, Amazon provides a variety of ways that a user can manage Kubernetes clusters and standalone container instances. The choice of which containerization system to choose depends on the needs of the user, and the tradeoffs they want to make on control and portability. Amazon’s container products have been designed in the context of a shifting competitive landscape. Kubernetes presents a potential long-term threat to Amazon’s status as the most popular cloud provider. Properly responding to this threat has required Amazon to extend itself into the world of open source, contributing to Kubernetes and having more conversations with customers who want their products to have the high quality user experience of AWS along with the open characteristics of Kubernetes. Abby Fuller is a principal technologist with Amazon who works on containers and Linux. Abby joins the show to describe Amazon’s perspective on containers and Kubernetes.

Jan 7, 202038 min

Ep 1306Kubernetes Progress with Kelsey Hightower

When the Kubernetes project was started, Amazon Web Services was the dominant cloud provider. Most of the code that runs AWS is closed source, which prevents an open ecosystem from developing around AWS. Developers who deploy their application onto AWS are opting into a closed, controlled ecosystem, which has both costs and benefits. The software industry has a history of closed and open ecosystems existing at the same time. AWS represented a huge closed ecosystem. With the amount of money at stake in the cloud business, it was only a matter of time before a more open ecosystem emerged. Since the creation of Kubernetes, the world of cloud computing has evolved rapidly. Google and Microsoft have both invested heavily into Kubernetes, and Amazon itself has adapted to the newer competitive landscape with a Kubernetes offering of its own. Amazon has also made efforts to become more publicly involved in open source projects, including Kubernetes. Kelsey Hightower has been a part of the Kubernetes ecosystem since the project was started. He is one of the most recognizable faces in the world of Kubernetes, delivering keynotes, appearing on podcasts, and co-authoring the popular Kubernetes Up and Running. Kelsey joins the show to discuss the progress in the Kubernetes ecosystem, and the competitive dynamics between Kubernetes and AWS.

Jan 6, 202056 min

Ep 1305freeCodeCamp with Quincy Larson

freeCodeCamp was started five years ago with the goal of providing free coding education to anyone on the Internet. freeCodeCamp has become the best place to begin learning how to write software. There are many other places that a software engineer should visit on their educational journey, but freeCodeCamp is the best place to start, because it is free, and there are no advertisements. For most people learning to code, the price of that education is important, because they are learning to code to build a new career. It’s also important that a new programmer learns from an unbiased source of information, because an ad-supported environment will educate the new programmer towards products that they might not need. freeCodeCamp has not been easy to build. Building freeCodeCamp has required expertise in software engineering, business, media, and community development. The donation-based business model of freeCodeCamp doesn’t collect very much money. Why would somebody build a non-profit when they could spend their time building a highly profitable software company? Quincy Larson is the founder of freeCodeCamp, and he joins the show for a special episode about his backstory and the journey to building the best place on the Internet for a new programmer to begin. Support freeCodeCamp

Dec 20, 20192h 16m

Ep 1304No Code with Shawn Wang

The software category known as “no-code” describes a set of tools that can be used to build software without writing large amounts of code in a programming language. No-code tools use visual interfaces such as spreadsheets and web based drag-and-drop systems. In previous shows, we have covered some of the prominent no-code products such as Airtable, Webflow, and Bubble. It is clear that no-code tools can be used to build core software infrastructure in a manner that is more abstract than the typical software engineering model of writing code. No-code tools do not solve everything. You can’t use a no-code tool to build a high performance distributed database, or a real-time multiplayer video game. But they are certainly useful for building internal tools and basic CRUD applications. We know that no-code tools can create value. But how do they fit into the overall workflow of a software company? How should teams be arranged now that knowledge workers can build certain kinds of software without writing code? And how should no-code systems interface with the monoliths, microservices, and APIs that we have building for years? Shawn Wang is an engineer with Netlify, a cloud provider that is focused on delivering high-quality development and deployment experience. Netlify is not a no-code platform, but Shawn has explored and written about the potential of no-code systems. Since he comes from a code-heavy background, he is well-positioned to give a realistic perspective on how no-code systems might evolve to play a role in the typical software development lifecycle.

Dec 19, 20191h 11m

Ep 1303Roblox Engineering with Claus Moberg

Roblox is a gaming platform with a large ecosystem of players, creators, game designers, and entrepreneurs. The world of Roblox is a three-dimensional environment where characters and objects interact through a physics engine. Roblox is multiplayer, and users can interact with each other over the Internet. Roblox is not one single game—it is a system where anyone can design and monetize their own games within Roblox. Over the last 14 years, Roblox has grown to be massively popular. As the product has grown, the software has evolved to meet changes in consumer demands and engineering constraints. Client devices include mobile phones, desktop computers, and virtual reality. All of these clients must have a consistent experience in graphics and functionality. The backend platform has to support a high volume of concurrent players who are accessing a high volume of content. The networking needs to support multiple players operating in an environment that demands high bandwidth. Claus Moberg is a vice president of engineering at Roblox. He joins the show to discuss the engineering of Roblox and the future of gaming.

Dec 18, 201956 min

Ep 1302Kubernetes at Cruise with Karl Isenberg

Cruise is a company that is building a fully automated self-driving car service. The infrastructure of a self-driving car platform presents a large number of new engineering problems. Self-driving cars collect vast quantities of data as they are driving around the city. This data needs to be transferred from the cars onto cloud servers. The data needs to be used for training machine learning models. These models must be tested in a simulated environment, which provides more data to be integrated back into the self-driving system that is deployed to the cars. As the cars drive around the city, they can communicate with custom cloud services to get information about traffic, navigation, and weather. Cloud services are also used for internal tooling that can help with automotive diagnostics, configuration changes, deployments, and security policy management. The software platform used to manage infrastructure at Cruise is a combination of cloud products, open source tools, and custom built infrastructure that is mostly deployed to Kubernetes. Karl Isenberg is an engineer at Cruise, and he joins the show to talk about the engineering requirements of building a self-driving car service, and Cruise’s approach to platform engineering.

Dec 17, 20191h 7m

Ep 1301Snyk: Open Source Security with Guy Podjarny

The software supply chain includes cloud infrastructure, on-prem proprietary solutions, APIs, programming languages, networking products, and open source software. Each of these software categories has its own security vulnerabilities, and each category has tools that can help protect a company from attackers that are trying to exploit known vulnerabilities. As open source software has grown in popularity, it has turned into an enormous potential attack surface that is difficult to protect. Snyk is a company that builds security tools for companies that are consuming open source. Guy Podjarny is the CEO of Snyk, and he joins the show to discuss the security vulnerabilities of open source projects, and how his business works. Guy was previously the CTO of Akamai, so he has significant experience in technical leadership. He also is the host of the podcast The Secure Developer, which I recommend for anyone who is interested in technical interviews about security topics.

Dec 16, 201958 min

Ep 1300GitLab Engineering with Marin Jankovski

GitLab is a company that builds an open source platform for managing git repositories. GitLab was started in 2012, and has grown to have a large enterprise business with additional products such as continuous integration and security tooling. GitLab is also known for being a large, entirely remote workforce. GitLab does not have any offices, and the employees mostly communicate through Slack, email, and GitLab itself. Marin Jankovski was the first full-time engineer to join GitLab after the company was started. Marin joins the show to talk about the early days of GitLab, the evolution of the remote culture, and how product development works at GitLab today. He also talks about the experience of being a fast-growing company in the public spotlight of the software industry.

Dec 13, 201955 min

Ep 1299Basic Income with Floyd Marinescu

Automation has the potential to eliminate rote jobs such as call center workers and truck drivers. The downstream effects of automation also leads to new jobs, such as data labeling and robot operations. The net effect of modern automation technology is unclear, but it is likely to cause some disruption in the job market. Universal Basic Income (UBI) is an economic policy idea in which a government sends money to every person living in the country. The goal of UBI is to reduce the impact of dramatic changes to the economy that are resulting from accelerating technological change. Floyd Marinescu is the CEO of C4 Media, the company that produces the QCon conference series and the InfoQ website for software engineers. Floyd has worked in the software industry for decades and in recent years has become an advocate for basic income. He is a friend and supporter of Andrew Yang, a 2020 presidential candidate who is running on a platform centered around a basic income policy. Floyd joins the show for a discussion of the future, and the potential positive and negative consequences of implementing a basic income.

Dec 12, 201948 min

Ep 1298Continuous Intelligence with Kalyan Ramanathan

Logging provides raw data that can be abstracted into higher level information. Logs are generated at every layer of infrastructure: physical host, virtual machine, container, pod, and Kubernetes cluster. Logs are generated by network proxies, edge servers, and API requests. There is far too much logging information to be read by humans. Log messages need to be refined into statistical metrics that can be put into charts. A high volume of log messages can be used to detect anomalies across a system. If unusual behavior is present in a system, the relevant log messages can be identified and sent to a human operator for them to triage and respond to. Kalyan Ramanathan works at Sumo Logic, a platform for log management and continuous intelligence. Sumo Logic recently published the Continuous Intelligence Report, which is based on a study of over 2000 technology companies. This is a useful data set for anyone who is looking to understand adoption of cloud products and Kubernetes, and it can be found at softwareengineeringdaily.com/sumologic. Kalyan joins the show to discuss log management, continuous intelligence, and the data that Sumo Logic gathered in the Continuous Intelligence Report. Full disclosure: Sumo Logic is a sponsor of Software Engineering Daily.

Dec 11, 201948 min

Ep 1297Remote Work with Philip Thomas

Offices have historically been the place where most knowledge work takes place. An office is a central meeting point for everyone in an organization. Offices allow for high bandwidth, in-person communication. Employees have access to shared resources, such as food, tables, and quiet working space. Offices provide a means of encouraging a common culture within a given workplace. There are also significant downsides to offices–most notably the commute. Employees often spend between one and four hours per day driving to the office. Office work creates a huge restriction for parents with a young child to take care of. Many employees feel more productive when they are working at home, or working from a coffee shop. Remote work is becoming an increasingly popular mode of work for knowledge workers. Remote work has been made possible by increased bandwidth, more powerful computers, and new communications software such as Slack and Zoom. Remote work is a powerful trend that is reshaping how knowledge workers spend their time. It is also changing how companies are structured. A “remote first” culture impacts hiring, human resources, engineering, sales, work-life balance and every other aspect of operations. The downstream impacts of remote work will change the labor market even more thoroughly, and cause us to rethink contracting, equity structures, and the traditional five day workweek. Philip Thomas is the co-founder and CEO of Moonlight Work, a marketplace for software engineers who work on contract projects full-time or part-time. Philip is also the co-author of the Remote Work Encyclopedia, which is a collection of strategies and tactics for knowledge workers and companies that are looking to adapt successfully to the changes that remote work is bringing to the world. Philip joins the show to discuss remote work and his experience building Moonlight. Full disclosure: I am an investor in Moonlight Work.

Dec 10, 20191h 0m

Ep 1296Practical AI with Chris Benson

Machine learning algorithms have existed for decades. But in the last ten years, several advancements in software and hardware have caused dramatic growth in the viability of applications based on machine learning. Smartphones generate large quantities of data about how humans move through the world. Software-as-a-service companies generate data about how these humans interact with businesses. Cheap cloud infrastructure allows for the storage of these high volumes of data. Machine learning frameworks such as Apache Spark, TensorFlow, and PyTorch allow developers to easily train statistical models. These models are deployed back to the smartphones and the software-as-a-service companies, which improves the ability for humans to move through the world and gain utility from their business transactions. And as the humans interact more with their computers, it generates more data, which is used to create better models, and higher consumer utility. The combination of smartphones, cloud computing, machine learning algorithms, and distributed computing frameworks is often referred to as “artificial intelligence.” Chris Benson is the host of the podcast Practical AI, and he joins the show to talk about the modern applications of artificial intelligence, and the stories he is covering on Practical AI. On his podcast, Chris talks about everything within the umbrella of AI, from high level stories to low level implementation details.

Dec 9, 201947 min

Ep 1295Linkerd Market Strategy with William Morgan

The container orchestration wars ended in 2016 with Kubernetes being the most popular open source tool for deploying and managing infrastructure. Since that time, most large enterprises have been implementing a “platform strategy” based around Kubernetes. A platform strategy is a plan for creating a consistent experience for software engineers working throughout an enterprise. At most companies, a software engineer should be thinking about business logic–whether that logic is related to banking, insurance, oil and gas, or e-commerce. Today, engineers at many enterprises need to think about continuous delivery, application deployment, security policy management, and other deeply technical problems that have nothing to do with the business that they are actually working at. Kubernetes is a foundational open source building block that allows enterprises to base the rest of their infrastructure decisions around. Kubernetes has made it much more viable for enterprises to pursue a platform strategy. With widespread adoption of Kubernetes, there is a business opportunity for companies that can offer other platform solutions that build on top of Kubernetes. A service mesh is one such tool. A service mesh provides networking and security features for all the services in an organization. The service mesh category is a large business opportunity because it sits on the critical path of every network request that goes through an enterprise. It is a potential insertion point for lots of other products including logging agents, distributed tracing, network packet scanning, security policy management, and A/B testing. The potential for business expansion is why so many businesses are entering the service mesh category today, from cloud providers to API gateways. Buoyant was one of the first companies to work on a service mesh tool, with the Linkerd open source project. William Morgan is the CEO of Buoyant, and he returns to the show to discuss the competitive dynamics of the service mesh market.

Dec 6, 20191h 2m

Ep 1294Istio Market Strategy with Zack Butcher

Kubernetes has created a widespread system for deploying and managing infrastructure. As Kubernetes has been increasingly adopted, companies are thinking about how to leverage that common layer of infrastructure. With the common infrastructure abstraction of Kubernetes, it becomes easier to adopt other abstractions that are uniform across the entire company. This has created a market opportunity for products such as a service mesh. A service mesh consists of sidecar containers that get deployed alongside services in a distributed system. Each sidecar container is used as a proxy for all the communication that goes through the service it is deployed with. This consistent proxying layer provides each service with benefits such as security, routing, telemetry, and policy management. Istio is a service mesh that was created and open sourced by Google. Istio is built around the Envoy service proxy sidecar and a control plane that manages the Envoy sidecars. Since the launch of Istio, some of the Google employees who were working on Istio have started Tetrate, a company with the goal of commercializing Istio into a product that enterprises will pay for. The market demand for service mesh has been proven, but there are many competitors to Tetrate. Istio is open source and can be commercialized by other companies, as well as cloud providers such as Google and AWS. Linkerd is a service mesh built by the company Buoyant, which was the first company to focus exclusively on this space. There are other companies that are expanding existing products into service mesh: Kong, NGINX, and Hashicorp. Zack Butcher is a founding engineer with Tetrate, and he joins the show to discuss the market for service mesh and the plan for Tetrate to build a business around Istio.

Dec 5, 20191h 18m

Ep 1293Heroku Infrastructure with Mark Turner

A cloud provider gives a developer low-cost compute infrastructure on-demand. Cloud providers can be divided up into two categories: Layer 1 cloud providers and Layer 2 cloud providers. A Layer 1 cloud provider such as Amazon Web Services owns server hardware and sells compute infrastructure as a commodity. A Layer 2 cloud provider purchases compute infrastructure from a Layer 1 provider and builds a high quality developer experience on top of that compute infrastructure. Heroku was the first Layer 2 cloud provider. Heroku’s first business was to provide a high quality developer experience and low cost containerization infrastructure on top of Amazon’s EC2 virtual machine infrastructure. Heroku has added features for continuous integration, relational databases, caches, and queueing. Building a Layer 2 cloud provider is a very different challenge than building a Layer 1 cloud provider. A Layer 1 provider must focus on low level problems such as hardware infrastructure and virtualization. This does not leave much time for focusing on developer experience. A Layer 1 cloud provider must be able to serve every type of potential software customer. A Layer 2 provider can provide a streamlined experience. Mark Turner is an engineer at Heroku. He joins the show to discuss the architecture and engineering of a Layer 2 cloud provider. Heroku is built on top of Amazon Web Services, and the core compute infrastructure is built on top of a pool of EC2 virtual machines that are continually scheduled with applications that users create on Heroku. Full disclosure: Heroku is a sponsor of Software Engineering Daily.

Dec 4, 20191h 0m

Ep 1292Java 13 with Georges Saab

Java has been popular since the 90s, when it started to be used as a programming language for enterprises. Today, Java is still widely deployed, but the infrastructure environment is dramatically different. Java is often deployed to containers in the cloud. If those containers can share resources, then those containers can share the same underlying Java infrastructure. Java 13 is the most recent public release of Java. The new features in Java 13 reflect the changing demands of modern application developers. Georges Saab is an engineer with Oracle who has been working on Java for more than a decade. He joins the show to discuss how Java development patterns are changing, and how the language is evolving to accommodate those changes, including discussion of garbage collection and dynamic application class data sharing.

Dec 3, 201945 min

Ep 1291Distributed SQL with Karthik Ranganathan and Sidharth Choudhury

Relational databases provide durable transactional systems for storing data. The relational model has existed for decades, but the requirements for a relational database have changed. Modern applications have requirements for high volumes of data that do not fit onto a single machine. When a database gets too big to fit on a single machine, that database needs to be sharded into smaller subsets of the data. These database shards are spread across multiple machines, and as the database grows, the database can be resharded to scale to even more machines. To ensure durability, a database needs to be replicated. The database needs to be able to survive any single machine losing power or getting destroyed. Sharding and replication allow a relational database to be scalable, durable, and highly available. There are many ways to build sharding and replication into a database. Karthik Ranganathan and Sidharth Choudhury are engineers with YugabyteDB, a distributed SQL database. In today’s episode, we discuss the modern requirements of a distributed SQL database, and compare the applications of distributed SQL to those of other systems such as Cassandra and Hadoop. We also talk through the competitive market of cloud-based distributed SQL providers such as Google Cloud Spanner and Amazon Aurora. YugabyteDB is an open source database that competes with these other relational databases. Full disclosure: YugabyteDB is a sponsor of Software Engineering Daily.

Dec 2, 20191h 1m