
Software Engineering Daily
2,188 episodes — Page 20 of 44
Ep 1402Data Lakehouse with Michael Armbrust
A data warehouse is a system for performing fast queries on large amounts of data. A data lake is a system for storing high volumes of data in a format that is slow to access. A typical workflow for a data engineer is to pull data sets from this slow data lake storage into the data warehouse for faster querying. Apache Spark is a system for fast processing of data across distributed datasets. Spark is not thought of as a data warehouse technology, but it can be used to fulfill some of the responsibilities. Delta is an open source system for a storage layer on top of a data lake. Delta integrates closely with Spark, creating a system that Databricks refers to as a “data lakehouse.” Michael Armbrust is an engineer with Databricks. He joins the show to talk about his experience building the company, and his perspective on data engineering, as well as his work on Delta, the storage system built for the Spark ecosystem.
Ep 1400JAMStack Content Management with Scott Gallant, Jordan Patterson, and Nolan Phillips
A content management system (CMS) defines how the content on a website is arranged and presented. The most widely used CMS is WordPress, the open source tool that is written in PHP. A large percentage of the web consists of WordPress sites, and WordPress has a huge ecosystem of plugins and templates. Despite the success of WordPress, the JAMStack represents the future of web development. JAM stands for JavaScript, APIs, and Markup. In contrast to the monolithic WordPress deployments, a JAMStack site consists of loosely coupled components. And there are numerous options for a CMS in this environment. TinaCMS is one such option. TinaCMS is an acronym for “Tina Is Not A CMS”, and it is a toolkit for content management. Scott Gallant, Jordan Patterson, and Nolan Phillips work on TinaCMS, and they join the show to explore the topic of content management on the JAMStack.
Ep 1398Prefect Dataflow Scheduler with Jeremiah Lowin
A data workflow scheduler is a tool used for connecting multiple systems together in order to build pipelines for processing data. A data pipeline might include a Hadoop task for ETL, a Spark task for stream processing, and a TensorFlow task to train a machine learning model. The workflow scheduler manages the tasks in that data pipeline and the logical flow between them. Airflow is a popular data workflow scheduler that was originally created at Airbnb. Since then, the project has been adopted by numerous companies that need workflow orchestration for their data pipelines. Jeremiah Lowin was a core committer to Airflow for several years before he identified several features of Airflow that he wanted to change. Prefect is a dataflow scheduler that was born out of Jeremiah’s experience working with Airflow. Prefect’s features include data sharing between tasks, task parameterization, and a different API than Airflow. Jeremiah joins the show to discuss Prefect, and how his experience with Airflow led to his current work in dataflow scheduling.
Ep 1397CockroachDB with Peter Mattis
A relational database often holds critical operational data for a company, including user names and financial information. Since this data is so important, a relational database must be architected to avoid data loss. Relational databases need to be a distributed system in order to provide the fault tolerance necessary for production use cases. If a database node goes down, the database must be able to recover smoothly without data loss, and this requires having all of the data in the database replicated beyond a single node. If you write to a distributed transactional database, that write must propagate to each of the other nodes in the database. If you read from a distributed database, that read must return the same data that any other database reader would see. These constraints can be satisfied differently depending on the design of the database system. As a result, there is a vast market of distributed databases from cloud providers and software vendors. CockroachDB is an open source, globally consistent relational database. CockroachDB is heavily informed by Google Spanner, the relational database that Google uses for much of its transactional workloads. Peter Mattis is a co-founder of CockroachDB, and he joins the show to discuss the architecture of CockroachDB, the process of building a business around a database, and his memories working on distributed systems at Google. Full disclosure: CockroachDB is a sponsor of Software Engineering Daily.
Ep 1396Dask: Scalable Python with Matthew Rocklin
Python is the most widely used language for data science, and there are several libraries that are commonly used by Python data scientists including Numpy, Pandas, and scikit-learn. These libraries improve the user experience of a Python data scientist by giving them access to high level APIs. Data science is often performed over huge datasets, and the data structures that are instantiated with those datasets need to be spread across multiple machines. To manage large distributed datasets, a library such as scikit-learn can use a system called Dask. Dask allows the instantiation of data structures such as a Dask dataframe or a Dask array. Matthew Rocklin is the creator of Dask. He joins the show to talk about distributed computing with Dask, its use cases, and the Python ecosystem. He also provides a detailed comparison between Dask and Spark, which is also used for distributed data science.
Ep 1395Rasa: Conversational AI with Tom Bocklisch
Chatbots became widely popular around 2016 with the growth of chat platforms like Slack and voice interfaces such as Amazon Alexa. As chatbots came into use, so did the infrastructure that enabled chatbots. NLP APIs and complete chatbot frameworks came out to make it easier for people to build chatbots. The first suite of chatbot frameworks were largely built around rule-based state machine systems. These systems work well for a narrow set of use cases, but fall over when it comes to chatbot models that are more complex. Rasa was started in 2015, amidst the chatbot fever. Since then, Rasa has developed a system that allows a chatbot developer to train their bot through a system called interactive learning. With interactive learning, I can deploy my bot, spend some time talking to it, and give that bot labeled feedback on its interactions with me. Rasa has open source tools for natural language understanding, dialogue management, and other components needed by a chatbot developer. Tom Bocklisch works at Rasa, and he joins the show to give some background on the field of chatbots and how Rasa has evolved over time.
Ep 1394Cloudburst: Stateful Functions-as-a-Service with Vikram Sreekanti
Serverless computing is a way of designing applications that do not directly address or deploy application code to servers. Serverless applications are composed of stateless functions-as-a-service and stateful data storage systems such as Redis or DynamoDB. Serverless applications allow for scaling up and down the entire architecture, because each component is naturally scalable. And this pattern can be used to create a wide variety of applications. The functions-as-a-service can handle the compute logic, and the data storage systems can handle the storage. But these applications do not give the developer as much flexibility as an ideal serverless system might. The developer would need to use cloud-specific state management systems. Vikram Sreekanti is the creator of Cloudburst, a system for stateful functions as a service. Cloudburst is architected as a set of VMs that can execute functions-as-a-service that are scheduled onto them. Each VM can utilize a local cache, as well as an autoscaling key-value store called Anna which is accessible to the Cloudburst runtime components. Vikram joins the show to talk about serverless computing and his efforts to build stateful serverless functionality.
Ep 1392NGINX API Management with Kevin Jones
NGINX is a web server that can be used to manage the APIs across an organization. Managing these APIs involves deciding on the routing and load balancing across the servers which host them. If the traffic of a website suddenly spikes, the website needs to spin up new replica servers and update the API gateway to route traffic to those new replicas. Some servers should not be accessible to outside traffic, and policy management is used to configure the security policies of different APIs. And as a company grows, the number of APIs also grows, increasing the complexity of managing routing logic and policies. Kevin Jones is a product manager with NGINX. He joins the show to discuss how API management has changed with the growth of cloud and mobile, and how NGINX has evolved over that period of time. Full disclosure: NGINX is a sponsor of Software Engineering Daily.
Ep 1391Frontend Monitoring with Matt Arbesfeld
Web development has historically had more work being done on the server than on the client. The observability tooling has reflected this emphasis on the backend. Monitoring tools for log management and backend metrics have existed for decades, helping developers debug their server infrastructure. Today, web frontends have more work to do. Detailed components in frameworks such as React and Angular might respond quickly without waiting for a network request, with their mutations being processed entirely in the browser. This results in better user experiences, but more work is being done on the client side, away from the backend observability tools. Matt Arbesfeld is a co-founder of LogRocket, a tool that records and plays back browser sessions and allows engineers to look at those sessions to understand what kinds of issues are occurring in the user’s browser. Matt joins the show to talk about the field of frontend monitoring, and the engineering behind his company LogRocket.
Ep 1390Zoom Vulnerabilities with Patrick Wardle
Zoom video chat has become an indispensable part of our lives. In a crowded market of video conferencing apps, Zoom managed to build a product that performs better than the competition, scaling with high quality to hundreds of meeting participants, and millions of concurrent users. Zoom’s rapid growth in user adoption came from its focus on user experience and video call quality. This focus on product quality came at some cost to security quality. As our entire digital world has moved onto Zoom, the engineering community has been scrutinizing Zoom more closely, and discovered several places where the security practices of Zoom are lacking. Patrick Wardle is an engineer with a strong understanding of Apple products. He recently wrote about several vulnerabilities he discovered on Zoom, and joins the show to talk about the security of large client-side Mac applications as well as the specific vulnerabilities of Zoom.
Ep 1389Facebook OpenStreetMap Engineering with Saurav Mapatra and Jacob Wasserman
Facebook applications use maps for showing users where to go. These maps can display businesses, roads, and event locations. Understanding the geographical world is also important for performing search queries that take into account a user’s location. For all of these different purposes, Facebook needs up-to-date, reliable mapping data. OpenStreetMap is an open system for accessing mapping data. Anyone can use OpenStreetMap to add maps to their application. The data in OpenStreetMap is crowdsourced by users who submit updates to the OpenStreetMap database. Since anyone can submit data to OpenStreetMap, there is a potential for bad data to appear in the system. Facebook uses OpenStreetMap for its mapping data, including for important applications where bad data would impact a map user in a meaningfully negative way. In order to avoid this, Facebook builds infrastructure tools to improve the quality of its maps. Saurav Mapatra and Jacob Wasserman work at Facebook on its mapping infrastructure, and join the show to talk about the tooling Facebook has built around OpenStreetMap data.
Ep 1388NGINX Service Mesh with Alan Murphy
NGINX is a web server that is used as a load balancer, an API gateway, a reverse proxy, and other purposes. Core application servers such as Ruby on Rails are often supported by NGINX, which handles routing the user requests between the different application server instances. This model of routing and load balancing between different application instances has matured over the last ten years due to an increase in the number of servers, and an increase in the variety of services. A pattern called “service mesh” has grown in popularity and is used to embed routing infrastructure closer to individual services by giving them a sidecar proxy. The application sidecars are connected to each other, and requests between any two services are routed through a proxy. These different proxies are managed by a central control plane which manages policies of the different proxies. Alan Murphy works at NGINX, and he joins the show to give a brief history of NGINX and how the product has evolved from a reverse proxy and edge routing tool to a service mesh. Alan has worked in the world of load balancing and routing for more than a decade, having been at F5 Networks for many years before F5 acquired NGINX. We also discussed the business motivations behind the merger of those two companies. Full disclosure: NGINX is a sponsor of Software Engineering Daily.
Ep 1387Shopify React Native with Farhan Thawar
Shopify is a platform for selling products and building a business. It is a large e-commerce company with hundreds of engineers and several different mobile apps. Shopify’s engineering culture is willing to adopt new technologies aggressively, trying new tools that might provide significant leverage to the organization. React Native is one of those technologies. React Native can be used to make cross-platform mobile development easier by allowing code reuse between Android and iOS. React Native was developed within Facebook, and has been adopted by several other prominent technology companies, with varying degrees of success. Many companies have seen improvements to their mobile development and release process. However, in a previous episode, we talked with Airbnb about their adoption of React Native, which was less successful. Farhan Thawar is a VP of engineering at Shopify. He joins the show to talk about Shopify’s experience using React Native, the benefits of cross-platform development, and his perspective on when it is not a good idea to use React Native.
Ep 1386Ceph Storage System with Sage Weil
Ceph is a storage system that can be used for provisioning object storage, block storage, and file storage. These storage primitives can be used as the underlying medium for databases, queueing systems, and bucket storage. Ceph is used in circumstances where the developer may not want to use public cloud resources like Amazon S3. As an example, consider telecom infrastructure. Telecom companies that have their own data centers need software layers which make it simpler for the operators and developers that are working with that infrastructure to spin up databases and other abstractions with the same easy experience that is provided by a cloud provider by AWS. Sage Weil has been a core developer on Ceph since 2005, and the company he started around Ceph sold to Red Hat for $175 million. Sage joins the show to talk about the engineering behind Ceph and his time spent developing companies.
Ep 1385Collaborative SQL with Rahil Sondhi
Data analysts need to collaborate with each other in the same way that software engineers do. They also need a high quality development environment. These data analysts are not working with programming languages like Java and Python, so they are not using an IDE such as Eclipse. Data analysts predominantly use SQL, and the tooling for a data analyst to work with SQL is often a SQL explorer tool that lacks the kind of collaborative experience that we would expect in the age of Slack and GitHub. Rahil Sondhi is the creator of PopSQL, a collaborative SQL explorer. He created PopSQL after several years in the software industry, including 4 years at Instacart. Rahil joins the show to talk about the frictions that data analysts encounter when working with databases, and how those frictions led to the design of PopSQL.
Ep 1384Reserved Instances with Aran Khanna
When a developer spins up a virtual machine on AWS, that virtual machine could be purchased using one of several types of cost structures. These cost structures include on-demand instances, spot instances, and reserved instances. On-demand instances are often the most expensive, because the developer gets reliable VM infrastructure without committing to long-term pricing. Spot instances are cheap, spare compute capacity with lower reliability, that is available across AWS infrastructure. Reserved instances allow a developer to purchase longer term VM contracts for a lower price. Reserved instances can provide significant savings, but it can be difficult to calculate how much infrastructure to purchase. Aran Khanna is the founder of Reserved.ai, a company that builds cost management tools for AWS. He joins the show to talk about the landscape of cost management, and what he is building with Reserved.ai.
Ep 1383Snorkel: Training Dataset Management with Braden Hancock
Machine learning models require the use of training data, and that data needs to be labeled. Today, we have high quality data infrastructure tools such as TensorFlow, but we don’t have large high quality data sets. For many applications, the state of the art is to manually label training examples and feed them into the training process. Snorkel is a system for scaling the creation of labeled training data. In Snorkel, human subject matter experts create labeling functions, and these functions are applied to large quantities of data in order to label it. For example, if I want to generate training data about spam emails, I don’t have to hire 1000 email experts to look at emails and determine if they are spam or not. I can hire just a few email experts, and have them define labeling functions that can indicate whether an email is spam. If that doesn’t make sense, don’t worry. We discuss it in more detail in this episode. Braden Hancock works on Snorkel, and he joins the show to talk about the labeling problems in machine learning, and how Snorkel helps alleviate those problems. We have done many shows on machine learning in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about machine learning, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.
Ep 1382Cadence: Uber’s Workflow Engine with Maxim Fateev
A workflow is an application that involves more than just a simple request/response communication. For example, consider a session of a user taking a ride in an Uber. The user initiates the ride, and the ride might last for an hour. At the end of the ride, the user is charged for the ride and sent a transactional email. Throughout this entire ride, there are many different services and database tables being accessed across the Uber infrastructure. The transactions across this infrastructure need to be processed despite server failures which may occur along the way. Workflows are not just a part of Uber. Many different types of distributed operations at a company might be classified as a workflow: banking operations, spinning up a large cluster of machines, performing a distributed cron job. Maxim Fateev is the founder of Temporal.io, and the co-creator of Cadence, a workflow orchestration engine. Maxim developed Cadence when he was at Uber, seeing the engineering challenges that come from trying to solve the workflow orchestration problem. Before Uber, Maxim worked at AWS on the Simple Workflow Service, which was also a system for running workflows. Altogether, Maxim has developed workflow software for more than a decade.
Ep 1380kSQLDB: Kafka Streaming Interface with Michael Drogalis
Kafka is a distributed stream processing system that is commonly used for storing large volumes of append-only event data. Kafka has been open source for almost a decade, and as the project has matured, it has been used for new kinds of applications. Kafka’s pubsub interface for writing and reading topics is not ideal for all of these applications, which has led to the creation of ksqlDB, a database system built for streaming applications that uses Kafka as the underlying infrastructure for storing data. Michael Drogalis is a principal product manager at Confluent, where he helped develop ksqlDB. Michael joins the show to discuss ksqlDB, including the architecture, the query semantics, and the applications which might want a database that focuses on streams. We have done many great shows on Kafka in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about Kafka, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.
Ep 1379Godot Game Engine with Juan Linietsky
Building a game is not easy. The development team needs to figure out a unique design and gameplay mechanics that will attract players. There is a great deal of creative work that goes into making a game successful, and these games are often built with low budgets by people who are driven by the art and passion of game creation. A game engine is a system used to build and run games. Game engines let the programmer work at a high level of abstraction, by providing interfaces for graphics, physics, and scripting. Popular game engines include Unreal Engine and Unity, both of which require a license that reduces the amount of money received by the game developer. Godot is an open source and free to use game engine. The project was started by Juan Linietsky, who joins the show to discuss his motivation for making Godot. We have done some great shows on gaming in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about game development, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.
Ep 1378V8 Lite with Ross McIlroy
V8 is the JavaScript engine that runs Chrome. Every popular website makes heavy use of JavaScript, and V8 manages the execution environment of that code. The code that processes in your browser can run faster or slower depending on how “hot” the codepath is. If a certain line of code is executed frequently, that code might be optimized to run faster. V8 is running behind the scenes in your browser all the time, evaluating the code in your different tabs and determining how to manage that runtime in memory. As V8 is observing your code and analyzing it, V8 needs to allocate resources in order to determine what code to optimize. This process can be quite memory intensive, and can add significant overhead to the memory overhead of Chrome. Ross McIlroy is an engineer at Google, where he worked on a project called V8 Lite. The goal of V8 Lite was to significantly reduce the execution overhead of V8. Ross joins the show to talk about JavaScript memory consumption, and his work on V8 Lite. We have done some great shows on JavaScript in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about JavaScript, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.
Ep 1377Serverless Development with Jeremy Daly
Serverless tools have come a long way since the release of AWS Lambda in 2014. Serverless apps were originally architected around Lambda, with the functions-as-a-service being used to glue together larger pieces of functionality and API services. Today, many of the common AWS services such as API Gateway and DynamoDB have functionality built in to be able to respond to events. These services can use Amazon EventBridge to connect to each other. In many cases, a developer does not need AWS Lambda to glue services together in order to build an event-driven application. Jeremy Daly is the host of the Serverless Chats podcast, a show about patterns and strategies in serverless architecture. Jeremy joins the show to talk about modern serverless development, and the new tools available in the AWS ecosystem.
Ep 1376Audio Data Engineering with Allison King
Cortico is a non-profit that builds audio tools to improve public dialogue. Allison King is an engineer at Cortico, and she joins the show to talk about the process of building audio applications. One of these applications was a system for ingesting radio streams, transcribing the radio, and looking for duplicate information across the different radio stations. In a talk at Data Council, Allison talked through the data engineering architecture for processing these radio streams, and the patterns that she found across the radio streams, including clusters of political leanings. Another project from Cortico is called Local Voices Network. The Local Voices Network is built around a piece of hardware called a “digital hearth”, a specialized device that records discussions among people in a community. These community discussions are made available to journalists, public officials, and political candidates, creating a listening channel that connects these communities and stakeholders. Much of our conversation is focused on the engineering of the digital hearth, this device that sits in the center of community discussions.
Ep 1374Facebook Messenger Engineering with Mohsen Agsen
Facebook Messenger is a chat application that millions of people use every day to talk to each other. Over time, Messenger has grown to include group chats, video chats, animations, facial filters, stories, and many more features. Messenger is a tool for utility as well as for entertainment. Messenger is used both on mobile and on desktop, but the size of the mobile application is particularly important on mobile. There are many users who are on devices that do not have much storage space. As Messenger has accumulated features, the iOS code base has grown larger and larger. Several generations of Facebook engineers have rotated through the company with the responsibility of working on Facebook Messenger, which has led to different ways of managing information within the same codebase. The iOS codebase had room for improvement. Project Lightspeed was a project within Facebook that had the goal of making Messenger on iOS much smaller. Mohsen Agsen is an engineer with Facebook, and he joins the show to talk about the process of rewriting the Messenger app.
Ep 1373Pika Dependency Management with Fred Schott
Modern web development involves a complicated toolchain for managing dependencies. One part of this toolchain is the bundler, a tool that puts all your code and dependencies together into static asset files. The most popular bundler is webpack, which was originally released in 2012, before browsers widely supported ES Modules. Today, every major browser supports the ES Module system, which improves the efficiency of JavaScript dependency management. Snowpack is a system for managing dependencies that takes advantage of the browser support for ES Modules. Snowpack is made by Pika, a company that is developing a set of web technologies including a CDN, a package catalog, and a package code editor. Fred Schott is the founder of Pika and the creator of Snowpack. Fred joins the show to talk about his goals with Pika, and the ways in which modern web development is changing.
Ep 1372Cloud Kitchen Platform with Ashley Colpaart
Food delivery apps have changed how the restaurant world operates. After seven years of mobile food delivery, the volume of food ordered through these apps has become so large that entire restaurants can be sustained solely through the order flow that comes in from the apps. This raises the question as to why you even need an “on-prem” restaurant. A cloud kitchen is a large, shared kitchen where food is prepared for virtual restaurants. These virtual restaurants exist only on mobile apps. There are no waiters, there are only the food delivery couriers who pick up the food from these warehouse-sized food preparation facilities. A virtual restaurant entrepreneur could open up multiple restaurants operated from the same cloud kitchen. The mobile app user might see separate restaurant listings for a pizza place, a cookie bakery, and a Thai food restaurant, when all of them are operated by the same restaurateur. Ashley Colpaart is the founder of The Food Corridor, a system for cloud kitchen management. Ashley joins the show to talk about the dynamics of virtual restaurants and the cloud kitchen industry.
Ep 1371Remote Team Management with Ryan Chartrand
Remote engineering work makes some elements of software development harder, and some elements easier. With Slack and email, communication becomes more clear cut. Project management tools lay out the responsibilities and deliverables of each person. GitHub centralizes and defines the roles of developers. On the other hand, remote work subtracts the role of nuanced conversation. There is no water cooler or break room. Work can become systematic, rigid, and completely transactional. Your co-workers are your allies, but they feel less like friends when you don’t see them every day. For some people, this can have a devastating long-term impact on their psyche. Managers have the responsibility of ensuring the health and productivity of the people that work with them. Managing an all-remote team includes a different set of challenges than an in-person team. Ryan Chartrand is the CEO of X-Team, a team of developers who work across the world and collaborate with each other remotely. X-Team partners with large companies who need additional development work. Ryan joins the show to talk about the dynamics of leading a large remote workforce, as well as his own personal experiences working remotely.
Ep 1370Sorbet: Typed Ruby with Dmitry Petrashko
Programming languages are dynamically typed or statically typed. In a dynamically typed language, the programmer does not need to declare if a variable is an integer, string, or other type. In a statically typed language, the developer must declare the type of the variable upfront, so that the compiler can take advantage of that information. Dynamically typed languages give a programmer flexibility and fast iteration speed. But they also introduce the possibility of errors that can be avoided by performing type checking. This is one of the reasons why TypeScript has risen in popularity, giving developers the option to add types to their JavaScript variables. Sorbet is a typechecker for Ruby. Sorbet allows for gradual typing of Ruby programs, which helps engineers avoid errors that might otherwise be caused by the dynamic type system. Dmitry Petrashko is an engineer at Stripe who helped build Sorbet. He has significant experience in compilers, having worked on Scala before his time at Stripe. Dmitry joins the show to discuss his work on Sorbet, and the motivation for adding type checking to Ruby. We’re looking for new show ideas, so if you have any interesting topics, please feel free to reach out via twitter or email us at [email protected] We realize right now humanity is going through a hard time with the Caronovirus pandemic, but we all have skills useful to fight this battle. Head over to codevid19.com to join the world’s largest pandemic hackathon!
Ep 1369Datomic Architecture with Marshall Thompson
Datomic is a database system based on an append-only record keeping system. Datomic users can query the complete history of the database, and Datomic has ACID transactional support. The data within Datomic is stored in an underlying database system such as Cassandra or Postgres. The database is written in Clojure, and was co-authored by the creator of Clojure, Rich Hickey. Datomic has a unique architecture, with a component called a Peer, which gets embedded in an application backend. A Peer stores a subset of the database data in memory in this application backend, improving the latency of database queries that hit this caching layer. Marshall Thompson works at Cognitect, the company that supports and sells the Datomic database. Marshall joins the show to talk about the architecture of Datomic, its applications, and the life of a query against the database. We’re looking for new show ideas, so if you have any interesting topics, please feel free to reach out via twitter or email us at [email protected]
Ep 1368Google Cloud Networking with Lakshmi Sharma
A large cloud provider has high volumes of network traffic moving through data centers throughout the world. These providers manage the infrastructure for thousands of companies, across racks and racks of multitenant servers, and cables that stretch underseas, connecting network packets with their destination. Google Cloud Platform has grown steadily into a wide range of products, including database services, machine learning, and containerization. Scaling a cloud provider requires both technical expertise and skillful management. Lakshmi Sharma is the director of product management for networking at Google Cloud Platform. She joins the show to discuss the engineering challenges of building a large scale cloud provider, including reliability, programmability, and how to direct a large hierarchical team. We’re looking for new show ideas, so if you have any interesting topics, please feel free to reach out via twitter or email us at [email protected]
Ep 1367ClickUp Engineering with Zeb Evans and Alex Yurkowski
Over the last fifteen years, there has been a massive increase in the number of new software tools. This is true at the infrastructure layer: there are more databases, more cloud providers, and more open-source projects. And it’s also true at a higher level: there are more APIs, project management systems, and productivity tools. ClickUp is a project management and productivity system for organizations and individuals. The goal of ClickUp is to create a system that integrates closely with other project management systems, popular SaaS tools, and the Google Suite of docs and spreadsheets. The company was started in 2016, and despite raising zero outside capital, it has grown as rapidly as many venture-backed companies. Zeb Evans and Alex Yurkowski are the founders of ClickUp. They join the show to talk about their experience building the company. We talk through their process of scaling the infrastructure, and their philosophy of moving fast. This episode has some useful strategic advice for anyone who is looking to take a product to market and iterate quickly–even if that product is bootstrapped. Full disclosure: ClickUp is a sponsor of Software Engineering Daily.
Ep 1366Pulumi: Infrastructure as Code with Joe Duffy
Infrastructure-as-code allows developers to use programming languages to define the architecture of their software deployments, including servers, load balancers, and databases. There have been several generations of infrastructure-as-code tools. Systems such as Chef, Puppet, Salt, and Ansible provided a domain-specific imperative scripting language that became popular along with the early growth of Amazon Web Services. Hashicorp’s Terraform project created an open source declarative model for infrastructure. Kubernetes YAML definitions are also a declarative system for infrastructure as code. Pulumi is a company that offers a newer system for infrastructure as code, combining declarative and imperative syntax. Pulumi programs can be written in TypeScript, Python, Go, or .NET. Joe Duffy is the CEO of Pulumi, and he joins the show to talk about his work on the Pulumi project and his vision for the company. Joe also discusses his twelve years at Microsoft, and how his work in programming language tooling shaped how he thinks about building infrastructure-as-code.
Ep 1365Infrastructure Investing with Vivek Saraswat
Software investing requires a deep understanding of the market, and an ability to predict what changes might occur in the near future. At the level of core infrastructure, software investing is particularly difficult. Databases, virtualization, and large scale data processing tools are all complicated, highly competitive areas. As the software world has matured, it has become apparent just how big these infrastructure companies can become. Consequently, the opportunities to invest in these infrastructure companies have become highly competitive. When a venture capital fund invests into an infrastructure company, the fund will then help the infrastructure company bring their product to market. This involves figuring out the product design, the sales strategy, and the hiring roadmap. A strong investor will be able to give insight into all of these different facets of building a software company. Vivek Saraswat is a venture investor with Mayfield, a venture fund that focuses on early to growth-stage investments. Vivek joins the show to discuss his experience at AWS, Docker, and Mayfield, as well as his broad lessons around how to build infrastructure companies today.
Ep 1364Sisu Data with Peter Bailis
A high volume of data can contain a high volume of useful information. That fact is well understood by the software world. Unfortunately, it is not a simple process to surface useful information from this high volume of data. A human analyst needs to understand the business, formulate a question, and determine what metrics could reveal the answer to such a question. Sisu is a system for automatically surfacing insights from large data sets within companies. A user of Sisu can select a database column that they are interested in learning more about, and Sisu will automatically analyze the records in the database to look for trends and relationships between that column and the other columns. For example, if I have a database of user purchases, including how much money those users spent on each purchase, I can ask Sisu to analyze the purchase price column, and find what kinds of attributes correlate with a high purchase price. Perhaps there will be correlations such as age and city that I can use to understand my customers better. Sisu can automatically surface these correlations and display them to me to help me make business decisions. Peter Bailis is the CEO of Sisu Data and an assistant professor at Stanford. Peter joins the show to give his perspective on the development of Sisu, which came out of his research on data-intensive systems, including MacroBase, an analytic monitoring engine that prioritizes human attention.
Ep 1363Location Data with Ryan Fox Squire
Physical places have a large amount of latent data. Pick any location on a map, and think about all of the questions you could ask about that location. What businesses are at that location? How many cars pass through it? What is the soil composition? How much is the land on that location worth? The world of web-based information has become easy to query. We can use search engines like Google, as well as APIs like Diffbot and Clearbit. Today, the physical world is not so easy to query, but it is becoming easier. Location data as a service is a burgeoning field, with some vendors offering products for satellite data, foot traffic, and other specific location-based domains. SafeGraph is a company that provides location data-as-a-service. SafeGraph data sets include data about businesses, patterns describing human movement, and geometric representations describing the shape and size of buildings. Ryan Fox Squire develops data products for SafeGraph, and he joins the show to talk about the engineering and strategy that goes into building a data-as-a-service company.
Ep 1362Descript with Andrew Mason
Descript is a software product for editing podcasts and video. Descript is a deceptively powerful tool, and its software architecture includes novel usage of transcription APIs, text-to-speech, speech-to-text, and other domain-specific machine learning applications. Some of the most popular podcasts and YouTube channels use Descript as their editing tool because it provides a set of features that are not found in other editing tools such as Adobe Premiere or a digital audio workstation. Descript is an example of the downstream impact of machine learning tools becoming more accessible. Even though the company only has a small team of machine learning engineers, these engineers are extremely productive due to the combination of APIs, cloud computing, and frameworks like TensorFlow. Descript was founded by Andrew Mason, who also founded Groupon and Detour, and Andrew joins the show to describe the technology behind Descript and the story of how it was built. It is a remarkable story of creative entrepreneurship, with numerous takeaways for both engineers and business founders.
Ep 1361Flyte: Lyft Data Processing Platform with Allyson Gale and Ketan Umare
Lyft is a ridesharing company that generates a high volume of data every day. This data includes ride history, pricing information, mapping, routing, and financial transactions. The data is stored across a variety of different databases, data lakes, and queueing systems, and is processed at scale in order to generate machine learning models, reports, and data applications. Data workflows involve a set of interconnected systems such as Kubernetes, Spark, Tensorflow, and Flink. In order for these systems to work together harmoniously, a workflow manager is often used to orchestrate them together. A workflow platform lets a data engineer have a high-level view into how data moves through the system, and can be used to reason about retries, resource utilization, and scalability. Flyte is a data processing system built and open-sourced at Lyft. Allyson Gale and Ketan Umare work at Lyft, and they join the show to talk about how Flyte works, and why they needed to build a new workflow processing system when there are already tools available such as Airflow.
Ep 1360Cloud Investing with Danel Dayan
Cloud computing caused a fundamental economic shift in how software is built. Before the cloud, businesses needed to buy physical servers in order to operate. There was an up-front cost that often amounted to tens of thousands of dollars required to pay for these servers. Cloud computing changed the up-front capital expense to an ongoing operational expense, with businesses increasingly shifting to Amazon Web Services, Microsoft Azure, and Google Compute Platform. Although the initial motivation for moving onto cloud providers might have been decreased cost, over time the cloud providers have developed unique services that make software even easier to build than before. There has also been a proliferation of new software infrastructure companies that have been built on top of the cloud providers, giving rise to new databases, logging companies, and platform-as-a-service products. Danel Dayan is a venture investor with Battery Ventures and a co-author of the State of the OpenCloud 2019, a report that compiles a wide set of statistics and information on how cloud computing and open source are impacting the software industry. Danel joins the show to talk about his work as an investor, as well as his previous career at Google, where he worked on mergers and acquisitions. If you want to reach Danel you can email him at [email protected] or tweet at him via @daneldayan.
Ep 1359OneGraph: GraphQL Tooling with Sean Grove
GraphQL is a system that allows frontend engineers to make requests across multiple data sources using a simple query format. In GraphQL, a frontend developer does not have to worry about the request logic for individual backend services. The frontend developer only needs to know how to issue GraphQL requests from the client, and these requests are handled by a GraphQL server. GraphQL is mostly used to issue queries across internal databases and services. But many of the data sources that a company needs to query in modern infrastructure are not databases–they are APIs like Salesforce, Zendesk, and Stripe. These API companies might store a large percentage of the data that a given company needs to query, and executing queries, subscriptions, and joins against these APIs is not a simple task. OneGraph is a company that builds integrations with third-party services and exposes them through a GraphQL interface. Sean Grove is a founder of OneGraph, and he joins the show to explain the problem that OneGraph solves, how OneGraph is built, and some of the difficult engineering challenges required to design OneGraph.
Ep 1358DBT: Data Build Tool with Tristan Handy
A data warehouse serves the purpose of providing low latency queries for high volumes of data. A data warehouse is often part of a data pipeline, which moves data through different areas of infrastructure in order to build applications such as machine learning models, dashboards, and reports. Modern data pipelines are often associated with the term “ELT” or Extract, Load, Transform. In the “ELT” workflow, data is taken out of a source such as a data lake, loaded into a data warehouse, and then transformed within the data warehouse to create materialized views on the data. Data warehouse queries are usually written in SQL, and for the last 50 years, SQL has been the primary language for executing these kinds of queries. DBT is a system for data modeling that allows the user to write queries that involve a mix of SQL and a templating language called Jinja. Jinja allows the analyst to blend imperative code along with the declarative SQL. Tristan Handy is the CEO of Fishtown Analytics, the company that created DBT, and he joins the show to discuss how DBT works, and the role it plays in modern data infrastructure.
Ep 1357React Best Practices with Kent Dodds
ReactJS developers have lots of options for building their applications, and those options are not easy to work through. State management, concurrency, networking, and testing all have elements of complexity and a wide range of available tools. Take a look at any specific area of JavaScript application development, and you can find highly varied opinions. Kent Dodds is a JavaScript teacher who focuses on React, JavaScript, and testing. In today’s episode, Kent provides best practices for building JavaScript applications, specifically React. He provides a great deal of advice on testing, which is unsurprising considering he owns TestingJavaScript.com. Kent is an excellent speaker who has taught thousands of people about JavaScript, so it was a pleasure to have him on the show. Kent is also speaking at Reactathon, a San Francisco JavaScript conference taking place March 30th and 31st in San Francisco. This week we will be interviewing speakers from Reactathon, and if you are interested in JavaScript and the React ecosystem then stay tuned, and if you hear something you like, you can check out the Reactathon conference in person.
Ep 1356React Stack with Tejas Kumar
JavaScript fatigue. This phrase has been used to describe the confusion and exhaustion around the volume of different tools required to be productive as a JavaScript developer. Frameworks, package managers, typing systems, state management, GraphQL, and deployment systems–there are so many decisions to make. In addition to the present-day tooling choices, a JavaScript developer needs to watch the emerging developments in the ecosystem. ReactJS is evolving at a rapid clip, and newer primitives such as React Hooks and React Suspense allow developers to handle concurrency and networking more robustly. Tejas Kumar works with G2i, a company that connects React developers with organizations that are looking for high-quality engineers. His role at G2i is head of vetting, which requires him to assess engineers for their competency in JavaScript-related technologies. Tejas joins the show to discuss the modern stack of technologies that a React developer uses to build an application. Full disclosure: G2i, where Tejas works, is a sponsor of Software Engineering Daily. Tejas is also speaking at Reactathon, a San Francisco JavaScript conference taking place March 30th and 31st in San Francisco. This week we will be interviewing speakers from Reactathon, and if you are interested in JavaScript and the React ecosystem then stay tuned, and if you hear something you like, you can check out the Reactathon conference in person.
Ep 1355JavaScript Deployments with Brian LeRoux
Full-stack JavaScript applications have been possible since the creation of NodeJS in 2009. Since then, the best practices for building and deploying these applications have steadily evolved with the technology. ReactJS created consolidation around the view layer. The emergence of AWS Lambda created a new paradigm for backend execution. Serverless tools such as DynamoDB offer autoscaling abstractions. CDNs such as Cloudflare and Fastly can now do processing on the edge. Brian LeRoux is the founder of Begin.com, a hosting and deployment company built on serverless tools. He’s also the primary committer to Architect, a framework for defining applications to be deployed to serverless infrastructure. Brian joins the show to talk about his work in the JavaScript ecosystem and his vision for Begin.com. Brian is also speaking at Reactathon, a San Francisco JavaScript conference taking place March 30th and 31st in San Francisco. This week we will be interviewing speakers from Reactathon, and if you are interested in JavaScript and the React ecosystem then stay tuned, and if you hear something you like, you can check out the Reactathon conference in person.
Ep 1354React Fundamentals with Ryan Florence
ReactJS began to standardize frontend web development around 2015. The core ideas around one-way data binding, JSX, and components caused many developers to embrace React with open arms. There has been a large number of educators that have emerged to help train developers wanting to learn React. A new developer learning React has numerous questions around frameworks, state management, rendering, and other best practices. In today’s episode, those questions are answered by Ryan Florence, a co-founder of React Training. React Training is a company devoted to helping developers learn React that trains large companies like Google and Netflix how to use React. Ryan has a strong understanding of how to be productive with React, and in today’s episode, he explains some of the fundamentals that commonly confuse new students of React. Ryan is also speaking at Reactathon, a San Francisco JavaScript conference taking place March 30th and 31st in San Francisco. This week we will be interviewing speakers from Reactathon, and if you are interested in JavaScript and the React ecosystem then stay tuned, and if you hear something you like, you can check out the Reactathon conference in person.
Ep 1353NextJS with Guillermo Rauch
When ReactJS became popular, frontend web development became easier. But React is just a view layer. Developers who came to React expecting a full web development framework like Ruby on Rails or Django were required to put together a set of tools to satisfy that purpose. A full-stack JavaScript framework has numerous requirements. How does it scale? How does it handle server-side rendering versus client-side rendering? Should GraphQL be included by default? How should package management work? Guillermo Rauch is the creator of NextJS, a popular framework for building React applications. He is also the CEO of ZEIT, a cloud hosting company. Guillermo joins the show to discuss NextJS, and his vision for how the React ecosystem will evolve in the near future, as features such as React Suspense and Concurrent Mode impact the developer experience. Guillermo is also speaking at Reactathon, a San Francisco JavaScript conference taking place March 30th and 31st in San Francisco. This week we will be interviewing speakers from Reactathon, and if you are interested in JavaScript and the React ecosystem then stay tuned, and if you hear something you like, you can check out the Reactathon conference in person.
Ep 1351Makerpad: Low Code Tools with Ben Tossell
Low code tools can be used to build an increasing number of applications. Knowledge workers within a large corporation can use low code tools to augment their usage of spreadsheets. Entrepreneurs can use low code tools to start businesses even without knowing how to code. Modern low code tools have benefited from steady improvements in cloud infrastructure, front-end frameworks like ReactJS, and browser technology such as the V8 JavaScript engine. These building blocks led to the popular low code products such as Webflow, Bubble, Retool, and Airtable. The low code products are supported by a broad selection of domain-specific APIs such as Stripe, Twilio, and Zapier. Ben Tossell runs Makerpad, a site devoted to low-code and no-code applications. Makerpad describes how to use these tools to design sophisticated applications that don’t require you to write code. But they do require a different kind of software engineering. To create applications intelligently with low-code tools, you need to know how the tools fit together, and you need to be willing to persist through a process of iteration and debugging that is similar to traditional software engineering. Ben joins the show to talk about his experience building low-code tools, the use cases for these tools, and his predictions for how they will impact the future of software.
Ep 1352Slack Frontend Architecture with Anuj Nair
Slack is a messaging application with millions of users. The desktop application is an Electron app, which is effectively a web browser dedicated to running Slack. This frontend is built with ReactJS and other JavaScript code, and the application is incredibly smooth and reliable, despite its complexity. When a user boots up Slack, the application needs to figure out what data to fetch and where to fetch it from. Companies that use Slack heavily have thousands of messages in their history, and Slack needs to determine which of those should be pulled into the client. There are profile images, and logos, and custom emojis, all of which are used to define the user’s custom workspace experience. Anuj Nair joined Slack in late 2017. In the years since he has been with the company, Anuj helped rewrite the Slack frontend client, including work on the bootup experience, the caching infrastructure, and the role of service workers. Anuj joins the show to discuss his work on the Slack frontend architecture and the canonical view layer problems that Slack faces.
Ep 1350Parabola: No-Code Data Workflows with Alex Yaseen
Every company has a large number of routine data workflows. These data workflows involve spreadsheets, CSV files, and tedious manual work to be done by a knowledge worker. For example, data might need to be taken from Salesforce, filtered for new customers, and piped into Mailchimp. Or perhaps you need to sort all your customers to find only the ones who have spent more than $50. These data workflows might require some basic knowledge of SQL, or an understanding of how to make an API request. Not everyone knows how to execute these technical commands. A software company can be slowed down due to a shortage of technical analysts who have the necessary programming skills to build these data workflows. Parabola is a low-code tool for building data workflows. Parabola lets the user drag and drop different components together to build an application without using a programming language. Parabola lowers the technical barrier for knowledge workers who want to build these kinds of data workflows. Alex Yaseen is the CEO of Parabola, and he joins the show to talk about the ideas behind Parabola and his goals with the company. parabola.io parabola.io/careers Twitter: @alexyaseen
Ep 1349Decentralized Finance with Tom Schmidt
Cryptocurrencies today serve two purposes: store of value and speculation. The application infrastructure that has been built around cryptocurrency is mostly to support these use cases. At some point in the future, perhaps cryptocurrencies can be used as a global medium of exchange that is accepted at the grocery store. Perhaps we will use the blockchain for supply chain management, and as a universal ledger for real estate ownership. But today, cryptocurrencies are mostly used for speculative trading. Users buy and sell different cryptocurrencies and stablecoins, looking to make short-term profits. And the markets for trading cryptocurrencies have evolved to have a sophistication that looks like the centralized markets of derivatives and leverage-based day trading. The term “decentralized finance” refers to this phenomenon of cryptocurrency lending markets. Decentralized finance increases the volume of speculated capital by providing liquidity through smart contracts. This short-term liquidity is often collateralized by a volatile cryptocurrency such as Ethereum, creating an opportunity for a type of market participant called a “liquidator.” Tom Schmidt is an investor with Dragonfly Capital, a cryptoasset investment firm. Tom joins the show to describe the dynamics of decentralized finance.
Ep 1348Infrastructure Management with Joey Parsons
At Airbnb, infrastructure management is standardized across the organization. Platform engineering teams build tools that allow the other teams throughout the organization to work more effectively. A platform engineering team handles problems such as continuous integration, observability, and service discovery. Other teams throughout a company use the tools that a platform engineering team builds. For example, there is a team at Airbnb that builds the search and discovery system that is used by customers who are looking for a place to stay. That team does not want to have to worry about how they are deploying, how their service is being logged, and how to scale up. All of that should be taken care of by the platform engineering team. At a large company like Airbnb, there is so much happening across the infrastructure. Services are being deployed, services are having outages, databases are being resharded. With all of this change occurring, it can be difficult for a team to pinpoint the cause of a service outage. Digging through logs and dashboards is often insufficient. Joey Parsons is the founder of Effx, a company that is building a platform for observing and managing the changes across the infrastructure. Effx is like a newsfeed for a service. An application instrumented with Effx gives the engineers a single endpoint that they can navigate to for understanding the history of their service. Joey joins the show to talk about his experience as an infrastructure engineer at Airbnb, and how that experience informs the work of his new company, Effx.