PLAY PODCASTS
Software Engineering Daily

Software Engineering Daily

2,200 episodes — Page 32 of 44

Ep 718SpeechBoard with Craig Cannon and Ramon Recuero Moreno

Creating a podcast is still too difficult. One of the main barriers to entry is the editing process. After recording a podcast, the podcast producer needs to line up soundwaves in a digital audio workstation and clip the raw audio files to remove sections that need to be removed. As someone who has edited a lot of podcasts, I know that this is difficult and tedious. One way of simplifying the editing process is to use speech-to-text to produce a transcription of an audio file, and aligning the text output with the audio. After that alignment, you have a mapping between the text and the audio–so you can delete text and have the corresponding audio be deleted as well. SpeechBoard is a project by Craig Cannon and Ramon Recuero Moreno. SpeechBoard is an easy way to edit podcasts by deleting transcribed words that are mapped to an audio interview. In this episode, Craig, Ramon and I discuss how SpeechBoard is built and why this product hasn’t existed until recently. We also discuss the podcast world, which Craig is deeply familiar with as the host of Y-Combinator’s podcast. The YC podcast is one of my favorite shows, and if you like SE Daily, you will probably like the YC podcast, so check it out. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 23, 20181h 0m

Ep 717Container Instances with Gabe Monroy

In 2011, platform-as-a-service was in its early days. It was around that time that Gabe Monroy started a container platform called Deis, with the goal of making an open source platform-as-a-service that anyone could deploy to whatever infrastructure they wanted. Over the last six years, Gabe had a front row seat to the rise of containers, the variety of container orchestration systems, and the changing open source landscape. Every container orchestration system consists of a control plane, a data plane, and a scheduler. In the last few weeks, we have been exploring these different aspects of Kubernetes in detail. Last year, Microsoft acquired Deis, and Gabe began working on the Azure services that are related to Kubernetes–Azure Container Service, Kubernetes Service, and Container Instances. In this episode, Gabe talks about how containerized applications are changing, and what developments might come in the next few years. Kubernetes, functions-as-a-service, and container instances are different cloud application runtimes, with different SLAs, interfaces, and economics. Gabe provided some thoughts on how different application types might use those different runtimes. Full disclosure: Microsoft is a sponsor of Software Engineering Daily. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 22, 201850 min

Ep 716Service Mesh Design with Oliver Gould

Oliver Gould worked at Twitter from 2010 to 2014. Twitter’s popularity was taking off, and the engineering team was learning how to scale the product. During that time, Twitter adopted Apache Mesos, and began breaking up its monolithic architecture into different services. As more and more services were deployed, engineers at Twitter decided to standardize communications between those services with a tool called a service proxy. A service proxy provides each service with features that every service would want: load balancing, routing, service discovery, retries, and visibility. It turns out that lots of other companies wanted this service proxy technology as well, which is why Oliver left Twitter to start Buoyant, a company that was focused on developing software around the service proxy–and eventually the service mesh. If you are unfamiliar with service proxies and service mesh, check out our previous shows on Linkerd, Envoy, and Istio. Kubernetes is often deployed with a service mesh. A service mesh consists of two parts: the data plane and the control plane. The “data plane” refers to the sidecar containers that are deployed to each of your Kubernetes application pods. Each sidecar has a service proxy. The “control plane” refers to a central service that aggregates data from across the data plane and can send communications to the service proxies sitting across that control plane. The Linkerd service mesh was built in Java, and the project started before Kubernetes had become the standard for container orchestration. More recently, Buoyant built Conduit, a service mesh built using Rust and Go. In this episode, we explore how to design a service mesh and what Oliver learned in his experience building Linkerd and Conduit. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 19, 201855 min

Ep 715Kubernetes Storage with Bassam Tabbara

Modern applications store most of their data on hosted storage solutions. We use hosted block storage to back databases, hosted object storage for objects such as videos, and hosted file storage for file systems. Using a cloud provider for these storage systems can simplify scalability, durability, and availability–it can be less painful than taking care of storage yourself. One downside: the storage systems offered by the cloud providers are not open source. The APIs might vary from provider to provider. Wiring your application to a particular storage service on a particular cloud could tightly couple you to that cloud. Rook is a project for managing storage, built on Kubernetes. If you use a Rook cluster for your storage, you can port that storage model to any cloud, and have a consistent API for object, block, and file storage. In this episode, Bassam Tabbara describes the state of cloud storage, and why he started the Rook project. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 18, 201856 min

Ep 714Kubernetes State Management with Niraj Tolia

A common problem in a distributed system: how do you take a snapshot of the global state of that system? Snapshot is difficult because you need to tell every node in the system to simultaneously record its state. There are several reasons to take a snapshot. You might want to take a picture of the global state for the purposes of debugging. Or you might want to take a comprehensive snapshot of your system (including the database) and port your system from one cloud to another. Or you might just need to take a snapshot for disaster recovery. When a Kubernetes application is deployed, its initial configuration is described in config files. After a deployment, the state of the application might change–some nodes die, some services get scaled up. At any given time, the current state of a Kubernetes cluster is described by etcd, a distributed key-value store. Niraj Tolia is CEO of Kasten, a company that provides data management, backups, and disaster recovery for Kubernetes applications. Niraj joins the show to describe how Kubernetes deployments manage state, and what the modern business environment is around Kubernetes. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 17, 201851 min

Ep 713Kubernetes Operations with Brian Redbeard

In the last four years, CoreOS has been at the center of enterprise adoption of containers. During that time, Brian Harrington (or “Redbeard”) has seen a lot of deployments. In this episode, Brian discusses the patterns he has seen among successful Kubernetes deployments–and the pitfalls of the less successful. How should you manage configuration? How can you avoid IP address overlap between containers? How should you log and monitor your Kubernetes cluster–and whose responsibility is it to set all that stuff up? Brian also discusses the motivation for multi-cloud deployments, and how to implement multi-cloud Kubernetes. CoreOS offers a distributed systems management tool called Tectonic, which uses Kubernetes for container orchestration. In a time where there are lots of options to choose from when it comes to managed Kubernetes providers, it was great to hear Brian describe some of the architectural decisions for building Kubernetes into Tectonic. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 16, 201848 min

Ep 712FluentD with Eduardo Silva

A backend application can have hundreds of services written in different programming frameworks and languages. Across these different languages, log messages are produced in different formats. Some logging is produced in XML, some is produced in JSON, some is in other formats. These logs need to be unified into a common format, and centralized for any developer who wants to debug. The popularity of Kubernetes is making it easier for companies to build this kind of distributed application, where different services of different languages are communicating over a network, with a variety of log message types. Fluentd is a tool for solving this problem of log collection and unification. In today’s episode, Eduardo Silva joins the show to describe how Fluentd is deployed to Kubernetes, and what the role of Fluentd is within a Kubernetes logging pipeline. We also discuss the company where Eduardo works–Treasure Data. The story of Treasure Data is unusual. The team started out doing log management, but has found itself moving up the stack, into marketing analytics, sales analytics, and customer data management. This story might be useful for anyone who is open source developer thinking about how to evolve your project into a business. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 15, 201849 min

Ep 710The Gravity of Kubernetes with Jeff Meyerson

Kubernetes has become the standard way of deploying new distributed applications. Most new internet businesses started in the foreseeable future will leverage Kubernetes (whether they realize it or not). Many old applications are migrating to Kubernetes too. Before Kubernetes, there was no standardization around a specific distributed systems platform. Just like Linux became the standard server-side operating system for a single node, Kubernetes has become the standard way to orchestrate all of the nodes in your application. With Kubernetes, distributed systems tools can have network effects. Every time someone builds a new tool for Kubernetes, it makes all the other tools better. And it further cements Kubernetes as the standard. Google, Microsoft, Amazon, and IBM each have a Kubernetes-as-a-service offering, making it easier to shift infrastructure between the major cloud providers. We are likely to see Digital Ocean, Heroku, and longer tail cloud providers offer a managed, hosted Kubernetes eventually. In this editorial, I explore the following questions: Click here to read the full “Gravity of Kubernetes” editorial by Jeff Meyerson.

Jan 13, 20181h 2m

Ep 709Kubernetes Vision with Brendan Burns

Kubernetes has become the standard system for deploying and managing clusters of containers. But the vision of the project goes beyond managing containers. The long-term goal is to democratize the ability to build distributed systems. Brendan Burns is a co-founder of the Kubernetes project. He recently announced an open source project called Metaparticle, a standard library for cloud-native development: Metaparticle builds on top of Kubernetes primitives to make distributed synchronization easier… It supplies language independent modules for locking and leader election as easy-to-use abstractions in familiar programming languages. After decades of distributed systems research and application, patterns have emerged about how we build these systems. We need a way to lock a variable, so that two nodes will not be able to write to that variable in a nondeterministic fashion. We need a way to do master election, so that if the master node dies, the other nodes can pick a new node to orchestrate the system. We know that just about every distributed application needs locking and leader election–so how can we build these features directly into our programming tools, rather than bolting them on? With Kubernetes providing a standard operating system for distributed applications, we can start to build standard libraries that assume we have access to underlying Kubernetes primitives. Instead of calling out to external tools like Zookeeper and etcd, a standard library like Metaparticle will abstract them away. An example: if I am writing a system to do distributed mapreduce, I would like to avoid thinking about node failures and race conditions. Brendan’s idea is to push those problems down into a standard library–so the next developer who comes along with a new idea for a multi-node application has an easier time. Brendan Burns currently works as a distinguished engineer at Microsoft, and he joins the show to discuss why it is still hard to build distributed systems and what can be done to make it easier. This is the second time we have had Brendan on the show. The first time he came on, he discussed the history of Kubernetes, and some of the design decisions of the system. This episode was more about the future. Full disclosure: Microsoft (where Brendan is employed) is a sponsor of Software Engineering Daily. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 12, 201853 min

Ep 708High Volume Distributed Tracing with Ben Sigelman

Ben Sigelman began working on distributed tracing when he was at Google and authored the “Dapper” paper. Dapper was implemented at Google to help debug some of the distributed systems problems faced by the engineers who work on Google infrastructure. Today, a decade after he started thinking about distributed tracing, Ben Sigelman is the CEO of Lightstep, a company that provides distributed tracing and other monitoring technologies. Lightstep’s distributed tracing model still bears a resemblance to the same techniques described in the paper–so I was eager to learn the differences between open source versions of distributed tracing (such as OpenZipkin) and enterprise providers such as Lightstep. The key feature of Lightstep that we discussed: garbage collection. If you are using a distributed tracing system, you could be collecting a lot of traces. You could collect a trace for every single user request. Not all of these traces are useful–but some of them are very useful. Maybe you only want to keep track of traces that take an exceptionally long latency. Maybe you want to keep every trace in the last 5 days, and destroy them over time. So, the question of how to manage the storage footprint of those traces was as interesting as the discussion of distributed tracing itself. Beyond the distributed tracing features of his product, Ben has a vision for how his company can provide other observability tools over time. I spoke to Ben at Kubecon–and although this conversation does not talk about Kubernetes specifically, this topic is undoubtedly interesting to people who are building Kubernetes technologies. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 11, 201857 min

Ep 707Kubernetes on AWS with Arun Gupta

Since Kubernetes came out, engineers have been deploying clusters to Amazon. In the early years of Kubernetes, deploying to AWS meant that you had to manage the availability of the cluster yourself. You needed to configure etcd and your master nodes in a way that avoided having a single point of failure. Deploying Kubernetes on AWS became simpler with an open-source tool called kops (short for Kubernetes Operations). Kops automates the provisioning and high-availability deployment of Kubernetes. In late 2017, AWS released a managed Kubernetes service called EKS. EKS allows developers to run Kubernetes without having to manage the availability and scaling of a cluster. The announcement of EKS was exciting, because it means that all of the major cloud providers are officially supporting Kubernetes. Arun Gupta is a principal open source technologist at AWS, and he joins the show to explain what is involved in deploying and managing a Kubernetes cluster. Arun describes how to operate a Kubernetes cluster, including logging, monitoring, storage, and updates. If you are convinced that you want to use Kubernetes, but you aren’t sure yet how you want to deploy it, this will be useful information for you. We also discussed how Amazon built EKS, and some of the architectural decisions they made. AWS has had a managed container service called ECS since 2014. The development of ECS was instructive for the AWS engineers who built EKS. Amazon wanted to make EKS able to integrate with both open source tools and the Amazon managed services. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 10, 201849 min

Ep 706Istio Motivations with Louis Ryan

A single user request hits Google’s servers. A user is looking for search results. In order to deliver those search results, that request will have to hit several different internal services on the way to getting a response. These different services work together to satisfy the user request. All of these services need to communicate efficiently, they need to scale, and they need to be secure. Services need to have a consistent way of being “observable”–allowing logging and monitoring. Services need to have proper security. Since every service wants these different features (like communication, load balancing, security), it makes sense to build these features into a common system that can be deployed to every server. Louis Ryan has spent his years at Google working on service infrastructure. During that time, he has seen massive changes in the way traffic flows through Google. First, the rise of Android, and all of the user traffic from mobile phones. And second, the rise of Google Cloud Platform, which meant that Google was now responsible for nodes deployed by users outside of Google. These two changes–mobile and cloud–led to an increase in the amount of traffic and the type of traffic. All of this traffic leads to more internal services communicating with each other. How does service networking change in such an environment? Google’s adaptation to the new networking conditions is to introduce a “service mesh”. A service mesh is a network for services. It provides observability, resiliency, traffic control, and other features to every service that plugs into it. Each service needs to plug into the service mesh. In Kubernetes, services connect to the mesh through a sidecar. Let me explain the term “sidecar.” Kubernetes manages its resources in pods, and each pod contains a set of containers. You might have a pod that is dedicated to responding to any user that is requesting a picture of a cat. Within that pod, you not only have the container that serves the cat picture–you also have other “sidecar” containers that help out an application container. You could have a sidecar that gets deployed next to your application container that handles logging, or a sidecar that helps out with monitoring, or network communications. If you are using the Istio service mesh, that means that you are using a sidecar called Envoy. Envoy is a sidecar called a “service proxy” that provides configuration updates, load balancing, proxying, and lots of other benefits. If we get all that out of Envoy, why do we need a separate abstraction of a “service mesh”? Because it helps to have a tool that aggregates and centralizes all the different communications among these proxies. Every service gets a sidecar for a service proxy. Every service proxy communicates with the centralized service mesh. Louis Ryan joins this episode to explain the motivations for building the Istio service mesh, and the problems it solves for Kubernetes developers. For the next two weeks, we are covering exclusively the world of Kubernetes. Kubernetes is a project that is likely to have as much impact as Linux–and it is very early days. Whether you are an expert in Kubernetes or you are just starting out, we have lots of episodes to fit your learning curve. To find all of our old episodes about Kubernetes (including a previous show about Istio), download the Software Engineering Daily app for iOS or for Android. In other podcast players, only the most 100 recent episodes are available, but in our apps you can find all 650 episodes–and there is also plenty of content that is totally unrelated to Kubernetes! Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 9, 201855 min

Ep 705Kubernetes Usability with Joe Beda

With the community centralizing on Kubernetes, developers are able to comfortably bet big on open source projects like Istio, Conduit, Rook, Fluentd, and Helm, each of which we will be covering in the next few weeks. The centralization on Kubernetes also makes it easier to build enterprise companies, who are no longer trying to think about which container orchestration to support. There is a wide array of Kubernetes-as-a-service providers offering a highly available runtime–and a variety of companies offering observability tools to make it easier to debug distributed systems problems. Despite all of these advances–Kubernetes is less usable than it should be. It still feels like operating a distributed system. Hopefully someday, operating a Kubernetes cluster will be as easy as operating your laptop computer. To get there, we need improvements in Kubernetes usability. Today’s guest Joe Beda was one of the original creators of the Kubernetes project. He is a founder of Heptio, a company that provides Kubernetes tools and services for enterprises. I caught up with Joe at KubeCon 2017, and he told me about where Kubernetes is today, where it is going, and what he is building at Heptio. Full disclosure–Heptio is a sponsor of Software Engineering Daily. For the next two weeks, we are covering exclusively the world of Kubernetes. Kubernetes is a project that is likely to have as much impact as Linux. Whether you are an expert in Kubernetes or you are just starting out, we have lots of episodes to fit your learning curve. To find all of our old episodes about Kubernetes, download the Software Engineering Daily app for iOS or for Android. In other podcast players, only the most 100 recent episodes are available, but in our apps you can find all 650 episodes–and there is also plenty of content that is totally unrelated to Kubernetes! Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 8, 20181h 8m

Ep 704Cloud R&D with Onsi Fakhouri

In the first 10 years of cloud computing, a set of technologies emerge that every software enterprise needs; continuous delivery, version control, logging, monitoring, routing, data warehousing. These tools were built into the Cloud Foundry project, a platform for application deployment and management. As we enter the second decade of cloud computing, another new set of technologies are emerging as useful tools. Serverless functions allow for rapid scalability at a low cost. Kubernetes offers a control plane for containerized infrastructure. Reactive programming models and event sourcing make an application more responsive and simplify the interactions between teams who are sharing data sources. The job of a cloud provider is to see new patterns in software development and offer tools to developers to help them implement those new patterns. Of course, building these tools is a huge investment. If you’re a cloud provider, your customers are trusting you with the health of their application. The tool that you build has to work properly and you have to help the customers figure out how to leverage the tool and resolve any breakages. Onsi Fakhouri is the senior VP of R&D for cloud at Pivotal, a company that provides a software and support for Spring, Cloud Foundry and several other tools. I sat down with Onsi to discuss his strategy for determining which products Pivotal chooses to build. There are a multitude of engineering and business elements that Onsi has to consider when allocating resources to a project. Cloud Foundry is used by giant corporations like banks, telcos and automotive manufacturers. Spring is used by most enterprises that run Java, including most of the startups that I have worked at in the past. Cloud Foundry has to be able to run on premise and in the cloud providers like AWS, Google and Microsoft. Pivotal also has its own cloud, Pivotal Web Services, and all of these stakeholders have different technologies that they would like to see built. Onsi’s job is to determine which ones have the highest net impact and make a decision on those and allocate resources towards them. I interviewed Onsi at Spring One Platform, which is a conference that is organized by Pivotal who, full disclosure, is a sponsor of Software Engineering Daily. This week’s episodes are all conversations from that conference, and if there’s a conference that you think I should attend and do coverage at, let me know. Whether you like this format or not, I would love to get your feedback. We have some big developments coming for Software Engineering Daily in 2018 and we want to have a closer dialogue with the listeners. Please send me an email, [email protected] or join our Slack channel. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 5, 201855 min

Ep 703Spring Data with John Blum

In the 1980s and the 1990s, most applications used only a relational database for their data management. In the early 2000s, software projects started to use an ever increasing number of data sources. MongoDB popularized the document database, which allows storage of objects that do not have a consistent schema. The Hadoop distributed file system enabled the redundant storage and efficient querying of high volumes of data that are spread out across multiple commodity disks. The Cassandra Database is a hybrid between key-value storage and column-oriented storage. The benefit of these different data systems is that you can choose a system that gives you the read and write performance that you need. The downside is that each of these databases has different querying semantics. If you’re a developer trying to access data from your application, you often need to know how to access that data from the specific data source and whether that data needs to be queried with SQL, or with the document style query, or with a MapReduce job. Spring Data is a project to standardize the programming model for data access within Spring. The vision for the project is to give Spring developers a consistent way to access their data from any database, or retaining the performance characteristics of those databases. Spring is a Java framework for writing web applications, but this conversation is useful even for people who are not building these Spring applications. Whatever application you’re building, you are probably pulling from multiple data sources. The question of how to abstract away the complexity of those multiple data sources is also being tackled by projects such as GraphQL and Falcor. John Blum is a staff engineer who works on the Spring Data Project at Pivotal. He joins the show to discuss how to design a data access layer. We discussed the API between a database and the Spring Data layer and also talked about reactive programming. Reactive programming allows the application layer to respond to changes in the underlying data layer. I interviewed John at SpringOne Platform, which is a conference that is organized by Pivotal, who full disclosure is a sponsor of Software Engineering Daily. This week’s episodes are all conversations from that conference. If there’s a conference that you think I should attend and do some coverage at, please let me know. Whether you like this format or not, I would love to get your feedback. We have some big developments coming for Software Engineering Daily in 2018, and we want to have a closer dialogue with the listeners. Please send me an e-mail [email protected], let me know what’s up. Or join our Slack channel. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 4, 20181h 0m

Ep 702Cloud Foundry with Rupa Nandi

Cloud Foundry is an open-source platform as a service for deploying and managing web applications. Cloud Foundry is widely used by enterprises who are running applications that are built using Spring, a popular web framework for Java applications, but developers also use Cloud Foundry to manage apps built in Ruby, Node and any other programming language. Cloud Foundry includes routing, message brokering, service discovery, authentication and other application level tooling for building and managing a distributed system. Some of the standard tooling in Cloud Foundry was adopted from Netflix open-source projects, such as Hystrix, which is the circuit breaker system; and Eureka, which is the service discovery server and client. When a developer deploys their application to Cloud Foundry, the details of what is going on are mostly abstracted away, which is by design. When you’re trying to ship code and iterate quickly for your organization, you don’t want to think about how your application image is being deployed to underlying infrastructure. You don’t want to think about whether you’re deploying a container or a VM, but if you use Cloud Foundry enough, you might have become curious about how Cloud Foundry schedules and runs application code. BOSH is a component of Cloud Foundry that sits between the infrastructure layer and the application layer. Cloud Foundry can be deployed to any cloud provider because of BOSH’s well-defined interface. BOSH has the abstraction of a stem cell, which is a versioned operating system image wrapped in packaging for whatever infrastructure as a service is running underneath. With BOSH, whenever a VM gets deployed no your underlying infrastructure, that VM gets a BOSH agent. The agent communicates with the centralized component of BOSH called the director. This role of director is the leader of the distributed system. Rupa Nandi is a director of engineering at Pivotal where she works on Cloud Foundry. In this episode we talked about scheduling an infrastructure, the relationship between Spring and Cloud Foundry and the impact of Kubernetes, which Cloud Foundry has integrated with so that users can run Kubernetes workloads on Cloud Foundry. I interviewed Rupa at SpringOne Platform, a conference that is organized by Pivotal who, full disclosure, is a sponsor of Software Engineering Daily, and this week’s episode are all conversations from that conference. Whether you like this format or don’t like this format, I would love to get your feedback. We have some big developments coming for Software Engineering Daily in 2018 and we want to have a closer dialogue with the listeners. Please send me an email, [email protected] or join our Slack channel. We really want to know what you’re thinking and what your feedback is, what you would like to hear more about, what you’d like to hear less about, who you are. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Jan 3, 201854 min

Ep 701Dwarf Fortress with Tarn Adams Holiday Repeat

Originally published October 22, 2015 Dwarf Fortress is a construction and management simulation computer game set in a procedurally generated fantasy world in which the player indirectly controls a group of dwarves, and attempts to construct a successful underground fortress. Tarn Adams works on Dwarf Fortress with his brother Zach.

Dec 29, 20171h 5m

Ep 700Language Design with Brian Kernighan Holiday Repeat

Originally published January 6, 2016 Brian Kernighan is a professor of computer science at Princeton University and the author of several books, including The Go Programming Language and The C Programming Language, a book more commonly referred to as K&R. Professor Kernighan also worked at Bell Labs alongside Unix creators Ken Thompson and Dennis Ritchie and contributed to the development of Unix.

Dec 28, 20171h 10m

Ep 699Software and Entrepreneurship with Seth Godin Holiday Repeat

Originally published November 18, 2015 Seth Godin is a writer, speaker, and entrepreneur. He is the author of many books, including most recently, What To Do When It’s Your Turn.

Dec 27, 201736 min

Ep 698Knowledge-Based Programming with Stephen Wolfram Holiday Repeat

Originally published November 10, 2015 Wolfram Research makes computing software powered by the Wolfram language, a knowledge-based programming language that draws from symbolic and functional programming paradigms. Stephen Wolfram is the Founder and CEO of Wolfram Research, and also the author of A New Kind of Science.

Dec 26, 20171h 21m

Ep 697Machine Learning and Technical Debt with D. Sculley Holiday Repeat

Originally published November 17, 2015 Technical debt, referring to the compounding cost of changes to software architecture, can be especially challenging in machine learning systems. D. Sculley is a software engineer at Google, focusing on machine learning, data mining, and information retrieval. He recently co-authored the paper Machine Learning: The High Interest Credit Card of Technical Debt.

Dec 25, 201734 min

Ep 696Modern War with Peter Warren Singer

Military force is powered by software. The drones that are used to kill suspected terrorists can identify those terrorists using the same computer vision tools that are used to identify who is in an Instagram picture. Nuclear facilities in Iran were physically disabled by the military-sponsored Stuxnet virus. National intelligence data is collected and processed using the MapReduce algorithm. The military keeps up with technology more effectively than lawmakers. It is common to read a quote from a senator or a judge that shows a basic misunderstanding of cybersecurity. Many politicians do not even use email. There is a large and growing knowledge gap between military capability and the technological savvy of policymakers. On the whole, government is not prepared for modern warfare. Just like in social media information wars, the instigators of conflict have an advantage. And the ability to instigate such a conflict is democratized. Social media, open source software, and cloud computing give a technologist superpowers. Cryptocurrencies can anonymize the financial transactions to pay for such tools, and basic encryption can anonymize the terroristic acts that occur over a remote internet connection. Peter Warren Singer is a political scientist who formerly worked in the United States advisory committee on International Communications and Information Policy. He is also an author, whose books include Wired for War, Cybersecurity and Cyberwar: What Everyone Needs to Know, and Ghost Fleet: A Novel of the Next World War. Peter writes about the circumstances that could lead to global warfare, and how military actors might behave in a third world war. In this episode, Peter shares a dark, but realistic vision that we should all hope to avoid. If you like this episode, we have done many other shows on related topics–including drones, IoT security, and automotive cybersecurity. To find these old episodes, you can download the Software Engineering Daily app for iOS and for Android. In other podcast players, you can only access the most recent 100 episodes. With these apps, we are building a new way to consume content about software engineering. They are open-sourced at github.com/softwareengineeringdaily. If you are looking for an open source project to get involved with, we would love to get your help. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 22, 20171h 11m

Ep 695React Components with Max Stoiber

Modern frontend development is about components. Whether we are building an application in React, Vue, or Angular, components are the abstractions that we build our user interfaces out of. Today, this seems obvious, but if you think back five years ago, frontend development was much more chaotic–partly because we had not settled around this terminology of the component. React has become the most popular frontend framework, and part of its growth is due to the ease and reusability of components across the community. It’s easy to find building blocks that you can use to piece together your frontend application. Do you need a video player component? Do you need a news feed component? A profile component? All of these things are easy to find. As you build a React application, you take some open source components off the shelf, and you build others yourself. To keep things looking nice and consistent, you need to style your components. If you are not careful with how you manage your stylesheets, you can end up with inconsistent stylings and namespace conflicts. Max Stoiber is the creator of styled-components, a project to help enforce best practices around styling components. He has also a founder of Spectrum, a system that allows people to build online communities. Spectrum has similar design and engineering challenges to Slack or Facebook, so it made for a great discussion of modern software architecture. In today’s episode, Max and I had a wide-ranging conversation about frontend frameworks, components, and the process of building a product. Max also describes the advantages of using GraphQL and the Apollo toolchain. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 21, 201749 min

Ep 694Managing Engineers with Ron Lichty

“Management is about human beings. Its task is to make people capable of joint performance. To make their strengths effective and their weaknesses irrelevant.” That quote is from Peter Drucker. It is one of the many useful quotes collected in Ron Lichty’s book “Managing the Unmanageable”—and it illustrates why we work in teams. When we collaborate with each other, we make each other’s strengths effective, and our weaknesses become irrelevant. To collaborate effectively, we need leaders. We need management. Ron Lichty spent 6 years managing engineers at Apple, and many more years in management and director roles elsewhere. In his book, Ron lays out the lessons he learned in 30 years of engineering management. Ron also describes concrete strategies for how to manage engineers productively. An engineer who becomes a manager needs to learn new skills. And the hardest skills to master have nothing to do with technology. Prioritizing the right projects, allocating engineering resources, making architectural decisions—all of those skills are important. But the art of relationships—of diplomacy and language—is harder to learn than any technical skill. How do you motivate an engineer to do something that is boring? How do you have a difficult conversation with an engineer who needs to improve? When a conflict between engineers comes up, do you confront the conflict head-on, or do you wait for those engineers to resolve it among themselves? These questions do not have easy answers. The best way to learn how to react to these situations is to live through them. The second best way to learn is to read and listen to people who have seen so much of the management dynamic that they can distill it into anecdotes and aphorisms. In today’s show, Ron shares several stories that changed how I think about management. Ron and I did not have time to discuss everything I wanted to, and I recommend checking out his podcast episode on Software Engineering Radio for more detail. And also check out his book—Managing the Unmanageable. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 20, 20171h 3m

Ep 693Hacker Noon with David Smooke

The New York Times makes most of its money off of subscriptions. Facebook makes its money off of native advertising. Hacker News is funded by Y-Combinator. Each of these business models creates biases in the information that gets promoted on the respective platforms. This is why I like to know the origin story and the business models behind the publications that I read. Published content is shaped by the profit motive of the publication. And yet, last month, I repeatedly found myself reading high quality content on a Medium publication that I did not know the origin of: Hacker Noon. Hacker Noon is a popular Medium publication that syndicates curated content written about software. Let me explain “syndication.” Imagine that I just spent three days on a Medium post about functional programming, and I have zero followers on social media. How can I get people to read awesome post? The answer is syndication. I can submit my Medium post to Hacker Noon. This gives me free distribution, and it gives Hacker Noon free content—a win-win relationship. But why was it worth it for Hacker Noon to spend time curating content? That syndication process takes time. You have to read through lots of submissions, sometimes you have to send it back to the author to have it edited. And this is all to build a following on Medium. I have not heard of Medium being a profitable platform to build a business. It’s worth pointing out the difference between Medium and WordPress. On WordPress, this model of curated syndication has worked to massive success—for example, the Huffington Post and TechCrunch. These businesses make millions of dollars from advertising networks, because they are built on WordPress, and WordPress is an open model. A publisher on WordPress can install plugins that serve ads from third party providers like Outbrain and Taboola. A WordPress site can also install any kind of data collection scripts, to gather data on visitors, and sell it to the highest bidder. The lack of third party plugins is the blessing and the curse of Medium. Because there is no third party ecosystem, reading content on Medium is a beautiful experience. The page loads quickly and predictably. There are no random scripts that are blocking the page as they hog your browser’s resources. When you go to close the page, there is never a popup that asks you to subscribe to a newsletter. When I read content on Medium, I am not getting slapped across the face with ads for reverse mortgages and açaí berries. I am not being tagged for retargeting. It’s a beautiful experience. But Medium seems like an ecosystem that would not allow for the content syndication business like Hacker Noon. I wanted to know who was running Hacker Noon, how the business works, and what it says about Medium as a publishing platform. Hacker Noon turns out to be part of a network of Medium publications called AMI. AMI’s network includes sites like Art + Marketing, Future Travel, and Fit Yourself Club–all of which are distinct syndication platforms. David Smooke is the CEO of AMI, and he joins this episode to explain how his business works, how he has scaled the content syndication business, and why he is betting on Medium. It was a detailed look into the state of online publishing and where it might be headed. If you don’t read Hacker Noon already, one article to start with that shows off the quality of content is Learn Blockchains By Building One. I interviewed the author of that article, Daniel Van Flymen, and it has been one of the most popular episodes of Software Engineering Daily. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 19, 20171h 1m

Ep 692Protocol Buffers with Kenton Varda

When engineers are writing code, they are manipulating objects. You might have a user object represented on your computer, and that user object has several different fields—a name, a gender, and an age. When you want to send that object across the network to a different computer, the object needs to be turned into a sequence of 1s and 0s that will travel efficiently across the network. This is known as “serialization.” As the user object sits on your computer, it is represented in 1s and 0s. You could just send that same representation over the wire. But we use efficient serialization to send it over the network in a more compact format. We also have to make sure that when we send that object to another service, the other service knows how to deserialize it, and turn it back into a format that we can operate on at the application level. Protocol buffers are a serialization protocol that originated at Google. Protocol buffers created a standardized interface for efficiently passing data between services. When Kenton Varda worked at Google, he was the tech lead for protocol buffers, and he joins the show to explain how protobufs work—and a newer serialization protocol that Kenton led: Cap’n Proto. You can expect to walk away from this episode with an understanding of how serialization protocols work, and the design tradeoffs you can make when creating a serialization protocol. We also touched on a startup that Kenton founded, called Sandstorm, and how he eventually found himself at Cloudflare, where he works on Cloudflare workers. With these topics, we did not go as deep as I would have liked, and I look forward to having Kenton back on in the near future. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 18, 201755 min

Ep 691High Volume Logging with Steve Newman

Google Docs is used by millions of people to collaborate on documents together. With today’s technology, you could spend a weekend coding and build a basic version of a collaborative text editor. But in 2004 it was not so easy. In 2004 Steve Newman built a product called Writely, which allowed users to collaborate on documents together. Initially, Writely was hosted on a single server that Steve managed himself. All of the reads and writes to the documents went through that single server. Writely rapidly grew in popularity, and Steve went through a crash course in distributed systems as he tried to keep up with the user base. In 2006, Writely was acquired by Google, and Steve spent his next four years turning Writely into Google Docs. Eventually he moved onto other projects within Google—“Cosmo” and “Megastore Replication.” When Steve left the company in 2010, he took with him the lessons of logging and monitoring that keep Google’s infrastructure observable. Large organizations have terabytes of log data to manage. This data streams off the servers that are running our applications. That log data gets processed in a “metrics pipeline” and turned into monitoring data. Monitoring data aggregates log data in a more presentable format. Most of the log messages that get created will never be seen with human eyes. These logs get aggregated into metrics, then compressed, and (in many cases) eventually thrown away. Different companies have different sensitivity around their logs, so some companies may not garbage collect any of their logs! When a problem occurs in our infrastructure, we need to be able to dig into our terabytes of log data and quickly find the root cause of a problem. If our log data is compressed and stored on disk, it will take longer to access it. But if we keep all of our logs in memory, it could get expensive. To review: if I want to build a logging system from scratch today I need to build: a metrics pipeline for converting log data into monitoring data; a complicated caching system, a way to store and compress logs; a query engine that knows how to ask questions to the log storage system; a user interface so I don’t have to inspect these logs via command line… The list of requirements goes on and on—which is why there is a huge industry around log management. And logging keeps evolving! One example we covered recently is distributed tracing, which is used to diagnose requests that travel through multiple endpoints. After Steve Newman left Google, he started Scalyr, a product that allows developers to consume, store, and query log messages. I was looking forward to talking to Steve about data engineering, and the query engine that Scalyr has architected, but we actually spent most of our conversation talking about the early days of Writely, and his time at Google—particularly the operational challenges of Google’s infrastructure. Full disclosure: Scalyr is a sponsor of Software Engineering Daily. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 15, 201758 min

Ep 690Scala at Duolingo with Andre Kenji Horie

Duolingo is a language learning platform with over 200 million users. On a daily basis millions of users receive customized language lessons targeted specifically to them. These lessons are generated by a system called the session generator. Andre Kenji Horie is senior engineer at Duolingo. He wrote about the process of rewriting the session generator, moving from Python to Scala and changing architecture at the same time. In this episode Adam Bell talks with him about the reasons for the rewrite, what drove them to move to Scala and the experience of moving from one technology stack to another. Rewriting Doulingo’s Engine in Scala Jobs at Duolingo Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 14, 201746 min

Ep 689Engineering Values with Lynne Tye

The values system of a company guides the actions of the engineers who work at that company. Some companies value open communication and a flat organization where anybody can talk to anyone else. Other companies encourage hierarchy and secrecy, so that employees are focused on their specific section of the company. Some companies take themselves seriously, and have a work environment that is as stoic as the military. Other companies pride themselves on having good beer and a friendly, laid back atmosphere. When company values are properly defined, the values can be used as reference points when making decisions. At Amazon, one of the core company values is “bias for action.” As an engineer, you are often in a situation where you can wait for more information, or you can start a project with an incomplete picture for how you will finish it. The “bias for action” lets you know that you should usually start the project despite having an incomplete picture. Another use of a company values system is for hiring. When a company publishes their values, prospective employees can use those stated values as a way to know if they would be a good cultural fit. For example “move fast and break things” was a value that allowed Facebook to ship new products faster than any other company before it. But the speed of movement is not for everyone. Some engineers like to have their code unit tested, and free of all bugs before shipping to production. Every company has values that define their company. And every engineer has values that define how they want to work. Lynne Tye started her company Key Values as a platform to index companies by their values systems. This allows engineers to find companies that are a good cultural fit for their values system. Lynne joins the show today to explain how engineers and companies define their values systems, and how that affects the outcomes of engineering organizations. Lynne also talks about her time at HomeJoy, one of the first companies in the “gig economy”. HomeJoy was an on-demand house cleaning service that grew extremely fast, but ultimately went under due to lawsuits. The challenges of HomeJoy were a predictor of the challenges later faced by Uber and Airbnb, and it was fascinating to hear Lynne reflect on her time spent managing operations at HomeJoy–which was about as operationally intensive a company as you can imagine! Thanks to Courtland Allen for the intro to Lynne, and if you haven’t checked out the Indie Hackers podcast, which is hosted by Courtland, you should subscribe to it. Indie Hackers breaks down the engineering and business models behind small software companies–it’s one of my favorite shows. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 13, 201755 min

Ep 688Cloud Marketplace with Zack Bloom

Ten years ago, if you wanted to build software, you probably needed to know how to write code. Today, the line between “technical” and “non-technical” people is blurring. Website designers can make a living building sites for people on WordPress or Squarespace–without knowing how to write code. Salesforce integration experts can help a sales team set up complicated software–without knowing how to write code. Shopify experts can set up an ecommerce store to your exact specifications–without knowing how to write code. WordPress, Squarespace, Salesforce, and Shopify are all fantastic services–but they are not compatible with each other. I can’t install a WordPress plugin on Salesforce. Now imagine this from the point of view of plugin creators. Plugin creators make easy ways to integrate different pieces of software together. Take PayPal as an example. PayPal wants to make it easy for software builders to integrate with their API. One plugin that PayPal has is a button that says “Pay with PayPal.” If I am a developer at PayPal, and I am building a button that people should be able to easily put on their webpage so that their users can pay with PayPal, I have to create a button that is compatible with WordPress, and Squarespace, and Wix, and Weebly, and GoDaddy, and Blogger, and all the other website builders that I might want to integrate with. In 2014, Zack Bloom started a company called Eager. Eager was a cloud app marketplace which allowed app developers to make flexible plugins that non-technical users could drag and drop into their site without technical expertise. In order for these non-technical users to add any apps from the Eager marketplace to their webpage, they had to drop in a line of JavaScript–which is, unfortunately, a significant hurdle for a nontechnical user. Eager proved to be a useful distribution mechanism for plugin developers who could write a plugin once and get distributed to multiple plugin marketplaces. But Eager was not as widely used as a way to directly drag and drop plugins onto sites. The question was: how do you build a marketplace for non-technical users to add plugins to any website without forcing the non-technical user to write code? How do you make editing any website as easy as a WYSIWYG editor? The CDN turns out to be the perfect distribution platform for these kinds of apps. Users already integrate with a CDN, so the CDN can do the work of inserting the code that allows the plugins to be added to a user’s webpage. Because of the opportunity for the integration between a plugin marketplace and a CDN, Eager was acquired by Cloudflare, and Eager became Cloudflare apps. Zack Bloom joins the show today to discuss the motivations for his company, the engineering behind building a cloud app marketplace, and the acquisition process of his company Eager. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 12, 201759 min

Ep 687Scalable Multiplayer Games with Yan Cui

Remember when the best game you could play on your phone was Snake? In 1998, Snake was preloaded on Nokia phones, and it was massively popular. That same year Half-Life won game of the year on PC. Metal Gear Solid came out for Playstation. The first version of Starcraft also came out in 1998. In 1998, few people would have anticipated that games with as much interactivity as Starcraft would be played on mobile phones twenty years later. Today, mobile phones have the graphics and processing power of a desktop gaming PC from two decades ago. But one thing still separates desktop gaming from mobile gaming: the network. With desktop gaming, users have a reliable wired connection that keeps their packets moving over the network with speeds that let them compete with other users. With mobile gaming, the network can be flaky. How do we architect real-time strategy games that can be played over an intermittent network connection? Yan Cui is an engineer at Space Ape Games, a company that makes interactive multiplayer games for mobile devices. In a previous episode, Yan described his work re-architecting a social networking startup where the costs had gotten out of control. Yan has a skill for describing software architecture and explaining the tradeoffs. When architecting a multiplayer mobile game, there are many tradeoffs to consider. What do you build and what do you buy? Do you centralize your geographical deployment to make it easier to reconcile conflicts, or do you spread your server deployment out globally? What is the interaction between the mobile clients and the server? The question of interaction between client and server for a mobile game has lessons that are important for anyone building a highly interactive mobile application. For example, think about Uber. When I make a request for a car, I can look at my phone and see the car on the map, slowly approaching me. The driver can look at his phone and see if I move across the street. This is accomplished by synchronizing the data from the driver’s phone and my phone in a centralized server, and sending the synchronized state of the world out to me and the driver. How much data does the centralized server need to get from the mobile phones? How often does it need to make those requests? The answers to these questions will vary based on bandwidth, device type, phone battery life, and other factors. There are similar problems in mobile game engineering, when users are in different players on a virtual map. They are fighting each other, trying to avoid enemies, trying to steal power ups from each other. Mobile games can be even more interactive than a ridesharing app like Uber, so the questions of data synchronization can be even harder to answer. On Software Engineering Daily, we have explored the topic of real-time synchronization in our past shows about the infrastructure of Uber and Lyft. To find these old episodes, you can download the Software Engineering Daily app for iOS and for Android. In other podcast players, you can only access the most recent 100 episodes. With these apps, we are building a new way to consume content about software engineering. They are open-sourced at github.com/softwareengineeringdaily. If you are looking for an open source project to get involved with, we would love to get your help. Yan Cui’s new video course: AWS Lambda in Motion Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 11, 20171h 9m

Ep 686Decentralized Objects with Martin Kleppman

The Internet was designed as a decentralized system. Theoretically, if Alice wants to send an email to Bob, she can set up an email client on her computer and send that email to Bob’s email server on his computer. In reality, very few people run their own email servers. We all send our emails to centralized services like Gmail, and connect to those centralized services using our own client—a browser on our laptop or a mobile application on our smart phone. Gmail is popular because nobody wants to run their own email server—it’s too much work. With Gmail, our emails our centralized, but centralization comes with convenience. Similar centralization happened with online payments. Decentralization is a desirable feature of computer systems. So how do we make more of our applications decentralized? Martin Kleppman is a distributed systems researcher and the author of Data Intensive Applications. Martin is concerned by the centralization of our computer networks, and he works on CRDT technology in order to make it easier for people to build peer-to-peer applications. Most of the people who know how to build systems with CRDTs are distributed systems PhDs, database experts, and people working at huge internet companies. How do you make developer-friendly CRDTs? How do you allow random hackers to build peer-to-peer applications that avoid conflicts? Start by making a CRDT out of the most widely used, generalizable data structure in modern application development: the JSON object. In today’s episode, Martin and I talk about conflict resolution, CRDTs, and decentralized applications. This is Martin’s second time on the show, and his first interview is the most popular episode to date. You can find a link to that episode in the show notes for this episode, or you can find it in the Software Engineering Daily app for iOS and for Android. In other podcast players, you can only access the most recent 100 episodes. With these apps, we are building a new way to consume content about software engineering. They are open-sourced at github.com/softwareengineeringdaily. If you are looking for an open source project to get involved with, we would love to get your help. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 8, 20171h 18m

Ep 685Serverless Applications with Randall Hunt

Developers can build networked applications today without having to deploy their code to a server. These “serverless” applications are constructed from managed services and functions-as-a-service. Managed services are cloud offerings like database-as-a-service, queueing-as-a-service, or search-as-a-service. These managed services are easy to use. They take care of operational burdens like scalability and outages. But managed services typically solve a narrow use case. You can’t build an application entirely out of managed services. Managed services are scalable and narrow. Functions-as-a-service are scalable and flexible. With managed services, you make remote calls to a service with a well-defined API. With functions-as-a-service, you can deploy your own code. But functions-as-a-service execute against transient, unreliable compute resources. They aren’t a good fit for low latency computation, and the code you run on them should be stateless. Managed services and functions-as-a-service are the perfect complements. Managed services provide you with well-defined server abstractions that every application needs—like databases, search indexes, and queues. Functions as a service offer flexible “glue code” that you can use to create custom interactions between the managed services. The term “serverless” is used to describe the applications that are built entirely with managed services and functions as a service. Serverless applications are dramatically simpler to build and easier to operate than cloud applications of the past. The costs of managed services can get expensive, but the costs of functions as a service can cost 1/10th of what it might take to run a server that is handling your requests. Whether the size of your bill will increase or decrease as your company becomes “serverless” is less of an issue than the fact that your employees will be more productive: serverless applications have less operational burden, so developers spend more time architecting and implementing software. It has been 5 years since the Netflix infrastructure team was talking about the aspirational goal of a “no-ops” software culture. Your software should be so well-defined that you do not need regular intervention of ops staff to reboot your servers and reconfigure your load balancers. Serverless is a newer way of moving operational expense into capital expense. Today’s guest Randall Hunt is a senior technical evangelist with Amazon Web Services. He travels around the world meeting developers and speaking at conferences about AWS Lambda, the functions as a service platform from Amazon. Randall has given some excellent talks about how to architect and build serverless applications (which I will add to the show notes), and today we explore those application patterns further. Serverless Services – Randall Hunt Randall Hunt at AWS Summit Seoul Serverless, What is it Good For? Randall Hunt Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 7, 201743 min

Ep 684Data Science Mindset with Zacharias Voulgaris

A company’s approach to data can make or break the business. In the past, data was static. There was not much data, it sat in Excel, and it was interacted with on a nightly or monthly basis. Now, data is dynamic, real time and huge. To tap into available data, many industries have oriented themselves to becoming data intensive. With many new industry sectors becoming data driven, a new field called data science emerged. As a new field, data science has attracted a lot of attention from professionals with diverse backgrounds. Describing what is data science and who is a data scientist is not easy. As technologies surrounding the field continue to evolve and new verticals are added, the discourse surrounding the field has attracted different voices putting forward their definition of the field. In this episode, Zacharias Voulgaris joins guest host Sid Ramesh to discuss the developments in the field. He is the author of several data science books, and in today’s conversation Zacharias explains what he means by the data science mindset–including trends and misconceptions that people have on the field. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 6, 20171h 13m

Ep 683Secure Authentication with Praneet Sharma

When I log into my bank account from my laptop, I first enter my banking password. Then the bank sends a text message to my phone with a unique code, and I enter that code into my computer to finish the login. This login process is two-factor authentication. I am proving my identity by entering my banking password (the first factor) and validating that I am in control of my phone (the second factor) by receiving that text message. But in order to log in from my laptop, I need to be in control of my laptop. The laptop itself is a factor. With the laptop and my password, I have two factors. I might not actually need the phone as a factor. Praneet Sharma is the CEO of Keyless, a product that moves 2-factor authentication into the browser. Praneet joins the show to discuss how all kinds of authentication work: multi-factor authentication, single sign on, and Yubikey. We use this discussion of authentication methods to help explain why it actually could make sense for some people to be doing 2-factor authentication without requiring people to take out their phone. We also explore recent security breaches like Target, Equifax and Yahoo–and the industry of security software sold to developers. I see giant banners for security software companies every time I go into the San Francisco airport, and Praneet explained to me some of the products that these kinds of companies are selling. Praneet has joined the show in a previous episode to talk about advertising fraud. He also works with Shailin Dhar at Method Media Intelligence. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 5, 20171h 0m

Ep 682Serverless Scheduling with Rodric Rabbah

Functions as a service are deployable functions that run without an addressable server. Functions as a service scale without any work by the developer. When you deploy a function as a service to a cloud provider, the cloud provider will take care of running that function whenever it is called. You don’t have to worry about spinning up a new machine and monitoring that machine, and spinning the machine down once it becomes idle. You just tell the cloud provider that you want to run a function, and the cloud provider executes it and returns the result. Functions as a service can be more cost effective than running virtual machines or containerized infrastructure, because you are letting the cloud provider decide where to schedule your function, and you are giving the cloud provider flexibility on when to schedule the function. The developer experience for deploying a serverless function can feel mysterious. You send a blob of code into the cloud. Later on, you send a request to call that code in the cloud. The result of the execution of that code gets sent back down to you. What is happening in between? Rodric Rabbah is the principal researcher and technical lead in serverless computing at IBM. He helped design IBM Cloud Functions, the open source functions-as-a-service platform that IBM has deployed and operationalized as IBM Cloud Functions. Rodric joins the show to explain how to build a platform for functions as a service. When a user deploys a function to IBM Cloud Functions, that function gets stored in a database as a blob of text, waiting to be called. When the user makes a call to the function, IBM Cloud Functions takes it from the database and queues the function in Kafka, and eventually schedules the function onto a container for execution. Once the function has executed, IBM Cloud Functions stores the result in a database and sends that result to the user. When you execute a function, the time spent scheduling it and loading it onto a container is known as the “cold start problem”. The steps of executing a serverless function take time, but the resource savings are significant. Your code is just stored as a blob of text in a database, rather than sitting in memory on a server, waiting to execute. In his research for building IBM Cloud Functions, Rodric wrote about some of the tradeoffs for users who build applications with serverless functions. The tradeoffs exist along what Rodric calls “the serverless trilemma.” In today’s episode, we discuss why people are using functions-as-a-service, the architecture of IBM Cloud Functions, and the unsolved challenges of building a serverless platform. Full disclosure: IBM is a sponsor of Software Engineering Daily. OpenWhisk Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 4, 20171h 7m

Ep 681Animating VueJS with Sarah Drasner

Most user interfaces that we interact with are not animated. We click on a button, and a form blinks into view. We click a link and the page abruptly changes. On the other hand, when we interact with an application that has animations, we can feel the difference. The animations are often subtle. If you aren’t sure what I’m talking about, pay attention the next time you use Slack or Facebook Messenger or iMessage. Airbnb values animation so much that they built Lottie, a library for animation. In an animated application, the user interface feels alive. When a software team takes the time to build animations into small interactions, the user perceives the animations as polish and attention to detail. Sarah Drasner has been evangelizing the value of animations for years, and she is an expert at implementing complex and beautiful animations on the web. She works at Microsoft as a developer advocate and joins the show to talk about how to build animations. If you are building a web application and want to create a unique UI, you might find this show useful. JavaScript supports detailed animations, often through the manipulation of SVG files. SVG stands for “scalable vector graphics” a file format that represents an image in its own DOM. SVG is so flexible because of this DOM format, which defines the different parts of the SVG. This is in contrast to a bitmap, which is just a simple matrix of dots, without any rich metadata. You could manipulate SVG with raw JavaScript—but most people use a frontend JavaScript framework like React, Angular, or VueJS. Sarah has been implementing most of her recent web animations using Vue, and she is a member of the Vue core team. Vue has an entertaining story, because it gained popularity in a time when Google was supporting AngularJS and Facebook was supporting ReactJS. The first version of Vue was created from scratch by a single developer, Evan You. If you are a Vue developer looking for an open source project to hack on, you can check out softwaredaily.com, which is an open source platform to consume content about software. In addition to the Vue web app, we also have the Software Engineering Daily app for iOS and for Android. All of these apps are open-sourced at github.com/softwareengineeringdaily. If you are looking for an open source project to get involved with, we would love to get your help. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Dec 1, 201751 min

Ep 680React and GraphQL at New York Times

Are we a media company or a technology company? Facebook and the New York Times are both asking themselves this question. Facebook originally intended to focus only on building technology–to be a neutral arbiter of information. This has turned out to be impossible. The Facebook newsfeed is defined by algorithms that are only as neutral as the input data. Even if we could agree on a neutral data set to build a neutral newsfeed, the algorithms that generate this news feed are not public, so we have no way to vet their neutrality. Facebook is such a powerful engine for distribution, it has allowed for a rise in the number of publishers who can get their voice heard. As a result, large media companies have lost market share because Facebook has replaced their distribution. The New York Times has always been a media company–but the standards for media consumption have shot up. Millions of people produce content for free, and that content is distributed through high quality experiences like Twitter, YouTube, Medium, and Facebook. When a page takes too long to load on NewYorkTimes.com, it doesn’t matter how good the content is–the user is going to navigate away before they read anything. Today, the New York Times has built out an experienced engineering team. In a previous episode, we reported how the Times uses Kafka to make its old content more accessible. In today’s show, we talk about how the Times uses React and GraphQL to improve the performance and the developer experience of engineers who are building software at the New York Times. Scott Taylor and James Lawrie are software engineers at the New York Times. In this episode, they explain how the New York Times looks at technology. The user experience on New York Times rivals that of a platform company like Facebook, and this is assisted by technologies originally built at Facebook: React, Relay, and GraphQL. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Nov 30, 201755 min

Ep 679How IBM Runs Its Cloud with Jason McGee

Functions as a service let developers deploy stateless application logic that is cheap and scalable. Functions as a service still have some problems to overcome in the areas state management, function composition, usability, and developer education. Kubernetes is a tool for managing containerized infrastructure. Developers put their apps into containers on Kubernetes, and Kubernetes provides a control plane for deployment, scalability, load balancing, and monitoring. So–all of the things that you would want out of a managed service become much easier when you put applications into Kubernetes. This is why Kubernetes has become so popular–and it is why Kubernetes itself is being offered as a managed service by many cloud providers–including IBM. For the last decade, IBM has been building out its cloud offerings–and for two of those years, Jason McGee has been CTO of IBM Cloud Platform. In this episode, Jason discusses what it is like to build and manage a cloud, from operations to economics to engineering. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Nov 29, 20171h 3m

Ep 678Thumbtack Infrastructure with Nate Kupp

Thumbtack is a marketplace for real-world services. On Thumbtack, people get their house painted, their dog walked, and their furniture assembled. With 40,000 daily marketplace transactions, the company handles significant traffic. On yesterday’s episode, we explored how one aspect of Thumbtack’s marketplace recently changed, going from asynchronous matching to synchronous “instant” matching. In this episode, we zoom out to the larger architecture of Thumbtack, and how the company has grown through its adoption of managed services from both AWS and Google Cloud. The word “serverless” has a few definitions. In the context of today’s episode, serverless is all about managed services like Google BigQuery, Google Cloud PubSub, and Amazon ECS. The majority of infrastructure at Thumbtack is built using services that automatically scale up and down. Application deployment, data engineering, queueing, and databases are almost entirely handled by cloud providers. For the most part, Thumbtack is a “serverless” company. And it makes sense–if you are building a high-volume marketplace, you are not in the business of keeping servers running. You are in the business of improving your matching algorithms, your user experience, and your overall architecture. Paying for lots of managed services is more expensive than running virtual machines–but Thumbtack saves money from not having to hire site reliability engineers. Nate Kupp leads the technical infrastructure team, and we met at QCon in San Francisco to talk about how to architect a modern marketplace. This was my third time attending QCon and as always I was impressed by the quality of presentations and conversations I had there. They were also kind enough to set up some dedicated space for podcasters like myself. The most widely used cloud provider is AWS, but more and more companies that come on the show are starting to use some of the managed services from Google. The great news for developers is that integration between these managed services is pretty easy. At Thumbtack, the production infrastructure on AWS serves user requests. The log of transactions that occur get pushed from AWS to Google Cloud, where the data engineering occurs. On Google Cloud, the transaction records are queued in Cloud PubSub, a message queueing service. Those transactions are pulled off the queue and stored in BigQuery, a system for storage and querying of high volumes of data. BigQuery is used as the data lake to pull from when orchestrating machine learning jobs. These machine learning jobs are run in Cloud Dataproc, a managed service that runs Apache Spark. After training a model in Google Cloud, that model is deployed on the AWS side, where it serves user traffic. On the Google Cloud side, the orchestration of these different managed services is done by Apache Airflow, an open source tool that is one of the few pieces of infrastructure that Thumbtack does have to manage themselves on Google Cloud. To find out more about the Thumbtack infrastructure, check out the video of the talk Nate gave at QCon San Francisco, or check out the Thumbtack Engineering Blog. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Nov 28, 201744 min

Ep 677Marketplace Matching with Xing Chen

The labor market is moving online. Taxi drivers are joining Uber and Lyft. Digital freelancers are selling their services through Fiverr. Experienced software contractors are leaving contract agencies to join Gigster. Online labor marketplaces create market efficiency by improving the communications between buyers and sellers. Workers make their own hours, and their performance is judged by customers and algorithms, rather than the skewed perspective of a human manager. These marketplaces for human labor are in different verticals, but they share a common problem: how do you most efficiently match supply and demand? Perfect marketplace matching is an unsolved problem. Hundreds of computer science papers have been written about the problems of stable matching, which often turn out to be NP-Complete. The stock market has been attempting to automate marketplace matching for decades, and inefficiencies are discovered every year. Today’s show is about matching buyers and sellers on Thumbtack, a marketplace for local services. For the first seven years, Thumbtack was building liquidity in its 2-sided market. During those years, the model for job requests was as follows: let’s say I was on Thumbtack looking for someone to paint my house. I would post a job that would say I am looking for house painters. The workers on Thumbtack that paint houses could see my job and place a bid on it. Then I would choose from the bids and get my house painted. This was the “asynchronous” model. The actions of the buyer and seller were not synchronized. There was a significant delay between the time when the buyer posted a job and the time when a seller places a bid, and then another delay before the buyer selects from the sellers. Thumbtack recently moved to an “instant matching” model. After gathering data about the people selling services on the platform, Thumbtack is now able to avoid the asynchronous bidding process. In the new experience, a buyer goes on the platform, requests a house painter, and is instantly matched to someone who has a history of accepting house painting tasks that fit the parameters of the buyer. From the user’s perspective, this is a simple improvement. From Thumbtack’s perspective, there was significant architectural change required. In the asynchronous model, the user requests lined up in a queue, and were matched with pros who placed bids on the items in that queue. In the instant matching model, a user request became more like a search query–the parameters of that request hit an index of pros and returns a response immediately. Xing Chen is an engineer from Thumbtack, and joins the show to describe the rearchitecture process–how Thumbtack went from an asynchronous matching system to synchronous, instant matching. We also explore some of the other architectural themes of Thumbtack, which we dive into in further detail in tomorrow’s episode about scaling Thumbtack’s infrastructure, which uses both AWS and Google Cloud. On Software Engineering Daily, we have explored the software architecture and business models of different labor marketplaces–from Uber to Fiverr. To find these old episodes, you can download the Software Engineering Daily app for iOS and for Android. In other podcast players, you can only access the most recent 100 episodes. With these apps, we are building a new way to consume content about software engineering. They are open-sourced at github.com/softwareengineeringdaily. If you are looking for an open source project to get involved with, we would love to get your help. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Nov 27, 201754 min

Ep 676Load Balancing at Scale with Vivek Panyam

Facebook serves interactive content to billions of users. Google serves query requests on the world’s biggest search engine. Uber handles a significant percentage of the transportation within the United States. These services are handling radically different types of traffic, but many of the techniques they use to balance loads are similar. Vivek Panyam is an engineer with Uber, and he previously interned at Google and Facebook. In a popular blog post about load balancing at scale, he described how a large company scales up a popular service. The methods for scaling up load balancing are simple, but effective–and they help to illustrate how load balancing works at different layers of the networking stack. Let’s say you have a simple service where a user makes a request, and your service sends them a response with a cat picture. Your service starts to get popular, and begins timing out and failing to send a response to users. When your service starts to get overwhelmed, you can scale up load by creating another service instance that is a copy of your cat picture service. Now you have two service instances, and you can use a layer 7 load balancer to route traffic evenly between those two service instances. You can keep adding service instances as the load scales and have the load distributed among those new instances. Eventually, your L7 load balancer is handling so much traffic itself that you can’t put any more service instances in front of it. So you have to set up another L7 load balancer, and put an L4 load balancer in front of those L7 load balancers. You can scale up that tier of L7 load balancers, each of which is balancing traffic across a set of your service instances. But eventually, even your L4 load balancer gets overwhelmed with requests for cat pictures. You have to set up another tier, this time with L3 load balancing… In this episode, Vivek gives a clear description for how load balancing works. We also review the 7 networking layers before discussing why there are different types of load balancers associated with the different networking layers. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Nov 22, 201750 min

Ep 675Incident Response with Emil Storlarsky

As a system becomes more complex, the chance of failure increases. At a large enough scale, failures are inevitable. Incident response is the practice of preparing for and effectively recovering from these failures. An engineering team can use checklists and runbooks to minimize failures. They can put a plan in place for responding to failures. And they can use the process of post mortems to reflect on a failure and take full advantage of the lessons of that failure. Emil Storlarsky is a production engineer at Shopify where his role shares many similarities with that of Google’s site reliability engineers. In this episode, Emil argues that the academic study of emergency management and industries such as aerospace and transportation have a lot to teach software engineers about responding to production problems. In this interview with guest host Adam Bell, Emil argues that we need to move beyond tribal knowledge and incorporate practices such as an incident command system and rigorous use of checklists. Emil suggests that we need to move beyond a mindset of “move fast and break things” and toward a place of more deliberate preparation. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript. Incident Response Insights Talk The Human Side Of Post Mortems

Nov 21, 201754 min

Ep 674Run Less Software with Rich Archbold

There is a quote from Jeff Bezos: “70% of the work of building a business today is undifferentiated heavy lifting. Only 30% is creative work. Things will be more exciting when those numbers are inverted.” That quote is from 2006, before Amazon Web Services had built most of their managed services. In 2006, you had no choice but to manage your own database, data warehouse, and search cluster. If your server crashed in the middle of the night, you had to wake up and fix it. And you had to deal with these engineering problems in addition to building your business. Technology today evolves much faster than in 2006. That is partly because managed cloud services make operating a software company so much smoother. You can build faster, iterate faster, and there are fewer outages. If you are an insurance company or a t-shirt manufacturing company or an online education platform, software engineering is undifferentiated heavy lifting. Your customers are not paying you for your expertise in databases or your ability to configure load balancers. As a business, you should be focused on what the customers are paying you for, and spending the minimal amount of time on rebuilding software that is available as a commodity cloud service. Rich Archbold is the director of engineering at Intercom, a rapidly growing software company that allows for communication between customers and businesses. At Intercom, the engineering teams have adopted a philosophy called Run Less Software. Running less software means reducing choices among engineering teams, and standardizing on technologies wherever possible. When Intercom was in its early days, the systems were more heterogeneous. Different teams could choose whatever relational database they wanted–MySQL or Postgres. They could choose whatever key/value store they were most comfortable with. The downside of all this choice was that engineers who moved from one team to another team might not know how to use the tools at the new team they were moving to. After switching teams, you would have to figure out how to onboard with those new tools, and that onboarding process was time that was not spent on effort that impacted the business. By reducing the number of different choices that engineering teams have, and opting for managed services wherever possible, Intercom ships code at an extremely fast pace with very few outages. In our conversation, Rich contrasts his experience at Intercom with his experiences working at Amazon Web Services and Facebook. Amazon and Facebook were built in a time where there was not a wealth of managed services to choose from, and this discussion was a reminder of how much software engineering has changed because of cloud computing. To learn more about Intercom, you can check out the Inside Intercom podcast. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Nov 20, 201756 min

Ep 673Training the Machines with Russell Smith

If I am building a mobile app to play podcast episodes, and I make a change to the user interface, I want to have manual quality assurance (QA) testers run through tests that I describe to them, to make sure my change did not break anything. QA tests describe high level application functionality. Can the user register and log in? Can the user press the play button and listen to a podcast episode on my app? Unit tests are not good enough, because unit tests only verify the logic and the application state from the point of view of the computer itself. Manual QA tests ensure that the quality of the user experience was not impacted. With so many different device types, operating systems, and browsers, I need my QA test to be executed in all of the different target QA environments. This requires lots of manual testers. If I want manual testing for every deployment I push, that manual testing can get expensive. RainforestQA is a platform for QA testing that turns manual testing into automated testing. The manual test procedures are recorded, processed by computer vision, and turned into automated tests. RainforestQA hires human workers from Amazon Mechanical Turk to execute the well-defined manual tests, and the recorded manual procedure is used to train the machines that can execute the same task in the future. Russell Smith is the CTO and co-founder of RainforestQA, and he joins the show to explain how RainforestQA works: the engineering infrastructure, the process of recruiting workers from mechanical turk, and the machine learning system for taking manual tasks and automating them. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Nov 17, 20171h 3m

Ep 672High Volume Event Processing with John-Daniel Trask

A popular software application serves billions of user requests. These requests could be for many different things. These requests need to be routed to the correct destination, load balanced across different instances of a service, and queued for processing. Processing a request might require generating a detailed response to the user, or making a write to a database, or the creation of a new file on a file system. As a software product grows in popularity, it will need to scale these different parts of infrastructure at different rates. You many not need to grow your database cluster at the same pace that you grow the number of load balancers at the front of your infrastructure. Your users might start making 70% of their requests to one specific part of your application, and you might need to scale up the services that power that portion of the infrastructure. Today’s episode is a case study of a high-volume application: a monitoring platform called Raygun. Raygun’s software runs on client applications and delivers monitoring data and crash reports back to Raygun’s servers. If I have a podcast player application on my iPhone that runs the Raygun software, and that application crashes, Raygun takes a snapshot of the system state and reports that information along with the exception, so that the developer of that podcast player application can see the full picture of what was going on in the user’s device, along with the exception that triggered the application crash. Throughout the day, applications all around the world are crashing and sending requests to Rayguns servers. Even when crashes are not occurring, Raygun is receiving monitoring and health data from those applications. Raygun’s infrastructure routes those different types of requests to different services, queues them up, and writes the data to multiple storage layers–ElasticSearch, a relational SQL database, and a custom file server built on top of S3. John-Daniel Trask is the CEO of Raygun and he joins the show to describe the end-to-end architecture of Raygun’s request processing and storage system. We also explore specific refactoring changes that were made to save costs at the worker layer of the architecture. This is useful memory management strategy for anyone working in a garbage collected language. If you would like to see diagrams that explain the architecture and other technical decisions, the show notes have a video that explains what we talk about in this show. Full disclosure: Raygun is a sponsor of Software Engineering Daily. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Nov 16, 20171h 0m

Ep 671Fiverr Engineering with Gil Scheinfeld

As the gig economy grows, that growth necessitates innovations in the online infrastructure powering these new labor markets. In our previous episodes about Uber, we explored the systems that balance server load and gather geospacial data. In our coverage of Lyft, we studied Envoy, the service proxy that standardizes communications and load balancing among services. In shows about Airbnb, we talked about the data engineering pipeline that powers economic calculations, user studies, and everything else that requires a MapReduce. In today’s episode, we explore the business and engineering behind another online labor platform: Fiverr. Fiverr is a marketplace for digital services. On Fiverr, I have purchased podcast editing, logo creation, music lyrics, videos, and sales leads. I have found people who will work for cheap, and quickly finish a job to my exact specification. I have discovered visual artists who worked with me to craft a music video for a song I wrote. Workers on Fiverr post “gigs”–jobs that they can perform. Most of the workers on Fiverr specialize in knowledge work, like proofreading or gathering sales leads. The workers are all over the world. I have worked with people from Germany, the Philippines, and Africa through Fiverr. Fiverr has become the leader in digital freelancing. The staggering growth of Fiverr’s marketplace has put the company in a position similar to an early Amazon. There is room for strategic expansion, but there is also an urgency to improve the infrastructure and secure the market lead. Gil Scheinfeld is the CTO at Fiverr, and he joins the show to explain how the teams at Fiverr are organized to fulfill the two goals of strategic, creative growth and continuous improvement to the platform. One engineering topic we discussed at length was event sourcing. Event sourcing is a pattern for modeling each change to your application as an event. Each event is placed on a pub/sub messaging queue, and made available to the different systems within your company. Event sourcing creates a centralized place to listen to all of the changes that are occurring within your company. For example, you might be working on a service that allows a customer to make a payment to a worker. The payment becomes an event. Several different systems might want to listen for that event. Fiverr needs to call out to a credit card processing system. Fiverr also needs to send an email to the worker, to let them know they have been paid. Fiverr ALSO needs to update internal accounting records. Event sourcing is useful because the creator of the event is decoupled from all of the downstream consumers. As the platform engineering team works to build out event sourcing, communications between different service owners will become more efficient. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Nov 15, 201756 min

Ep 670Serverless Event-Driven Architecture with Danilo Poccia

In an event driven application, each component of application logic emits events, which other parts of the application respond to. We have examined this pattern in previous shows that focus on pub/sub messaging, event sourcing, and CQRS. In today’s show, we examine the intersection of event driven architecture and serverless architecture. Serverless applications can be built by combining functions-as-a-service (like AWS Lambda) together with backend as a service tools like DynamoDB and Auth0. Functions-as-a-service give you cheap, flexible, scalable compute. Backend as a service tools give you robust, fault-tolerant tools for managing state. By combining these sets of tools, we can build applications without thinking about specific servers that are managing large portions of our application logic. This is great–because managing servers and doing load balancing and scaling is painful. With this shift in architecture, we also have to change how data flows through our applications. Danilo Poccia is the author of AWS Lambda In Action, a book about building event-driven serverless applications. In today’s episode, Danilo and I discuss the connection between serverless architecture and event driven architecture. We start by reviewing the evolution of the runtime unit–from physical machines to virtual machines to containers to functions as a service. Then, we dive into what it means for an application to be “event driven.” We explore how to architect and scale a serverless architecture, and we finish by discussing the future of serverless–how IoT and edge computing and on-premise architectures will take advantage of this new technology. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Nov 14, 201755 min

Ep 669BigQuery with Jordan Tigani

Large-scale data analysis was pioneered by Google, with the MapReduce paper. Since then, Google’s approach to analytics has evolved rapidly, marked by papers such as Dataflow and Dremel. Dremel combined a column-oriented, distributed file system with a novel way of processing queries. A single Dremel query is distributed into a tree of servers, starting with the root server, splitting into the intermediate servers, and ending with the leaf servers talking to the file system. Once the data is pulled from the file system into the leaves, the data propagates back to the root server, and is shuffled along the way so that the root server receives a sorted response. When Google started turning its internal services into customer-facing cloud products, the effort to productize Dremel began, and BigQuery was born. Jordan Tigani is an engineering lead who works on BigQuery, and he joins the show to discuss the evolution of the data warehouse. Large scale distributed queries still can take a long time–but queries get faster every year. Queries that required a nightly Hadoop job 10 years ago can be viewed in a frequently updated user-facing dashboard. Power users of BigQuery talk about the speed and the query interface as being two of its most valuable differentiating features. As the job of a large scale data analyst becomes less technically intensive, tools like BigQuery will continue to rise in popularity. We have done some great shows about Google papers like Spanner, Dremel, and Dataflow. To find these old episodes, you can download the Software Engineering Daily app for iOS and for Android. In other podcast players, you can only access the most recent 100 episodes. With these apps, we are building a new way to consume content about software engineering. They are open-sourced at github.com/softwareengineeringdaily. If you are looking for an open source project to get involved with, we would love to get your help. Shout out to today’s featured contributor Shreyans Sheth. Shreyans has worked on the Software Engineering Daily search API, and has also helped us understand open source best practices, which we are still learning. Thanks again Shreyans for your work. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Nov 13, 201739 min

Ep 668Legal Technology with Justin Kan

Justin Kan has been building startups for a decade, and in that time he has interacted with lots of lawyers. From incorporation to fundraising to selling his company Twitch, the interactions with lawyers consistently seemed less transparent and less efficient than would be optimal. For an engineer like Justin, the natural inclination here was to build software and sell it to lawyers. But there would be so much resistance–you would have to convince the lawyers to change their pricing model to fixed-pricing, which would give them the incentive to buy software and work more efficiently. Instead, Justin teamed up with a few entrepreneurial lawyers who were willing to start a new law firm from scratch, and use software on day 1. The software company is called Atrium Legal Technology Services (or Atrium LTS for short), and the law firm that uses the software is Atrium LLP. Both of these companies are very new, and were publicly announced a few months ago. The two companies work side-by-side in undecorated office in downtown San Francisco. When I took the elevator up to see the company, the elevator doors opened and revealed two paper signs pointing to opposite ends of the office. On the Atrium LTS side of the office, engineers were writing software to extract the meaning from documents. Today, lawyers at old law firms are paid hundreds of dollars an hour to fill in document templates by editing a text document. As the Atrium LTS software gets better, document preparation will be done through web applications, with the variable names disambiguated from the parts of the document that never change from client to client. On the other side of the office sat Atrium LLP. The legal team was dressed a little more formally than their engineer counterparts, but there was nothing close to the formality of a traditional Silicon Valley law firm. Far from the decor of a Menlo Park law firm, the office space was actually more spartan than most well-funded startups, signaling to the employees that this is an unproven business strategy, and there is a ton of work to be done to validate it. This sentiment was echoed in my conversation with Justin. It’s possible (even plausible) that Atrium LLP could become the biggest law firm in the world, but the road to getting there will take patience and steady execution. I enjoyed hearing Justin explain the motivation for starting Atrium LTS, and look forward to covering the company in the future. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Nov 10, 201758 min