
Software Engineering Daily
2,188 episodes — Page 28 of 44
Ep 930Mapillary: Computer Vision Crowdsourcing with Peter Neubauer
Mapillary is a platform for gathering photos taken by smartphones and using that data to build a 3D model of the world. Mapillary’s model of the world includes labeled objects such as traffic signs, trees, humans, and buildings. This 3D model can be explored much like you can explore Google Street view. The data set that underlies Mapillary is crowdsourced from volunteer users who are taking pictures from different vantage points. These smartphone photos are uploaded to Mapillary, queued, and processed to constantly update and refine the Mapillary model. Mapillary processes high volumes of photos from around the world. The images in these photos need to be correctly fit into Mapillary’s model of the world like a puzzle piece sliding into place. The image needs to be segmented into the different entities within, and those entities need to be put through object recognition algorithm. When two pictures have a conflict, that conflict needs to be resolved. Mapillary is full of interesting engineering problems. The high volume of images and the level of processing has created the need for a unique sequence of indexing, queueing, and distributed processing using Apache Storm. In addition to processing all of this data and building a 3-D model, Mapillary serves an API for querying geolocations about traffic signs, road conditions, and bus stops. Peter Neubauer is the co-founder of Mapillary, and is also a co-founder of Neo Technology, the company behind Neo4j. Peter is a world-class engineer and he joins the show to give a detailed overview of the technology behind Mapillary, from ingressing the photos to running data engineering jobs to serving the API.
Ep 928Digital Privacy with Aran Khanna
When Aran Khanna was a college student, he accepted an internship to work at Facebook. Even before his internship started, he started playing around with Facebook’s APIs and applications. Aran built a Chrome extension called Marauder’s Map, which used Facebook Messenger’s web APIs to track where people lived, what their schedule was, and other highly sensitive information. These were not public features of Messenger, but Aran was able to reverse engineer the APIs. As a result, of making Marauder’s Map, Aran’s invitation to work at Facebook was retracted. Aran remained curious about the norms of publicly available social network data, and the second order data sets that could be built on top. Out of this curiosity, Aran created a tool called Money Trail, which used public Venmo data to model a graph of how users were paying each other. Aran showed for a second time that data that seems innocent to share can be repurposed to identify, classify, and incriminate users. Developers of these online applications face tradeoffs between privacy, convenience, and security. By interacting with these applications, we generate data that suggests how we think, what we like to do, and who we are affiliating with. Google and Facebook probably understand you better than you understand yourself. Aran Khanna previously was on the show to talk about machine learning at the edge. At the time he worked at Amazon Web Services. He now works as a digital privacy researcher. His background in machine learning makes him well-equipped to talk through the subtleties of modern digital privacy. In this show, Aran returns to talk through the finer points of privacy, data, and artificial intelligence.
Ep 926Airbnb Engineering with Surabhi Gupta
Airbnb began in 2008 as a monolithic Rails application serving the simple purpose of listing homes for rental. Over time, the number of listings increased dramatically, as did the number of people who were renting. With that scale, the Rails app had to be broken into different services, and entire teams were built out to focus on challenges such as pricing, application infrastructure, and search. Surabhi Gupta joined in 2013 to work on the search team, and has worked on different teams at Airbnb over time. Today she is a director of engineering leading the Homes business for Airbnb, which includes Growth, Search, Hosts, Pricing, and Business Travel. Surabhi has helped scale Airbnb through a hypergrowth period, and joins the show to share those experiences. One distinct area that we spent time on was Airbnb’s search engine. Surabhi formerly worked at Google, and she described how the engineering problem of a search engine for homes differs from a general purpose search engine like Google.
Ep 925Monolith Migration with Jan Schiffman and Sherman Wood
We previously released this episode with the wrong audio file and are re-releasing it on a weekend. TIBCO was started in the 90’s with a popular message bus product that was widely used by finance companies, logistics providers, and other systems with high throughput. As TIBCO grew in popularity, the company expanded into other areas through products it developed in-house as well as through acquisitions. One acquisition was Jaspersoft, a business intelligence data platform. When TIBCO acquired Jaspersoft in 2014, the architecture was a monolithic Java application. Around this time, customer use cases were shifting from centralized reporting to real-time, embedded visualizations. The use case of the Jaspersoft software was becoming less centralized and less monolithic and the software architecture needed to change in order to reflect that. Jan Schiffman is a VP of engineering at TIBCO and Sherman Wood is a director at TIBCO. They join the show to discuss the process of migrating a large Java monolith to a composable set of services. Breaking up a monolith is not an easy process–nor is it something that every company should do just because they have a monolith. In some cases, a monolith is just fine. Jan and Sherman explain why the business use case for why the Jaspersoft monolith needed to be refactored, and their approach to the refactoring. We also talk through the modern use cases of embedded analytics and the interaction between business analysts and data engineers. At a higher level, we discuss the lessons they have learned from managing a large, complex refactoring. Full disclosure: TIBCO is a sponsor of Software Engineering Daily.
Ep 923Scalyr: Column-Oriented Log Management with Steve Newman
Log messages are fast, high volume, unstructured data. Logs are often the source of metrics, alerts, and dashboards, so these critical systems are downstream from a log management system. A log management system needs to be highly available, so that a failure in one part of your system will not be correlated with failure of the log management system. Users of a log management system are often building tools based off of the query engine of that log management system. For example, I might build a dashboard that gives me a line graph representing the number of times a certain log message is alerting me due to a memory warning. I write a query to return the instances of these memory warnings, and my line graph is a visual representation that query. A log management system needs to be able to quickly serve users that are querying their logs–whether for dashboards or for ad-hoc queries. When logs are ingested by a log management system, the logs get parsed in a way that can bring some structure to the blob of text that is a raw log message. Some log management systems will then add the log message to an index. An index can allow for very fast lookups of particular types of queries. But an index also has certain constraints–such as processing regular expression queries. Steve Newman is the CEO and founder of Scalyr, a log management system that uses a column-oriented data storage system instead of the more conventional index-based log management systems. Today’s episode is a great case study in distributed systems tradeoffs. Steve talks in great detail about how Scalyr maintains high uptime, and its system for ingesting logs and serving queries.
Ep 921Database Performance and Optimization with Andrew Davidson
When a database gets large, it can start to perform poorly. This can manifest in slow query speed. You can speed up a query by defining an index, which is a data structure that allows for faster access to the data that is being indexed. As a consequence, whenever you update the database, you will now need to update the index with that new piece of data. The more you index your data, the faster the access time. In order to have more indexes you must pay a write penalty in order to maintain consistency around that data, since the indexes need to be updated with each new entry. This illustrates one simple tradeoff that a developer can make within a database deployment. Why are there so many different databases in the world? Why do we need SQL databases like Postgres, document databases like MongoDB, key/value systems like Cassandra, and search systems like Elasticsearch? Because each of these each system optimizes for different sets of tradeoffs. Tradeoffs can affect the speed of a read, the speed of a write, the user experience, the consistency of data, and the cost of running the database. Andrew Davidson is the lead product manager of MongoDB Atlas. Andrew joins the show to talk about how database performance can degrade when a database gets large, and how to measure and optimize performance of a critical database. Andrew explores the range of distributed systems cases–from a single node database to a multi-geographic distribution of nodes around the world, and describes how the configuration of a database in the cloud can help or hurt the application that the database is serving. Full disclosure: MongoDB is a sponsor of Software Engineering Daily.
Ep 919Cursor: Data Collaboration with Adam Weinstein
Linkedin is an organization with thousands of employees. An enterprise of that size starts to develop problems with data collaboration. Data collaboration is the process of sharing and analyzing data with multiple users, such as data scientists, business analysts, and engineers. How do data scientists know what questions to ask? How do business analysts know the right way to query a database? How does a data engineer even find where the right database is within the company infrastructure? And how can these different users share information with each other so that redundant work is avoided? When Adam Weinstein was at Linkedin, he saw these problems firsthand. The process of accessing and utilizing data felt slow and broken. Engineers were searching through a company wiki to find out how to leverage data, and the wiki was often out of date. When an engineer would leave the company, there was not a durable, institutional memory of how that engineer worked with data. Adam used this experience as inspiration for Cursor, a tool for data collaboration. Cursor allows different users in the data pipeline to share data sets, queries, access patterns, and comments about data within a company. Cursor is used by Linkedin, Slack, Apple, and other companies. Adam is the CEO of Cursor, and he joins the show for an interview about the problems and opportunities of data collaboration.
Ep 917Kotlin Design with Andrey Breslav
Kotlin is a statically typed programming language that started as a JVM language. It gained popularity because it reduces the amount of boilerplate code required for a typical Java project. Many of the early adopters of Kotlin were building Android apps or Java applications, but it has grown to a variety of use cases including at companies like Uber, Pinterest, and Atlassian. Andrey Breslav is the lead language designer of Kotlin at JetBrains. He joins the show to describe the original goals of Kotlin, the compilation path of the language, and how it has moved beyond its days of only running on the JVM.
Ep 915Continuous Integration in Open Source with Oren Novotny
Open source software is key to our software infrastructure. Closed source enterprises rely on open source software, but the development processes for closed source and open source software are often different in their approach to continuous integration and delivery. Oren Novotny is a chief architect of DevOps and modern software at BlueMetal Architects where he works with a variety of clients to build products and internal applications. Oren spends lots of time developing open source software for his job as well as during his spare time. He’s been in the software industry for more than 15 years, and has a wide breadth of insights from different businesses in how they apply software. We started the conversation talking about electronic trading companies, which in some ways operate like large enterprises and in other ways operate like startups. Oren described working in the financial industry through the 2008 crisis, then switching industries to work at Microsoft, before coming to BlueMetal Architects. We then discussed the process of setting up continuous integration for an open source project–including the difficulties and the large benefits for adding continuous integration to an open source project.
Ep 913Prisma: GraphQL Infrastructure with Soren Bramer Schmidt
GraphQL allows developers to communicate with all of their different data backends through a consistent query interface. A GraphQL query can be translated into queries to MySQL, MongoDB, ElasticSearch, or whatever kind of API or backend is needed to fulfill the GraphQL query. GraphQL users need to set up a GraphQL server to fulfill this query federation. Prisma is a tool for automatically generating a GraphQL API and serving GraphQL queries. The developer defines a data model and deploys with Prisma. Prisma generates the necessary GraphQL infrastructure to serve queries from the developer’s database. This can allow the developer to get up and running faster than they would setting up GraphQL infrastructure and defining the middleware query layer by hand. Prisma is an open source project, but it is also a company. The opportunities to build a business around a GraphQL infrastructure layer are numerous. In recent episodes, we have explored the complexities of the “data platform.” From newer companies like Uber to older companies like Procter and Gamble, engineers are struggling to find and access their data sources. Data engineers and data scientists spend months configuring their infrastructure to connect to BI tools and run distributed queries. GraphQL could simplify data platforms by providing a unified, standardized layer. At this layer, you could also offer caching, virtual data sets, and crowdsourced queries from across the company. Soren Bramer Schmidt is the CTO and co-founder of Prisma, and he joins the show to discuss why GraphQL has become so popular, how Prisma works, and the opportunities to build developer tooling around GraphQL.
Ep 911Android Things with Wayne Piekarski
Internet of Things is a concept that describes lots of devices that you interact with regularly being connected to the Internet and networked together. Technologists have been dreaming of the world of IoT for many years, where our connected refrigerator can detect that we are out of food, and automatically order more food. Or our connected bathroom can scan us for diseases and recommend treatment. The bright future of IoT is slowly coming together. Hardware prototyping is getting cheaper. Voice interfaces and machine learning are creating new mediums for communicating with devices. Platforms like Kickstarter are allowing developers to validate the market for their products and raise the necessary capital to build their product. Android Things is a developer platform for IoT applications based on the Android Operating system. Android Things consists of hardware devices and software tools that reduce common IoT problems such as software updates and security patches. Wayne Piekarski is a staff developer advocate at Google, and he joins the show to talk about the state of IoT and why Google built Android Things.
Ep 909JavaScript Engines with Mathias Bynens
JavaScript performance has improved over time due to advances in JavaScript engines such as Google’s V8. A JavaScript engine performs compiler optimization, garbage collection, hot code management, caching, and other runtime aspects that keep a JavaScript program running efficiently. JavaScript runs in browsers and servers. The resources that are available to a JavaScript engine vary widely across different machines. JavaScript code is parsed into an abstract tree before being handed off to the compiler toolchain, in which one or more optimizing compilers produce efficient low-level code. In recent shows about WebAssembly, we have covered compiler pipelines. In an episode about GraalVM, we explored the impact that “code shape” has on the efficiency of JavaScript execution. Mathias Bynens is a developer advocate at Google working on the V8 JavaScript engine team. In today’s show we explore how a JavaScript engine works, and how compiler toolchains can adapt the hot code paths depending on what code needs to be optimized for.
Ep 907Unity and WebAssembly with Brett Bibby
Unity is a game engine for building 2-D and 3-D experiences, augmented reality, movies, and other applications. Unity is cross-platform, so that applications can be written once and deployed to iOS, Android, web, and other surfaces. Unity has been around for 13 years, and has grown in popularity with the rise in gaming and game development. Brett Bibby is VP of engineering at Unity, and he joins the show to describe how Unity applications are built. Since Unity SDKs allow Unity code to run across all the different platforms, this requires writing and maintaining native code libraries for each of these devices. When asm.js came out, Unity developers were able to deploy 3-D games to the web–these were some of the first examples of asm.js being used. Asm.js is a small, performant subset of JavaScript that other languages could compile down into. So in this case, Unity programs in C# were running in the browser after being compiled down into asm.js. Since then, WebAssembly has improved the tooling further, allowing a high-performance compilation path for non-JavaScript programs. After exploring the basics of Unity, Brett described how Unity works with WebAssembly, and the potential for creative applications of Unity both on and off the web.
Ep 905Front Engineering with Laurent Perrin
Front is a shared inbox application that has seen rapid adoption within companies. Front allows multiple members of a company to collaborate together on a conversation–whether that conversation is in email, Twitter, or Facebook Messenger. This is useful when a customer email needs to be shared between the sales and engineering teams, or when a single email address is shared between different members of the same team, such as “[email protected]”. This might sound like a niche problem, but it is actually a problem faced somewhere within every single company. Because the problem of shared inbox is so prevalent, the company has grown its user base quickly, scaling the team as well as the infrastructure. The sensitivity of the data (emails) that Front is handling means that security is paramount. And as users of Front rely on it more and more as a central point of communication, uptime and consistency needs to be maintained. Laurent Perrin is the CTO at Front, and he joins the show to describe the software architecture and product strategy for Front. It was a fascinating show, and we covered the full stack. On the backend, Front pulls emails into S3 buckets and maintains the schema of the inbox in a SQL database. The desktop Front client is written in Electron, which is a way to write desktop applications in HTML5, JavaScript, and CSS. We also talked about the system for keeping the communications “real-time”–it’s important that users are aware of what each other is doing, since you don’t want to be preparing a response to an email at the same time I am.
Ep 904Checkr: Background Check Platform with Tomas Barreto
Background checks are a routine part of the hiring process. After a potential employee has made it through job interviews, a background check is administered to look through the applicant’s work history, criminal record, and other available data. Conducting a conventional background check can require manual work–including phone calls for reference checks and going to a courthouse to look up physical records of a person’s criminal history. The on-demand economy has rapidly increased the volume of workers who are getting hired–and all of them need background checks. Lyft drivers, DoorDash food delivery people, Instacart shoppers–these on-demand workers are being trusted with our lives. We get into their cars, let them into our houses, and eat the food that they hand to us. We want some guarantees about their reputation. Checkr is a background check platform that allows companies to request background check services via API request. Checkr was started 4 years ago, and has benefitted from the growth of gig economy services like ridesharing and food delivery. Since the background check API product has found success, Checkr has raised additional capital and invested in other new products: a next-generation background check product based on machine learning, and a mobile app that allows people to instantly background check themselves and find jobs that align with the results of that background screening. Tomas Barreto is the VP of product and engineering at Checkr and he joins the show to describe how the core Checkr API product works, and the challenges of automating the background check process. We also explored the product development roadmap for Checkr, and the product opportunities that come from building within a specialized vertical such as background checks.
Ep 903Android on Chrome with Shahid Hussain and Stefan Kuhne
Google has two consumer operating systems: Android and Chrome. The Android operating system has been widely deployed on mobile devices. Chrome is an operating system for laptops and tablets, originally based around the Chrome browser. For several years, these two ecosystems were mostly separate–you could not run Android apps on a Chrome operating system. Shahid Hussain and Stefan Kuhne are engineers at Google who worked on support for Android apps on ChromeOS. The implementation of Android on Chrome involves running the Android OS in a Linux container on the host Chrome operating system. In today’s episode, Shahid and Stefan compare the Android and Chrome operating system platforms. They explain why Google has two different consumer operating systems, and the advantages of allowing Android apps to deploy to Chrome. Shahid and Stefan also talk about the challenges of porting mobile applications to ChromeOS. Android apps are made to run on small screens and tablets. In order to make them run on ChromeOS, the applications need to support running on a desktop or laptop.
Ep 902Kubernetes Distributions with Brian Gracely and Michael Hausenblas
Kubernetes is an open source container management system. Kubernetes is sometimes described as “the Linux of distributed systems” and this description makes sense: the large numbers of users and contributors in the Kubernetes community is comparable to the volume of Linux adopters in its early days. There are many different distributions of Linux: Ubuntu, Red Hat, Chromium OS. These different operating system distributions were created to fulfill different needs. Linux is used for Raspberry Pis, Android phones, and enterprise workstations. These different use cases require different configurations of an operating system. Similarly, there are different distributions of Kubernetes because there are different types of distributed systems. The internal infrastructure of a cloud provider might use one type of Kubernetes to serve users running application containers. A network of smart security cameras might be networked together with a different distribution of Kubernetes. Brian Gracely and Michael Hausenblas join the show today to discuss Kubernetes distributions. Brian and Michael work at Red Hat, which helps maintain the Origin Community Distribution of Kubernetes, which Red Hat OpenShift runs on. OpenShift is a platform as a service that enterprises use to deploy and manage their applications. Full disclosure: Red Hat is a sponsor of Software Engineering Daily.
Ep 901Continuous Delivery Pipelines with Abel Wang
Continuous integration and delivery allows teams to move faster by allowing developers to ship code independently of each other. A multi-stage CD pipeline might consist of development, staging, testing, and production. At each of these stages, a new piece of code undergoes additional tests, so that when the code finally makes it to production, the developers can be certain it won’t break the rest of the project. In a company, the different engineers working on a software project are given the permissions to ship code through a continuous delivery pipeline. Employees at a company have a strong incentive not to push buggy code to production. But what about open source contributors? What does the ideal continuous delivery workflow look like for an open source project? Abel Wang works on Azure Pipelines, a continuous integration and delivery tool from Microsoft. Azure Pipelines is designed to work with open source projects as well as companies. Abel joins the show to talk about using continuous integration and delivery within open source, and the process of designing a CI/CD tool that can work in any language and environment. Full disclosure: Microsoft is a sponsor of SE Daily.
Ep 900DEV Community with Ben Halpern
The DEV Community is a platform where developers share ideas, programming advice, and tools. Ben Halpern started it after running an extremely successful Twitter account creating humorous tweets for developers. One way to describe DEV Community is as a cross between Medium, Stack Overflow, and Reddit–but it has its own personality, so I recommend checking it out. The DEV Community was open sourced, and we discussed the challenges and the opportunities of having an open source social network. We also talked about his plans for the future, and where he is taking DEV Community. Ben is an entrepreneur who tries lots of different creative projects, so his perspective has always resonated with me. Ben has been on the show a few times before, when we talked about the state of developer media, side projects, and the identity of the software engineer. DEV Community was originally called Practical Dev, which I mistakenly referred to it as in the earlier parts of the show.
Ep 898Druid Analytical Database with Fangjin Yang
Modern applications produce large numbers of events. These events can be users clicking, IoT sensors accumulating data, or log messages. The cost of cloud storage and compute continues to drop, so engineers can afford to build applications around these high volumes of events, and a variety of tools have been developed to process them. Apache Kafka is widely used to store and queue these streams of data, and Apache Spark and Apache Flink are stream processing systems that are used to perform general purpose computations across this event stream data. Kafka, Spark, and Flink are great general purpose tools, but there is also room for a more narrow set of distributed systems tools to support high volume event data. Apache Druid is an open source database built for high performance, read only analytic workloads. Druid has a useful combination of features for event data workloads, including a column-oriented storage system, automatic search indexing, and a horizontally scalable architecture. Druid’s feature set allows for new types of analytics applications to be built on top of it, including search applications, dashboards, and ad-hoc analytics. Fangjin Yang is a core contributor to Druid and the CEO of Imply.io, a company that makes a storage, querying, and visualization tool build on top of Druid. He joins the show to talk about the architecture of Druid and his company Imply.
Ep 897Orchestrating Kubernetes with Chris Gaun
A company runs a variety of distributed systems applications such as Hadoop for batch processing jobs, Spark for data science, and Kubernetes for container management. These distributed systems tools can run on-prem, in a cloud provider, or in a hybrid system that uses on-prem and cloud infrastructure. Some enterprises use VMs, some use bare metal, some use both. Mesosphere is a company that was started to abstract the complexity of resource management away from the application developer. Instead of a developer managing virtual machines, provisioning cloud infrastructure, or wiring all that infrastructure together to run distributed applications, the developer spins up distributed applications like Kubernetes, Spark, or Jenkins on top of Mesosphere, and Mesosphere provisions the machines on the underlying infrastructure. Using Kubernetes on top of Mesos allows you to separate resource provisioning from the actual container orchestration. In a previous episode, we explored how Netflix uses Mesos with a container orchestrator on top to simplify the resource management of microservice application containers as well as data science workloads. Chris Gaun is a product manager at Mesosphere who helped build Kubernetes-as-a-service. In today’s show, he describes why it is useful to have separate layers for resource provisioning and container orchestration. He also talks about the difficulties of manually installing Kubernetes, and why Mesosphere built a Kubernetes-as-a-service product. Full disclosure: Mesosphere is a sponsor of Software Engineering Daily.
Ep 896Netflix Observability with Kevin Lew
Netflix users stream terabytes of data from the cloud to their devices every day. During a high bandwidth, long-lived connection, a lot can go wrong. Networks can drop packets, machines can run out of memory, and the Netflix app on a user’s device can have a bug. All of these events can result in a bad user experience. Other errors can occur that do not disrupt the user experience. Netflix runs thousands of machine learning jobs, logging servers, and other pieces of internal infrastructure. Customer service dashboards, CI/CD pipelines, and AB testing frameworks are all software built by Netflix–and when an error occurs in any of these places, engineers need to be able to diagnose and debug that error. Observability is the practice of using logs, monitoring, metrics, and distributed tracing to understand how a system is working. Kevin Lew is a senior software engineer at Netflix with the Edge Insights team. He joins the show to talk about adding observability across the microservices deployed at Netflix. We also talk about how to manage high volumes of logging data effectively using stream processing.
Ep 895Real Estate Machine Learning with Or Hiltch
Stock traders have access to high volumes of information to help them make decisions on whether to buy an asset. A trader who is considering buying a share of Google stock can find charts, reports, and statistical tools to help with their decision. There are a variety of machine learning products to help a technical investor create models of how a stock price might change in the future. Real estate investors do not have access to the same data and tooling. Most people who invest in apartment buildings are using a combination of experience, news, and basic reports. Real estate data is very different from stock data. Real estate assets are not fungible–each one is arguably unique from all others, whereas one share of Google stock is the same as another share. But there are commonalities between real estate assets. Just like collaborative filtering can be applied to find a new movie that is similar to the ones you have watched on Netflix, comparable analysis can be used to find an apartment building that is very similar to another apartment building which recently appreciated in asset value. Skyline.ai is a company that is building tools and machine learning models for real estate investors. Or Hiltch is the CTO at Skyline.ai and he joins the show to explain how to apply machine learning to real estate investing. He also describes the mostly serverless architecture of the company. This is one of the first companies we have talked to that is so heavily on managed services and functions-as-a-service.
Ep 894Kubernetes Continuous Deployment with Sheroy Marker
Engineering organizations can operate more efficiently by working with a continuous integration and continuous deployment workflow. Continuous integration is the process of automatically building and deploying code that gets pushed to a remote repository. Continuous deployment is the process of moving that code through a pipeline of environments, from dev to test to production. At each stage, the engineers feel increasingly safe that the code will not break the user experience. When a company adopts Kubernetes, the workflow for deploying software within that company might need to be refactored. If the company starts to deploy containers in production, and managing those containers using Kubernetes, the company will also want to have a testing pipeline that emulates the production environment using containers and Kubernetes. Sheroy Marker is the head of technology at ThoughtWorks products, where he works on GoCD, a continuous delivery tool. Sheroy joins the show to talk about how Kubernetes affects continuous delivery workflows, and the process of building out Kubernetes integrations for GoCD. We also discussed the landscape of continuous delivery tools–why there are so many continuous delivery tools, and the question of how to choose a continuous delivery product if you are implementing CD. Continuous delivery tooling is in some ways like the space of monitoring, logging, and analytics–there are lots of successful products in the market. Full disclosure: ThoughtWorks and GoCD are sponsors of Software Engineering Daily.
Ep 892Go To Market with Mitch Ferguson
Engineers need to have an awareness of the business model that allows their company to succeed. When a software company is going to market, the engineers need to work closely with the sales and marketing team to formulate a strategy for building and selling that product. This is especially true in highly technical products, such as database- or platform-as-a-service companies. An engineer at a Hadoop-as-a-service product needs to work with the sales and marketing team to explain why a customer might want a data platform. An engineer at a SaaS company needs to understand how the cost to provide a service might scale, so that the sales team can decide on appropriate pricing. Mitch Ferguson has been developing businesses at software companies since the 90s. He helped build out SpringSource and arrange the acquisition of SpringSource by VMWare–an acquisition that later enabled the creation of Pivotal Software. He then joined Hortonworks as an early member of the team bringing their Hadoop platform to market. Today Mitch works as a co-founder of Accel G2M, an organization that helps bring technology companies to market–building out their sales, marketing, product, and organizational strategies. Accel G2M
Ep 891Music Engineering with Dom Kane
For most of history, a typical musician would learn to play one specific instrument. As synthesizers became available to the public, it became commonplace for a musician to create their own instruments using hardware and software. By the early 2000s, digital audio workstation software allowed a musician with a laptop to have access to the tools of a record producer. These tools changed how music is made, and gave rise to new genres. Creating electronic music on the computer is a practice much like software engineering. Iteration, modularity, and software architecture skills are required to build a song intelligently. Music engineering also requires working at numerous levels of abstraction: the synthesizer level, the song arrangement level, the mixer level, and the design of melodies. Dom Kane is a musician and sound engineer who writes music for mau5trap, a label started by deadmau5. He has built software synthesizers, worked with numerous artists as a producer, and written music for film and TV. He joins the show to talk about working as a professional electronic musician. We also talk about the overlap between engineering and the different facets of crafting modern music on the computer.
Ep 889Faust: Streaming at Robinhood with Ask Solem
Robinhood is a platform for buying and selling stocks, cryptocurrencies, and other assets. Since its founding in 2013, Robinhood has grown to have more than 3 million user accounts, which is approximately the same as the popular online broker E-Trade. With the surge in user growth and transaction volume, the demands on the software infrastructure have increased significantly. When a user buys a stock on Robinhood, that transaction gets written to Kafka and Postgres. Multiple services get notified of the new entry on the Kafka topic, and those services process that new event using Kafka Streams. Kafka Streams are a way of reading streams of data out of Kafka with exactly-once semantics. Developers at Robinhood use a variety of languages to build services on top of these Kafka streams–including Python. Commonly used systems for building stream processing tasks on top of a Kafka topic include Apache Flink and Apache Spark. Spark and Flink let you work with large data sets while maintaining high speed and fault-tolerance. These tools are written in Java. If you want to write a Python program that interfaces with Apache Spark, you have to pay an expensive serialization/deserialization cost as you move that object between Python and Spark. Ask Solem is an engineer with Robinhood, and the author of Faust, a stream processing library that ports the ideas of Kafka Streams to Python. Faust provides stream processing and event processing in a manner that is similar to Kafka Streams, Apache Spark, and Apache Flink. He is also the author of the popular Celery asynchronous task queue. Ask joins the show to provide his perspective on large scale, distributed stream processing, and why he created Faust.
Ep 888Monolith Migration with Jan Schiffman and Sherman Wood
TIBCO was started in the 90’s with a popular message bus product that was widely used by finance companies, logistics providers, and other systems with high throughput. As TIBCO grew in popularity, the company expanded into other areas through products it developed in-house as well as through acquisitions. One acquisition was Jaspersoft, a business intelligence data platform. When TIBCO acquired Jaspersoft in 2014, the architecture was a monolithic Java application. Around this time, customer use cases were shifting from centralized reporting to real-time, embedded visualizations. The use case of the Jaspersoft software was becoming less centralized and less monolithic and the software architecture needed to change in order to reflect that. Jan Schiffman is a VP of engineering at TIBCO and Sherman Wood is a director at TIBCO. They join the show to discuss the process of migrating a large Java monolith to a composable set of services. Breaking up a monolith is not an easy process–nor is it something that every company should do just because they have a monolith. In some cases, a monolith is just fine. Jan and Sherman explain why the business use case for why the Jaspersoft monolith needed to be refactored, and their approach to the refactoring. We also talk through the modern use cases of embedded analytics and the interaction between business analysts and data engineers. At a higher level, we discuss the lessons they have learned from managing a large, complex refactoring. Full disclosure: TIBCO is a sponsor of Software Engineering Daily.
Ep 887RideOS: Fleet Management with Rohan Paranjpe
Self-driving transportation will be widely deployed at some point in the future. How far off is that future? There are widely varying estimations: maybe you will summon a self-driving Uber in a New York within 5 years, or maybe it will take 20 years to work out all of the challenges in legal and engineering. Between now and the self-driving future, there will be a long span of time where cars are semi-autonomous. Maybe your car is allowed to drive itself in certain areas of the city. Maybe your car can theoretically drive itself in 99% of conditions, but the law requires you to be behind the wheel until the algorithms get just a little bit better. While we wait for self-driving to be widely deployed to consumers, a lot could change in the market. We know about Uber, Lyft, Waymo, Tesla and Cruise. But what about the classic car companies like Ford, Mercedes Benz, and Volkswagen? These companies are great at making cars, and they have hired teams of engineers working on self-driving. But self-driving functionality is not the only piece of software you need to compete as a transportation company. You also need to build a marketplace for your autonomous vehicles, because in the future, far fewer people will want to own a car. Customers will want to use transportation as a service. RideOS is a company that is building fleet management and navigation software. If you run a company that is building autonomous cars, you need to solve the problem of making an autonomous, safe robot that can drive you around. Building an autonomous car is hard, but to go to market as a next-generation transportation company, you also need fleet management software, so you can deploy your cars in an Uber-like transportation system. And you need navigation software so that your cars know how to drive around. RideOS lets a car company like Ford focus on building cars by providing a set of SDKs and cloud services for managing and navigating fleets of cars. Rohan Paranjpe joins today’s show to talk about the world of self-driving cars. Rohan worked at Tesla and Uber before joining RideOS, so he has a well-informed perspective on a few directions the self-driving car market might go in.
Ep 885Kubernetes Continuous Deployment with Sheroy Marker
Engineering organizations can operate more efficiently by working with a continuous integration and continuous deployment workflow. Continuous integration is the process of automatically building and deploying code that gets pushed to a remote repository. Continuous deployment is the process of moving that code through a pipeline of environments, from dev to test to production. At each stage, the engineers feel increasingly safe that the code will not break the user experience. When a company adopts Kubernetes, the workflow for deploying software within that company might need to be refactored. If the company starts to deploy containers in production, and managing those containers using Kubernetes, the company will also want to have a testing pipeline that emulates the production environment using containers and Kubernetes. Sheroy Marker is the head of technology at ThoughtWorks products, where he works on GoCD, a continuous delivery tool. Sheroy joins the show to talk about how Kubernetes affects continuous delivery workflows, and the process of building out Kubernetes integrations for GoCD. We also discussed the landscape of continuous delivery tools–why there are so many continuous delivery tools, and the question of how to choose a continuous delivery product if you are implementing CD. Continuous delivery tooling is in some ways like the space of monitoring, logging, and analytics–there are lots of successful products in the market. Full disclosure: ThoughtWorks and GoCD are sponsors of Software Engineering Daily.
Ep 884DataOps with Christopher Bergh
Every company with a large set of customers has a large set of data–whether that company is 5 years old or 50 years old. That data is valuable whether you are an insurance company, a soft drink manufacturer, or a ridesharing company. All of these large companies know that their data is valuable, but some of them are not sure how to standardize the access patterns of that data, or build a culture around data. The larger the company is, the more the data is spread throughout the company, and the more heterogeneous the data sources are. Older companies often have older pieces of data infrastructure, and it might not be well documented. It is hard to make data driven decisions when an organization cannot effectively query their own data. For example, consider a simple question about marketing. An insurance company wants to know how their spending on TV advertising correlates with sales in California over the last 25 years. The VP of marketing sends an email to a business analyst, asking for a historical report of this marketing data. The business analyst knows how to present the data with a business intelligence tool, but the analyst needs to ask the data scientist for how to make that query. The data scientist needs to ask the data engineer where to find those records in a large Hadoop distributed file system cluster. And the data engineer joined the company last week and has no idea where anything is. These are the problems of DataOps. Similarly to DevOps, DataOps is the recognition that a set of problems have crept into organizations over time and slowed down productivity. The story of the DevOps movement is that old infrastructure, lack of testing, and complicated monolithic backends slowed down everyone in an old, big enterprise. The slow pace of change destroys morale and erodes trust. The DevOps movement is about revamping organizations through tooling and organizational behavior. We have covered this in lots of episodes, such as in a great episode with Gene Kim who wrote “The Phoenix Project.” When an organization wants to reinvent itself with DevOps, it often begins with testing and continuous delivery. DataOps encourages data driven organizations to begin with a similar practice of testing their data pipelines to build trust and evolve best practices. There are other similarities between DataOps and DevOps, such as continuous delivery and the breaking down of siloes between different organizational roles. Chris Bergh joins the show to talk about the data problems encountered by large companies, the practices of DataOps, and his company Data Kitchen, which builds tools to help companies move towards more productive data practices.
Ep 883Android Slices with Jason Monk
The main user interfaces today are the smartphone, the laptop, and the desktop computer. Some people today interact with voice interfaces, augmented reality, virtual reality, and automotive computer screens like the Tesla. In the future, these other interfaces will become more common. Developers will want to be able to expose their applications to these new interfaces. For example, let’s say I am a developer who builds a podcast playing app. I have a website and a mobile app, but what if I want to expose that app to a voice interface? Or, what if I want to expose a specific piece of functionality from that app, to make shortcuts easier? Android Slices are user interface components that expose pieces of application functionality to Google Search, Google Assistant, and other applications. Jason Monk is a software engineer who works on Android Slices at Google. Jason joins the show to discuss how mobile user interfaces are changing, the motivation behind Android Slices, and the engineering behind this newer building block for Android developers.
Ep 882Helm with Michelle Noorali
Back in 2014, platform-as-a-service was becoming an increasingly popular idea. The idea of PaaS was to sit on top of infrastructure-as-a-service providers like Azure, AWS, or Google Cloud, and simplify some of the complexity of these infrastructure providers. Heroku had built a successful businesses from the idea of platform-as-a-service, and there was a widely held desire in the developer community to have an “open source Heroku.” One project that was working towards the idea of an open source platform-as-a-service was Deis. Deis made it easier for people to deploy and manage their applications, and it simplified some of the hard parts of container management. When Kubernetes came out, Deis got refactored to use Kubernetes under the hood for container orchestration. Deis was one of the first projects to use Kubernetes as a tool to build a platform-as-a-service, and the team that was working on Deis got very early exposure to the process of building a platform on top of Kubernetes. Michelle Noorali was one of the engineers on the Deis team. When Deis got acquired by Microsoft, Michelle was working on Helm, a package manager for distributed systems. Helm allows developers to deploy distributed applications on top of Kubernetes more easily. A few examples of distributed applications that can be deployed using Helm are Kafka, Prometheus, and IPFS. One reason Helm is so useful is that distributed systems are notoriously hard to configure and run. Since joining Microsoft, Michelle has continued to work on Helm. She is also a member of the Kubernetes Steering Committee and the board of the CNCF. Michelle joins the show to talk about her early experiences building PaaS and her perspective on the Kubernetes ecosystem. Full disclosure: Microsoft is a sponsor of Software Engineering Daily.
Ep 881StitchFix Engineering with Cathy Polinsky
Stitch Fix is a company that recommends packages of clothing based on a set of preferences that the user defines and updates over time. Stitch Fix’s software platform includes the website, data engineering infrastructure, and warehouse software. Stitch Fix has over 5000 employees, including a large team of engineers. Cathy Polinsky is the CTO of Stitch Fix. In today’s show Cathy describes how the infrastructure has changed as the company has grown–including the process of moving the platform from Heroku to AWS, and the experience of scaling and refactoring a large monolithic database. Cathy also talked about the management structure, the hiring process, and engineering compensation at Stitch Fix.
Ep 880OLIO: Food Sharing with Lloyd Watkin
Food gets thrown away from restaurants, homes, catering companies, and any other place with a kitchen. Most of this food gets thrown away when it is still edible, and could provide nutrition to someone who is hungry. Just like Airbnb makes use of excess living capacity, OLIO was started to connect excess food with people who want to eat that food. There are numerous challenges with this idea. How do you control quality and ensure the food is safe? How do you make money as a business? How do you solve the chicken and egg problem, and make sure that you get hungry users and people with food to give away at the same time? Lloyd Watkin is a software engineer at OLIO, and he joins today’s episode to describe how the platform works, how it is built, and how the company plans to scale their large base of volunteers. It’s a fascinating set of operational and engineering issues.
Ep 879Build Faster with Nader Dabit
Building software today is much faster than it was just a few years ago. The tools are higher level, and abstract away tasks that would have required months of development. Much of a developer’s time used to be spent optimizing databases, load balancers, and queueing systems in order to be able to handle the load created by thousands of users. Today, scalability is built into much of our infrastructure by default. We have had several years of infrastructure with automatic scalability, and some of the more recent advances in developer tooling are about convenience, and faster development time. Developers are spending less time dealing with the ambiguous idea of a “server” and more time interacting with well-defined APIs and data sources. A few examples are AppSync from Amazon Web Services and Firebase from Google. These tools are like databases with rich interactive functionality. Instead of having to create a server to listen to a database for changes and push notifications to users in response to those changes, AppSync and Firebase can be programmed to have this kind of functionality built in. There are many other examples of high level APIs, rich backends, and developer productivity tools that lead to shorter development time. What does this mean for developers? It means we can build much faster. We can prototype quickly for low amounts of money–without sacrificing quality. We can spend more time focusing on design, user experience, and business models and less time focusing on keeping the application up and running. Nader Dabit is a developer advocate at Amazon Web Services, and he returns to the show to discuss modern tooling, and how that tooling changes the potential for high output and fast iteration among developers. It is a strategic, philosophical discussion of how to build modern software.
Ep 878WebAssembly Engineering with Ben Smith and Thomas Nattestad
WebAssembly allows developers to run any language in a sandboxed, memory controlled module that can be called via well-defined semantics. As we have discussed in recent episodes with Lin Clark and Steve Klabnik from Mozilla, WebAssembly is changing application architectures both in and outside the browser. WebAssembly is being adopted by all of the major browser vendors, including Google. Today’s guests are Thomas Nattestad and Ben Smith from Google. Thomas is the PM for V8, WebAssembly, Storage, and Games on the web and Ben is a software engineer on the Chrome team. Ben and Thomas talk about the state of WebAssembly, what the different browser manufacturers are doing, and some cool uses for WebAssembly–from games to CDNs to cryptocurrency infrastructure. As in the previous episodes, there was discussion of WebAssembly’s security and memory benefits from being a bounded context.
Ep 877WebAssembly Future with Steve Klabnik
WebAssembly is a low-level compilation target for any programming language that can be interpreted into WebAssembly. Alternatively, WebAssembly is a way to run languages other than JavaScript in the browser. Or, yet another way of describing WebAssembly is a virtual machine for executing code in a low level, well-defined sandbox. WebAssembly is reshaping what is possible to do in the web browser. A developer can write a program in Rust or C++, compile it down into a WebAssembly module, and call out to that module via JavaScript. This is very useful for running memory-sensitive workloads in the browser—such as 3-D games. But WebAssembly is also useful for running modules outside of the browser. Why is that important? If you can already run C++ or Rust code outside of the browser by executing the program from the command line, why would you want to put the code into a WebAssembly module before executing it? One answer is security. WebAssembly modules have well-defined semantics for what memory they can access. WebAssembly could provide more reliable sandboxing for untrusted code. Steve Klabnik is a software engineer at Mozilla, and he joins the show today to play the role of a WebAssembly futurist. We revisit the basics of WebAssembly and the current state of the technology. Steve talks Steve also describes the lessons of past web technologies such as Flash—and what they did right and wrong. We also explore the current and future applications of WebAssembly, which we will talk about in more detail in tomorrow’s episode.
Ep 876DoorDash Engineering with Raghav Ramesh
DoorDash is a logistics company that connects customers, restaurants, and drivers that can move food to its destination. When a customer orders from a restaurant, DoorDash needs to identify the ideal driver for picking up the order from the restaurant and dropping it off with the customer. This process of matching an order to a driver takes in many different factors. Let’s say I order spaghetti from an Italian restaurant. How long does the spaghetti take to prepare? How much traffic is there in different areas of the city? Who are the different drivers who could potentially pick the spaghetti up? Are there other orders near the Italian restaurant, that we could co-schedule the spaghetti delivery with? In order to perform this matching of drivers and orders, DoorDash builds machine learning models that take into account historical data. In today’s episode, Raghav Ramesh explains how DoorDash’s data platform works, and how that data is used to build machine learning models. We also explore the machine learning model release process—which involves backtesting, shadowing, and gradual rollout.
Ep 875Casa: Crypto Wallet Security with Jameson Lopp
Cryptocurrency security is a concern to anyone who has a significant amount of money in the form of Bitcoin, Ethereum, or other crypto assets. Most Bitcoin is held in either a Bitcoin wallet or a Bitcoin bank. Your Bitcoin holdings are recorded on a public ledger. You access these holdings by authenticating with your private key. A Bitcoin wallet could be described more accurately as a Bitcoin keyring. Securing your Bitcoin wallet is about securing that private key. Just as there are many different ways to secure any individual piece of text, there are many ways to secure a Bitcoin private key. A Bitcoin “bank” is a term that can be used to describe institutions such as Coinbase. Coinbase takes the technology of the Bitcoin wallet and wraps it in additional layers of security, identity, and failover that we associate with banks and large technology companies. By using a Bitcoin bank, you sacrifice the autonomy of managing your own private key. On the bright side, you don’t have to manage your own private key. If you lose your Coinbase password, there are plenty of ways to recover it. A Bitcoin bank gives you the downsides and the upsides of working with a centralized service provider. Jameson Lopp is a cypherpunk and cryptocurrency engineer at Casa. Casa is a company that is building long-term cryptocurrency storage and secure key infrastructure. In this episode, we explore how Bitcoin wallets work, how to secure them, the common threats, scams and hacking attempts of Bitcoin, and what he is working on at Casa.
Ep 874Infrastructure Monitoring with Mark Carter
At Google, the job of a site reliability engineer involves building tools to automate infrastructure operations. If a server crashes, there is automation in place to create a new server. If a service starts to receive a high load of traffic, there is automation in place to scale up the instances of that service. In order to create an automated response to an infrastructure problem, a site reliability engineer needs insights into that infrastructure. Every service needs tools around monitoring, alerting, debugging, and distributed tracing. One benefit of working at a large company like Google is that an engineer building a new product gets this kind of tooling by default. If I am hacking on a project at home, I have to set up all kinds of tools to help me diagnose and resolve problems. Setting up this tooling takes time, and requires expertise. Stackdriver is a set of tools and instrumentation that allows developers to monitor, debug, and inspect infrastructure. Stackdriver is based on the internal observability tools built for Google. Mark Carter is a group product manager at Google, and he joins the show to discuss site reliability engineering and the creation of Stackdriver.
Ep 873GitOps: Kubernetes Continuous Delivery with Alexis Richardson
Continuous delivery is a way of releasing software without requiring software engineers to synchronize during a release. Over the last decade, continuous delivery workflows have evolved as the tools have changed. Jenkins was one of the first continuous delivery tools and is still in heavy use today. Netflix’s open sourced Spinnaker has also been widely adopted. As Kubernetes has grown in popularity, some engineers have developed a workflow around Kubernetes and Git known as GitOps. GitOps treats Git as the source of truth for deployments. Under GitOps, when a divergence occurs between your git repository’s configuration files and the state of your production infrastructure, your infrastructure should automatically adjust its state to align with the state defined in git. Alexis Richardson is the CEO of Weaveworks, a company that has built tooling around GitOps. He joins the show to describe how GitOps works, and explain how it compares to other methods for continuous delivery.
Ep 872Klarna Engineering with Marcus Granström
Klarna is a payments company headquartered in Sweden. Since being established in 2005 it has grown to handling $21 billion in online sales in 2017. Roughly 40% of all e-commerce sales in Sweden go through Klarna. Klarna’s original differentiator was that it allowed users to checkout of e-commerce stores without entering in credit card information. Instead, the user enters an email address and registers with Klarna. This allows Klarna to assume the risk of the transaction, in place of the credit card company. Klarna’s clever payment method became very popular, and 13 years later Klarna is a bank with a variety of financial services and payment methods. Marcus Granstrom is a director of engineering at Klarna. His work ranges from product development to systems architecture to management. His cross functional role has some similarity to Raylene Yung from Stripe, who is also an engineering director at a payments company, and was on the show yesterday. Marcus walked me through the life of a payment hitting Klarna’s servers, and this served as a nice starting point for a conversation about Klarna’s infrastructure, their product, and their engineering practices.
Ep 871Stripe Engineering with Raylene Yung
Stripe is a payments API that allows merchants to transact online. Since the creation of the payments API, Stripe has expanded into adjacent services such as fraud detection, business management, and billing. These other verticals leverage the existing customer base and infrastructure that Stripe has developed from the success of their payments business. Raylene Yung is the head of payments at Stripe. She joins the show to talk about her work, which includes elements of engineering, product development, design, and management. All of these dimensions of her job came up in our conversation, which made for a wide ranging conversation. This interview comes in the context of Stripe’s rapid growth. The organization is changing, and Raylene explored the questions that Stripe is asking itself internally about org structure. Namely: what is the tradeoff between a defined, hierarchical structure of direct reports versus a decentralized, flat org structure? Is there any advantage to making roles highly defined (such as “senior infrastructure software engineer”)? Or is it better to let people have fluid roles, and self-assemble? Raylene was willing to explore these questions–and I found her answers highly useful and thought provoking.
Ep 870Self-Driving Engineering with George Hotz
In the smartphone market there are two dominant operating systems: one closed source (iPhone) and one open source (Android). The market for self-driving cars could play out the same way, with a company like Tesla becoming the closed source iPhone of cars, and a company like Comma.ai developing the open source Android of self-driving cars. George Hotz is the CEO of Comma.ai. Comma makes hardware devices that allow users with “normal” cars to be augmented with advanced cruise control and lane assist features. This means you can take your own car–for example, a Toyota Prius–and outfit your car to have something similar to the Tesla Autopilot. Comma’s hardware devices cost under $1000 to order online. George joins the show to explain how the Comma hardware and software stack works in detail–from the low level interface with a car’s CAN bus to the high level machine learning infrastructure. Users who purchase the Comma.ai hardware drive around with a camera facing the front of their windshield. This video is used to orient the state of the car in space. The video from that camera also gets saved and uploaded to Comma’s servers. Comma can use this video together with labeled events from the user’s driving experience to crowdsource their model for self-driving. For example, if a user is driving down a long stretch of highway, and they turn on the Comma.ai driving assistance, the car will start driving itself and the video capture will begin. If the car begins to swerve into another lane, the user will take over for the car and the Comma system will disengage. This “disengagement” event gets labeled as such, and when that data makes it back to Comma’s servers, Comma can use the data to update their models. George is very good at explaining complex engineering topics, and is also quite entertaining and open to discussing the technology as well as other competitors in the autonomous car space. I have not been able to get many other people on the show to talk about autonomous cars, so this was quite refreshing! I hope to do more in the future.
Ep 869Future Architecture with Chad Fowler
Chad Fowler was the CTO of Wunderlist prior to its acquisition by Microsoft. Since the acquisition, Chad has become the general manager of developer advocacy at Microsoft. He also works as a venture capitalist at BlueYard Capital, an early stage investment firm. I’ve had a lot of fun talking to Chad, because he can move seamlessly between talking about disparate subjects like cloud computing, investing, cryptocurrencies, and music composition. And he has novel things to say about all of them! When Chad joined Wunderlist, he helped start a large refactoring of the software architecture. He then helped the company navigate to the successful Microsoft acquisition. We started off the conversation with the story of this rearchitecture, and how he sees the current opportunities in front of Microsoft. Chad gives his perspective on Kubernetes, functions-as-a-service, and how developer tooling might evolve in the near future. After talking about near-term developer tooling, we talked about the distant future: bug bounty marketplaces on the blockchain; using Github repositories to train machine learning models about how to write code; the comparison between music collaboration and software collaboration. This was a wide array of topics, but Chad was equipped to discuss all of them–since he works at Microsoft, makes large investments in the future, and studied music when he was in school.
Ep 868Splice: Music Collaboration with Matt Aimonetti
Music collaboration has historically been accomplished by musicians gathering in bands. A band is usually an in-person, physical manifestation: a drummer, a guitarist, a piano player. Or, on a large scale, a symphony of classical instruments led by a conductor. Today, the most flexible instrument that anyone can play is arguably the computer, because a computer can simulate or replay any of the sounds made by any other instrument. Another advantage of the computer is that it removes physicality as a constraint on the musician. A computer musician does not have to train their muscles to play piano, or guitar, or drums. The computer musician can imagine a sound and bring it to life inside a digital audio workstation (a program for composing and arranging music). The rise of the computer musician has coincided with a change in the way popular music is created. Instead of bands needing to work together to create a piece of music, a single producer can simulate all of the members of the band by programming piano, drums, and everything else. The rise of the solo producer has given birth to new kinds of music–but solo music production inherently limits the range of musical ideas that can be explored. The most important works of art have input from multiple people. And even the most successful solo producers love to work with other artists who have a complementary skill–such as vocals. For the last twenty years, the model of solo producer working with pop vocalist has largely dominated the charts. Musical collaboration has stuck to a model that mimics its pre-Internet form, with very small groups of 1-5 people making the core of a song. The main tools that people use to collaborate are email and Dropbox. Splice is a tool for music collaboration. Splice combines version control, revision history, social networking, sample discovery, synthesizer rental, and other features. Splice is changing the way that music is created, with a large percentage of top producers adopting it. The impact Splice has on music will be on par with what Github has done for software engineering. Matt Aimonetti is the CTO and co-founder of Splice, and he joins the show to talk about the founding story, the product development, and the engineering of Splice.
Ep 867GraalVM with Thomas Wuerthinger
Java programs compile into Java bytecode. Java bytecode executes in the Java Virtual Machine, a runtime environment that compiles that bytecode further into machine code, and optimizes the runtime by identifying “hot” code paths and keeping those hot code paths executing quickly. The Java Virtual Machine is a popular platform for building languages on top of. Languages like Scala and Clojure compile down to Java bytecode, and can take advantage of the garbage collection system and the code path optimizations of the JVM. But when Scala and Clojure compile into Java bytecode, the code “shape”–the way that the programs are laid out in memory–is not the same as when Java programs compile into Java bytecode. Executing bytecode that comes from Scala will have certain performance penalties relative to a functionally identical program written in Java. GraalVM is a system for interpreting languages into Java bytecode that can run efficiently on the JVM. Any language can be interpreted into an abstract syntax tree that the GraalVM can execute using the JVM. Languages that can run on GraalVM include JavaScript, R, Ruby, and Python. Thomas Wuerthinger is a senior research director at Oracle and the project lead for GraalVM. He joins the show to explain the motivation for GraalVM, the architecture of the project, and the future of language interoperability. It was an exciting discussion and I learned a lot about the Java ecosystem.
Ep 866Token Types with Felipe Pereira
A token is a unit of virtual currency. Most tokens are built on a blockchain-based cryptocurrency platform, such as Ethereum. Building on top of a platform like Ethereum allows these tokens to form their own financial ecosystem while leveraging the scale of an existing currency. Tokens became highly popular in early 2018, with the boom in ICOs–initial coin offerings. Many of these coins offer a value proposition of a “utility token.” The idea of a utility token is that the token is necessary to transact in a particular ecosystem. If Amazon were to require you to convert US dollars to Amazon coins in order to buy items on Amazon, the Amazon coin would be a “utility token.” There are many different kinds of utility token schemes, and time will tell if this model makes sense for the cryptocurrency investment landscape. Another type of token is the “security token,” in which a token represents a share in an organization. This token type is more like a stock, or bond, or certificate of ownership of a financial instrument. These types of tokens also have their share of criticism. If I start a company, most of my assets are not represented on a blockchain–the assets are things like hiring contracts, intellectual property, real estate, etc. The legal ownership of these assets is settled by a complicated legal system which has no notion of a blockchain. It’s unclear how the claims of a security token today would be enforced–or why a security token is presently a better option for raising capital than traditional equity or debt instruments. Felipe Pereira is the author of “On the immaturity of tokenized value capture mechanisms,” a Medium article in which he documents different types of token systems, including several flavors of utility tokens and security tokens. He’s also the co-founder at a company called Paratii. He joins the show to discuss the present viability of token-based systems–and what blockchains have actually proven to be useful for today.
Ep 865Castor EDC with Derk Arts
Medical breakthroughs require medical research. Medical research requires patient testing and data collection. The most common form of capturing patient data is through surveys–and most of those surveys today are done on paper. Surveying patients to understand the side effects or benefits of trial drugs or treatments, and getting accurate results out of these are critical aspects of medical research. Traditionally, these surveys are filled and read manually, and entered into a database by a human operator. In these steps, there is too much room for human error, from unreadable handwritings to typos being entered into the databas Electronic Data Capture platforms were created out of this need for easy and accurate data collection for researches. By enabling online survey creation and result collection, EDC platforms improved medical research immensely. However, these platforms are complex to design. Where patient medical data is concerned, privacy and security are of extremely high importance. Compliance with laws that protect anonymity and privacy of the patients is necessary. On top of these, the platform must be easy-to-use, and reliable. Castor EDC is a company specializing on EDC for medical research, founded in the Netherlands and active in many countries around the globe. Our guest today is Derk Arts, the founder and CEO of Castor EDC. In this episode we discuss Electronic Data Capture platforms, how Castor EDC overcame the engineering and design problems, how they comply with the laws, and their business model.