
Software Engineering Daily
2,200 episodes — Page 29 of 44
Ep 876DoorDash Engineering with Raghav Ramesh
DoorDash is a logistics company that connects customers, restaurants, and drivers that can move food to its destination. When a customer orders from a restaurant, DoorDash needs to identify the ideal driver for picking up the order from the restaurant and dropping it off with the customer. This process of matching an order to a driver takes in many different factors. Let’s say I order spaghetti from an Italian restaurant. How long does the spaghetti take to prepare? How much traffic is there in different areas of the city? Who are the different drivers who could potentially pick the spaghetti up? Are there other orders near the Italian restaurant, that we could co-schedule the spaghetti delivery with? In order to perform this matching of drivers and orders, DoorDash builds machine learning models that take into account historical data. In today’s episode, Raghav Ramesh explains how DoorDash’s data platform works, and how that data is used to build machine learning models. We also explore the machine learning model release process—which involves backtesting, shadowing, and gradual rollout.
Ep 875Casa: Crypto Wallet Security with Jameson Lopp
Cryptocurrency security is a concern to anyone who has a significant amount of money in the form of Bitcoin, Ethereum, or other crypto assets. Most Bitcoin is held in either a Bitcoin wallet or a Bitcoin bank. Your Bitcoin holdings are recorded on a public ledger. You access these holdings by authenticating with your private key. A Bitcoin wallet could be described more accurately as a Bitcoin keyring. Securing your Bitcoin wallet is about securing that private key. Just as there are many different ways to secure any individual piece of text, there are many ways to secure a Bitcoin private key. A Bitcoin “bank” is a term that can be used to describe institutions such as Coinbase. Coinbase takes the technology of the Bitcoin wallet and wraps it in additional layers of security, identity, and failover that we associate with banks and large technology companies. By using a Bitcoin bank, you sacrifice the autonomy of managing your own private key. On the bright side, you don’t have to manage your own private key. If you lose your Coinbase password, there are plenty of ways to recover it. A Bitcoin bank gives you the downsides and the upsides of working with a centralized service provider. Jameson Lopp is a cypherpunk and cryptocurrency engineer at Casa. Casa is a company that is building long-term cryptocurrency storage and secure key infrastructure. In this episode, we explore how Bitcoin wallets work, how to secure them, the common threats, scams and hacking attempts of Bitcoin, and what he is working on at Casa.
Ep 874Infrastructure Monitoring with Mark Carter
At Google, the job of a site reliability engineer involves building tools to automate infrastructure operations. If a server crashes, there is automation in place to create a new server. If a service starts to receive a high load of traffic, there is automation in place to scale up the instances of that service. In order to create an automated response to an infrastructure problem, a site reliability engineer needs insights into that infrastructure. Every service needs tools around monitoring, alerting, debugging, and distributed tracing. One benefit of working at a large company like Google is that an engineer building a new product gets this kind of tooling by default. If I am hacking on a project at home, I have to set up all kinds of tools to help me diagnose and resolve problems. Setting up this tooling takes time, and requires expertise. Stackdriver is a set of tools and instrumentation that allows developers to monitor, debug, and inspect infrastructure. Stackdriver is based on the internal observability tools built for Google. Mark Carter is a group product manager at Google, and he joins the show to discuss site reliability engineering and the creation of Stackdriver.
Ep 873GitOps: Kubernetes Continuous Delivery with Alexis Richardson
Continuous delivery is a way of releasing software without requiring software engineers to synchronize during a release. Over the last decade, continuous delivery workflows have evolved as the tools have changed. Jenkins was one of the first continuous delivery tools and is still in heavy use today. Netflix’s open sourced Spinnaker has also been widely adopted. As Kubernetes has grown in popularity, some engineers have developed a workflow around Kubernetes and Git known as GitOps. GitOps treats Git as the source of truth for deployments. Under GitOps, when a divergence occurs between your git repository’s configuration files and the state of your production infrastructure, your infrastructure should automatically adjust its state to align with the state defined in git. Alexis Richardson is the CEO of Weaveworks, a company that has built tooling around GitOps. He joins the show to describe how GitOps works, and explain how it compares to other methods for continuous delivery.
Ep 872Klarna Engineering with Marcus Granström
Klarna is a payments company headquartered in Sweden. Since being established in 2005 it has grown to handling $21 billion in online sales in 2017. Roughly 40% of all e-commerce sales in Sweden go through Klarna. Klarna’s original differentiator was that it allowed users to checkout of e-commerce stores without entering in credit card information. Instead, the user enters an email address and registers with Klarna. This allows Klarna to assume the risk of the transaction, in place of the credit card company. Klarna’s clever payment method became very popular, and 13 years later Klarna is a bank with a variety of financial services and payment methods. Marcus Granstrom is a director of engineering at Klarna. His work ranges from product development to systems architecture to management. His cross functional role has some similarity to Raylene Yung from Stripe, who is also an engineering director at a payments company, and was on the show yesterday. Marcus walked me through the life of a payment hitting Klarna’s servers, and this served as a nice starting point for a conversation about Klarna’s infrastructure, their product, and their engineering practices.
Ep 871Stripe Engineering with Raylene Yung
Stripe is a payments API that allows merchants to transact online. Since the creation of the payments API, Stripe has expanded into adjacent services such as fraud detection, business management, and billing. These other verticals leverage the existing customer base and infrastructure that Stripe has developed from the success of their payments business. Raylene Yung is the head of payments at Stripe. She joins the show to talk about her work, which includes elements of engineering, product development, design, and management. All of these dimensions of her job came up in our conversation, which made for a wide ranging conversation. This interview comes in the context of Stripe’s rapid growth. The organization is changing, and Raylene explored the questions that Stripe is asking itself internally about org structure. Namely: what is the tradeoff between a defined, hierarchical structure of direct reports versus a decentralized, flat org structure? Is there any advantage to making roles highly defined (such as “senior infrastructure software engineer”)? Or is it better to let people have fluid roles, and self-assemble? Raylene was willing to explore these questions–and I found her answers highly useful and thought provoking.
Ep 870Self-Driving Engineering with George Hotz
In the smartphone market there are two dominant operating systems: one closed source (iPhone) and one open source (Android). The market for self-driving cars could play out the same way, with a company like Tesla becoming the closed source iPhone of cars, and a company like Comma.ai developing the open source Android of self-driving cars. George Hotz is the CEO of Comma.ai. Comma makes hardware devices that allow users with “normal” cars to be augmented with advanced cruise control and lane assist features. This means you can take your own car–for example, a Toyota Prius–and outfit your car to have something similar to the Tesla Autopilot. Comma’s hardware devices cost under $1000 to order online. George joins the show to explain how the Comma hardware and software stack works in detail–from the low level interface with a car’s CAN bus to the high level machine learning infrastructure. Users who purchase the Comma.ai hardware drive around with a camera facing the front of their windshield. This video is used to orient the state of the car in space. The video from that camera also gets saved and uploaded to Comma’s servers. Comma can use this video together with labeled events from the user’s driving experience to crowdsource their model for self-driving. For example, if a user is driving down a long stretch of highway, and they turn on the Comma.ai driving assistance, the car will start driving itself and the video capture will begin. If the car begins to swerve into another lane, the user will take over for the car and the Comma system will disengage. This “disengagement” event gets labeled as such, and when that data makes it back to Comma’s servers, Comma can use the data to update their models. George is very good at explaining complex engineering topics, and is also quite entertaining and open to discussing the technology as well as other competitors in the autonomous car space. I have not been able to get many other people on the show to talk about autonomous cars, so this was quite refreshing! I hope to do more in the future.
Ep 869Future Architecture with Chad Fowler
Chad Fowler was the CTO of Wunderlist prior to its acquisition by Microsoft. Since the acquisition, Chad has become the general manager of developer advocacy at Microsoft. He also works as a venture capitalist at BlueYard Capital, an early stage investment firm. I’ve had a lot of fun talking to Chad, because he can move seamlessly between talking about disparate subjects like cloud computing, investing, cryptocurrencies, and music composition. And he has novel things to say about all of them! When Chad joined Wunderlist, he helped start a large refactoring of the software architecture. He then helped the company navigate to the successful Microsoft acquisition. We started off the conversation with the story of this rearchitecture, and how he sees the current opportunities in front of Microsoft. Chad gives his perspective on Kubernetes, functions-as-a-service, and how developer tooling might evolve in the near future. After talking about near-term developer tooling, we talked about the distant future: bug bounty marketplaces on the blockchain; using Github repositories to train machine learning models about how to write code; the comparison between music collaboration and software collaboration. This was a wide array of topics, but Chad was equipped to discuss all of them–since he works at Microsoft, makes large investments in the future, and studied music when he was in school.
Ep 868Splice: Music Collaboration with Matt Aimonetti
Music collaboration has historically been accomplished by musicians gathering in bands. A band is usually an in-person, physical manifestation: a drummer, a guitarist, a piano player. Or, on a large scale, a symphony of classical instruments led by a conductor. Today, the most flexible instrument that anyone can play is arguably the computer, because a computer can simulate or replay any of the sounds made by any other instrument. Another advantage of the computer is that it removes physicality as a constraint on the musician. A computer musician does not have to train their muscles to play piano, or guitar, or drums. The computer musician can imagine a sound and bring it to life inside a digital audio workstation (a program for composing and arranging music). The rise of the computer musician has coincided with a change in the way popular music is created. Instead of bands needing to work together to create a piece of music, a single producer can simulate all of the members of the band by programming piano, drums, and everything else. The rise of the solo producer has given birth to new kinds of music–but solo music production inherently limits the range of musical ideas that can be explored. The most important works of art have input from multiple people. And even the most successful solo producers love to work with other artists who have a complementary skill–such as vocals. For the last twenty years, the model of solo producer working with pop vocalist has largely dominated the charts. Musical collaboration has stuck to a model that mimics its pre-Internet form, with very small groups of 1-5 people making the core of a song. The main tools that people use to collaborate are email and Dropbox. Splice is a tool for music collaboration. Splice combines version control, revision history, social networking, sample discovery, synthesizer rental, and other features. Splice is changing the way that music is created, with a large percentage of top producers adopting it. The impact Splice has on music will be on par with what Github has done for software engineering. Matt Aimonetti is the CTO and co-founder of Splice, and he joins the show to talk about the founding story, the product development, and the engineering of Splice.
Ep 867GraalVM with Thomas Wuerthinger
Java programs compile into Java bytecode. Java bytecode executes in the Java Virtual Machine, a runtime environment that compiles that bytecode further into machine code, and optimizes the runtime by identifying “hot” code paths and keeping those hot code paths executing quickly. The Java Virtual Machine is a popular platform for building languages on top of. Languages like Scala and Clojure compile down to Java bytecode, and can take advantage of the garbage collection system and the code path optimizations of the JVM. But when Scala and Clojure compile into Java bytecode, the code “shape”–the way that the programs are laid out in memory–is not the same as when Java programs compile into Java bytecode. Executing bytecode that comes from Scala will have certain performance penalties relative to a functionally identical program written in Java. GraalVM is a system for interpreting languages into Java bytecode that can run efficiently on the JVM. Any language can be interpreted into an abstract syntax tree that the GraalVM can execute using the JVM. Languages that can run on GraalVM include JavaScript, R, Ruby, and Python. Thomas Wuerthinger is a senior research director at Oracle and the project lead for GraalVM. He joins the show to explain the motivation for GraalVM, the architecture of the project, and the future of language interoperability. It was an exciting discussion and I learned a lot about the Java ecosystem.
Ep 866Token Types with Felipe Pereira
A token is a unit of virtual currency. Most tokens are built on a blockchain-based cryptocurrency platform, such as Ethereum. Building on top of a platform like Ethereum allows these tokens to form their own financial ecosystem while leveraging the scale of an existing currency. Tokens became highly popular in early 2018, with the boom in ICOs–initial coin offerings. Many of these coins offer a value proposition of a “utility token.” The idea of a utility token is that the token is necessary to transact in a particular ecosystem. If Amazon were to require you to convert US dollars to Amazon coins in order to buy items on Amazon, the Amazon coin would be a “utility token.” There are many different kinds of utility token schemes, and time will tell if this model makes sense for the cryptocurrency investment landscape. Another type of token is the “security token,” in which a token represents a share in an organization. This token type is more like a stock, or bond, or certificate of ownership of a financial instrument. These types of tokens also have their share of criticism. If I start a company, most of my assets are not represented on a blockchain–the assets are things like hiring contracts, intellectual property, real estate, etc. The legal ownership of these assets is settled by a complicated legal system which has no notion of a blockchain. It’s unclear how the claims of a security token today would be enforced–or why a security token is presently a better option for raising capital than traditional equity or debt instruments. Felipe Pereira is the author of “On the immaturity of tokenized value capture mechanisms,” a Medium article in which he documents different types of token systems, including several flavors of utility tokens and security tokens. He’s also the co-founder at a company called Paratii. He joins the show to discuss the present viability of token-based systems–and what blockchains have actually proven to be useful for today.
Ep 865Castor EDC with Derk Arts
Medical breakthroughs require medical research. Medical research requires patient testing and data collection. The most common form of capturing patient data is through surveys–and most of those surveys today are done on paper. Surveying patients to understand the side effects or benefits of trial drugs or treatments, and getting accurate results out of these are critical aspects of medical research. Traditionally, these surveys are filled and read manually, and entered into a database by a human operator. In these steps, there is too much room for human error, from unreadable handwritings to typos being entered into the databas Electronic Data Capture platforms were created out of this need for easy and accurate data collection for researches. By enabling online survey creation and result collection, EDC platforms improved medical research immensely. However, these platforms are complex to design. Where patient medical data is concerned, privacy and security are of extremely high importance. Compliance with laws that protect anonymity and privacy of the patients is necessary. On top of these, the platform must be easy-to-use, and reliable. Castor EDC is a company specializing on EDC for medical research, founded in the Netherlands and active in many countries around the globe. Our guest today is Derk Arts, the founder and CEO of Castor EDC. In this episode we discuss Electronic Data Capture platforms, how Castor EDC overcame the engineering and design problems, how they comply with the laws, and their business model.
Ep 864Jailbreaking Apple Watch with Max Bazaliy
Apple operating systems are closed source. This closed source nature gives Apple an extremely successful business model–and a very different software developer ecosystem than Linux-based systems. Since Linux is open source, the information on how to manipulate the system at a low level is very public. The lack of information about low-level programming in Apple operating systems has led to a large community of “jailbreaking”–where people try to reverse engineer how the closed source systems function. In today’s episode, Max Bazaliy joins the show to describe how he reverse engineered an Apple Watch. It’s a complex security challenge to jailbreak an Apple Watch, as he describes in detail. Max is a security researcher at Lookout, a mobile security company.
Ep 863Edge Kubernetes with Venkat Yalla
“Edge computing” is a term used to define computation that takes place in an environment outside of a data center. Edge computing is a broad term. Your smartphone is an edge device. A self-driving car is an edge device. A security camera with a computer chip is an edge device. These “edge devices” have existed for a long time now, but the term “edge computing” has only started being used more recently. Why is that? It is mostly because the volume of data produced by edge devices, and the type of computation that we want from edge devices is changing. We want to develop large sensor networks to enable smart factories, and smart agriculture fields. We want our smartphones to have machine learning models that get updated as frequently as possible. We want to use self-driving cars, and drones, and smart refrigerators to develop elaborate mesh networks–and perhaps even have micropayments between machines, so that computation can be offloaded from edge devices to a nearby mesh network for a small price. Kubernetes is a tool for orchestrating distributed, containerized computation. Just as Kubernetes is being widely used for data center infrastructure, it can also be used to orchestrate computation among nodes on-premise at a factory, or in a smart agriculture environment. In today’s episode, Venkat Yalla from Microsoft joins the show to talk about Kubernetes at the edge, and how Internet of things applications can use Kubernetes for their deployments today–and what the future might hold. Full disclosure: Microsoft is a sponsor of SE Daily.
Ep 862React Native at Airbnb with Gabriel Peal
React Native allows developers to reuse frontend code between mobile platforms. A user interface component written in React Native can be used in both iOS and Android codebases. Since React Native allows for code reuse, this can save time for developers, in contrast to a model where completely separate teams have to create frontend logic for iOS and Android. React Native was created at Facebook. Facebook itself uses React Native for mobile development, and contributes heavily to the open source React Native repository. In 2016, Airbnb started using React Native in a significant portion of their mobile codebase. Over the next two years, Airbnb saw the advantages and the disadvantages of adopting the cross platform, JavaScript based system. After those two years, the engineering management at Airbnb came to the conclusion to stop using React Native. Gabriel Peal is an engineer at Airbnb who was part of the decision to move off of React Native. Gabriel wrote a blog post giving the backstory for React Native at Airbnb, and he joins the show to give more detail on the decision.
Ep 860Ghost: Open Source Publishing Platform with John O’Nolan
Blogging is more than 20 years old. Over that period of time, numerous publishing platforms have been created. Squarespace, Blogger, Medium, and Twitter are popular closed source platforms. WordPress has been the most popular open source blogging platform–and much of the Internet (including Software Engineering Daily) runs on WordPress. WordPress is a powerful platform. News companies, ecommerce websites, and many other kinds of businesses use WordPress as their central publishing tool. But WordPress has been around for 15 years–and there are some potential conflicts of interest between WordPress the open source project and WordPress.com (a company started to host WordPress websites). John O’Nolan was working as a WordPress developer when he decided to start a new publishing platform called Ghost. Five years later, the Ghost project is a success–with a thriving open source community, a profitable SaaS business, and companies like Digital Ocean and Mozilla using Ghost to host their blogs. John and I discussed his background with WordPress, what he wanted to do differently with Ghost, and the software architecture of Ghost. We also touched on the Ghost SaaS business and the management of the open source project.
Ep 859Video Games and Funding Techniques with Howard Marks
Howard Marks ran two video game companies in the 90’s: Activision and Acclaim. While running these companies, he developed a love for entrepreneurship that he maintains today. Howard is the CEO of StartEngine, a company that functions as an accelerator, a crowdfunding platform, and ICO launcher. Howard joins the show to talk about his background as an entrepreneur, as well as some modern alternative funding mechanisms that he’s working on at StartEngine. Hearing Howard’s thoughts on building a video game company in the 90’s was particularly new information to me–it’s an era of software development that we have not covered much at all. As a side note–some listeners have asked recently why we cover subjects such as ICOs, when there have been so many dubious companies that have launched ICOs. There are two reasons why we cover this area. Firstly, cryptocurrencies are a breakthrough computer science construct. It’s important for us to try to understand their implications. The other reason why we cover ICOs is that some technology companies require high upfront capital costs. The amount of capital you have available affects the speed at which your engineering team can move. New funding mechanisms could mean more capital for certain types of software companies–and this could be a good thing and a bad thing, depending on the company.
Ep 858Video Machine Learning with Ben Dodson
Video streaming platforms like Netflix offer a convenient way to watch video content. We are now able to watch our favorite TV shows, movies, or content creators on a range of devices. However, buffering while watching videos can be a painful experience on mobile phones and tablets that use 4G or LTE. As streaming becomes available to a wider range of devices with varying bandwidth restrictions, different encodings of the video need to be created for different devices, and different bandwidth situations. To get the best quality viewing possible with the bandwidth available to connections, there needs to be a balance between the resolution of the video, and the bitrate, which defines the data that the video consumes. Mux is a company that builds video hosting and analytics. Ben Dodson is a data scientist at Mux, who built a system for optimizing the bitrate of videos through machine learning. In this episode we discuss video encoding and how Mux solved the problem of serving the highest quality video with the ideal bitrate.
Ep 857Kubernetes in the Enterprise with Aparna Sinha
Enterprises want to update their technology faster. One way an enterprise can accelerate the adoption of new tools is to move more aggressively towards the cloud. By giving internal developers access to the cloud, it becomes easier to provision new servers–allowing for rapid experimentation, test environments, and scalability. In previous shows we have explored how large enterprises successfully learn to move their technology faster. Much of this process is rooted in being able to experiment quickly–which requires well-defined testing procedures, and the ability to quickly provision and destroy infrastructure. Many enterprises have large on-premise infrastructure deployments. An enterprise’s movement towards the cloud can be made complex by this existing set of servers. In today’s show, Aparna Sinha discusses how Kubernetes is useful for enterprises–and how it can improve development speed, experimentation, and observability. Aparna is the leader of the product team for Kubernetes and Container Engine at Google. Much of her job is centered around understanding what would be useful to enterprises who are choosing a cloud provider. The open source version of Kubernetes is useful on its own, but most enterprises choose a managed provider of Kubernetes–such as Google Kubernetes Engine–to help with support and onboarding . Full disclosure: Google is a sponsor of Software Engineering Daily.
Ep 856WebAssembly with Lin Clark
JavaScript has been the exclusive language of the web browser for the last 20 years. Whether you use Chrome, Firefox, Internet Explorer, or Safari, your browser interprets and executes code in a virtual machine–and that virtual machine only runs JavaScript. Unfortunately, JavaScript is not ideal for every task we want to perform in the browser. Think about the use cases where you need to use software outside of the browser: video editing, music production, 3D art, video games. These applications require a high degree of performance that is hard to get from raw JavaScript. WebAssembly was created to get better performance on the web. WebAssembly allows code from other languages to be compiled and run in the browser. With WebAssembly, languages such as C, C++, and Rust can be used to achieve major performance gains. WebAssembly is still under development, and eventually more programming languages will be accessible as well. Lin Clark is an engineer on the Mozilla Developer Relations team, and has been working closely on the WebAssembly project. She is the author of a detailed series of illustrated blog posts that explain how WebAssembly works. In this episode, we discuss how WebAssembly came to be, its advantages over a web driven purely by Javascript, what is possible with WebAssembly, and its engineering implementation.
Ep 855Botchain with Rob May
“Bots” are becoming increasingly relevant to our everyday interactions with technology. A bot sometimes mediates the interactions of two people. Examples of bots include automated reply systems, intelligent chat bots, classification systems, and prediction machines. These systems are often powered by machine learning systems that are black boxes to the user. Today’s guest Rob May argues that these systems should be auditable and accountable, and that using a blockchain-based identity system for bots is a viable solution to the machine learning auditability problem. Rob is the CEO of Talla, a knowledge base provider for business teams. The Botchain project was spun out of Talla as a solution to the problem of bot identity. In this episode, we talk about Botchain and the application of blockchain to bot identity, the current state of ICOs, and the viability of utility token ecosystems. Botchain has its own cryptotoken called “Botcoin.”
Ep 853Build a Bank: N26 with Pat Kua
Banking has been a part of the economy for 600 years. Banking has always been evolving. The most recent evolution: the financial industry has been going digital. Newer “fintech” companies have created innovative ways of doing everything related to money–from friendly payments to budgeting; from business transactions to insurance. However, the traditional banks themselves have been relatively slow at adjusting to these digital changes, creating the opportunity for native digital banks–often referred to as “challenger banks.” N26 is a digital first bank established in Berlin and is active in 17 European countries with a million users. Pat Kua is the CTO at N26. Digital banks are hosted on the cloud, without a physical branch the user must go to to perform an operation. The user accesses their account through a mobile application, and can complete everything online from opening an account to performing transactions. This system has numerous advantages, like simplicity for the user, and higher scalability in terms of users from the bank’s perspective. In this episode, we discuss the advantages of digital banks, the fintech industry and N26 as a itself. We also explore product development and how Pat manages his time as CTO–which is a useful discussion for anyone who is learning to be a technical leader or manager.
Ep 852Git Vulnerability with Edward Thomson
Git is a distributed file system for version control. Git is extremely reliable, fast, and secure, owing to the fact that it is one of the oldest pieces of open source software. But even battle-tested software can have vulnerabilities. In this episode, we explore a subtle git vulnerability that could have potentially led to git users executing malicious scripts when they intended to simply pull a repository. Today’s guest Edward Thomson is a program manager at Microsoft, and a maintainer of libgit2, a C implementation of git. He also writes about git and hosts the podcast All Things Git. He is passionate about git development, which gave me a deeper perspective on something that I just consider a tool. But the only reason that tool is so good–the only reason it fades into the background–is because there are people that are passionate enough to work on it on a regular basis. We also spent some time talking about the vulnerabilities that can spread through shared code environments–particularly in the realm of git, npm, and PHP. And we touched on how deployment workflows around git and Kubernetes are changing. Full disclosure: Microsoft, where Edward works, is a sponsor of Software Engineering Daily.
Ep 851Counting People with Andrew Farah
If you operate a restaurant, you want to know how many people are inside your restaurant at any given time. You also want to be able to know your occupancy if you operate a movie theater, coffee shop, or apparel store. Knowing how many people are in your building can answer several business-related questions. Do you need to unlock an additional entrance? Should you open another store? Do you really need a building this big? This might sound like a simple question, but how do you solve the problem of counting people inside of a building? A naive approach to counting people is to use video cameras and count the number of people entering and exiting the building. Machine learning algorithms are good at classifying humans. But the downside of this is that you have to put cameras anywhere you want a people-counter. There are many situations where you would want to count the number of people where a camera is not acceptable. What if you wanted to count people in a privacy preserving way? What if you wanted to obscure any identifiable traits of a person that you were counting? Density is a device for counting people. It sits above a doorway and counts the people who are entering or exiting the building. Andrew Farah is the CEO at Density, and in today’s episode he explains why the problem of counting people is harder than it sounds, and how the Density people counter functions.
Ep 850Machine Learning Deployments with Diego Oppenheimer
Machine learning models allow our applications to perform highly accurate inferences. A model can be used to classify a picture as a cat, or to predict what movie I might want to watch. But before a machine learning model can be used to make these inferences, the model must be trained and deployed. In the training process, a machine learning model consumes a data set and learns from it. The training process can consume significant resources. After the training process is over, you have a trained model that you need to get into production. This is known as the “deployment” step. Deployment can be a hard problem. You are taking a program from a training environment to a production environment. A lot can change between these two environments. In production, your model is running on a different machine–which can lead to compatibility issues. If your model serves a high volume of requests, it might need to scale up. In production, you also need caching, and monitoring, and logging. Large companies like Netflix, Uber, and Facebook have built their own internal systems to control the pipeline of getting a model from training into production. Companies who are newer to machine learning can struggle with this deployment process, and these companies usually don’t have the resources to build their own machine learning platform like Netflix. Diego Oppenheiner is the CEO of Algorithmia, a company that has built a system for automating machine learning deployments. This is the second cool product that Algorithmia has built, the first being the algorithm marketplace that we covered in an episode a few years ago. In today’s show, Diego describes the challenges of deploying a machine learning model into production, and how that product was a natural complement to the algorithms marketplace. Full disclosure: Algorithmia is a sponsor of Software Engineering Daily.
Ep 849Ballerina Language with Tyler Jewell
Modern programming requires lots of integration between APIs. Some of these integrations are trivial–such as using Twilio or Stripe. But there are many more complex integrations. For example, when a large company acquires a smaller company, the acquiring company might want to integrate with that smaller company to leverage the synergies between the two companies. How do you build clean communication patterns between the services of one company and another? Two teams within a single enterprise can also have integration issues. One team might have a different data model than the other team. One team might be using JSON and the other using XML. In these cases, integrations between APIs can take considerable time. Ballerina is a programming language that is designed for writing integrations. Ballerina is made for building services that allow two APIs to communicate easily–in contrast to other patterns of API integration such as those involving an enterprise service bus. Tyler Jewell is the CEO of WSO2, a company that specializes in integrations. WSO2 created the Ballerina language and is investing heavily into it (with ~80 people working on Ballerina today). In this episode we explored integrations–and why this problem required creating a new programming language. BallerinaCon is July 18th in San Francisco and our listeners can attend for free–use code BalCon-SEDaily.
Ep 847Flutter in Practice with Randal Schwartz
Flutter allows developers to build cross-platform mobile apps. In our previous show about Flutter, Eric Seidel from Google described the goals of Flutter, why he founded the project, and how Flutter is built. In today’s show, Randal Schwartz talks about Flutter in more detail–including the developer experience of building Flutter apps and why he finds Flutter so exciting. Randal is a longtime software developer who has focused mostly on web applications. He’s also the host of the popular podcast FLOSS Weekly, a show about open source software. Like many developers, Randal has stayed away from mobile development in the past. If you write a mobile application, you have historically had to build iOS and Android apps separately. That is a large up-front cost, and Flutter reduces the cost by allowing users to develop mobile apps for iOS and Android with a single codebase. Randal has been podcasting about open source software for 12 years, so he brings historical depth to this conversation. He has also been working with Dart for several years–Dart is a language developed by Google that is used in Flutter.
Ep 846Build a Bank: Nubank with Edward Wible
Nubank was started in 2013 with a credit card that was controlled through a mobile app. At the time, it was the first service in Brazil that allowed customers to do banking without going to a physical bank branch. Since then, Nubank has expanded into additional financial services and currently has 850 employees working in Brazil. Edward Wible is a co-founder and CTO of Nubank and in this episode he discusses his work growing Nubank from a small team of less than 10 people into a company with almost 1000 people. We have covered two other banks in the past few weeks: Monzo and N26. In terms of software engineering and product management, Nubank is similar to Monzo and N26. One characteristic that stood out was Nubank’s use of Clojure, a functional programming language built on the JVM. A question that came up during this show: what is the line between “fintech company” and “bank”? We hope explore this more in future shows about the intersection of money and software. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 845Flutter with Eric Seidel
Flutter is a project from Google that is rebuilding user interface engineering from the ground up. Today, most engineering teams have dedicated engineering resources for web, iOS, and Android. These different platforms have their own design constraints, their own toolset, and their own programming languages. But each platform is merely building a user interface. Why should development across these three user surfaces be so different? This was the question that Eric Seidel was asking himself three years ago, when he co-founded the Flutter project. The Flutter project had a few rough starts, as the team tried to figure out exactly what layer of abstraction they were trying to provide. Around that time, ReactJS and React Native were growing in popularity. Seeing the React projects provided some data points, and some inspiration. But Flutter takes a lower level approach to cross platform app development, by presenting a rendering layer and a runtime API that are interfacing with the hardware in the same way that OpenGL does. In today’s episode, Eric joins the show to explain how the Flutter project came to life, and his lessons from starting an ambitious project that took several years to pick up steam. I enjoyed this episode because Flutter could have massive improvements for how quickly we can build apps–and also because Eric is a serious engineer and there are so many insights in this episode about computer science, software engineering, and project management. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 844Future Projection with Tim O’Reilly
Tim O’Reilly’s book What’s the Future? is an overview of business, technology, and society. As the founder of O’Reilly Media, Tim has been steeped in technology trends for the last 40 years. From his vantage point running conferences and publishing technical content, Tim has been able to make informed predictions about what is coming next. In today’s conversation, Tim gave his perspective on how artificial intelligence will impact our world in the coming decades. More importantly–Tim emphasizes the role of human agency. The future is not something that merely happens to us as we sit back and eat popcorn. Today, we make decisions, and those decisions that we make could help lead us to technology utopia or towards the fall of our great technological empire. On the subject of business, Tim gave a radically different perspective than most of the entrepreneurs that come on Software Engineering Daily. In our conversation, he raised the question of why entrepreneurs raise massive amounts of money, get on the treadmill of startup hype, and build a company around negative cash flows. For that model, the only possible outcomes are going public, being acquired, or flaming out completely. O’Reilly Media has been cash flow positive since the beginning, and the company has steadily compounded, growing successively bigger businesses from publishing to conferences to online learning. This episode gave me a lot to think about–just as the O’Reilly Conferences have throughout the years. O’Reilly Media has graciously partnered with SE Daily since we were very small, so I have great admiration for the company and Tim himself. It was a pleasure to get the chance to meet him in person. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 843Machine Learning Stroke Identification with David Golan
When a patient comes into the hospital with stroke symptoms, the hospital will give that patient a CAT scan, a 3-dimensional imaging of the patient’s brain. The CAT scan needs to be examined by a radiologist, and the radiologist will decide whether to refer the patient to an interventionist–a surgeon who can perform an operation to lower the risk of long-term damage to the patient’s brain function. After getting the CAT scan, the patient might wait for hours before a radiologist has a chance to look at the scan. In that period of time, the patient’s brain function might be rapidly degrading. To speed up this workflow, a company called Viz.ai built a machine learning model that can recognize whether a patient is at high risk of stroke consequences or not. Many people have predicted that radiologists will be automated away by machine learning in the coming years. This episode presents a much more realistic perspective: first of all, we don’t have nearly enough radiologists, so if we can create automated radiologists that would be a very good thing; second of all, even in this workflow with a cutting-edge machine learning radiologist, you still need the human radiologist in the loop. David Golan is the CTO at Viz.ai, and in today’s show he explains why he is working on a system for automated stroke identification, and the engineering challenges in building that system. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 842Fintech Environment with Michael Walsh
Computer systems consume memory, CPU, battery, data, and network bandwidth as inputs. These systems provide value for the end user by delivering information, virtual objects, and physical products as outputs. Another fundamental resource that is becoming easier to consume as input is money. There are also new outputs–financial constructs that are made possible by cloud computing, machine learning, and cryptocurrencies. This is why so much opportunity exists in fintech. Money has always been a flexible tool for brokerage between humans. But as recently as the early 2000s, the interfaces between money and computers have been clunky and inflexible. Engineers that wanted to build financial systems around money had to work directly with banks and credit card processors. More recently, there has been an explosion in new APIs and completely new financial primitives like cryptocurrencies. In the year 2000, a well-funded team had to struggle to put together a basic ecommerce company. Today, a blue ocean of opportunity has opened up for entrepreneurs building businesses around lending, insurance, underwriting, banking, and every other microcosm of the financial system. Michael Walsh is a general partner and co-founder of Green Visor Capital. In today’s episode, he described his perspective on the modern fintech environment, and what the future holds. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 841Kademlia: P2P Distributed Hash Table with Petar Maymounkov
Napster, Kazaa, and Bittorrent are peer-to-peer file sharing systems. In these P2P systems, nodes need to find each other. Users need to be able to search for files that exist across the system. P2P systems are decentralized, so these routing problems must be solved without a centralized service in the middle. Without a centralized service that has all the information in one place, how can you solve these problems of node discovery and file lookup? This is the central question that Petar Maymounkov sought to answer with Kademlia. Kademlia is a peer-to-peer distributed hash table. Kademlia implements the “put” and “get” operations of an efficiently scalable hash table without using any centralized service. Each node in the system maintains its own routing table. When a user queries the system (a “get” operation), that query is serviced by the nodes coordinating with each other to intelligently route the user to their target location. When a file is stored (a “put” operation), that update to the file system can propagate through the network in a decentralized, uncoordinated way. Petar joins the show to give a brief history of P2P networks, why he created Kademlia, and what he is working on today. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 839Data Engineering Automation with Mike Kim
Every company has the idea of the “nightly report.” A business analyst comes into the office, sits down in front their inbox, and looks at yesterday’s data. Did sales go up? Did the marketing campaigns bring in the expected number of customers? Was there an increase in helpdesk tickets? The statistics that these reports deliver to human analysts can change the direction of the business. Everyone within a company could use a regular report that documents how the business is changing over time. Outlier.ai is a company that processes the data sets within a business and generates automated reports that are relevant to different people within the organization. If you are an email marketing analyst, your data from MailChimp campaigns will be analyzed. If you manage a customer success team, your Zendesk tickets will be analyzed. If you are a technical support analyst, the crash reports and error messages from your users will be analyzed. In all of these cases, the data gets processed automatically, and a story is sent to you, so that you can have the information in your inbox waiting for you, instead of having to go ask a data scientist to generate it for you. Mike Kim is the CTO of Outlier.ai, and in this show he describes the engineering challenges of integrating with all the different data sets of an organization–and why there is so much value in the idea of the automated “report” or “story” for analysts. In past shows, we have explored how data engineering has progressed over the last twenty years–from database administration to Hadoop cluster management to the emergence of “data breadlines” where analysts wait for a data scientist to process the job they asked for. Outlier represents a step towards a world where the data science reports are delivered to us before we even ask, rather than us having to query the system. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 838Chrome and Chromium with David Bokan
Chromium is an open source browser that shares code with the Chrome browser from Google. A browser is a large piece of software, with engineering challenges around threading, rendering, resource management, and networking. To add to the complexity, Chrome runs on iOS, Android, MacOSX, Windows, and other platforms. Chrome OS is an operating system based on Chrome. There is also Chromium OS, the open source version of Chrome OS. The Chrome/Chromium operating systems are based off of Linux. Through this entire episode, the line between browser and operating system is blurry. There is so much resource management involved in the Chrome browser that it has its own task manager. For many people (including myself) the browser is the main application you are interfacing with throughout the day. It handles all of your business applications. Even many of your desktop apps, such as Slack, are running on Electron, which is a framework for building cross-platform apps that uses Chromium. David Bokan is an engineer on the Chromium team at Google, and he joins the show to describe the engineering of Chrome and the development and release process. David also gives his thoughts on future developments for browsers, apps, and the Internet. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 837Shopify Infrastructure with Niko Kurtti
Shopify runs more than 500,000 small business websites. When Shopify was figuring out how to scale, the engineering teams did not have a standard workflow for how to deploy and manage services. Some teams used AWS, some teams used Heroku, some teams used other infrastructure providers. To manage all those stores effectively, Shopify has built its own platform-as-a-service on top of Kubernetes called Cloudbuddies. Cloudbuddies was inspired by Heroku, and it allows engineers at Shopify to deploy services in an opinionated way that is perfect for Shopify. Niko Kurtti is a production engineer at Shopify, and he joins the show to describe Shopify’s infrastructure–how they run so many stores, how they distribute those stores across their infrastructure, and the motivation for building their own internal platform on top of Kubernetes. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 836Function Platforms with Chad Arimura and Matt Stephenson
“Serverless” is a word used to describe functions that get deployed and run without the developer having to manage the infrastructure explicitly. Instead of creating a server, installing the dependencies, and executing your code, the developer just provides the code to the serverless API, and the serverless system takes care of the server creation, the installation, and the execution. Serverless was first offered with the AWS Lambda service, but has since been offered by other cloud providers. There have also been numerous open source serverless systems. On SE Daily, we have done episodes about OpenWhisk, Fission, and Kubeless. All of these are built on the Kubernetes container management system. Kubernetes is an open-source tool used to build and manage infrastructure, so it is a useful building block for higher level systems. Chad Arimura is the VP of serverless at Oracle, where he runs the Fn project, an open source serverless platform built on top of Kubernetes. In the past, he ran Iron.io, a message broker platform. Matt Stephenson also joins the show–he is a senior principal software engineer at Oracle and has experience from Amazon and Google, where he worked on Google App Engine (which was arguably one of the first “serverless” platforms). We discussed why there are so many different serverless tools built on Kubernetes, and the tradeoffs that these serverless tools are exploring. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 835Build a Bank: Monzo with Richard Dingwall
When you interact with your bank, it probably feels different than when you interact with a software technology company. That’s because the biggest banks in the world were started before software became such a universally important tool. Their core competency is banking–not consumer software. Today, most banks make consumer-facing software. But these banks were not founded by engineers. The software development process at a typical bank does not look like the software development process at a software company like Netflix. Monzo is a digital bank that focuses on high quality engineering. Since it was started in 2015, Monzo has always thought of itself as a software company. This gives it certain advantages over older banks. Today’s guest Richard Dingwall is an engineer at Monzo, and he joins the show to describe Monzo’s software architecture, the engineering strategy, and its migration to Kubernetes. Richard has prior experience at several different banks and financial institutions. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 834Browser Building with Osine Ikhianosime
Crocodile Browser is a fast browser built by Osine and Anesi Ikhianosime, a pair of brothers from Nigeria. I interviewed them 3 years ago, and in this episode I caught up with Osine to learn what he and his brother have been working on since then. Osine and Anesi have become friends of mine since we had a conversation several years ago. I met Osine for the first time at the Facebook F8 conference last year, and it was one of the first times I had met someone from another continent on the Internet, then got to hang out with them in person. There were some issues with network connectivity, so I decided to release this show on the weekend with no ads. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 833Video Search with Rasty Turek
Searching through all of the videos on the Internet is not a simple problem. In order to search through all the videos, you need to build a search index. In order to build a search index, you need to build a web crawler. Video files are large. To store all of the actual video files would cost far too much money. In order to build an index in a cost-efficient manner, you need to have a way of storing information about a video without storing the entire video itself. You might be thinking “hasn’t Google already solved video search? Why are we even talking about this?” Google has solved some aspects of video search–but a different set of challenges is being tackled by a video search company called Pex. In order to explain what Pex is building, we should first explain the problem set they are trying to tackle. Videos across the internet are consumed on a variety of platforms such as YouTube, Instagram, Facebook, and Vimeo. These videos are sliced up, bootlegged, and repurposed from one platform to another. For content creators who earn their living from their hosted video streams, this can be a nightmare. Imagine you are a musician, and you make lots of money from music videos. You upload your cool new video to YouTube, and it instantly gets bootlegged by other users and shared across the internet in hundreds of different places. When people watch the stolen versions of your video, you are not getting compensated. If you could locate all of those stolen videos, you could order them to take it down, or claim the video so that you are paid for it. And here is the engineering problem–how can you find all those re-posted videos? By crawling the web and building a search index for every video on the web. Rasty Turek is the CEO of Pex, and in this episode he describes how to build a system that crawls the Internet and indexes videos. It’s a large scale engineering challenge, and there are lots of tradeoffs to be made between financial cost, speed, accuracy, and engineering complexity. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 832Babel with Henry Zhu
Different browsers consume JavaScript in different ways. When a new version of JavaScript comes out, developers are eager to use the new functionality of that language version. But if you are writing frontend JavaScript code, that code needs to be interpretable by every browser that might consume it–whether the consumer is on an iPhone running Safari or a Windows machine running Internet Explorer 11. Babel is a transpiler for JavaScript. Babel allows new versions of JavaScript to be consumed by older browsers by translating new language features of JavaScript into code that is readable by an older JavaScript interpreter. Babel does this by parsing JavaScript code, creating an abstract syntax tree, and manipulating the AST to make that code comply with the old browser. Henry Zhu is a core maintainer of Babel and a full-time open source developer. In today’s episode, Henry explains how Babel works and its various applications. He also talks about life as a full-time open source developer, where he earns a living through Patreon and OpenCollective. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 831Database Reliability Engineering with Laine Campbell
Over the last decade, cloud computing made it easier to programmatically define what infrastructure we have running, and perform operations across that infrastructure. This is called “infrastructure as code.” Whether you want to backup a database, deploy a new version of a service, or introduce a new tier of load balancers, the changes that we make across our infrastructure can be done programmatically, instead of through a series of manual steps. As infrastructure got turned into code, operations people started working more like developers, and developers began to do the work of operations–a convergence known as “devops.” At Google, this “devops” movement was manifested in a role called “site reliability engineer.” In previous shows, we have explored site reliability engineering culture. Laine Campbell is a senior VP of engineering at Fastly, and the author of the book Database Reliability Engineering. In this book, Laine describes how the ideas of site reliability engineering can be extended to databases. Laine joins the show to discuss the book, and how engineering teams can build effective workflows around databases. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 829Rust Networking with Carl Lerche
Rust is a systems programming language with a distinct set of features for safety and concurrency. In previous shows about Rust, we explored how Rust can prevent crashes and eliminate data races through its approach to type safety and memory management. Rust’s focus on efficiency and safety makes it a promising language for networking code. Tokio is a set of networking libraries built on Rust. Tokio enables developers to write asynchronous IO operations by way of its multithreaded scheduler. Tokio’s goal is to make production-ready clients and servers easy to create by focusing on small, reusable components. Carl Lerche is an engineer at Buoyant, a company that makes the popular Linkerd and Conduit service mesh systems. Kubernetes developers deploy service mesh to their distributed applications as sidecar proxies. These proxies need to be low-latency and highly reliable. In that light, it makes sense that Conduit (the more recent service mesh from Buoyant) is built using Rust. Carl joins the show to describe why Rust is useful for building networked services. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 828Dremio Data Engineering with Tomer Shiran
Twenty years ago, all of the data in an organization could fit inside of relational databases. Imagine a company like Proctor and Gamble. P&G is a consumer packaged goods company with hundreds of business sectors–shaving products, toothpaste, shampoo, laundry detergent. Twenty years ago, if the chief financial officer of P&G wanted to answer a question about the revenue projections within the enterprise, that CFO would ask a VP to find the answer. The VP would contact the business analysts in all the different departments within Procter and Gamble, and those business analysts would all work with database administrators to answer questions for their business sector. In that world, it might have taken weeks or months for the CFO to get the answer about revenue projections. Today, data engineering has improved dramatically. Data sets within an enterprise are updated more rapidly. The tooling has advanced thanks to the Hadoop project leading to a wide range of open source projects that feed into one another. But data problems across an enterprise still exist. Business analysts, data scientists, and data engineers struggle to communicate with each other. The CFO still can’t get a question about revenue projections answered instantly. Instead of instant answers, we live in a world of friction, batch processing, and monthly reports. And this is not just true of old enterprises like P&G. It is true of newer startups like Uber, Airbnb, and Netflix. It seems that no amount of engineers and financial windfall can completely cure the frictions of the modern data platform. Tomer Shiran started Dremio to address the long-lived problems of data management, data access, and data governance within an enterprise. Dremio connects databases, storage systems, and business intelligence tools together, and uses intelligent caching to make commonly used queries within an organization more readily accessible. Dremio is an ambitious project that spent several years in stealth before launching. In today’s episode, Tomer gives a history of data engineering, and provides his perspective on how the data problems within an organization can be diminished. Full disclosure: Dremio is a sponsor of Software Engineering Daily. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 827Digital Evolution with Joel Lehman, Dusan Misevic, and Jeff Clune
Evolutionary algorithms can generate surprising, effective solutions to our problems. Evolutionary algorithms are often let loose within a simulated environment. The algorithm is given a function to optimize for, and the engineers expect that algorithm to evolve a solution that optimizes for the objective function given the constraints of the simulated environment. But sometimes these results are not exactly what we are looking for. For example, imagine an evolutionary algorithm that tries to evolve a creature that do a flip within a simulated physics engine that mirrors the real world. You could imagine all sorts of evolutionary traits. Maybe the creature will evolve to have legs that are like springs, and let the creature jump high enough to do a flip. Maybe the creature will develop normal legs with strong muscles that propel the creature high enough to flip. But you wouldn’t expect the creature to evolve to be extremely tall–so tall that the creature can merely lean over fast enough so that the top of its body flips upside down. In one experiment, this is exactly what happened. In another, similar experiment, the evolving creature discovered a bug in the physics engine of the simulated environment. This creature was able to exploit the problem with this physics engine to be able to move in ways that would not be possible in our real-world physical universe. Evolutionary algorithms sometimes evolve solutions in ways that we don’t expect. Researchers usually throw those results away, because they don’t contribute to the result that the researchers are looking for. The consequence is that lots of interesting anecdotes get lost. Joel Lehman, Dusan Misevic, and Jeff Clune are the lead authors of the paper “The Surprising Creativity of Digital Evolution.” The paper was a collection of anecdotes about strange results within the world of digital evolution. They join the show to describe what digital evolution is, and some of the strange results that they surveyed in their paper. Joel and Jeff are engineers at Uber’s artificial intelligence division–so this topic has applicable importance to them. Machine learning is all about evolution within simulated environments, and developing safe algorithms for AI requires an understanding of what can go wrong in a poorly defined evolutionary system. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 826Hacking Your Short-Term Rental with Jeremy Galloway
If you have ever stayed in a short-term rental (like an Airbnb, HomeAway, or CouchSurfing), you have probably used the wifi network at that rental property. Why wouldn’t you? It’s no different than hopping on an open wifi network at an airport, or a Starbucks, or your friend’s house, right? One major difference: the hardware is easily accessible to previous guests at the short-term rental. Previous guests could tamper with the software on a router, and use that tampering to do some malicious, surveillant things. Jeremy Galloway is a security engineer at Atlassian. In today’s show, he explains the risk of using wifi at a short-term rental like an Airbnb–including an explanation of how easy it is to take over a wifi network as a guest at a rental property. A broader point we discuss: large attack surfaces are difficult to secure. Whether we are talking about Airbnb, or another sharing economy app like Uber, or a large corporate network like Atlassian, or even your own personal life. Jeremy offers some best practices and philosophies for how to respond to the modern world of security. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 825Postgres Sharding and Scalability with Marco Slot
Relational databases have been popular since the 1970s, but in the last 20 years the amount of data that applications need to collect and store has skyrocketed. The raw cost to store that data has decreased. There is a common phrase in software companies: “it costs you less to save the data than to throw it away.” Saving the data is cheap, but accessing that data in a useful way can be expensive. Developers still need rapid row-wise and column-wise access to the data. Accessing an individual row of a database can be useful if a user is logging in and you want to load all of that user’s data, or if you want to update a banking system with a new financial transaction. Accessing an entire column of a database can be useful if you want to aggregate summaries of all of the entries in a system–like the sum of all financial transactions in a bank. These different kinds of transactions are nothing new, but with the growing scale of data, companies are changing their mentality from thinking in terms of individual databases to thinking about distributed “data platforms.” In a data platform, the data across a company might be put into a variety of storage systems–distributed file systems, databases, in-memory caches, search indexes–but the API for the developer is kept simple. And the simplest, most commonly understood language is SQL. Marco Slot is an engineer with Citus Data, a company that makes Postgres scalable. Postgres is one of the most common relational databases, and in this episode Marco describes how Postgres can be used to service almost all of the needs of a data platform. This isn’t easy to do, as it requires sharding your growing relational database into clusters and orchestrating distributed queries between those shards. In this show, Marco and I discuss Citus’s approach to the distributed systems problems of a sharded relational database. This episode is a nice complement to previous episodes we have done with Ozgun and Craig from Citus, in which they gave a history of relational databases, and explained how Postgres compares to the wide variety of relational databases out there. Full disclosure: Citus Data is a sponsor of Software Engineering Daily. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 824Necto: Build an ISP with Adam Montgomery
In the tech industry, we have all grown to fear “lock-in.” Lock-in is a situation in which you have no choice but to pay a certain provider for some aspect of your computer services. Since computers are so fundamental to our lives, we sometimes have no choice but to pay the provider of that lock-in service. Think of a few service providers in your life who have no serious competition. What is your relationship to that service provider? Do you feel like you are paying too much money? Do you wish that you could switch? This is how many people feel about their Internet service provider. An Internet Service Provider is the company that provides you with the “last mile” of physical infrastructure that connects you to the rest of the Internet. Different forms of ISP include cable ISPs, satellite ISPs, fiber ISPs, and copper/DSL ISPs. The medium of delivery varies, but the functionality is the same. This company is crucial to your Internet access. In many geographic locations, there are very restricted options for which ISP you could use. Why is that? Many people assume that there is some physical or regulatory barrier to starting an ISP. In fact, there are fewer barriers than you might think. Adam Montgomery is a co-founder of Necto, a company that provides an ISP starter kit. If you want to start your own ISP in an apartment building or in your neighborhood or wherever you are, the Necto ISP starter kit can help you get off the ground. That might sound like a crazy idea, but in this episode Adam explains why it is not so crazy–why the technology around ISPs is more broadly accessible than many people believe. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 823Bitcoin Lightning Network with Jameson Lopp
Big blocks or small blocks: this is the fundamental question of Bitcoin scalability. The argument for big blocks is also known as “on-chain scalability.” Under this strategy, each block in the append-only chain of Bitcoin transaction blocks would grow in size to be able to support lower transaction fees and higher on-chain throughput. A set of Bitcoin users who supported this idea forked Bitcoin to create Bitcoin Cash, a version of Bitcoin that has a larger block size. The argument for small blocks asserts that scaling Bitcoin does not require a larger block size. Under this model, the scaling demands of the Bitcoin blockchain would be handled by sidechains. A sidechain is a network of person-to-person payment channels that only reconcile with the Bitcoin blockchain to checkpoint batches of transactions. These sidechains can be connected together to form the “lightning network.” Lightning network is hard to implement. To implement a lightning network requires solving real-world distributed systems problems that are unprecedented. It’s much more complicated than deploying a blockchain with a larger block size. In addition, opponents of lightning network suggest that this will lead to a centralized banking system being constructed on top of Bitcoin. Opponents of lightning network fear that instead of a decentralized payments network, the world with lightning network will be a lower cost version of the present financial system, in which JP Morgan and Blockstream partner up to battle Coinbase in a centralized war for control of the unbanked. These big blockers argue that the new banks on the lightning network will be just like the old banks–censorious of transactions and held in the domineering palms of the global financial kleptocracy. So why bother with the lightning network approach? Why are we building this inelegant, kludgey system of off-chain, potentially centralized banking 2.0 complexity? Why not just increase the block size indefinitely and keep things simple? And even if we increased the block size today, couldn’t we still deploy lightning network in the future while appeasing the transaction volume of today? One major reason is that growing the block size does have a cost. The bigger the block size, the more demands it places on any node that wants to maintain a record of those blocks. And if you grow the block size today, you forego the experiment of seeing whether a small block size plus lightning network could in itself handle the transaction volume of a global financial system. The framing of “big blockers versus small blockers” is a conveniently polarized reduction of a much more granular reality. To believe that there is no subtlety between the two sides of this debate is to underestimate the number of dimensions to this argument. It’s an unfortunate side effect of rigidly programmed Twitter bots, and a political atmosphere in which your lines in the sand are demarcated by which subreddit you choose to affiliate with. That said–my impression is that the more experienced engineers are overwhelmingly on the side of small blocks plus lightning network as the most promising approach to scaling Bitcoin. Take whatever side of the debate you want. A single line of Bitcoin core code speaks much louder than an avalanche of tweets. In today’s episode, Jameson Lopp joins the show to explain why lightning network is an appealing engineering construct. We play the devil’s advocate and contrast lightning network with a big block approach, as well as a big block plus lightning network approach. Jameson also describes his experience working within the Ethereum ecosystem, and gives a sober explanation of some of the issues that Ethereum scalers may themselves encounter.
Ep 821Investment Games with Brian Singerman
Investing is an infinite game. In a game, a player can formulate a strategy based on the available resources, the apparent variance of the environment, and the metagame of the other actors involved. For an investor, the game board includes companies, currencies, and people. A successful game player can model their actions mathematically. They can describe a thesis for an in-game decision with clear language. Game players who reason through “gut feeling” do not perform well (unless their “gut” is aligned with correct mathematical heuristics). The same is true for investors. An investor who is going to be successful in the long term will be able to explain their investment thesis crisply. Each investment represents a bet with net positive expected value. The expected value of an investment is the sum of all potential probability-weighted future outcomes of a business. Each of those potential expected outcomes is the anticipated outcome times the probability that the investment works out in the anticipated way. Brian Singerman is a computer scientist and partner at Founder’s Fund. He is on the board of Affirm, AltSchool, Emerald Therapeutics, and a variety of other companies in disparate areas. He also plays lots of board games. Brian was a lot of fun to talk to because he was willing to field questions from an expansive range of topics–and he answered them so quickly and concisely that I started to get nervous that I was going to run out of things to ask him. Many of the businesses Brian has invested in do not have a well-defined historical precedent. If a venture capital investor was trying to make bets in defined “sectors” that investor would probably overlook a business like Forward (a vertically integrated healthcare company) or Cloud9 (a collection of esports teams). If an investment does not have a historical precedent, it’s harder to reason about it by analogy. You have to judge it by fundamental reasoning: the current market, the capability of the founders, and the economics of the business model. In many professions, reasoning by analogy will work out perfectly fine. You can pattern match on the past, and use that to justify decisions for the future. But if your professional livelihood depends on reasoning by fundamental principles, you get trained to assess situations that do not have precedent. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.