
Software Engineering Daily
2,188 episodes — Page 29 of 44
Ep 864Jailbreaking Apple Watch with Max Bazaliy
Apple operating systems are closed source. This closed source nature gives Apple an extremely successful business model–and a very different software developer ecosystem than Linux-based systems. Since Linux is open source, the information on how to manipulate the system at a low level is very public. The lack of information about low-level programming in Apple operating systems has led to a large community of “jailbreaking”–where people try to reverse engineer how the closed source systems function. In today’s episode, Max Bazaliy joins the show to describe how he reverse engineered an Apple Watch. It’s a complex security challenge to jailbreak an Apple Watch, as he describes in detail. Max is a security researcher at Lookout, a mobile security company.
Ep 863Edge Kubernetes with Venkat Yalla
“Edge computing” is a term used to define computation that takes place in an environment outside of a data center. Edge computing is a broad term. Your smartphone is an edge device. A self-driving car is an edge device. A security camera with a computer chip is an edge device. These “edge devices” have existed for a long time now, but the term “edge computing” has only started being used more recently. Why is that? It is mostly because the volume of data produced by edge devices, and the type of computation that we want from edge devices is changing. We want to develop large sensor networks to enable smart factories, and smart agriculture fields. We want our smartphones to have machine learning models that get updated as frequently as possible. We want to use self-driving cars, and drones, and smart refrigerators to develop elaborate mesh networks–and perhaps even have micropayments between machines, so that computation can be offloaded from edge devices to a nearby mesh network for a small price. Kubernetes is a tool for orchestrating distributed, containerized computation. Just as Kubernetes is being widely used for data center infrastructure, it can also be used to orchestrate computation among nodes on-premise at a factory, or in a smart agriculture environment. In today’s episode, Venkat Yalla from Microsoft joins the show to talk about Kubernetes at the edge, and how Internet of things applications can use Kubernetes for their deployments today–and what the future might hold. Full disclosure: Microsoft is a sponsor of SE Daily.
Ep 862React Native at Airbnb with Gabriel Peal
React Native allows developers to reuse frontend code between mobile platforms. A user interface component written in React Native can be used in both iOS and Android codebases. Since React Native allows for code reuse, this can save time for developers, in contrast to a model where completely separate teams have to create frontend logic for iOS and Android. React Native was created at Facebook. Facebook itself uses React Native for mobile development, and contributes heavily to the open source React Native repository. In 2016, Airbnb started using React Native in a significant portion of their mobile codebase. Over the next two years, Airbnb saw the advantages and the disadvantages of adopting the cross platform, JavaScript based system. After those two years, the engineering management at Airbnb came to the conclusion to stop using React Native. Gabriel Peal is an engineer at Airbnb who was part of the decision to move off of React Native. Gabriel wrote a blog post giving the backstory for React Native at Airbnb, and he joins the show to give more detail on the decision.
Ep 860Ghost: Open Source Publishing Platform with John O’Nolan
Blogging is more than 20 years old. Over that period of time, numerous publishing platforms have been created. Squarespace, Blogger, Medium, and Twitter are popular closed source platforms. WordPress has been the most popular open source blogging platform–and much of the Internet (including Software Engineering Daily) runs on WordPress. WordPress is a powerful platform. News companies, ecommerce websites, and many other kinds of businesses use WordPress as their central publishing tool. But WordPress has been around for 15 years–and there are some potential conflicts of interest between WordPress the open source project and WordPress.com (a company started to host WordPress websites). John O’Nolan was working as a WordPress developer when he decided to start a new publishing platform called Ghost. Five years later, the Ghost project is a success–with a thriving open source community, a profitable SaaS business, and companies like Digital Ocean and Mozilla using Ghost to host their blogs. John and I discussed his background with WordPress, what he wanted to do differently with Ghost, and the software architecture of Ghost. We also touched on the Ghost SaaS business and the management of the open source project.
Ep 859Video Games and Funding Techniques with Howard Marks
Howard Marks ran two video game companies in the 90’s: Activision and Acclaim. While running these companies, he developed a love for entrepreneurship that he maintains today. Howard is the CEO of StartEngine, a company that functions as an accelerator, a crowdfunding platform, and ICO launcher. Howard joins the show to talk about his background as an entrepreneur, as well as some modern alternative funding mechanisms that he’s working on at StartEngine. Hearing Howard’s thoughts on building a video game company in the 90’s was particularly new information to me–it’s an era of software development that we have not covered much at all. As a side note–some listeners have asked recently why we cover subjects such as ICOs, when there have been so many dubious companies that have launched ICOs. There are two reasons why we cover this area. Firstly, cryptocurrencies are a breakthrough computer science construct. It’s important for us to try to understand their implications. The other reason why we cover ICOs is that some technology companies require high upfront capital costs. The amount of capital you have available affects the speed at which your engineering team can move. New funding mechanisms could mean more capital for certain types of software companies–and this could be a good thing and a bad thing, depending on the company.
Ep 858Video Machine Learning with Ben Dodson
Video streaming platforms like Netflix offer a convenient way to watch video content. We are now able to watch our favorite TV shows, movies, or content creators on a range of devices. However, buffering while watching videos can be a painful experience on mobile phones and tablets that use 4G or LTE. As streaming becomes available to a wider range of devices with varying bandwidth restrictions, different encodings of the video need to be created for different devices, and different bandwidth situations. To get the best quality viewing possible with the bandwidth available to connections, there needs to be a balance between the resolution of the video, and the bitrate, which defines the data that the video consumes. Mux is a company that builds video hosting and analytics. Ben Dodson is a data scientist at Mux, who built a system for optimizing the bitrate of videos through machine learning. In this episode we discuss video encoding and how Mux solved the problem of serving the highest quality video with the ideal bitrate.
Ep 857Kubernetes in the Enterprise with Aparna Sinha
Enterprises want to update their technology faster. One way an enterprise can accelerate the adoption of new tools is to move more aggressively towards the cloud. By giving internal developers access to the cloud, it becomes easier to provision new servers–allowing for rapid experimentation, test environments, and scalability. In previous shows we have explored how large enterprises successfully learn to move their technology faster. Much of this process is rooted in being able to experiment quickly–which requires well-defined testing procedures, and the ability to quickly provision and destroy infrastructure. Many enterprises have large on-premise infrastructure deployments. An enterprise’s movement towards the cloud can be made complex by this existing set of servers. In today’s show, Aparna Sinha discusses how Kubernetes is useful for enterprises–and how it can improve development speed, experimentation, and observability. Aparna is the leader of the product team for Kubernetes and Container Engine at Google. Much of her job is centered around understanding what would be useful to enterprises who are choosing a cloud provider. The open source version of Kubernetes is useful on its own, but most enterprises choose a managed provider of Kubernetes–such as Google Kubernetes Engine–to help with support and onboarding . Full disclosure: Google is a sponsor of Software Engineering Daily.
Ep 856WebAssembly with Lin Clark
JavaScript has been the exclusive language of the web browser for the last 20 years. Whether you use Chrome, Firefox, Internet Explorer, or Safari, your browser interprets and executes code in a virtual machine–and that virtual machine only runs JavaScript. Unfortunately, JavaScript is not ideal for every task we want to perform in the browser. Think about the use cases where you need to use software outside of the browser: video editing, music production, 3D art, video games. These applications require a high degree of performance that is hard to get from raw JavaScript. WebAssembly was created to get better performance on the web. WebAssembly allows code from other languages to be compiled and run in the browser. With WebAssembly, languages such as C, C++, and Rust can be used to achieve major performance gains. WebAssembly is still under development, and eventually more programming languages will be accessible as well. Lin Clark is an engineer on the Mozilla Developer Relations team, and has been working closely on the WebAssembly project. She is the author of a detailed series of illustrated blog posts that explain how WebAssembly works. In this episode, we discuss how WebAssembly came to be, its advantages over a web driven purely by Javascript, what is possible with WebAssembly, and its engineering implementation.
Ep 855Botchain with Rob May
“Bots” are becoming increasingly relevant to our everyday interactions with technology. A bot sometimes mediates the interactions of two people. Examples of bots include automated reply systems, intelligent chat bots, classification systems, and prediction machines. These systems are often powered by machine learning systems that are black boxes to the user. Today’s guest Rob May argues that these systems should be auditable and accountable, and that using a blockchain-based identity system for bots is a viable solution to the machine learning auditability problem. Rob is the CEO of Talla, a knowledge base provider for business teams. The Botchain project was spun out of Talla as a solution to the problem of bot identity. In this episode, we talk about Botchain and the application of blockchain to bot identity, the current state of ICOs, and the viability of utility token ecosystems. Botchain has its own cryptotoken called “Botcoin.”
Ep 853Build a Bank: N26 with Pat Kua
Banking has been a part of the economy for 600 years. Banking has always been evolving. The most recent evolution: the financial industry has been going digital. Newer “fintech” companies have created innovative ways of doing everything related to money–from friendly payments to budgeting; from business transactions to insurance. However, the traditional banks themselves have been relatively slow at adjusting to these digital changes, creating the opportunity for native digital banks–often referred to as “challenger banks.” N26 is a digital first bank established in Berlin and is active in 17 European countries with a million users. Pat Kua is the CTO at N26. Digital banks are hosted on the cloud, without a physical branch the user must go to to perform an operation. The user accesses their account through a mobile application, and can complete everything online from opening an account to performing transactions. This system has numerous advantages, like simplicity for the user, and higher scalability in terms of users from the bank’s perspective. In this episode, we discuss the advantages of digital banks, the fintech industry and N26 as a itself. We also explore product development and how Pat manages his time as CTO–which is a useful discussion for anyone who is learning to be a technical leader or manager.
Ep 852Git Vulnerability with Edward Thomson
Git is a distributed file system for version control. Git is extremely reliable, fast, and secure, owing to the fact that it is one of the oldest pieces of open source software. But even battle-tested software can have vulnerabilities. In this episode, we explore a subtle git vulnerability that could have potentially led to git users executing malicious scripts when they intended to simply pull a repository. Today’s guest Edward Thomson is a program manager at Microsoft, and a maintainer of libgit2, a C implementation of git. He also writes about git and hosts the podcast All Things Git. He is passionate about git development, which gave me a deeper perspective on something that I just consider a tool. But the only reason that tool is so good–the only reason it fades into the background–is because there are people that are passionate enough to work on it on a regular basis. We also spent some time talking about the vulnerabilities that can spread through shared code environments–particularly in the realm of git, npm, and PHP. And we touched on how deployment workflows around git and Kubernetes are changing. Full disclosure: Microsoft, where Edward works, is a sponsor of Software Engineering Daily.
Ep 851Counting People with Andrew Farah
If you operate a restaurant, you want to know how many people are inside your restaurant at any given time. You also want to be able to know your occupancy if you operate a movie theater, coffee shop, or apparel store. Knowing how many people are in your building can answer several business-related questions. Do you need to unlock an additional entrance? Should you open another store? Do you really need a building this big? This might sound like a simple question, but how do you solve the problem of counting people inside of a building? A naive approach to counting people is to use video cameras and count the number of people entering and exiting the building. Machine learning algorithms are good at classifying humans. But the downside of this is that you have to put cameras anywhere you want a people-counter. There are many situations where you would want to count the number of people where a camera is not acceptable. What if you wanted to count people in a privacy preserving way? What if you wanted to obscure any identifiable traits of a person that you were counting? Density is a device for counting people. It sits above a doorway and counts the people who are entering or exiting the building. Andrew Farah is the CEO at Density, and in today’s episode he explains why the problem of counting people is harder than it sounds, and how the Density people counter functions.
Ep 850Machine Learning Deployments with Diego Oppenheimer
Machine learning models allow our applications to perform highly accurate inferences. A model can be used to classify a picture as a cat, or to predict what movie I might want to watch. But before a machine learning model can be used to make these inferences, the model must be trained and deployed. In the training process, a machine learning model consumes a data set and learns from it. The training process can consume significant resources. After the training process is over, you have a trained model that you need to get into production. This is known as the “deployment” step. Deployment can be a hard problem. You are taking a program from a training environment to a production environment. A lot can change between these two environments. In production, your model is running on a different machine–which can lead to compatibility issues. If your model serves a high volume of requests, it might need to scale up. In production, you also need caching, and monitoring, and logging. Large companies like Netflix, Uber, and Facebook have built their own internal systems to control the pipeline of getting a model from training into production. Companies who are newer to machine learning can struggle with this deployment process, and these companies usually don’t have the resources to build their own machine learning platform like Netflix. Diego Oppenheiner is the CEO of Algorithmia, a company that has built a system for automating machine learning deployments. This is the second cool product that Algorithmia has built, the first being the algorithm marketplace that we covered in an episode a few years ago. In today’s show, Diego describes the challenges of deploying a machine learning model into production, and how that product was a natural complement to the algorithms marketplace. Full disclosure: Algorithmia is a sponsor of Software Engineering Daily.
Ep 849Ballerina Language with Tyler Jewell
Modern programming requires lots of integration between APIs. Some of these integrations are trivial–such as using Twilio or Stripe. But there are many more complex integrations. For example, when a large company acquires a smaller company, the acquiring company might want to integrate with that smaller company to leverage the synergies between the two companies. How do you build clean communication patterns between the services of one company and another? Two teams within a single enterprise can also have integration issues. One team might have a different data model than the other team. One team might be using JSON and the other using XML. In these cases, integrations between APIs can take considerable time. Ballerina is a programming language that is designed for writing integrations. Ballerina is made for building services that allow two APIs to communicate easily–in contrast to other patterns of API integration such as those involving an enterprise service bus. Tyler Jewell is the CEO of WSO2, a company that specializes in integrations. WSO2 created the Ballerina language and is investing heavily into it (with ~80 people working on Ballerina today). In this episode we explored integrations–and why this problem required creating a new programming language. BallerinaCon is July 18th in San Francisco and our listeners can attend for free–use code BalCon-SEDaily.
Ep 847Flutter in Practice with Randal Schwartz
Flutter allows developers to build cross-platform mobile apps. In our previous show about Flutter, Eric Seidel from Google described the goals of Flutter, why he founded the project, and how Flutter is built. In today’s show, Randal Schwartz talks about Flutter in more detail–including the developer experience of building Flutter apps and why he finds Flutter so exciting. Randal is a longtime software developer who has focused mostly on web applications. He’s also the host of the popular podcast FLOSS Weekly, a show about open source software. Like many developers, Randal has stayed away from mobile development in the past. If you write a mobile application, you have historically had to build iOS and Android apps separately. That is a large up-front cost, and Flutter reduces the cost by allowing users to develop mobile apps for iOS and Android with a single codebase. Randal has been podcasting about open source software for 12 years, so he brings historical depth to this conversation. He has also been working with Dart for several years–Dart is a language developed by Google that is used in Flutter.
Ep 846Build a Bank: Nubank with Edward Wible
Nubank was started in 2013 with a credit card that was controlled through a mobile app. At the time, it was the first service in Brazil that allowed customers to do banking without going to a physical bank branch. Since then, Nubank has expanded into additional financial services and currently has 850 employees working in Brazil. Edward Wible is a co-founder and CTO of Nubank and in this episode he discusses his work growing Nubank from a small team of less than 10 people into a company with almost 1000 people. We have covered two other banks in the past few weeks: Monzo and N26. In terms of software engineering and product management, Nubank is similar to Monzo and N26. One characteristic that stood out was Nubank’s use of Clojure, a functional programming language built on the JVM. A question that came up during this show: what is the line between “fintech company” and “bank”? We hope explore this more in future shows about the intersection of money and software. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 845Flutter with Eric Seidel
Flutter is a project from Google that is rebuilding user interface engineering from the ground up. Today, most engineering teams have dedicated engineering resources for web, iOS, and Android. These different platforms have their own design constraints, their own toolset, and their own programming languages. But each platform is merely building a user interface. Why should development across these three user surfaces be so different? This was the question that Eric Seidel was asking himself three years ago, when he co-founded the Flutter project. The Flutter project had a few rough starts, as the team tried to figure out exactly what layer of abstraction they were trying to provide. Around that time, ReactJS and React Native were growing in popularity. Seeing the React projects provided some data points, and some inspiration. But Flutter takes a lower level approach to cross platform app development, by presenting a rendering layer and a runtime API that are interfacing with the hardware in the same way that OpenGL does. In today’s episode, Eric joins the show to explain how the Flutter project came to life, and his lessons from starting an ambitious project that took several years to pick up steam. I enjoyed this episode because Flutter could have massive improvements for how quickly we can build apps–and also because Eric is a serious engineer and there are so many insights in this episode about computer science, software engineering, and project management. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 844Future Projection with Tim O’Reilly
Tim O’Reilly’s book What’s the Future? is an overview of business, technology, and society. As the founder of O’Reilly Media, Tim has been steeped in technology trends for the last 40 years. From his vantage point running conferences and publishing technical content, Tim has been able to make informed predictions about what is coming next. In today’s conversation, Tim gave his perspective on how artificial intelligence will impact our world in the coming decades. More importantly–Tim emphasizes the role of human agency. The future is not something that merely happens to us as we sit back and eat popcorn. Today, we make decisions, and those decisions that we make could help lead us to technology utopia or towards the fall of our great technological empire. On the subject of business, Tim gave a radically different perspective than most of the entrepreneurs that come on Software Engineering Daily. In our conversation, he raised the question of why entrepreneurs raise massive amounts of money, get on the treadmill of startup hype, and build a company around negative cash flows. For that model, the only possible outcomes are going public, being acquired, or flaming out completely. O’Reilly Media has been cash flow positive since the beginning, and the company has steadily compounded, growing successively bigger businesses from publishing to conferences to online learning. This episode gave me a lot to think about–just as the O’Reilly Conferences have throughout the years. O’Reilly Media has graciously partnered with SE Daily since we were very small, so I have great admiration for the company and Tim himself. It was a pleasure to get the chance to meet him in person. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 843Machine Learning Stroke Identification with David Golan
When a patient comes into the hospital with stroke symptoms, the hospital will give that patient a CAT scan, a 3-dimensional imaging of the patient’s brain. The CAT scan needs to be examined by a radiologist, and the radiologist will decide whether to refer the patient to an interventionist–a surgeon who can perform an operation to lower the risk of long-term damage to the patient’s brain function. After getting the CAT scan, the patient might wait for hours before a radiologist has a chance to look at the scan. In that period of time, the patient’s brain function might be rapidly degrading. To speed up this workflow, a company called Viz.ai built a machine learning model that can recognize whether a patient is at high risk of stroke consequences or not. Many people have predicted that radiologists will be automated away by machine learning in the coming years. This episode presents a much more realistic perspective: first of all, we don’t have nearly enough radiologists, so if we can create automated radiologists that would be a very good thing; second of all, even in this workflow with a cutting-edge machine learning radiologist, you still need the human radiologist in the loop. David Golan is the CTO at Viz.ai, and in today’s show he explains why he is working on a system for automated stroke identification, and the engineering challenges in building that system. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 842Fintech Environment with Michael Walsh
Computer systems consume memory, CPU, battery, data, and network bandwidth as inputs. These systems provide value for the end user by delivering information, virtual objects, and physical products as outputs. Another fundamental resource that is becoming easier to consume as input is money. There are also new outputs–financial constructs that are made possible by cloud computing, machine learning, and cryptocurrencies. This is why so much opportunity exists in fintech. Money has always been a flexible tool for brokerage between humans. But as recently as the early 2000s, the interfaces between money and computers have been clunky and inflexible. Engineers that wanted to build financial systems around money had to work directly with banks and credit card processors. More recently, there has been an explosion in new APIs and completely new financial primitives like cryptocurrencies. In the year 2000, a well-funded team had to struggle to put together a basic ecommerce company. Today, a blue ocean of opportunity has opened up for entrepreneurs building businesses around lending, insurance, underwriting, banking, and every other microcosm of the financial system. Michael Walsh is a general partner and co-founder of Green Visor Capital. In today’s episode, he described his perspective on the modern fintech environment, and what the future holds. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 841Kademlia: P2P Distributed Hash Table with Petar Maymounkov
Napster, Kazaa, and Bittorrent are peer-to-peer file sharing systems. In these P2P systems, nodes need to find each other. Users need to be able to search for files that exist across the system. P2P systems are decentralized, so these routing problems must be solved without a centralized service in the middle. Without a centralized service that has all the information in one place, how can you solve these problems of node discovery and file lookup? This is the central question that Petar Maymounkov sought to answer with Kademlia. Kademlia is a peer-to-peer distributed hash table. Kademlia implements the “put” and “get” operations of an efficiently scalable hash table without using any centralized service. Each node in the system maintains its own routing table. When a user queries the system (a “get” operation), that query is serviced by the nodes coordinating with each other to intelligently route the user to their target location. When a file is stored (a “put” operation), that update to the file system can propagate through the network in a decentralized, uncoordinated way. Petar joins the show to give a brief history of P2P networks, why he created Kademlia, and what he is working on today. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 839Data Engineering Automation with Mike Kim
Every company has the idea of the “nightly report.” A business analyst comes into the office, sits down in front their inbox, and looks at yesterday’s data. Did sales go up? Did the marketing campaigns bring in the expected number of customers? Was there an increase in helpdesk tickets? The statistics that these reports deliver to human analysts can change the direction of the business. Everyone within a company could use a regular report that documents how the business is changing over time. Outlier.ai is a company that processes the data sets within a business and generates automated reports that are relevant to different people within the organization. If you are an email marketing analyst, your data from MailChimp campaigns will be analyzed. If you manage a customer success team, your Zendesk tickets will be analyzed. If you are a technical support analyst, the crash reports and error messages from your users will be analyzed. In all of these cases, the data gets processed automatically, and a story is sent to you, so that you can have the information in your inbox waiting for you, instead of having to go ask a data scientist to generate it for you. Mike Kim is the CTO of Outlier.ai, and in this show he describes the engineering challenges of integrating with all the different data sets of an organization–and why there is so much value in the idea of the automated “report” or “story” for analysts. In past shows, we have explored how data engineering has progressed over the last twenty years–from database administration to Hadoop cluster management to the emergence of “data breadlines” where analysts wait for a data scientist to process the job they asked for. Outlier represents a step towards a world where the data science reports are delivered to us before we even ask, rather than us having to query the system. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 838Chrome and Chromium with David Bokan
Chromium is an open source browser that shares code with the Chrome browser from Google. A browser is a large piece of software, with engineering challenges around threading, rendering, resource management, and networking. To add to the complexity, Chrome runs on iOS, Android, MacOSX, Windows, and other platforms. Chrome OS is an operating system based on Chrome. There is also Chromium OS, the open source version of Chrome OS. The Chrome/Chromium operating systems are based off of Linux. Through this entire episode, the line between browser and operating system is blurry. There is so much resource management involved in the Chrome browser that it has its own task manager. For many people (including myself) the browser is the main application you are interfacing with throughout the day. It handles all of your business applications. Even many of your desktop apps, such as Slack, are running on Electron, which is a framework for building cross-platform apps that uses Chromium. David Bokan is an engineer on the Chromium team at Google, and he joins the show to describe the engineering of Chrome and the development and release process. David also gives his thoughts on future developments for browsers, apps, and the Internet. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 837Shopify Infrastructure with Niko Kurtti
Shopify runs more than 500,000 small business websites. When Shopify was figuring out how to scale, the engineering teams did not have a standard workflow for how to deploy and manage services. Some teams used AWS, some teams used Heroku, some teams used other infrastructure providers. To manage all those stores effectively, Shopify has built its own platform-as-a-service on top of Kubernetes called Cloudbuddies. Cloudbuddies was inspired by Heroku, and it allows engineers at Shopify to deploy services in an opinionated way that is perfect for Shopify. Niko Kurtti is a production engineer at Shopify, and he joins the show to describe Shopify’s infrastructure–how they run so many stores, how they distribute those stores across their infrastructure, and the motivation for building their own internal platform on top of Kubernetes. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 836Function Platforms with Chad Arimura and Matt Stephenson
“Serverless” is a word used to describe functions that get deployed and run without the developer having to manage the infrastructure explicitly. Instead of creating a server, installing the dependencies, and executing your code, the developer just provides the code to the serverless API, and the serverless system takes care of the server creation, the installation, and the execution. Serverless was first offered with the AWS Lambda service, but has since been offered by other cloud providers. There have also been numerous open source serverless systems. On SE Daily, we have done episodes about OpenWhisk, Fission, and Kubeless. All of these are built on the Kubernetes container management system. Kubernetes is an open-source tool used to build and manage infrastructure, so it is a useful building block for higher level systems. Chad Arimura is the VP of serverless at Oracle, where he runs the Fn project, an open source serverless platform built on top of Kubernetes. In the past, he ran Iron.io, a message broker platform. Matt Stephenson also joins the show–he is a senior principal software engineer at Oracle and has experience from Amazon and Google, where he worked on Google App Engine (which was arguably one of the first “serverless” platforms). We discussed why there are so many different serverless tools built on Kubernetes, and the tradeoffs that these serverless tools are exploring. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 835Build a Bank: Monzo with Richard Dingwall
When you interact with your bank, it probably feels different than when you interact with a software technology company. That’s because the biggest banks in the world were started before software became such a universally important tool. Their core competency is banking–not consumer software. Today, most banks make consumer-facing software. But these banks were not founded by engineers. The software development process at a typical bank does not look like the software development process at a software company like Netflix. Monzo is a digital bank that focuses on high quality engineering. Since it was started in 2015, Monzo has always thought of itself as a software company. This gives it certain advantages over older banks. Today’s guest Richard Dingwall is an engineer at Monzo, and he joins the show to describe Monzo’s software architecture, the engineering strategy, and its migration to Kubernetes. Richard has prior experience at several different banks and financial institutions. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 834Browser Building with Osine Ikhianosime
Crocodile Browser is a fast browser built by Osine and Anesi Ikhianosime, a pair of brothers from Nigeria. I interviewed them 3 years ago, and in this episode I caught up with Osine to learn what he and his brother have been working on since then. Osine and Anesi have become friends of mine since we had a conversation several years ago. I met Osine for the first time at the Facebook F8 conference last year, and it was one of the first times I had met someone from another continent on the Internet, then got to hang out with them in person. There were some issues with network connectivity, so I decided to release this show on the weekend with no ads. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 833Video Search with Rasty Turek
Searching through all of the videos on the Internet is not a simple problem. In order to search through all the videos, you need to build a search index. In order to build a search index, you need to build a web crawler. Video files are large. To store all of the actual video files would cost far too much money. In order to build an index in a cost-efficient manner, you need to have a way of storing information about a video without storing the entire video itself. You might be thinking “hasn’t Google already solved video search? Why are we even talking about this?” Google has solved some aspects of video search–but a different set of challenges is being tackled by a video search company called Pex. In order to explain what Pex is building, we should first explain the problem set they are trying to tackle. Videos across the internet are consumed on a variety of platforms such as YouTube, Instagram, Facebook, and Vimeo. These videos are sliced up, bootlegged, and repurposed from one platform to another. For content creators who earn their living from their hosted video streams, this can be a nightmare. Imagine you are a musician, and you make lots of money from music videos. You upload your cool new video to YouTube, and it instantly gets bootlegged by other users and shared across the internet in hundreds of different places. When people watch the stolen versions of your video, you are not getting compensated. If you could locate all of those stolen videos, you could order them to take it down, or claim the video so that you are paid for it. And here is the engineering problem–how can you find all those re-posted videos? By crawling the web and building a search index for every video on the web. Rasty Turek is the CEO of Pex, and in this episode he describes how to build a system that crawls the Internet and indexes videos. It’s a large scale engineering challenge, and there are lots of tradeoffs to be made between financial cost, speed, accuracy, and engineering complexity. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 832Babel with Henry Zhu
Different browsers consume JavaScript in different ways. When a new version of JavaScript comes out, developers are eager to use the new functionality of that language version. But if you are writing frontend JavaScript code, that code needs to be interpretable by every browser that might consume it–whether the consumer is on an iPhone running Safari or a Windows machine running Internet Explorer 11. Babel is a transpiler for JavaScript. Babel allows new versions of JavaScript to be consumed by older browsers by translating new language features of JavaScript into code that is readable by an older JavaScript interpreter. Babel does this by parsing JavaScript code, creating an abstract syntax tree, and manipulating the AST to make that code comply with the old browser. Henry Zhu is a core maintainer of Babel and a full-time open source developer. In today’s episode, Henry explains how Babel works and its various applications. He also talks about life as a full-time open source developer, where he earns a living through Patreon and OpenCollective. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 831Database Reliability Engineering with Laine Campbell
Over the last decade, cloud computing made it easier to programmatically define what infrastructure we have running, and perform operations across that infrastructure. This is called “infrastructure as code.” Whether you want to backup a database, deploy a new version of a service, or introduce a new tier of load balancers, the changes that we make across our infrastructure can be done programmatically, instead of through a series of manual steps. As infrastructure got turned into code, operations people started working more like developers, and developers began to do the work of operations–a convergence known as “devops.” At Google, this “devops” movement was manifested in a role called “site reliability engineer.” In previous shows, we have explored site reliability engineering culture. Laine Campbell is a senior VP of engineering at Fastly, and the author of the book Database Reliability Engineering. In this book, Laine describes how the ideas of site reliability engineering can be extended to databases. Laine joins the show to discuss the book, and how engineering teams can build effective workflows around databases. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 829Rust Networking with Carl Lerche
Rust is a systems programming language with a distinct set of features for safety and concurrency. In previous shows about Rust, we explored how Rust can prevent crashes and eliminate data races through its approach to type safety and memory management. Rust’s focus on efficiency and safety makes it a promising language for networking code. Tokio is a set of networking libraries built on Rust. Tokio enables developers to write asynchronous IO operations by way of its multithreaded scheduler. Tokio’s goal is to make production-ready clients and servers easy to create by focusing on small, reusable components. Carl Lerche is an engineer at Buoyant, a company that makes the popular Linkerd and Conduit service mesh systems. Kubernetes developers deploy service mesh to their distributed applications as sidecar proxies. These proxies need to be low-latency and highly reliable. In that light, it makes sense that Conduit (the more recent service mesh from Buoyant) is built using Rust. Carl joins the show to describe why Rust is useful for building networked services. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 828Dremio Data Engineering with Tomer Shiran
Twenty years ago, all of the data in an organization could fit inside of relational databases. Imagine a company like Proctor and Gamble. P&G is a consumer packaged goods company with hundreds of business sectors–shaving products, toothpaste, shampoo, laundry detergent. Twenty years ago, if the chief financial officer of P&G wanted to answer a question about the revenue projections within the enterprise, that CFO would ask a VP to find the answer. The VP would contact the business analysts in all the different departments within Procter and Gamble, and those business analysts would all work with database administrators to answer questions for their business sector. In that world, it might have taken weeks or months for the CFO to get the answer about revenue projections. Today, data engineering has improved dramatically. Data sets within an enterprise are updated more rapidly. The tooling has advanced thanks to the Hadoop project leading to a wide range of open source projects that feed into one another. But data problems across an enterprise still exist. Business analysts, data scientists, and data engineers struggle to communicate with each other. The CFO still can’t get a question about revenue projections answered instantly. Instead of instant answers, we live in a world of friction, batch processing, and monthly reports. And this is not just true of old enterprises like P&G. It is true of newer startups like Uber, Airbnb, and Netflix. It seems that no amount of engineers and financial windfall can completely cure the frictions of the modern data platform. Tomer Shiran started Dremio to address the long-lived problems of data management, data access, and data governance within an enterprise. Dremio connects databases, storage systems, and business intelligence tools together, and uses intelligent caching to make commonly used queries within an organization more readily accessible. Dremio is an ambitious project that spent several years in stealth before launching. In today’s episode, Tomer gives a history of data engineering, and provides his perspective on how the data problems within an organization can be diminished. Full disclosure: Dremio is a sponsor of Software Engineering Daily. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 827Digital Evolution with Joel Lehman, Dusan Misevic, and Jeff Clune
Evolutionary algorithms can generate surprising, effective solutions to our problems. Evolutionary algorithms are often let loose within a simulated environment. The algorithm is given a function to optimize for, and the engineers expect that algorithm to evolve a solution that optimizes for the objective function given the constraints of the simulated environment. But sometimes these results are not exactly what we are looking for. For example, imagine an evolutionary algorithm that tries to evolve a creature that do a flip within a simulated physics engine that mirrors the real world. You could imagine all sorts of evolutionary traits. Maybe the creature will evolve to have legs that are like springs, and let the creature jump high enough to do a flip. Maybe the creature will develop normal legs with strong muscles that propel the creature high enough to flip. But you wouldn’t expect the creature to evolve to be extremely tall–so tall that the creature can merely lean over fast enough so that the top of its body flips upside down. In one experiment, this is exactly what happened. In another, similar experiment, the evolving creature discovered a bug in the physics engine of the simulated environment. This creature was able to exploit the problem with this physics engine to be able to move in ways that would not be possible in our real-world physical universe. Evolutionary algorithms sometimes evolve solutions in ways that we don’t expect. Researchers usually throw those results away, because they don’t contribute to the result that the researchers are looking for. The consequence is that lots of interesting anecdotes get lost. Joel Lehman, Dusan Misevic, and Jeff Clune are the lead authors of the paper “The Surprising Creativity of Digital Evolution.” The paper was a collection of anecdotes about strange results within the world of digital evolution. They join the show to describe what digital evolution is, and some of the strange results that they surveyed in their paper. Joel and Jeff are engineers at Uber’s artificial intelligence division–so this topic has applicable importance to them. Machine learning is all about evolution within simulated environments, and developing safe algorithms for AI requires an understanding of what can go wrong in a poorly defined evolutionary system. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 826Hacking Your Short-Term Rental with Jeremy Galloway
If you have ever stayed in a short-term rental (like an Airbnb, HomeAway, or CouchSurfing), you have probably used the wifi network at that rental property. Why wouldn’t you? It’s no different than hopping on an open wifi network at an airport, or a Starbucks, or your friend’s house, right? One major difference: the hardware is easily accessible to previous guests at the short-term rental. Previous guests could tamper with the software on a router, and use that tampering to do some malicious, surveillant things. Jeremy Galloway is a security engineer at Atlassian. In today’s show, he explains the risk of using wifi at a short-term rental like an Airbnb–including an explanation of how easy it is to take over a wifi network as a guest at a rental property. A broader point we discuss: large attack surfaces are difficult to secure. Whether we are talking about Airbnb, or another sharing economy app like Uber, or a large corporate network like Atlassian, or even your own personal life. Jeremy offers some best practices and philosophies for how to respond to the modern world of security. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 825Postgres Sharding and Scalability with Marco Slot
Relational databases have been popular since the 1970s, but in the last 20 years the amount of data that applications need to collect and store has skyrocketed. The raw cost to store that data has decreased. There is a common phrase in software companies: “it costs you less to save the data than to throw it away.” Saving the data is cheap, but accessing that data in a useful way can be expensive. Developers still need rapid row-wise and column-wise access to the data. Accessing an individual row of a database can be useful if a user is logging in and you want to load all of that user’s data, or if you want to update a banking system with a new financial transaction. Accessing an entire column of a database can be useful if you want to aggregate summaries of all of the entries in a system–like the sum of all financial transactions in a bank. These different kinds of transactions are nothing new, but with the growing scale of data, companies are changing their mentality from thinking in terms of individual databases to thinking about distributed “data platforms.” In a data platform, the data across a company might be put into a variety of storage systems–distributed file systems, databases, in-memory caches, search indexes–but the API for the developer is kept simple. And the simplest, most commonly understood language is SQL. Marco Slot is an engineer with Citus Data, a company that makes Postgres scalable. Postgres is one of the most common relational databases, and in this episode Marco describes how Postgres can be used to service almost all of the needs of a data platform. This isn’t easy to do, as it requires sharding your growing relational database into clusters and orchestrating distributed queries between those shards. In this show, Marco and I discuss Citus’s approach to the distributed systems problems of a sharded relational database. This episode is a nice complement to previous episodes we have done with Ozgun and Craig from Citus, in which they gave a history of relational databases, and explained how Postgres compares to the wide variety of relational databases out there. Full disclosure: Citus Data is a sponsor of Software Engineering Daily. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 824Necto: Build an ISP with Adam Montgomery
In the tech industry, we have all grown to fear “lock-in.” Lock-in is a situation in which you have no choice but to pay a certain provider for some aspect of your computer services. Since computers are so fundamental to our lives, we sometimes have no choice but to pay the provider of that lock-in service. Think of a few service providers in your life who have no serious competition. What is your relationship to that service provider? Do you feel like you are paying too much money? Do you wish that you could switch? This is how many people feel about their Internet service provider. An Internet Service Provider is the company that provides you with the “last mile” of physical infrastructure that connects you to the rest of the Internet. Different forms of ISP include cable ISPs, satellite ISPs, fiber ISPs, and copper/DSL ISPs. The medium of delivery varies, but the functionality is the same. This company is crucial to your Internet access. In many geographic locations, there are very restricted options for which ISP you could use. Why is that? Many people assume that there is some physical or regulatory barrier to starting an ISP. In fact, there are fewer barriers than you might think. Adam Montgomery is a co-founder of Necto, a company that provides an ISP starter kit. If you want to start your own ISP in an apartment building or in your neighborhood or wherever you are, the Necto ISP starter kit can help you get off the ground. That might sound like a crazy idea, but in this episode Adam explains why it is not so crazy–why the technology around ISPs is more broadly accessible than many people believe. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 823Bitcoin Lightning Network with Jameson Lopp
Big blocks or small blocks: this is the fundamental question of Bitcoin scalability. The argument for big blocks is also known as “on-chain scalability.” Under this strategy, each block in the append-only chain of Bitcoin transaction blocks would grow in size to be able to support lower transaction fees and higher on-chain throughput. A set of Bitcoin users who supported this idea forked Bitcoin to create Bitcoin Cash, a version of Bitcoin that has a larger block size. The argument for small blocks asserts that scaling Bitcoin does not require a larger block size. Under this model, the scaling demands of the Bitcoin blockchain would be handled by sidechains. A sidechain is a network of person-to-person payment channels that only reconcile with the Bitcoin blockchain to checkpoint batches of transactions. These sidechains can be connected together to form the “lightning network.” Lightning network is hard to implement. To implement a lightning network requires solving real-world distributed systems problems that are unprecedented. It’s much more complicated than deploying a blockchain with a larger block size. In addition, opponents of lightning network suggest that this will lead to a centralized banking system being constructed on top of Bitcoin. Opponents of lightning network fear that instead of a decentralized payments network, the world with lightning network will be a lower cost version of the present financial system, in which JP Morgan and Blockstream partner up to battle Coinbase in a centralized war for control of the unbanked. These big blockers argue that the new banks on the lightning network will be just like the old banks–censorious of transactions and held in the domineering palms of the global financial kleptocracy. So why bother with the lightning network approach? Why are we building this inelegant, kludgey system of off-chain, potentially centralized banking 2.0 complexity? Why not just increase the block size indefinitely and keep things simple? And even if we increased the block size today, couldn’t we still deploy lightning network in the future while appeasing the transaction volume of today? One major reason is that growing the block size does have a cost. The bigger the block size, the more demands it places on any node that wants to maintain a record of those blocks. And if you grow the block size today, you forego the experiment of seeing whether a small block size plus lightning network could in itself handle the transaction volume of a global financial system. The framing of “big blockers versus small blockers” is a conveniently polarized reduction of a much more granular reality. To believe that there is no subtlety between the two sides of this debate is to underestimate the number of dimensions to this argument. It’s an unfortunate side effect of rigidly programmed Twitter bots, and a political atmosphere in which your lines in the sand are demarcated by which subreddit you choose to affiliate with. That said–my impression is that the more experienced engineers are overwhelmingly on the side of small blocks plus lightning network as the most promising approach to scaling Bitcoin. Take whatever side of the debate you want. A single line of Bitcoin core code speaks much louder than an avalanche of tweets. In today’s episode, Jameson Lopp joins the show to explain why lightning network is an appealing engineering construct. We play the devil’s advocate and contrast lightning network with a big block approach, as well as a big block plus lightning network approach. Jameson also describes his experience working within the Ethereum ecosystem, and gives a sober explanation of some of the issues that Ethereum scalers may themselves encounter.
Ep 821Investment Games with Brian Singerman
Investing is an infinite game. In a game, a player can formulate a strategy based on the available resources, the apparent variance of the environment, and the metagame of the other actors involved. For an investor, the game board includes companies, currencies, and people. A successful game player can model their actions mathematically. They can describe a thesis for an in-game decision with clear language. Game players who reason through “gut feeling” do not perform well (unless their “gut” is aligned with correct mathematical heuristics). The same is true for investors. An investor who is going to be successful in the long term will be able to explain their investment thesis crisply. Each investment represents a bet with net positive expected value. The expected value of an investment is the sum of all potential probability-weighted future outcomes of a business. Each of those potential expected outcomes is the anticipated outcome times the probability that the investment works out in the anticipated way. Brian Singerman is a computer scientist and partner at Founder’s Fund. He is on the board of Affirm, AltSchool, Emerald Therapeutics, and a variety of other companies in disparate areas. He also plays lots of board games. Brian was a lot of fun to talk to because he was willing to field questions from an expansive range of topics–and he answered them so quickly and concisely that I started to get nervous that I was going to run out of things to ask him. Many of the businesses Brian has invested in do not have a well-defined historical precedent. If a venture capital investor was trying to make bets in defined “sectors” that investor would probably overlook a business like Forward (a vertically integrated healthcare company) or Cloud9 (a collection of esports teams). If an investment does not have a historical precedent, it’s harder to reason about it by analogy. You have to judge it by fundamental reasoning: the current market, the capability of the founders, and the economics of the business model. In many professions, reasoning by analogy will work out perfectly fine. You can pattern match on the past, and use that to justify decisions for the future. But if your professional livelihood depends on reasoning by fundamental principles, you get trained to assess situations that do not have precedent. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 819Future of Computing with John Hennessy
Moore’s Law states that the number of transistors in a dense integrated circuit doubles about every two years. Moore’s Law is less like a “law” and more like an observation or a prediction. Moore’s Law is ending. We can no longer fit an increasing amount of transistors in the same amount of space with a highly predictable rate. Dennard scaling is also coming to an end. Dennard scaling is the observation that as transistors get smaller, the power density stays constant. These changes in hardware trends have downstream effects for software engineers. Most importantly–power consumption becomes much more important. As a software engineer, how does power consumption affect you? It means that inefficient software will either run more slowly or cost more money relative to our expectations in the past. Whereas software engineers writing code 15 years ago could comfortably project that their code would get significantly cheaper to run over time due to hardware advances, the story is more complicated today. Why is Moore’s Law ending? And what kinds of predictable advances in technology can we still expect? John Hennessy is the chairman of Alphabet. In 2017, he won a Turing award (along with David Patterson) for his work on the RISC (Reduced Instruction Set Compiler) architecture. From 2000 to 2016, he was the president of Stanford University. John joins the show to explore the future of computing. While we may not have the predictable benefits of Moore’s Law and Dennard scaling, we now have machine learning. It is hard to plot the advances of machine learning on any one chart (as we explored in a recent episode with OpenAI). But we can say empirically that machine learning is working quite well in production. If machine learning offers us such strong advances in computing, how can we change our hardware design process to make machine learning more efficient? As machine learning training workloads eat up more resources in a data center, engineers are developing domain specific chips which are optimized for those machine learning workloads. The Tensor Processing Unit (TPU) from Google is one such example. John mentioned that chips could become even more specialized within the domain of machine learning. You could imagine a chip that is specifically designed for a LSTM machine learning model. There are other domains where we could see specialized chips–drones, self-driving cars, wearable computers. In this episode, John describes his perspective on the future of computing, and offers some framework for how engineers can adapt to that future. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 817Container Storage with Jie Yu
A database stores data to an underlying section of storage. If you are an application developer, you might think of your persistent storage system as being the database itself–but at a lower level, that database is writing to block storage, file storage, or object storage. A container orchestration system manages application containers. If you want to run WordPress (a blogging platform) on Kubernetes, that means you also need to run a database to store your blog posts in a persistent way. To run a database, you need to have an underlying storage medium–which could be a disk that is at your on-prem data center, or block storage on a disk at a cloud provider. Kubernetes is not the only container orchestrator. There’s also Cloud Foundry, Mesos, Docker Swarm, and several others. Each of these container orchestrators needs to be able to run a variety of persistent workloads (such as a MySQL database or a Kafka cluster). Each of these persistent workloads needs to be able to use different types of backing storage. With the range of container orchestrators and the range of backing storage types, a problem arises. Every storage type would have to write custom code to connect to each container orchestrator. The solution to this is the CSI: the container storage interface. The CSI is an interface layer between the container orchestrator and the backing storage system. In today’s episode, Jie Yu from Mesosphere describes the motivation for the CSI, and gives an overview for its design principles. There are great lessons here for anyone working with containers or distributed systems in general. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 816Profilers with Julia Evans
When software is performing suboptimally, the programmer can use a variety of tools to diagnose problems and improve the quality of the code. A profiler is a tool for examining where a program is spending time. Every program consists of a set of different functions. These functions call each other. The total amount of time that your program runs is the sum of the time your program spends in all of the different functions. When you run a program, you can execute a profiler on that program, and the profiler will give you a breakdown of which of the different functions time is being spent in. If you have function A, B, and C, your profiler might say that your program is spending 30% of its time in function A, 20% of its time in function B, and 50% of its time in function C. Julia Evans is a software engineer at Stripe, and the creator of a Ruby profiler called rbspy. rbspy can execute on a running Ruby program and report back with a profile. As Julia explains, a profiler turns out to be a non-trivial piece of software to build. To introspect a Ruby program, you need to understand how the Ruby interpreter is translating Ruby code into C structs for execution. This episode is about profilers–but in order to talk about profilers, we also have to talk about Ruby, the Ruby interpreter, and the way that executing programs are laid out in memory. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 815OpenAI: Compute and Safety with Dario Amodei
Applications of artificial intelligence are permeating our everyday lives. We notice it in small ways–improvements to speech recognition; better quality products being recommended to us; cheaper goods and services that have dropped in price because of more intelligent production. But what can we quantitatively say about the rate at which artificial intelligence is improving? How fast are models advancing? Do the different fields in artificial intelligence all advance together, or are they improving separately from each other? In other words, if the accuracy of a speech recognition model doubles, does that mean that the accuracy of image recognition will double also? It’s hard to know the answer to these questions. Machine learning models trained today can consume 300,000 times the amount of compute that could be consumed in 2012. This does not necessarily mean that models are 300,000 times better–these training algorithms could just be less efficient than yesterday’s models, and therefore are consuming more compute. We can observe from empirical data that models tend to get better with more data. Models also tend to get better with more compute. How much better do they get? That varies from application to application, from speech recognition to language translation. But models do seem to improve with more compute and more data. Dario Amodei works at OpenAI, where he leads the AI safety team. In a post called “AI and Compute,” Dario observed that the consumption of machine learning training runs is increasing exponentially–doubling every 3.5 months. In this episode, Dario discusses the implications of increased consumption of compute in the training process. Dario’s focus is AI safety. AI safety encompasses both the prevention of accidents and the prevention of deliberate malicious AI application. Today, humans are dying in autonomous car crashes–this is an accident. The reward functions of social networks are being exploited by botnets and fake, salacious news–this is malicious. The dangers of AI are already affecting our lives on the axes of accidents and malice. There will be more accidents, and more malicious applications–the question is what to do about it. What general strategies can be devised to improve AI safety? After Dario and I talk about the increased consumption of compute by training algorithms, we explore the implications of this increase for safety researchers. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 814Scaling Ethereum with Raul Jordan and Preston Van Loon
Cryptocurrency infrastructure is a new form of software. Thousands of developers are submitting transactions to Bitcoin and Ethereum, and this transaction volume tests the scalability of current blockchain implementations. The bottlenecks in scalability lead to slow transaction times and high fees. Over the last twenty years, engineers have learned how to scale databases. We’ve learned how to scale Internet applications like e-commerce stores and online games. It’s easy to forget, but there was a time when those systems didn’t perform well either. Scaling a blockchain is different than scaling a relational database or a microservices infrastructure. Blockchains are peer-to-peer databases with an append only ledger shared by thousands of nodes. With different scalability solutions, there are tradeoffs between decentralization, scalability, and security. As an example, in Bitcoin, the core developers are working towards deployment and adoption of lightning network. Some would argue that this approach favors scalability over decentralization. Today’s show is about scaling Ethereum. Raul Jordan and Preston Van Loon are developers who are part of Prysmatic Labs, a team building a sharding implementation for the Go Ethereum client. In this episode, we discuss Ethereum’s approaches to scaling, including sharding and Plasma. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 813Life Science R&D with Sherwin Yu
Ten years ago, a biology researcher was limited by the software tools available. Most of the electronic record keeping was done using Excel and other general purpose tools. Benchling is a suite of software tools that were designed to simplify the lives of life science researchers. Benchling helps with sample tracking, experiment design, and workflow management. Sherwin Yu is an engineering manager at Benchling, and he joins the show to discuss the workflows of the life scientist–how experiments are designed and managed. Life science researchers in both academia and industry use Benchling, and Sherwin spends time talking to them and understanding what they need from their tools. We also talked about the impact of CRISPR, robotic cloud laboratories, and other future developments. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 812Container Native Development with Ralph Squillace
Containers have improved deployments and resource utilization. Kubernetes created a platform to manage those containers and orchestrate them into distributed applications. In today’s episode, we explore tools that improve the workflow of the application developer who is working with Kubernetes, including Helm, Draft, and Brigade. Helm is a package manager for Kubernetes, which allows users to find, share, and use software that is built for Kubernetes. The unit of installation for Helm users is a Helm Chart. Installing a Helm Chart can simplify the deployment of a database, load balancer, or continuous integration tool. Draft is a tool for simplifying the containerization process. When a developer runs Draft, a Dockerfile is created to containerize the application, and a Helm Chart is created to enable the application to be easily deployed. Brigade is a tool for creating and running Kubernetes workflows. Brigade allows for event-driven scripting on top of Kubernetes. Chatops, continuous integration systems, and complex big data pipelines can all be defined with Brigade. Brigade is exciting, because it is a higher level tool on top of Kubernetes–in some ways similar to the “serverless on Kubernetes” systems we have covered in the past. Ralph Squillace is a principal program manager with Microsoft, where he works on containers, Linux, and cloud products. Ralph joins the show to talk about how developing with containers has changed in the last few years, and how it will continue to evolve in the near future. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 811Pi Hole: Ad Blocker Hardware with Jacob Salmela
Ad blockers in the browser protect us from the most annoying marketing messages that the Internet tries to serve to us. But we still pay a price for these ads. We pay the bandwidth costs of requesting these pages. Our browsers are slowed down by these extra requests. Pi Hole is a hardware based ad blocker. Pi Hole acts as a DNS server for all of the traffic that makes its way onto your network. Pi Hole has a blacklist of all the URLs to block–including tracking systems and ad networks. Pi Hole stops these URLs from communicating with all the devices on your network–including your cell phone. Jacob Salmela is the developer of Pi Hole, which he describes as a black hole for advertiser traffic. In this episode, we explain how traditional ad blocking in the browser works, and how things are improved with a piece of dedicated hardware doing the ad blocking. It was also a useful review of the relationship between URLs, IP addresses, your home network, and the broader Internet. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 810Autonomy with Frank Chen
Self-driving, electric cars will someday outnumber traditional automobiles on the road. As transportation becomes autonomous, it is hard to imagine an industry that will not be affected by the downstream effects of this change. These cars will likely be managed by fleet operators like Lyft and Uber. We will need fewer cars, and the amount of space dedicated to those cars will shrink dramatically. Parking lots, massive roads, and gas stations will be reclaimed or repurposed. City planning departments will have to devise entirely new strategies. As the self-driving cars reach consumer availability, an intricate supply chain for these cars will develop. When smartphones became mass-produced, the costs of GPS devices, accelerometers, and other small components dropped steeply. A consequence of the smartphone supply chain was that other devices like consumer drones became affordable. The self-driving car supply chain will lead to the mass production of building blocks for other new devices. With fewer automotive fatalities, the economics of the car insurance industry might collapse completely. At a minimum, the costs of car insurance will likely shift to the fleet operators, who can purchase that car insurance at prices factoring in their large risk pool. Frank Chen is a deal and research partner with Andreessen Horowitz. In a series of presentations on the Autonomy Ecosystem, Frank explores the effects of our impending shift to self-driving electric cars. His analysis considers changes to energy infrastructure, the competitive landscape of software companies, and a range of other topics. Frank joins the show to discuss autonomous vehicles and the side effects of widespread autonomous deployments. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 809Uber’s Data Platform with Zhenxiao Luo
When a user takes a ride on Uber, the app on the user’s phone is communicating with Uber’s backend infrastructure, which is writing to a database that maintains the state of that user’s activity. This database is known as a transactional database or “OLTP” (online transaction processing). Every active user and driver and UberEATS restaurant is writing data to the transactional data store. Periodically, that data is copied from the transactional data system to a different data storage system, where that data can be queried for large-scale data analysis. For example, if a data scientist at Uber wants to get the average amount of miles that a given user rode in February, that data scientist would issue a query to the analytical data cluster. Uber uses the Hadoop distributed file system (HDFS) to store analytical data. On this file system, Uber has a version history of all of the company’s useful historical data. Trip history, rider activity, driver activity–every data point that is in the transactional database–but in a file format that is easier to query for large scale processing. This file format is known as Parquet. Data scientists, machine learning engineers, and real-time application developers all depend on the massive quantities of data that are stored in these Parquet files on Uber’s HDFS cluster. To simplify the access of that data by many different clients, Uber uses Presto, an analytical query engine originally built at Facebook. Presto translates SQL queries into whatever query language is necessary to access the underlying storage medium–whether that storage system is an ElasticSearch cluster, a set of Parquet files, or a relational database. Presto is useful because it simplifies the relationship between data engineers and the application developers who are building on top of the data engineering infrastructure. In today’s show, Zhenxiao Luo joins to give an end-to-end description of Uber’s data infrastructure–from the ingest point of the OLTP database to the OLAP data storage system on HDFS, to the wide range of data systems and applications that run on top of that OLAP data. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 808Software Law: GDPR, Patents, and Antitrust with Micah Kesselman
The world of software moves faster than the laws that regulate it. When software companies do get regulated, that regulation is often enforced unevenly among different companies. Software continually presents the legal system with new requirements. Consumer data privacy needs to enforced on a granular level. Software developers need a system of protecting their intellectual property. When a company becomes dominant, our legal system needs to scrutinize that company for potential antitrust violations. Micah Kesselman is a lawyer specializing in software IP prosecution. Prior to becoming a lawyer, he studied computer science. He joins the show to discuss a range of issues at the intersection of software and the law–including GDPR, software patents, and self-driving cars. These are topics we will cover in more detail in the future, but it was great to have Micah bring the perspective of a lawyer to the show. Massachusetts Autonomous Vehicles Working Group Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 807Container Security with Maya Kaczorowski
Deploying software to a container presents a different security model than deploying an application to a VM. There is a smaller attack surface per container, but the container is colocated on a node with other containers. Containers are meant to have a shorter lifetime than VMs, so there are generally fewer consequences if a container needs to be destroyed and rebuilt due to a potential security vulnerability. Maya Kaczorowski works on container security at Google. In a recent talk at KubeCon, Maya discussed runtime security of containers on Kubernetes. Maya joins the show to discuss container security, and what it means to software developers and operators. Maya also gives guidelines for evaluating the security of your own cluster. We talked about the security benefits of a managed Kubernetes provider, and also explored how some container security vendor software works. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.