
Software Engineering Daily
2,200 episodes — Page 30 of 44
Ep 819Future of Computing with John Hennessy
Moore’s Law states that the number of transistors in a dense integrated circuit doubles about every two years. Moore’s Law is less like a “law” and more like an observation or a prediction. Moore’s Law is ending. We can no longer fit an increasing amount of transistors in the same amount of space with a highly predictable rate. Dennard scaling is also coming to an end. Dennard scaling is the observation that as transistors get smaller, the power density stays constant. These changes in hardware trends have downstream effects for software engineers. Most importantly–power consumption becomes much more important. As a software engineer, how does power consumption affect you? It means that inefficient software will either run more slowly or cost more money relative to our expectations in the past. Whereas software engineers writing code 15 years ago could comfortably project that their code would get significantly cheaper to run over time due to hardware advances, the story is more complicated today. Why is Moore’s Law ending? And what kinds of predictable advances in technology can we still expect? John Hennessy is the chairman of Alphabet. In 2017, he won a Turing award (along with David Patterson) for his work on the RISC (Reduced Instruction Set Compiler) architecture. From 2000 to 2016, he was the president of Stanford University. John joins the show to explore the future of computing. While we may not have the predictable benefits of Moore’s Law and Dennard scaling, we now have machine learning. It is hard to plot the advances of machine learning on any one chart (as we explored in a recent episode with OpenAI). But we can say empirically that machine learning is working quite well in production. If machine learning offers us such strong advances in computing, how can we change our hardware design process to make machine learning more efficient? As machine learning training workloads eat up more resources in a data center, engineers are developing domain specific chips which are optimized for those machine learning workloads. The Tensor Processing Unit (TPU) from Google is one such example. John mentioned that chips could become even more specialized within the domain of machine learning. You could imagine a chip that is specifically designed for a LSTM machine learning model. There are other domains where we could see specialized chips–drones, self-driving cars, wearable computers. In this episode, John describes his perspective on the future of computing, and offers some framework for how engineers can adapt to that future. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 817Container Storage with Jie Yu
A database stores data to an underlying section of storage. If you are an application developer, you might think of your persistent storage system as being the database itself–but at a lower level, that database is writing to block storage, file storage, or object storage. A container orchestration system manages application containers. If you want to run WordPress (a blogging platform) on Kubernetes, that means you also need to run a database to store your blog posts in a persistent way. To run a database, you need to have an underlying storage medium–which could be a disk that is at your on-prem data center, or block storage on a disk at a cloud provider. Kubernetes is not the only container orchestrator. There’s also Cloud Foundry, Mesos, Docker Swarm, and several others. Each of these container orchestrators needs to be able to run a variety of persistent workloads (such as a MySQL database or a Kafka cluster). Each of these persistent workloads needs to be able to use different types of backing storage. With the range of container orchestrators and the range of backing storage types, a problem arises. Every storage type would have to write custom code to connect to each container orchestrator. The solution to this is the CSI: the container storage interface. The CSI is an interface layer between the container orchestrator and the backing storage system. In today’s episode, Jie Yu from Mesosphere describes the motivation for the CSI, and gives an overview for its design principles. There are great lessons here for anyone working with containers or distributed systems in general. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 816Profilers with Julia Evans
When software is performing suboptimally, the programmer can use a variety of tools to diagnose problems and improve the quality of the code. A profiler is a tool for examining where a program is spending time. Every program consists of a set of different functions. These functions call each other. The total amount of time that your program runs is the sum of the time your program spends in all of the different functions. When you run a program, you can execute a profiler on that program, and the profiler will give you a breakdown of which of the different functions time is being spent in. If you have function A, B, and C, your profiler might say that your program is spending 30% of its time in function A, 20% of its time in function B, and 50% of its time in function C. Julia Evans is a software engineer at Stripe, and the creator of a Ruby profiler called rbspy. rbspy can execute on a running Ruby program and report back with a profile. As Julia explains, a profiler turns out to be a non-trivial piece of software to build. To introspect a Ruby program, you need to understand how the Ruby interpreter is translating Ruby code into C structs for execution. This episode is about profilers–but in order to talk about profilers, we also have to talk about Ruby, the Ruby interpreter, and the way that executing programs are laid out in memory. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 815OpenAI: Compute and Safety with Dario Amodei
Applications of artificial intelligence are permeating our everyday lives. We notice it in small ways–improvements to speech recognition; better quality products being recommended to us; cheaper goods and services that have dropped in price because of more intelligent production. But what can we quantitatively say about the rate at which artificial intelligence is improving? How fast are models advancing? Do the different fields in artificial intelligence all advance together, or are they improving separately from each other? In other words, if the accuracy of a speech recognition model doubles, does that mean that the accuracy of image recognition will double also? It’s hard to know the answer to these questions. Machine learning models trained today can consume 300,000 times the amount of compute that could be consumed in 2012. This does not necessarily mean that models are 300,000 times better–these training algorithms could just be less efficient than yesterday’s models, and therefore are consuming more compute. We can observe from empirical data that models tend to get better with more data. Models also tend to get better with more compute. How much better do they get? That varies from application to application, from speech recognition to language translation. But models do seem to improve with more compute and more data. Dario Amodei works at OpenAI, where he leads the AI safety team. In a post called “AI and Compute,” Dario observed that the consumption of machine learning training runs is increasing exponentially–doubling every 3.5 months. In this episode, Dario discusses the implications of increased consumption of compute in the training process. Dario’s focus is AI safety. AI safety encompasses both the prevention of accidents and the prevention of deliberate malicious AI application. Today, humans are dying in autonomous car crashes–this is an accident. The reward functions of social networks are being exploited by botnets and fake, salacious news–this is malicious. The dangers of AI are already affecting our lives on the axes of accidents and malice. There will be more accidents, and more malicious applications–the question is what to do about it. What general strategies can be devised to improve AI safety? After Dario and I talk about the increased consumption of compute by training algorithms, we explore the implications of this increase for safety researchers. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 814Scaling Ethereum with Raul Jordan and Preston Van Loon
Cryptocurrency infrastructure is a new form of software. Thousands of developers are submitting transactions to Bitcoin and Ethereum, and this transaction volume tests the scalability of current blockchain implementations. The bottlenecks in scalability lead to slow transaction times and high fees. Over the last twenty years, engineers have learned how to scale databases. We’ve learned how to scale Internet applications like e-commerce stores and online games. It’s easy to forget, but there was a time when those systems didn’t perform well either. Scaling a blockchain is different than scaling a relational database or a microservices infrastructure. Blockchains are peer-to-peer databases with an append only ledger shared by thousands of nodes. With different scalability solutions, there are tradeoffs between decentralization, scalability, and security. As an example, in Bitcoin, the core developers are working towards deployment and adoption of lightning network. Some would argue that this approach favors scalability over decentralization. Today’s show is about scaling Ethereum. Raul Jordan and Preston Van Loon are developers who are part of Prysmatic Labs, a team building a sharding implementation for the Go Ethereum client. In this episode, we discuss Ethereum’s approaches to scaling, including sharding and Plasma. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 813Life Science R&D with Sherwin Yu
Ten years ago, a biology researcher was limited by the software tools available. Most of the electronic record keeping was done using Excel and other general purpose tools. Benchling is a suite of software tools that were designed to simplify the lives of life science researchers. Benchling helps with sample tracking, experiment design, and workflow management. Sherwin Yu is an engineering manager at Benchling, and he joins the show to discuss the workflows of the life scientist–how experiments are designed and managed. Life science researchers in both academia and industry use Benchling, and Sherwin spends time talking to them and understanding what they need from their tools. We also talked about the impact of CRISPR, robotic cloud laboratories, and other future developments. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 812Container Native Development with Ralph Squillace
Containers have improved deployments and resource utilization. Kubernetes created a platform to manage those containers and orchestrate them into distributed applications. In today’s episode, we explore tools that improve the workflow of the application developer who is working with Kubernetes, including Helm, Draft, and Brigade. Helm is a package manager for Kubernetes, which allows users to find, share, and use software that is built for Kubernetes. The unit of installation for Helm users is a Helm Chart. Installing a Helm Chart can simplify the deployment of a database, load balancer, or continuous integration tool. Draft is a tool for simplifying the containerization process. When a developer runs Draft, a Dockerfile is created to containerize the application, and a Helm Chart is created to enable the application to be easily deployed. Brigade is a tool for creating and running Kubernetes workflows. Brigade allows for event-driven scripting on top of Kubernetes. Chatops, continuous integration systems, and complex big data pipelines can all be defined with Brigade. Brigade is exciting, because it is a higher level tool on top of Kubernetes–in some ways similar to the “serverless on Kubernetes” systems we have covered in the past. Ralph Squillace is a principal program manager with Microsoft, where he works on containers, Linux, and cloud products. Ralph joins the show to talk about how developing with containers has changed in the last few years, and how it will continue to evolve in the near future. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 811Pi Hole: Ad Blocker Hardware with Jacob Salmela
Ad blockers in the browser protect us from the most annoying marketing messages that the Internet tries to serve to us. But we still pay a price for these ads. We pay the bandwidth costs of requesting these pages. Our browsers are slowed down by these extra requests. Pi Hole is a hardware based ad blocker. Pi Hole acts as a DNS server for all of the traffic that makes its way onto your network. Pi Hole has a blacklist of all the URLs to block–including tracking systems and ad networks. Pi Hole stops these URLs from communicating with all the devices on your network–including your cell phone. Jacob Salmela is the developer of Pi Hole, which he describes as a black hole for advertiser traffic. In this episode, we explain how traditional ad blocking in the browser works, and how things are improved with a piece of dedicated hardware doing the ad blocking. It was also a useful review of the relationship between URLs, IP addresses, your home network, and the broader Internet. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 810Autonomy with Frank Chen
Self-driving, electric cars will someday outnumber traditional automobiles on the road. As transportation becomes autonomous, it is hard to imagine an industry that will not be affected by the downstream effects of this change. These cars will likely be managed by fleet operators like Lyft and Uber. We will need fewer cars, and the amount of space dedicated to those cars will shrink dramatically. Parking lots, massive roads, and gas stations will be reclaimed or repurposed. City planning departments will have to devise entirely new strategies. As the self-driving cars reach consumer availability, an intricate supply chain for these cars will develop. When smartphones became mass-produced, the costs of GPS devices, accelerometers, and other small components dropped steeply. A consequence of the smartphone supply chain was that other devices like consumer drones became affordable. The self-driving car supply chain will lead to the mass production of building blocks for other new devices. With fewer automotive fatalities, the economics of the car insurance industry might collapse completely. At a minimum, the costs of car insurance will likely shift to the fleet operators, who can purchase that car insurance at prices factoring in their large risk pool. Frank Chen is a deal and research partner with Andreessen Horowitz. In a series of presentations on the Autonomy Ecosystem, Frank explores the effects of our impending shift to self-driving electric cars. His analysis considers changes to energy infrastructure, the competitive landscape of software companies, and a range of other topics. Frank joins the show to discuss autonomous vehicles and the side effects of widespread autonomous deployments. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 809Uber’s Data Platform with Zhenxiao Luo
When a user takes a ride on Uber, the app on the user’s phone is communicating with Uber’s backend infrastructure, which is writing to a database that maintains the state of that user’s activity. This database is known as a transactional database or “OLTP” (online transaction processing). Every active user and driver and UberEATS restaurant is writing data to the transactional data store. Periodically, that data is copied from the transactional data system to a different data storage system, where that data can be queried for large-scale data analysis. For example, if a data scientist at Uber wants to get the average amount of miles that a given user rode in February, that data scientist would issue a query to the analytical data cluster. Uber uses the Hadoop distributed file system (HDFS) to store analytical data. On this file system, Uber has a version history of all of the company’s useful historical data. Trip history, rider activity, driver activity–every data point that is in the transactional database–but in a file format that is easier to query for large scale processing. This file format is known as Parquet. Data scientists, machine learning engineers, and real-time application developers all depend on the massive quantities of data that are stored in these Parquet files on Uber’s HDFS cluster. To simplify the access of that data by many different clients, Uber uses Presto, an analytical query engine originally built at Facebook. Presto translates SQL queries into whatever query language is necessary to access the underlying storage medium–whether that storage system is an ElasticSearch cluster, a set of Parquet files, or a relational database. Presto is useful because it simplifies the relationship between data engineers and the application developers who are building on top of the data engineering infrastructure. In today’s show, Zhenxiao Luo joins to give an end-to-end description of Uber’s data infrastructure–from the ingest point of the OLTP database to the OLAP data storage system on HDFS, to the wide range of data systems and applications that run on top of that OLAP data. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 808Software Law: GDPR, Patents, and Antitrust with Micah Kesselman
The world of software moves faster than the laws that regulate it. When software companies do get regulated, that regulation is often enforced unevenly among different companies. Software continually presents the legal system with new requirements. Consumer data privacy needs to enforced on a granular level. Software developers need a system of protecting their intellectual property. When a company becomes dominant, our legal system needs to scrutinize that company for potential antitrust violations. Micah Kesselman is a lawyer specializing in software IP prosecution. Prior to becoming a lawyer, he studied computer science. He joins the show to discuss a range of issues at the intersection of software and the law–including GDPR, software patents, and self-driving cars. These are topics we will cover in more detail in the future, but it was great to have Micah bring the perspective of a lawyer to the show. Massachusetts Autonomous Vehicles Working Group Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 807Container Security with Maya Kaczorowski
Deploying software to a container presents a different security model than deploying an application to a VM. There is a smaller attack surface per container, but the container is colocated on a node with other containers. Containers are meant to have a shorter lifetime than VMs, so there are generally fewer consequences if a container needs to be destroyed and rebuilt due to a potential security vulnerability. Maya Kaczorowski works on container security at Google. In a recent talk at KubeCon, Maya discussed runtime security of containers on Kubernetes. Maya joins the show to discuss container security, and what it means to software developers and operators. Maya also gives guidelines for evaluating the security of your own cluster. We talked about the security benefits of a managed Kubernetes provider, and also explored how some container security vendor software works. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 806Voice with Rita Singh
A sample of the human voice is a rich piece of unstructured data. Voice recordings can be turned into visualizations called spectrograms. Machine learning models can be trained to identify features of these spectrograms. Using this kind of analytic strategy, breakthroughs in voice analysis are happening at an amazing pace. Rita Singh researches voice at Carnegie Mellon University. Her work studies the high volume of latent data that is available in the human voice. As she explains, just a small fragment of a human voice can be used to identify who a speaker is. Your voice is as distinctive as your fingerprint. Your voice can also reveal medical conditions. Features of the human voice can be strongly correlated with psychiatric symptom severity, and potentially heart disease, cancer, and other illnesses. The human voice can even suggest a person’s physique–your height, weight, and facial features. In this episode, Rita explains the machine learning techniques that she uses to uncover the hidden richness of the human voice. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 805Machine Learning with Data Skeptic and Second Spectrum at Telesign
Data Skeptic is a podcast about machine learning, data science, and how software affects our lives. The first guest on today’s episode is Kyle Polich, the host of Data Skeptic. Kyle is one of the best explainers of machine learning concepts I have met, and for this episode, he presented some material that is perfect for this audience: machine learning for software engineers. Second Spectrum is a company that analyzes data from professional sports, turning that data into visualizations, reports, and futuristic sports viewing experiences. We had a previous show about Second Spectrum where we went into the company in detail–it was an excellent show, so I wanted to have Kevin Squire, an engineer from Spectrum, come on the show to talk about how the company builds machine learning tools to analyze sports data. If you have not seen any of the visualizations from Second Spectrum, stop what you are doing and watch a video on it! This year we have had three Software Engineering Daily Meetups: in New York, Boston, and Los Angeles. At each of these Meetups, listeners from the SE Daily community got to meet each other and talk about software–what they are building and what they are excited about. I was happy to be in attendance at each of these, and I am posting the talks given by our presenters. The audio quality is not perfect on these, but there are also no ads. Thanks to Telesign for graciously providing a space and some delicious food for our Meetup. Telesign has beautiful offices in Los Angeles, and they make SMS, voice, and data solutions. If you are looking for secure and reliable communications APIs, check them out. We’d love to have you as part of our community. We will have more Meetups eventually, and you can be notified of these by signing up for our newsletter. Come to SoftwareDaily.com and get involved with the discussion of episodes and software projects. You can also check out our open source projects–the mobile apps, and our website.
Ep 804Alexa Voice Design with Paul Cutsinger
Voice interfaces are a newer form of communicating with computers. Alexa is a voice interface platform from Amazon. Alexa powers the Amazon Echo, as well as Alexa-enabled cars, refrigerators, and dishwashers. Any developer can build a device with a voice interface using a Raspberry Pi. Paul Cutsinger works on Echo and Alexa at Amazon. He’s focused on growing the market of developers who are building voice interfaces. In this episode, Paul describes how to design and implement a voice application for the Amazon Alexa platform. The market for voice powered apps is so new, and there has yet to be a “killer app.” If you like to tinker on new platforms, you will like this episode–and I was surprised by how easy it sounds to build a voice app. Personally I use voice interfaces all the time–to set timers, to find out how to tell if a cucumber has gone bad, to ask what temperature to cook a potato at. Sometimes, when I am lying in bed trying to get to sleep, I will ask my nearest device to read me a Wikipedia article. These are great use cases, but I’m sure we will see something much more groundbreaking in the future. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 803Pulsar Messaging with Lewis Kaneshiro
Message broker systems decouple the consumers and producers of a message channel. In previous shows, we have explored ZeroMQ, PubNub, Apache Kafka, and NATS. In this episode, we talk about another message broker: Apache Pulsar. Pulsar is an open source distributed pub-sub message system originally created at Yahoo. It was used to scale products with high volumes of users–such as Yahoo Mail. There are three components of a Pulsar deployment: the Pulsar broker (which handles the message brokering), Apache Bookkeeper (which handles the durable storage of the messages), and Apache Zookeeper, which manages the distributed coordination. Lewis Kaneshiro joins the show to describe how Apache Pulsar works, and how it compares to other messaging systems like Apache Kafka. Lewis is the CEO of Streamlio, a company that builds messaging and stream processing systems for enterprises, and uses Pulsar in its core product. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 802Gloo: Function Gateway with Idit Levine
Gloo is a function gateway built on top of the popular open source project Envoy. The goal of Gloo is to decouple client-facing APIs from upstream APIs. Gloo is similar to an API gateway, which is a tool that software companies can use to collect all their APIs and one place and impose security, monitoring, and other standards around those APIs. The goal of Gloo is to provide all the tools necessary to glue together traditional and cloud-native applications. Idit Levine is the CEO of Solo.io, a company that is building Gloo and several other projects. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 801Vitess: Scaling MySQL with Sugu Sougoumarane
YouTube runs a large MySQL database to hold the metadata about its videos. As YouTube scaled, the database was sharded, and applications within YouTube had to write queries that were aware of the sharding layout of that database. This is problematic, because it pushes complexity to the application developer. An application developer shouldn’t have to be aware of how a database is laid out among different nodes. The developer should be able to issue a query, and have the cluster simply return the data. Vitess is an open source system for scaling large MySQL databases. Sugu Sougoumarane co-created Vitess at YouTube. Since YouTube is owned by Google, Vitess was able to leverage the Borg cluster manager developed at Google. Once Kubernetes came to market, it became more viable to make Vitess accessible to open source developers. Sugu joins the show to talk about the scalability problems that YouTube’s database infrastructure encountered and the motivations for building Vitess. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 800Cloud Native Computing Foundation with Chris Aniszczyk and Dan Kohn
The Kubernetes ecosystem consists of enterprises, vendors, open source projects, and individual engineers. The Cloud Native Computing Foundation was created to balance the interests of all the different groups within the cloud native community. CNCF has similarities to the Linux Foundation and the Apache Foundation. CNCF helps to guide open source projects in the Kubernetes ecosystem–including Prometheus, Fluentd, and Envoy. With the help of the CNCF, these projects can find common ground where possible. KubeCon is a conference organized by the Cloud Native Computing Foundation. I attended the most recent KubeCon in Copenhagen. KubeCon was a remarkably well-run conference–and the attendees were excited and optimistic. As much traction as Kubernetes has, it is still very early days and it was fun to talk to people and forecast what the future might bring. At KubeCon, I sat down with Chris Aniszczyk and Dan Kohn, who are the COO and director of the CNCF. I was curious about how to scale an organization like the CNCF. In some ways, it is like scaling a government. Kubernetes is growing faster than Linux grew, and the applications of Kubernetes are as numerous as those of Linux. Different constituencies want different things out of Kubernetes–and as those constituencies rapidly grow in number, how do you maintain diplomacy among competing interests? It’s not an easy task, and that diplomacy has been established by keeping in mind lessons from previous open source projects. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 799Cluster Schedulers with Ben Hindman
Mesos is a system for managing distributed systems. The goal of Mesos is to help engineers orchestrate resources among multi-node applications like Spark. Mesos can also manage lower level schedulers like Kubernetes. A common misconception is that Mesos aims to solve the same problem as Kubernetes, but Mesos is a higher level abstraction. Ben Hindman co-founded Mesosphere to bring the Mesos project to market. Large enterprises like Uber, Netflix, and Yelp use Mesosphere for resource management. Before he started the company, Ben worked in the Berkeley AMP Lab, a research program where the Spark and Tachyon projects were also born. At this point, he has spent significant time in both academia and industry. This conversation spans distributed systems theory, history, and practice. Ben and I spoke at KubeCon 2018 in Copenhagen–which was an amazing conference. We were both amazed at how big the audience for Kubernetes has gotten, and the pace at which the technology is advancing. Today, Kubernetes is mostly used for scheduling containerized applications that engineers have built themselves. But there will be higher level tools that use Kubernetes as a building block. Much like Zookeeper was used as a building block for Hadoop, Kubernetes will be used to build serverless applications and distributed databases. Once you are using a distributed database built on Kubernetes, you don’t want to think about the container orchestration–you want to think about the raw storage and CPU requirements for that database. This is one reason why Mesos is so compelling. Since Kubernetes creates an increased cardinality of distributed systems, it’s good to know that there is a framework built to manage those higher level applications. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 798Deep Learning Topologies with Yinyin Liu
Algorithms for building neural networks have existed for decades. For a long time, neural networks were not widely used. Recent changes to the cost of compute and the size of our data have made neural networks extremely useful. Our smart phones generate terabytes of useful data. Lower storage costs make it economical to keep that data. Cloud computing democratized the ability to do large scale machine learning across GPUs. Over the last few years, these trends have been driving widespread use of deep learning, in which neural nets with a large series of layers are used to create powerful results in various fields of classification and prediction. Neural networks are a tool for making sense of unstructured data–text, images, sound waves, and videos. “Unstructured” data is data with high volume or high dimensionality. For example, an image has a huge collection of pixels, and each pixel has a color value. One way to think about image classification is that you are finding correlations between those pixels. A certain cluster of pixels might represent an edge. After doing edge detection on pixels, you have a collection of edges. Then you can find correlations between those edges, and build up higher levels of abstraction. Yinyin Liu is a principal engineer and head of data science at the Intel AI products group. She studies techniques for building neural networks. Each different configuration of a neural network for a given problem is called a “topology.” Engineers are always looking at new topologies for solving a deep learning application–such as natural language processing. In this episode, Yinyin describes what a deep learning topology is and describes topologies for natural language processing. We also talk about the opportunities and the bottlenecks in deep learning–including why the tools are so immature, and what it will take to make the tooling better. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 797Data Engineering Podcast with Tobias Macey
Cloud computing lowered the cost and improved accessibility to tools for storing large volumes of data. In the early 2000s, Hadoop caused a revolution in large scale batch processing. Since then, companies have been building ways to store and access their data faster and more efficiently. At the same time, the sheer volume of data has increased and machine learning has given rise to methods of extracting signal from seemingly inconsequential data points. This confluence of factors gave rise to the role of the data engineer. A data engineer defines the data pipeline and supports data scientists and machine learning engineers. Tobias Macey hosts the “Data Engineering Podcast,” where he covers the fast moving world of data engineering–including databases, cloud providers, and open source tools. Tobias and I covered a range of topics in the data engineering space and also spent significant time discussing the world of software engineering podcasting. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 796Stripe Atlas with Patrick McKenzie
Starting an Internet business is harder than it should be. You need to incorporate, create an operating agreement, set up a system to accept payments, and many other straightforward tasks. In the 1990s, this was how it felt to set up anything on the Internet. You always had to stand up a web server on your own infrastructure, before you could get to the interesting part–which was building an actual product. With the popularization of cloud computing, it became massively easier to stand up a server. Because of that lower activation energy, millions of applications and thousands of software businesses got started. But the activation energy required to start a business remains higher than necessary. It feels like standing up a web server in the 90s–lots of tedium and reinventing the wheel that has been done by people before you. This is the motivation behind Stripe Atlas, a project to simplify the process of starting an Internet business. Patrick McKenzie works on Atlas at Stripe. He was previously on the show to discuss his experience leaving a large corporation to work on his own small software companies. And his name has become synonymous with the modern phenomenon of the small software company–he has been writing about this topic for over a decade at Kalzumeus.com. It was great to talk to Patrick once again about Internet businesses, and I’m excited to see Stripe Atlas become something huge. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 795Affirm Engineering with Libor Michalek
When I buy a mattress online, I pay for it with my credit card. Behind the scenes, a complex series of transactions occur between a payment gateway, the credit card company, and a few banks. There are problems with this process–it is slow, complex, and involves the synchronization of several different parties. Some consumers will not want to purchase the mattress because they do not have cash, and the lending rates they get offered are higher than they are willing to spend. If these consumers were presented with more intelligent loan rates, the lender could still make money, the mattress company could still make money, and the consumer would get a new mattress. It’s a missed opportunity all around. Affirm is a consumer financial services company. Their first product offers loans to consumers making purchases. In today’s episode, Affirm CTO Libor Michalek explains how Affirm decided to build this product, and what they have done to scale it. The conversation took me by surprise. Affirm was started by Max Levchin, who was a co-founder of PayPal. I assumed that when Affirm was created, they already knew exactly what they were going to build–because Affirm is a payments company and Max has had knowledge of the payments industry going back two decades. In reality, Affirm started out with more vague ideas around what they were building. They spent some time running small experiments as they looked for product/market fit–just like a bootstrapped startup would have. It was inspiring to know that even an experienced team is willing to go through the humble process of searching for a product within a space they are deeply familiar with. We didn’t get to all the questions I was planning to explore, but I hope to do another show about Affirm in the future. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 794Superpedestrian Robotic Wheel / Infrastructure at HubSpot Meetup Talks
Superpedestrian is a robotic bicycle wheel that learns how you pedal and personalizes your bicycle ride. The engineering challenges of Superpedestrian are at the intersection of robotics, software, and real-time analytics. The first half of today’s show is about Superpedestrian. Goss Nuzzo Jones and Matt Cole are engineers at Superpedestrian. The slides for their presentation are also in the show notes. The second half of today’s show is about HubSpot, a massive business with lots of infrastructure challenges. Thomas Petr explained how HubSpot’s engineering has matured, and some of the scaling problems they have tackled. Last month, we had three Software Engineering Daily Meetups: in New York, Boston, and Los Angeles. At each of these Meetups, listeners from the SE Daily community got to meet each other and talk about software–what they are building and what they are excited about. I was happy to be in attendance at each of these, and I am posting the talks given by our presenters. The audio quality is not perfect on these, but there are also no ads. Thank you to HubSpot for hosting this Meetup–they have beautiful offices and if you are looking for a job (or if you want to host a technology Meetup in the Boston area) I strongly recommend checking them out. We’d love to have you as part of our community. We will have more Meetups eventually, and you can be notified of these by signing up for our newsletter. Come to SoftwareDaily.com and get involved with the discussion of episodes and software projects. You can also check out our open source projects–the mobile apps, and our website. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 793Spark Geospatial Analytics with Ram Sriharsha
Phones are constantly tracking the location of a user in space. Devices like cars, smart watches, and drones are also picking up high volumes of location data. This location data is also called “geospatial data.” The amount of geospatial data is rapidly increasing, and there is a growing demand for software to perform operations over that data. Geospatial data sets are often massive–so it is non-trivial to perform operations over this data. Geospatial data can consist of something as simple as a set of latitude/longitude data points. A single lat/long coordinate pair can be enriched with information about what ZIP code it is in, how far that data point is from the other data points in the set, and where the nearest coffee shop is in relation to that data point. Ram Sriharsha created Magellan, a geospatial analytics library for Spark. In today’s show, Ram describes the set of problems within the domain of geospatial analytics engineering. Ram also works as a product manager for Apache Spark at Databricks. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 792WannaCry’s Gray Hat with Reeves Wiedeman
Last year, the WannaCry ransomware attack shut down hospitals, public transportation systems, and governments, demanding payment to unlock key computer systems. A programmer named Marcus Hutchins was able to stop WannaCry by registering a DNS entry buried in the WannaCry code. Not long after he stopped the WannaCry attack, Marcus Hutchins was arrested at a security conference in Las Vegas. Marcus’s arrest was due to actions that were unrelated to WannaCry. He is accused of writing a piece of malware called Kronos. Marcus volunteered his time to help stop WannaCry–a piece of ransomware that threatened to cause billions of dollars in damages. Whether or not he was a black hat in the past, perhaps Marcus should be absolved of his past actions. Reeves Wiedeman is a journalist with New York Magazine, and he joins the show to tell the story of WannaCry’s Gray Hat: Marcus Hutchins. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 791Building Datadog with Alexis Le-Quoc
Alexis Le-Quoc started Datadog in 2010, after living through the Internet boom and bust cycle of the late 90s and early 2000s. In 2010, cloud was just starting to become popular. There was a gap in the market for infrastructure monitoring tools, which Alexis helped fill with the first version of Datadog. Since 2010, the number of different cloud infrastructure products has proliferated. There were new databases, queueing systems, virtualization and containerization tools. Web 2.0 took off, and thousands of new Internet businesses got started. Many of these businesses used Datadog to monitor their increasingly wide range of infrastructure configurations–and Datadog began to scale. On today’s show, Alexis tells the story of how Datadog grew from its first product into a variety of tools–infrastructure monitoring, logging, and application performance monitoring. Monitoring is a unique challenge–there is a ton of data, the data is latency sensitive, and the data is operationally important. These engineering constraints provide for a great conversation. Alexis is the CTO of Datadog, and we talked about cloud providers, building a business, infrastructure, and how to scale engineering management. Full disclosure: Datadog is a sponsor of Software Engineering Daily. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 790Technology Utopia with Michael Solana
Technology is pushing us rapidly toward a future that is impossible to forecast. We try to imagine what that future might look like, and we can’t help having our predictions shaped by the media we have consumed. 1984, Terminator, Gattaca, Ex Machina, Black Mirror–all of these stories present a dystopian future. But if you look around the world, the most successful technologists are mostly guided by a sense of optimism. Technologists themselves are mostly idealistic–they see the future through a utopian lens. Popular media largely tells a different story: that we are headed for a dystopian world. Why is there such a gulf in the level of idealism between technologists and the media? Mike Solana found himself asking that question on a regular basis during his work at Founder’s Fund, where he is a vice president. Founder’s Fund has a bias toward funding difficult, cutting-edge technology like gene editing, robotics, and nuclear energy. This technology that Mike was seeing made him excited about the future–which led to his creation of the podcast “Anatomy of Next.” “Anatomy of Next” has explored biology, robotics, nuclear energy, superintelligence, and the nature of reality. Soon the podcast will be exploring how our civilization will explore and settle the solar system–specifically Mars. I’ve listened through the entire first season of the show twice and enjoyed it so much because Mike explores questions that are on the border of philosophy and technology–questions about the nature of reality, and what makes us human–and nobody can give perfect answers to these questions. But Mike interviews top experts on the show, which provides us with a framework. Guests on “Anatomy of Next” include Nick Bostrom (the author of Superintelligence), George Church (a pioneer in gene editing), and Palmer Luckey (the founder of VR company Oculus). Mike joins the show to talk about why he started “Anatomy of Next,” and his own perspective on the future. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 789Epicenter Cryptocurrencies with Brian Fabian Crain
Podcasting about cryptocurrencies is a strange occupation. You get emails all the time from companies doing a token sale that you would never want to be affiliated with. You get angry tweets from anonymous Twitter accounts that are on one side of the Bitcoin scaling debate. You get to interview extreme personalities, and the technical discussions can be highly educational. Brian Fabian Crain started the Epicenter podcast four years ago. Podcasting about cryptocurrencies allows a podcaster to report on a wide range of areas: economics, software, philosophy–and the stories within the blockchain world itself. Epicenter is one of my favorite podcasts about cryptocurrencies because Brian is always prepared enough to ask sophisticated questions. In this episode, we talked about ICOs–when does an ICO make sense? It seems that many token economies could function just as well without a token involved. We discussed the scaling approaches of Bitcoin and Ethereum–why are these two blockchains taking very different approaches to their scaling plans? And we talked about Chorus, the company that Brian founded to build infrastructure for proof-of-stake cryptocurrencies. I enjoyed talking to Brian about all these different subjects, and look forward to having him on again in the future. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 788Keybase Architecture / Clarifai Infrastructure Meetup Talks
Keybase is a platform for managing public key infrastructure. Keybase’s products simplify the complicated process of associating your identity with a public key. Keybase is the subject of the first half of today’s show. Michael Maxim, an engineer from Keybase gives an overview for how the technology works and what kinds of applications Keybase unlocks. The second half of today’s show is about Clarifai. Clarifai is an AI platform that provides image recognition APIs as a service. Habib Talavati explains how Clarifai’s infrastructure processes requests, and the opportunities for improving the efficiency of that infrastructure. Last month, we had three Software Engineering Daily Meetups: in New York, Boston, and Los Angeles. At each of these Meetups, listeners from the SE Daily community got to meet each other and talk about software–what they are building and what they are excited about. I was happy to be in attendance at each of these, and I am posting the talks given by our presenters. The audio quality is not perfect on these, but there are also no ads. Thanks to Datadog for graciously providing a space for our Meetup, and for being a sponsor of SE Daily. You can sign up for Datadog and get a free t-shirt by going to softwareengineeringdaily.com/datadog. We’d love to have you as part of our community. We will have more Meetups eventually, and you can be notified of these by signing up for our newsletter. Come to SoftwareDaily.com and get involved with the discussion of episodes and software projects. You can also check out our open source projects–the mobile apps, and our website. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 787Google Cluster Evolution with Brian Grant
Google’s central system for managing compute resources is called Borg. On Borg, millions of Linux containers process a wide variety of workloads. When a new application is spun up, Borg provides that application with the resources it needs. Workloads at Google usually fall into one of two distinct categories: long-running application workloads (such as Gmail) and batch workloads (such as a MapReduce job). In the early days of Google, the long-lived workloads were scheduled onto a system called “BabySitter” and the batch workloads were scheduled onto a system called “Global Work Queue.” Borg was the first cluster manager at Google designed to service both long-running and batch workloads from a single system. The second cluster manager at Google was Omega, a project that was created to improve the engineering behind Borg. The innovations of Omega improved efficiency and architecture of Borg. More recently, Kubernetes was created as an open source implementation of the ideas pioneered in Borg and Omega. Google has also built a Kubernetes as a service offering that companies use to run their infrastructure in the same way that Google does. Brian Grant is an engineer at Google who has seen the iteration of all three cluster management systems that have come out of Google. He joins the show to discuss how the workloads at Google have changed over time, and how his perspective on how to build and architect distributed systems has evolved. Full disclosure: Google is a sponsor of Software Engineering Daily. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 786TensorFlow Applications with Rajat Monga
Rajat Monga is a director of engineering at Google where he works on TensorFlow. TensorFlow is a framework for numerical computation developed at Google. The majority of TensorFlow users are building machine learning applications such as image recognition, recommendation systems, and natural language processing–but TensorFlow is actually applicable to a broader range of scientific computation than just machine learning. TensorFlow has APIs for decision trees, support vector machines, and linear algebra libraries. The current focus of the TensorFlow team is usability. There are thousands of engineers building data intensive applications with TensorFlow, but Rajat and the rest of the TensorFlow team would like to see millions more. In today’s show, Rajat and I discussed how TensorFlow is becoming more usable, as well as some of the developments in TensorFlow around edge computing, TensorFlow Hub, and TensorFlow.js, which allows TensorFlow to run in the browser. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 785Siftery Engineering with Ayan Barua
There are hundreds of different databases. There are tens of continuous delivery products. There is an ocean of cloud providers and CRM systems and monitoring platforms and sales prospecting tools. The range of available software products is so diverse that it can be overwhelming to figure out which products to buy. Siftery is a company that was started to index software products and help buyers make decisions. Siftery can build a data set from your web site or from your Google account, assess your software stack, and compare those software products to others on the market. In a previous show with Ayan Barua, we discussed how engineers should explore the question of build vs. buy. In today’s episode, Ayan joins the show to discuss how Siftery has evolved, and the engineering behind Siftery products. A newer Siftery product called Track can ingest banking transactions, QuickBooks records, or other transaction histories and use that information to compile the cost structure of your software company, and we spent the latter part of our conversation discussing why and how they built it. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 784NATS Messaging with Derek Collison
A message broker is an architectural component that sends messages between different nodes in a distributed system. Message brokers are useful because the sender of a message does not always know who might want to receive that message. Message brokers can be used to implement the “publish/subscribe” pattern, and by centralizing the message workloads within the pub/sub system, it lets system operators scale the performance of the messaging infrastructure by simply scaling that pub/sub system. Derek Collison has worked on messaging infrastructure for 25 years. He started at TIBCO, then spent time at Google and VMWare. When he was at VMWare, he architected the open source platform Cloud Foundry. While working on Cloud Foundry, Derek developed NATS, a messaging control plane. Since that time, Derek has started two companies–Apcera and Synadia Communications. In our conversation, Derek and I discussed the history of message brokers, how NATS compares to Kafka, and his ideas for how NATS could scale in the future to become something much more than a centralized message bus. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 783Stripe Observability Pipeline with Cory Watson
Stripe processes payments for thousands of businesses. A single payment could involve 10 different networked services. If a payment fails, engineers need to be able to diagnose what happened. The root cause could lie in any of those services. Distributed tracing is used to find the causes of failures and latency within networked services. In a distributed trace, each period of time associated with a request is recorded as a span. The spans can be connected together because they share a trace ID. The spans of a distributed trace are one element of observability. Others include metrics and logs. Each of these components of observability make their way into services like Lightstep and Datadog. The path traveled by different elements of observability is called the observability pipeline. In an episode last year, Cory Watson explained how observability works at Stripe. In today’s episode, Cory describes how observability is created and aggregated. It’s a useful discussion for anyone working at a company that is figuring out how to instrument their systems for better monitoring. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 782Bitcoin Debates with Roger Ver
Bitcoin and Bitcoin Cash are two cryptocurrencies with similar properties. But the supporters of each of these Bitcoin versions have strongly divergent opinions on the direction of the Bitcoin project. At the center of this debate is the subject of block size. Bitcoin’s block size determines how many transactions fit into each block that is mined. A larger block size leads to faster transactions and lower fees, but creates higher demands on mining hardware. A smaller block size leads to a slower on-chain network and higher fees, but allows the full nodes on the network to be run on low performance hardware like Raspberry Pi. Bitcoin Cash has a large block size. Bitcoin Core has a smaller block size. Proponents of the smaller block size argue that Bitcoin’s scaling can be achieved by the off-chain “lightning network” solution. Roger Ver is a Bitcoin entrepreneur and investor. Since he discovered the currency, he has been buying it and evangelizing it. More recently, Roger has become an ardent supporter of Bitcoin Cash–emphasizing that Bitcoin Cash is Bitcoin. In this episode, Roger describes his economic ideology, and explains why Bitcoin is so important to him. We explore how vested interests can shape the narrative and the direction of Bitcoin, and talk about the future of how corporations, governments, and individuals might be using cryptocurrencies. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 781React Stack with G2i Team
Most new frontend webapps today use ReactJS. An increasing number of mobile apps are created using the cross-platform components of React Native. GraphQL, Facebook’s open source data-fetching middleware tool is being used by more and more companies, who are finding that it simplifies their development. Facebook’s open source suite of technologies created a new developer ecosystem. There is an increased demand for engineers who know how to build software with React, ReactJS, and GraphQL. This was the reasoning behind Gabe Greenberg starting G2i, a developer marketplace of engineers who write ReactJS, React Native, and GraphQL applications. In this episode, Gabe, Lee Johnson, and Chris Severns from G2i join the show to discuss React and the other Facebook open source technologies–as well as the ecosystem around them. We explored the architecture of a developer marketplace business, and how to scale a consulting company. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 780SafeGraph with Auren Hoffman
Machine learning tools are rapidly maturing. TensorFlow gave developers an open source version of Google’s internal machine learning framework. Cloud computing provides a cost effective, accessible way of training models. Edge computing allows for low latency deployments of models. But even if you are a kid with a laptop who has learned all the machine learning algorithms, read all of the deep learning textbooks, and figured out how to use AWS, all of the tooling and education in the world doesn’t change the fact that you still need data to build models. This illustrates why we need data-as-a-service. A kid with a laptop has access to infrastructure-as-a-service, platform-as-a-service, and software-as-a-service. As these tools build on each other, there has been an explosion of high-leverage software products. But the world of data sets remains crude and underdeveloped. Think about some data sets you could take advantage of: the number of emergency room patients that come into a hospital with chest pain; the size of the average coffee mug; the principal component breakdown of sidewalk concrete in San Francisco. SafeGraph is a company that offers data sets as a service. Auren Hoffman is the CEO of SafeGraph, and he joins the show to discuss why he started building SafeGraph and how he thinks about the state of publicly accessible data. Auren was previously on the podcast, and I always enjoy talking to him–this was a great episode and I think you will like it as well. Full disclosure: LiveRamp is a sponsor of Software Engineering Daily, LiveRamp being the company that Auren created prior to SafeGraph. Raj Chetty economic papers Paul Graham “Keep Your Identity Small” Auren Hoffman on Quora Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 779Talking Bitcoin with Adam B. Levine
Let’s Talk Bitcoin is one of the most popular podcasts about cryptocurrencies. Adam B. Levine started it after three other podcasts he started did not get the traction he had hoped for. Adam parlayed the success of Let’s Talk Bitcoin into a network of podcasts–the Let’s Talk Bitcoin Network–which also includes one of my favorite shows, Epicenter. Adam joins me on today’s episode for a discussion of so many topics: the culture around cryptocurrencies, the art of podcasting, blockchain scalability, and ICOs. The conversation around ICOs was particularly exciting–if you have been listening to recent episodes, you have heard interviews with companies who have done ICOs. Some ICO companies are now facing legal ramifications for their token sales–and Adam and I have some disagreement over whether these ICO companies deserve much sympathy. It was a debate that I enjoyed and I hope to have Adam back on the show in the future for more debates. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 778Monitoring Kubernetes with Ilan Rabinovitch
Monitoring a Kubernetes cluster allows operators to track the resource utilization of the containers within that cluster. In today’s episode, Ilan Rabinovitch joins the show to explore the different options for setting up monitoring, and some common design patterns around Kubernetes logging and metrics gathering. Ilan is the VP of product and community at Datadog. Earlier in his career, Ilan spent much of his time working with Linux and taking part in the Linux community. We discussed the similarities and differences between the evolution of Linux and that of Kubernetes. In previous episodes, we have explored some common open source solutions for monitoring Kubernetes–including Prometheus and the EFK stack. Since Ilan works at Datadog, we explored how hosted solutions compare to self-managed monitoring. We also talked about how to assess different hosted solutions–such as those from a large cloud provider like AWS versus vendors that are specifically focused on monitoring. Full disclosure: Datadog is a sponsor of Software Engineering Daily. 8 Surprising Facts About Real Docker Adoption – Datadog Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 777Unchained with Laura Shin
Laura Shin is the host of Unchained, a podcast about cryptocurrencies and decentralized technology. For every episode, Laura does significant research and preparation, so the content turns out polished and high quality. Her enthusiasm for the subject of cryptocurrencies comes through in her reporting. Podcasting about cryptocurrencies requires walking a fine line. Cryptocurrencies have a mixture of drama and exciting technology–which are both great for a journalist. But you can’t get too deep in the drama, because the podcast will feel like a tabloid. And you can’t get too deep in the technical weeds, because the listener will fall asleep. Laura joins the show to discuss how she got into reporting on cryptocurrencies, why she got so obsessed with the subject, and her experience as a solo entrepreneurial journalist. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 776Mastodon: Federated Social Network with Eugen Rochko
Social networks can make you feel connected to a global society. But those social networks are controlled by a corporate entity. The profit motivations of the corporation are not directly aligned with the experience of the users. Mastodon is an open source, decentralized social network. Eugen Rochko started building Mastodon in response to his dissatisfaction with centralized social networks like Facebook and Twitter. In the Mastodon model, users can run their own nodes, and other users can connect to them. You can follow users whose accounts reside in other nodes. Eugen joins the show to discuss how Mastodon works, and how its thousands of users interact on the platform. We explore the open source community that is building Mastodon, and speculate on the future of social networks. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 775Go Systems with Erik St. Martin
Go is a language designed to improve systems programming. Go includes abstractions that simplify aspects of low level engineering that are historically difficult—concurrency, resource allocation, and dependency management. In that light, it makes sense that the Kubernetes container orchestration system was written in Go. Erik St. Martin is a cloud developer advocate at Microsoft, where he focuses on Go and Kubernetes. He also hosts the podcast “Go Time,” and has written a book on Go called Go In Action. Recently, Erik helped build the virtual Kubelet project, which allows Kubernetes nodes to be backed by services outside of that cluster. If you want your Kubernetes cluster to leverage abstractions such as serverless functions and standalone container instances, you can use Virtual Kubelet to treat these other abstractions as nodes. Erik also discussed his experience using Kubernetes at Comcast—which was a great case study. Near the end of the show, he also talked about organizing Gophercon, a popular conference for Go programmers—if you are organizing a conference or thinking about organizing one, it will be useful information to you. Full disclosure: Microsoft, where Erik works, is a sponsor of Software Engineering Daily. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 774Database Chaos with Tammy Butow
Tammy Butow has worked at Digital Ocean and Dropbox, where she built out infrastructure and managed engineering teams. At both of these companies, the customer base was at a massive scale. At Dropbox, Tammy worked on the database that holds metadata used by Dropbox users to access their files. To call this metadata system simply a “database” is an understatement–it is actually a multi-tiered system of caches and databases. This metadata is extremely sensitive–this is metadata that tells you where the objects across Dropbox are located–so it has to be highly available. To encourage that reliability, Tammy helped institute chaos engineering–inducing random failures across the Dropbox infrastructure, and making sure that the Dropbox systems could automatically respond to those failures. If you are unfamiliar with the topic, we have covered chaos engineering in two previous episodes of Software Engineering Daily. Tammy now works at Gremlin, a company that does chaos engineering as a service. In this show we talked about her experiences at Dropbox, and how to institute chaos engineering across databases. We also explored how her work at Gremlin–a smaller startup–compares to Dropbox and Digital Ocean, which are larger companies. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 773Site Reliability Management with Mike Hiraga
Software engineers have interacted with operations teams since software was being written. In the 1990s, most operations teams worked with physical infrastructure. They made sure that servers were provisioned correctly and installed with the proper software. When software engineers shipped bad code that took down a software company, the operations teams had to help recover the system—which often meant dealing with the physical servers. During the 90s and early 2000s, these operations engineers were often called “sysadmins,” “database admins” (if they worked on databases), or “infrastructure engineers.” Over the last decade, virtualization has led to many more logical servers across a company. Cloud computing has made infrastructure remote and programmable. The progression of infrastructure led to a change in how operations engineers work. Since infrastructure can be interacted with through code, operations engineers are now writing a lot more code. The “DevOps” movement can be seen through this lens. Operations teams were now writing software—and this meant that software engineers could now work on operations. Both software engineers and operators could create deployment pipelines, monitor application health, and improve the system scalability—all through written code. Site reliability engineering (or SRE) is a newer point along the evolutionary timeline of operations. Web applications can be unstable sometimes, and SRE is focused on making a site work more reliably. This is especially important for a company that makes business applications which other companies rely on. Mike Hiraga is the head of site reliability engineering at Atlassian. Atlassian makes several products that many businesses rely on—such as JIRA, Confluence, HipChat, and Bitbucket. Since the infrastructure is at a massive scale, Mike has a broad set of experiences from his work managing SRE at Atlassian. One particularly interesting topic is Atlassian’s migration to the cloud. Atlassian was started in 2002, before the cloud was widely used, and they have more recently made a push to move applications into the cloud. Full disclosure: Atlassian is a sponsor of Software Engineering Daily—and they are hiring, so if you are looking for a job, check out Atlassian jobs, or send me an email directly and I’m happy to introduce you to the team. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 772IPFS Design with David Dias
The Interplanetary File System (IPFS) is a decentralized global, peer-to-peer file system. IPFS combines ideas from BitTorrent, Git, and Bitcoin, creating a new way to store and access objects across the Internet. When you access an object on almost any website, you are accessing the object via a location address—a URL. The URL tells you where to find the object. If the object is a photo on Facebook that you are linking to, the URL will have an address of somewhere on Facebook. Other objects that we access through URLs include web pages, videos, and JavaScript import packages. URLs seem natural to us. You look up an object based on where that object is being stored. Why would you do anything differently? A downside of location addressing is that if the location disappears, you can no longer access that object. If a government decides to censor a website that I wanted to visit, the government can shut down access to the server where that website sits, and my link will break. This happened in Turkey—where Wikipedia was shut down last year. Objects in IPFS are content addressed—you access an object by giving IPFS a cryptographic hash of the object, and IPFS will find someone on the network who has a copy of that object, and give you access to it. To look up a webpage in an IPFS browser, you put the content address in the address bar. When the HTML for that page is received, that page might have lots of other content-addressed files referred to on the page. Your browser can also grab all of those content-addressed files from the IPFS P2P network. In this episode, David Dias explains how IPFS is designed. David is an engineer at Protocol Labs, the company building out IPFS. This episode is a great companion to our previous show with Juan Benet, the creator of IPFS. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 771Ethereum Governance with Hudson Jameson
The Ethereum community started as a small group of dedicated engineers. It has ballooned to thousands of engineers, entrepreneurs and investors, all of whom have a stake in the direction of Ethereum. Ethereum is an open source project, and the direction of a popular open source project can get complex. Ethereum is figuring out how to govern itself. It’s not clear what the perfect model is, but there are a few historical examples to think about: namely Linux and Bitcoin. Linux is similar to Ethereum in that there is a clear leader—Linux has Linus Torvalds and Ethereum has Vitalik Buterin. Linux is massively successful, and the Linux development team does have a top-down, hierarchical approach. But does a hierarchy with clear leadership make sense for a project like Ethereum, which has decentralization at its core? Bitcoin is headless—Satoshi disappeared in 2010, and there is not an official leader. Bitcoin has succeeded without a well-defined hierarchy–depending on what your definition of success is. Bitcoin development does not move as fast as Ethereum (this is by design)—but there is more widespread trust that the integrity of the system cannot be compromised by its creator. Hudson Jameson is an Ethereum developer and entrepreneur who has been part of the community since the early days. He works on Ethereum governance, which defines how changes to the Ethereum project are proposed, accepted, and implemented. Hudson joins the show today to talk about Ethereum governance, smart contracts, and the DAO hack. We did not discuss on-chain vs. off-chain governance, but I am hoping to cover that in a future episode. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 770PubSub Infrastructure with Stephen Blum
The pubsub pattern allows a developer to create channels, which messages can be written to and read from. Pubsub messaging is useful for multicast messaging–when you want to publish messages from a producer, and have multiple consumers who are subscribed to the publisher receive those messages. Almost any application that reaches a high level of complexity will need a pubsub system of some kind. The pubsub system itself can be complex. A pubsub system needs to scale up and down to handle different numbers of consumers and producers, and different volumes of messages. Back in 2010, the growth of mobile and cloud was leading to many new applications with high throughput, multi-user interactions. Developers were standing up their own instances of open source pubsub message queueing systems like RabbitMQ and ZeroMQ. Once the MQ systems needed to scale, the developer would need to handle the scaling. Stephen Blum started his company PubNub around this time, to create automatically scaling APIs for messaging. Stephen joins the show to discuss the infrastructure choices around building a large scale pubsub service, and how the company has scaled over time. He also talks about the management, product development, and business side of running the company. PubNub has built several additional technologies on top of the core infrastructure that was originally for messaging. Full disclosure: PubNub is a sponsor of Software Engineering Daily. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Ep 769Gitcoin: Open Source Bounties with Kevin Owocki
Most technology companies rely on open source software projects. But open source software projects are often maintained by a group of people that is not affiliated with any particular company. When an open source project develops too much technical debt, it can become a tragedy of the commons. Who is responsible for maintaining these open source projects? This is the motivation for open source bounties. Companies and individuals who rely on open source create bounties, which are financial incentives for developers to solve problems within the open source project. Kevin Owocki is the creator of Gitcoin, a platform for open source bounties that is mediated by an Ethereum smart contract. Kevin joins the show to discuss his experience building Gitcoin–as well as some of the problems with the blockchain space, such as rampant ICOs. Gitcoin is NOT a cryptocurrency or token itself–it is a platform for open source software to be built more efficiently. Kevin was an awesome guest and you will enjoy the conversation. Gitcoin is a nice example of a real-world Ethereum use case–it uses Ethereum for escrow: if I post a $25 bounty for someone to fix a bug in my open source project, I will lock up ether in a smart contract. When the bug is fixed, the programmer who fixed it will submit a pull request on Github, and I will release the ether from the smart contract to pay them. We would love for you to fill out our listener survey at softwareengineeringdaily.com/survey. This will help us decide what other content to focus on. Of course–you can also send me an email at any time, [email protected]. And in the meantime, if you are completely sick of cryptocurrencies, check out our back catalog of episodes at softwaredaily.com, or by downloading our Software Engineering Daily apps, which have all of our episodes including our Greatest Hits, which is a curated set of the most popular shows. The apps will soon have offline downloads and bookmarking. Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.