Software Engineering Daily

2,200 episodes — Page 25 of 44

Ep 1146Monolithic Repositories with Ciera Jaspan

Google’s codebase is managed in a single monolithic repository. An engineer at Google can explore almost any area of the codebase within the entire company. In order to enable this, Google has built tooling to support the monolithic repo, including a virtual file system and a set of build tools. A monolithic repository is not to be confused with a monolithic deployment. Google’s infrastructure consists of thousands of small services interacting over a network, and scaling individually. But all of the code for each of these different independent modules is in the same version control system. Ciera Jaspan is a staff software engineer at Google working on developer infrastructure. She worked on an internal research project within Google to find out how engineers felt about the monolithic repository system and how it compared to a large number of small repositories. Ciera joins the show to discuss repository management, internal tooling, and Google’s approach to researching developer productivity within the company. RECENT UPDATES: The FindCollabs Open has started. It is our second FindCollabs hackathon, and we are giving away $2500 in prizes. The prizes will be awarded in categories such as machine learning, business plan, music, visual art, and JavaScript. If one of those areas sounds interesting to you, check out findcollabs.com/open! The FindCollabs Podcast is out! We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/

May 22, 20191h 2m

Ep 1145Scaling Intuit with Alex Balazs

Alex Balazs is the Intuit Chief Architect and has been working at the company for almost twenty years. Intuit’s products include QuickBooks, TurboTax, and Mint. These applications are used to file taxes, manage business invoices, conduct personal accounting, and other critical aspects of a user’s financial life. Because the applications are managing money for users, there is not much room for error. When Intuit was started, the company made desktop software. In his time at Intuit, Alex played a key role in rearchitecting the monolithic desktop applications to be resilient, reliable web applications. Intuit originally managed this software on their own servers. Since then, Intuit has migrated to the cloud using AWS. Alex joins the show to discuss his experience scaling Intuit, his strategy for cloud migration, and his evaluation criteria for questions of build versus buy. RECENT UPDATES: The FindCollabs Open has started. It is our second FindCollabs hackathon, and we are giving away $2500 in prizes. The prizes will be awarded in categories such as machine learning, business plan, music, visual art, and JavaScript. If one of those areas sounds interesting to you, check out findcollabs.com/open! The FindCollabs Podcast is out! We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/

May 21, 201957 min

Ep 1144EmergingMarkets: Kenya with Nelly Cheboi

Africa is rapidly adopting the same software and hardware technologies that have transformed the western world over the last few decades. But access to computers and technology education is still uneven. Where there is access to computers, smartphone adoption often comes before access to laptops or desktop computers. Nelly Cheboi is the founder of TechLit Africa, an organization that works to connect schools and families in Africa with computers and software. Nelly studied computer science, and worked as a software engineer before leaving her career to focus full-time on building a scalable model to take refurbished computers and give them to Africans who can make good use of them. TechLit Africa is also building a software stack to equip schools in Africa without an Internet connection with an internal subnet including Wikipedia and other educational resources, so that people in the school can get an Internet-like experience despite a lack of access to the full Internet. RECENT UPDATES: The FindCollabs Open has started. It is our second FindCollabs hackathon, and we are giving away $2500 in prizes. The prizes will be awarded in categories such as machine learning, business plan, music, visual art, and JavaScript. If one of those areas sounds interesting to you, check out findcollabs.com/open! The FindCollabs Podcast is out! We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/

May 20, 201959 min

Ep 1143Facebook Strategy with Mike Vernal

Facebook’s strategy is shaped by long term goals, short term requirements, and the available resources of the company. Long term goals are necessary for thinking through big decisions such as acquisitions, hardware product investments, and open source software ecosystems. To implement long term goals, Facebook needs to communicate the vision of the company and foster an internal culture that supports that vision. Short term requirements can affect how the company is thinking on a more immediate time horizon. When Facebook realized the importance of mobile computing, the mentality in the company quickly shifted from looking at mobile as a tax on engineering resources to a long-term source of business value. When Google started to work on Google+, Facebook engineers focused their resources on the potential competitive threat. Facebook’s strategy is implemented by the engineers, product managers, and other employees of the company. Facebook is unique in its ability to allow those employees to self-assemble into work that is meaningful to the individuals as well as to the company. As the long term goals and short term requirements of Facebook change over time, company resources are shifted to focus the company on the correct set of priorities. Some of those priorities might be speculative investments in new technologies. Other priorities might include doubling down on areas of the company that are showing promise. Mike Vernal worked as a VP of product and engineering at Facebook for 8 years. He left the company in 2016 and joined Sequoia Capital, where he now works as a partner. In his time at Facebook, he helped architect and implement strategies relating to product direction and engineering. Mike joins the show for a discussion about his time at Facebook and the strategic lessons that he learned from his time at the company. RECENT UPDATES: The FindCollabs Open has started. It is our second FindCollabs hackathon, and we are giving away $2500 in prizes. The prizes will be awarded in categories such as machine learning, business plan, music, visual art, and JavaScript. If one of those areas sounds interesting to you, check out findcollabs.com/open! The FindCollabs Podcast is out! We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/

May 17, 201955 min

Ep 1142Facebook React with Dan Abramov

React is a set of open source tools for building user interfaces. React was open sourced by Facebook, and includes libraries for creating interfaces on the web (ReactJS) and on mobile devices (React Native). React was released during a time when there was not a dominant frontend JavaScript library. Backbone, Angular, and other JavaScript frameworks were all popular, but there was not any consolidation across the frontend web development community. Before React came out, frontend developers were fractured into different communities for the different JavaScript frameworks. After Facebook open sourced React, web developers began to gravitate towards the framework for its one-way data flow and its unconventional style of putting JavaScript and HTML together in a format called JSX. As React has grown in popularity, the React ecosystem has developed network effects. In many cases, the easiest way to build a web application frontend is to compose together open source React components. After seeing the initial traction, Facebook invested heavily into React, creating entire teams within the company whose goal was to improve React. Dan Abramov works on the React team at Facebook and joins the show to talk about how the React project is managed and his vision for the project. RECENT UPDATES: The FindCollabs Open has started. It is our second FindCollabs hackathon, and we are giving away $2500 in prizes. The prizes will be awarded in categories such as machine learning, business plan, music, visual art, and JavaScript. If one of those areas sounds interesting to you, check out findcollabs.com/open! The FindCollabs Podcast is out! We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/

May 16, 201950 min

Ep 1140Facebook Management with Jocelyn Goldfein

Facebook leadership was able to recognize the importance of mobile computing in time to develop high quality mobile applications, but there were numerous challenges. The Facebook desktop web app had been difficult enough to build due to the unprecedented data requirements and amount of interactivity. Mobile introduced the additional hurdles of limited bandwidth and distinct native operating systems in Android and iPhone. Facebook’s early efforts to build a mobile application involved a cross-platform HTML5 solution. HTML5 had insufficient performance for Facebook’s needs, and the company needed to develop native apps in order to deliver the desired experience. Facebook’s ability to pivot to mobile is comparable to the classic story of Intel pivoting from a memory company to a microprocessor company. To succeed at mobile application development, Facebook had to shift its focus dramatically, reallocating engineering resources and acqui-hiring small mobile companies in order to build up the domain expertise for mobile. As a side effect of this transition to mobile, Facebook developed an understanding of how dramatically software engineering was changed by the introduction of smartphones and the high bandwidth requirements of social networking. The challenges of this new paradigm led to the development of open source tools such as GraphQL and React Native, which have allowed countless projects to build applications more easily. Jocelyn Goldfein was an engineering director at Facebook for four years, from 2010 to 2014. She currently works as an investor at Zetta Venture Partners. In her time at Facebook, Jocelyn saw the shift to mobile firsthand. In today’s episode, she describes how Facebook management works, and gives her perspective on the distinguishing characteristics of the engineering organization as a whole.

May 15, 20191h 14m

Ep 1139Facebook Developers with Nick Schrock

Facebook is a case study in the ability for developers to self-organize into groups who are working on projects that are meaningful to the company and personally satisfying to the individual engineers. Many engineers in the software industry work under a less capable manager who has complete control over their creativity. This leads to employee churn, dissatisfaction, and burnout. Facebook’s ability to move fast is predicated on its ability to match engineers with problems that are interesting to those particular individuals. Whether you want to work on newsfeed or developer productivity tools or machine learning research, there is a path within Facebook to finding a problem that is both important and fun. Facebook’s unique set of engineering challenges required the company to develop a unique set of internal tools. Because Facebook had data and throughput requirements which were unprecedented, the available tools and best practices at the time did not satisfy Facebook’s requirements. Over the years, Facebook has developed its own databases, caching strategies, and JavaScript frameworks. Nick Schrock worked at Facebook for eight years. He is best known as a co-creator of GraphQL, a tool for efficiently fetching data through a federated request language. GraphQL was the result of years of evolution of internal tooling within Facebook. Nick has discussed the creation of GraphQL in other podcasts, and we will have a more dedicated episode around a retrospective of GraphQL in the near future. Today’s episode is about the process by which developers at Facebook self-organized, and Nick’s ideas around how to identify a need for an internal tool. Since leaving Facebook, Nick has parlayed his experience in developer tools into Dagster, a programming model for data applications.

May 14, 20191h 7m

Ep 1138Facebook Engineering with Pete Hunt

Facebook engineering is commonly described by two words: move fast. Building products quickly has been a differentiating characteristic of the company since its inception. From the longtime engineers to the summer interns, Facebook instills a sense of immediacy and opportunity in all of its employees. The goal of Facebook is to make the world more open and transparent, with the intention of creating greater understanding and connection through Internet services. More than any other company in history, Facebook has enabled people to communicate with each other via simple user interfaces and real, authenticated human identity. Facebook must move fast, because the vision for Facebook is without precedent. It may feel like the Facebook mission is already finished, because you can already use Facebook to connect with anyone across the world with an Internet connection. But once you are connected to somebody on Facebook, there are only a small number of interactions you can take: sending a message, sharing a photo, broadcasting a video stream. There are so many more parts of our lives waiting to be digitized, and many of these require a real identity system to work properly. More than any other company, Facebook is positioned to expand our system of real-world human trust onto the Internet. The depth and breadth of the engineering problems required to accomplish this demands that Facebook move fast. To move slower would cause all of us to pay the opportunity cost of having to wait longer to interconnect our global society. Pete Hunt worked as an engineer at Facebook for three and a half years. At Facebook, he helped build React, a set of technologies that have significantly improved frontend application interface development. After the Instagram acquisition, Pete was the first engineer from Facebook to join the Instagram team to help bring the two companies together. Pete left Facebook in 2014 to start Smyte, a company that made trust and safety tools for marketplaces and social networks. Smyte was acquired by Twitter, where Pete now works on engineering problems relating to trust, safety, health, and infrastructure. Pete joins the show for the first of several episodes with Facebook engineers. In these episodes, we will explore the engineering practices of Facebook–from scaling Facebook’s PHP monolith to open sourcing React and GraphQL. Other topics will include management, onboarding, and product strategy. Our goal is to present a holistic picture of how Facebook engineering works, so that other organizations can learn to adopt practices that will allow them to move faster. We hope you enjoy this series on Facebook engineering. RECENT UPDATES: The FindCollabs Open has started. It is our second FindCollabs hackathon, and we are giving away $2500 in prizes. The prizes will be awarded in categories such as machine learning, business plan, music, visual art, and JavaScript. If one of those areas sounds interesting to you, check out findcollabs.com/open! The FindCollabs Podcast is out! We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/

May 13, 201956 min

Ep 1137Airtable with Howie Liu

Software engineering is harder than it should be. There are many people who have an app idea that they are not sure how to build. Some of these people are highly technical professionals like real estate agents, scientists, and accountants. These professionals learn to use spreadsheets in their day-to-day work. Spreadsheets are also used widely by young people such as students. Spreadsheet users vary in terms of how familiar they are with the programmability of a spreadsheet, but there are certainly more people who have built complex spreadsheets than there are people who have built complex web apps. Airtable is a tool for making application development easier and more accessible. The Airtable interface is similar to a spreadsheet and can be used for most spreadsheet applications. It can also serve as a rich backend database system to improve the productivity of software developers who are fully capable of building web applications. There are high-level programmable components called Blocks and integrations with developer APIs like Twilio and Stripe. Airtable has a permissions and collaboration system that allows interaction between engineers who might be using Airtable as a programmatic transactional database and operations members who might need to read or edit specific parts of the data on an ad hoc basis. Howie Liu is the CEO of Airtable and he joins the show to talk about his vision for the product and the engineering problems he is working on to realize that vision. Airtable has not been trivial to build, and has required its own custom database backend and its own JavaScript rendering system. Special thanks to Gareth Pronovost who is a full-time Airtable expert that I found on YouTube and who was generous enough to take some time to have a call with me and describe his experience using Airtable. The fact that there is a full profession around creating Airtable applications speaks to how unique this platform is. RECENT UPDATES: FindCollabs is a company I started recently The FindCollabs Podcast is out! FindCollabs is hiring a React developer FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/ Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service

May 10, 201943 min

Ep 1136Virtual Data with Sunil Kamath

Relational data systems have evolved from single node instances to complex distributed systems. Almost any database can be accessed through a SQL statement, but the guarantees of these databases can vary in terms of consistency, availability, latency, durability, and financial cost. Relational database systems that explore these different sets of tradeoffs are sometimes categorized as “NewSQL”. There are also a wide variety of data systems that are not categorized as databases. Kafka is a distributed queue. HDFS is a distributed file system. Spark provides a distributed in-memory working set to process data. Cloud providers offer hosted bucket storage for your data lake and fast processing in the form of a data warehouse. Sunil Kamath is a principal PM with Microsoft. Sunil has worked on database systems for two decades, and he joins the show to give his perspective on the current data world and his predictions for how data platforms will become easier to use. Sunil is optimistic about the use of virtual data for unifying the access of data for a variety of operational use cases. RECENT UPDATES: FindCollabs is a company I started recently The FindCollabs Podcast is out! FindCollabs is hiring a React developer FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/ Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service

May 9, 201939 min

Ep 1135Web Assembly Runtime with Tyler McMullen

WebAssembly is a binary instruction format for applications to run in a memory-constrained, stack-based virtual machine. The WebAssembly ecosystem consists of tools and projects that allow programs in a variety of languages to compile into WebAssembly and run in a safe, fast, sandboxed runtime environment. WebAssembly is a transformative technology for the Internet. Most users will experience it as a set of gradual, incremental improvements to their online experiences. Pages will load faster and become more dynamic. Applications will become more secure. Infrastructure will become cheaper, and those cost savings will eventually reach the consumer. For developers, WebAssembly opens a world of possibility. In today’s operating systems, the user can feel a big difference between applications that need a large client-side runtime (such as video editing tools, or render-heavy games such as Half Life) and applications that are more lightweight and can run entirely on the web (such as Twitter). Tyler McMullen is the CTO at Fastly. He joins the show to talk about the compilation path, the runtime, and the opportunities of WebAssembly. RECENT UPDATES: FindCollabs is a company I started recently The FindCollabs Podcast is out! FindCollabs is hiring a React developer FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/ Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service

May 8, 201953 min

Ep 1134Kubernetes Virtualization with Paul Czarkowski

Modern server infrastructure usually runs in a virtualized environment. Virtual servers can exist inside of a container or inside of a virtual machine. Containers can also run on virtual machines. Kubernetes has allowed developers to manage their multiple containers, whether those containers are running in VMs or on bare metal (servers without VMs). As organizations expand their Kubernetes deployments, the overhead of those deployments is becoming a relevant concern. So-called “Kubesprawl” can occur within organizations due to a lack of best practices on when new clusters should be spun up or spun down, and when clusters should be shared by teams or shared by services. Paul Czarkowski is a principal technologist with Pivotal. He joins the show to discuss virtualization, Kubernetes, and the state of the cloud native ecosystem. RECENT UPDATES: FindCollabs is a company I started recently The FindCollabs Podcast is out! FindCollabs is hiring a React developer FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/ Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service

May 7, 201959 min

Ep 1133Cloud Database Workloads with Jon Daniel

Relational databases such as Postgres are often used for critical workloads, such as user account data. To run a relational database service in the cloud requires a cloud provider to set up a highly durable, highly available system. Jon Daniel is an infrastructure engineer at Heroku. Jon joins the show to describe the engineering and operations required to build a managed relational database service. Full disclosure: Heroku is a sponsor of Software Engineering Daily. RECENT UPDATES: FindCollabs is a company I started recently The FindCollabs Podcast is out! FindCollabs is hiring a React developer FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/ Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service

May 6, 201959 min

Ep 1132Satellite Data Platform with Tim Kelton

Satellite images contain vast quantities of data. By analyzing the contents of satellite images over time, we can identify trends in weather, soil, and agriculture. If we combine that data with ground-level sensors, we can gather a clearer understanding of how chemicals in the air or in the dirt map to how things look from above via satellite. Descartes Labs is a company that gathers high dimensional data about our planet and turns it into machine learning models to be used by customers. In order to do this, the company has built out a data pipeline involving queueing systems, machine learning frameworks, and internal tools that are used to aggregate, clean, model, and measure data. Tim Kelton is a co-founder of Descartes Labs and he joins the show to discuss the high volume of data and the distributed systems that make up the Descartes Labs infrastructure. RECENT UPDATES: FindCollabs is a company I started recently The FindCollabs Podcast is out! FindCollabs is hiring a React developer FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/ Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service

May 3, 201936 min

Ep 1130Security Monitoring with Jeff Williams

The modern software supply chain contains many different points of distribution: JavaScript frameworks, npm modules, Docker containers, open source repositories, cloud providers, on-prem firmware, IoT, networking proxies, and so much more. With so much attack surface, securing a large enterprise is an uphill battle. Jeff Williams is the CTO at Contrast Security, a company that makes infrastructure monitoring tools. Contrast Security works by intercepting network traffic at a low level and assessing whether that traffic maps to a common threat model. Jeff joins the show to talk about different approaches to monitoring and securing large infrastructure deployments. Contrast Community Edition RECENT UPDATES: FindCollabs is a company I started recently The FindCollabs Podcast is out! FindCollabs is hiring a React developer FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/ Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service

May 2, 201952 min

Ep 1129Software Growth with Greg Kogan

Growing a software business requires an understanding of engineering, sales, and marketing. As we learn software engineering, we also pick up some knowledge about how a business should operate. We know that there are customers, and that our product needs to be scalable to serve more customers. We know that some features are more important than others, and so we focus on building the features that matter the most. But unless we make a deliberate focus, engineers do not learn how to sell and market a software product. Learning how to sell and market software is an important skill to develop. It allows a software engineer to be self-sufficient. If you already know how to write software, sales and marketing are actually the only other pieces you need to be an “entrepreneur”. And the basics of sales and marketing are often easier and more fun to learn than the first painful days of learning basic programming. Greg Kogan is an engineer who has shifted his focus to working as a consultant for companies that are trying to go to market with a technical product. Greg has helped grow companies such as Netlify, Scalyr, and Domino Data Lab. Much of his work is around products targeted toward developers. Greg joins the show to describe his methodical approach to selling and marketing software. RECENT UPDATES: FindCollabs is a company I started recently The FindCollabs Podcast is out! FindCollabs is hiring a React developer FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/ Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service

May 1, 201944 min

Ep 1128Container Platform Security with Maya Kaczorowski

A Kubernetes instance occupies a wide footprint of multiple servers, creating an appealing target to an attacker, due to its access to a large pool of compute resources. A common attack against an exposed Kubernetes cluster is to take it over for the purposes of mining cryptocurrency. Thus it is important to keep a cluster secure. The importance of security is magnified for a cloud provider. A cloud provider runs a managed Kubernetes service, which might be running thousands of Kubernetes clusters. If the cloud provider’s chosen distribution of Kubernetes contains a vulnerability, or if the Kubernetes instances are misconfigured, all of these clusters could be exposed to the same vulnerability. Maya Kaczorowski works on the security of Google’s managed Kubernetes service GKE. In today’s show we discuss the attack surface of a managed Kubernetes service. Maya was previously on the show to talk about container security. This episode is a good companion to that one, as well as a previous show with Liz Rice about container security. RECENT UPDATES: FindCollabs is a company I started recently The FindCollabs Podcast is out! FindCollabs is hiring a React developer FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/ Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service

Apr 30, 201934 min

Ep 1127Lyft’s Data Platform with Li Gao

FindCollabs is a company I started recently The FindCollabs Podcast is out! FindCollabs is hiring a React developer FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/ Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service Lyft generates petabytes of data. Driver and rider behavior, pricing information, the movement of cars through space; all of this data is received by Lyft’s backend services, buffered into Kafka queues, and processed by various stream processing systems. Lyft moves the high volumes of data into a data lake for different users throughout the company to use offline. Machine learning jobs, batch jobs, streaming jobs and materialized databases can be created on top of that data lake. Druid and Superset are used for operational analytics and dashboarding. Li Gao is a data engineer at Lyft. He joins the show to explore the different aspects of Lyft’s data platform. We also talk about the tradeoffs of streaming frameworks, and how to manage machine learning infrastructure. This episode is a great companion to our show about Uber’s data platform, and illustrates some fundamental differences in how the two ridesharing companies operate.

Apr 29, 2019

Ep 1126Cloud with Eric Brewer

To the extent that I am a software engineering journalist, I feel inclined to scrutinize all of the cloud providers. But to the extent that I am an engineer and a business person, I feel only admiration and love for the cloud providers. Cloud computing has brought the cost of starting an Internet business down to zero. Cloud computing has opened up my eyes to a world of creative possibilities that knows no boundaries, and for that I will always be a fan of all of the rivaling cloud companies because they all have played a role in creating the current software landscape. Eric Brewer is a Google Fellow and VP Infrastructure. He is well-known for his work on the CAP theorem, a distributed systems concept that formalized the tradeoffs between consistency, availability, and partition tolerance in a distributed system. At Google, Eric is as much a strategist and product creator as he is a theoretician. He has worked on database systems such as Spanner, machine learning systems such as TensorFlow, and container orchestration systems such as Kubernetes and GKE. Eric joins the show to talk about Google’s philosophy as a cloud provider, and how his understanding of distributed systems has evolved since joining the company.

Apr 26, 20191h 4m

Ep 1125Intricately: Mapping the Internet with Fima Leshinsky

RECENT UPDATES: FindCollabs is a company I started recently The FindCollabs Podcast is out! FindCollabs is hiring a React developer FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/ Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service Intricately is a company that maps the breadth and depth of cloud infrastructure usage. Using a combination of clever algorithms, data engineering, and web crawlers, Intricately derives information about how different companies spend money on infrastructure. Fima Leshinsky is the CEO and co-founder at Intricately. In his previous job at Akamai, he began to study how a cloud provider such as Akamai could figure out how much its competitors were charging certain customers. Since CDN infrastructure is a commodity with reasonably low switching cost, a provider that can undercut its competitors significantly can have an edge in the marketplace. From his work at Akamai, Fima felt there was a market opportunity to provide this kind of service to the broader market of cloud providers. There are more cloud providers than ever before, and the kind of data that Intricately aggregates is highly useful to this competitive marketplace. Fima joins the show to talk about the modern landscape of cloud providers, and how to build a system that maps the Internet.

Apr 25, 201958 min

Ep 1124gVisor: Secure Container Sandbox with Yoshi Tamura

RECENT UPDATES: Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service FindCollabs is hiring a React developer FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat. The Linux operating system includes user space and kernel space. In user space, the user can create and interact with a variety of applications directly. In kernel space, the Linux kernel provides a stable environment in which device drivers interact with hardware and manage low level resources. A Linux container is a virtualized environment that runs within user space. To perform an operation, a process in a container in user space makes a syscall (system call) into kernel space. This allows the container to have access to resources like memory and disk. Kernel space must be kept secure to ensure operating system integrity–but Linux includes hundreds of syscalls. Each syscall represents an interface between the user space and kernel space. Security vulnerabilities can emerge from this wide attack surface of different syscalls, and most applications only need a small number of syscalls to perform their required functionality. gVisor is a project to restrict the number of syscalls that the kernel and user space need to communicate. gVisor is a runtime layer between the user space container and the kernel space. gVisor reduces the number of syscalls that can be made into kernel space. The security properties of gVisor make it an exciting project today–but it is the portability features of gVisor that hint at a huge future opportunity. By inserting an interpreter interface between containers and the Linux kernel, gVisor presents the container world with the opportunity to run on operating systems other than Linux. There are many reasons why it might be appealing to run containers on an operating system other than Linux. Linux was built many years ago, before the explosion of small devices, smart phones, IoT hubs, voice assistants and smart cars. To be more speculative, Google is working on a secretive new operating system called Fuscia. gVisor could be a layer that allows workloads to be ported from Linux servers to Fuscia servers. Yoshi Tamura is a product manager at Google with a background in containers and virtualization. He joins the show to talk about gVisor and the different kinds of virtualization.

Apr 24, 201946 min

Ep 1123Observability Engineering with James Burns

Apr 23, 20191h 5m

Ep 1122Serverless Runtimes with Steren Giannini

Apr 22, 201951 min

Ep 1121Products with Ryan Hoover

RECENT UPDATES: Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service Software Daily is looking for help with Android engineering, QA, machine learning, and more FindCollabs Hackathon has ended–winners will probably be announced by the time this episode airs; we will be announcing our next hackathon in a few weeks, so stay tuned Products are an art form. As with any art, the world of products includes creators, patrons, fans, business people, and investors. Product Hunt is a place where those different people connect to build and discuss products. Products are different from other art forms in that they are measured not only through the lens of design and beauty–but also through utility. From software to books to couches to toiletry–we all have products that have improved our lives so much that we feel a deep sense of connection and hope for that product and the people behind it. Ryan Hoover is the founder of Product Hunt, a product I have found tremendous value and satisfaction from over the years. He is also a host of Product Hunt Radio, a weekly podcast with the people creating and exploring the future. Ryan joins the show to discuss products, the process of creating something useful, and his investing strategy. Ryan runs the Weekend Fund, an early stage investment fund.

Apr 19, 201957 min

Ep 1120Facebook OSS License Policy with Joel Marcey, Michael Cheng, and Kathy Kam

RECENT UPDATES: Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service Software Daily is looking for help with Android engineering, QA, machine learning, and more FindCollabs Hackathon has ended–winners will probably be announced by the time this episode airs; we will be announcing our next hackathon in a few weeks, so stay tuned Open source policy has become a business issue as well as a political one. Businesses like Elastic, MongoDB (the company), and Redis Labs have started to view the open source licenses of the projects they work on as a means for business defensibility against cloud providers offering similar services. It remains to be seen how viable this strategy will be for the commercial open source vendors. Companies that do not directly sell commercial open source are also grappling with questions around open source licensing. Facebook has become a force in the open source world through projects like React and GraphQL. Facebook leads these projects, but Facebook is not monetizing them other than to the extent that they use the projects to build Facebook.com. Facebook’s incentives are aligned with the rest of the industry on the quality of the GraphQL and React projects. Proper licensing can help Facebook keep those incentives in alignment. Joel Marcey, Michael Cheng, and Kathy Kam from Facebook join me for a discussion of the state of open source licensing, and how that impacts Facebook.

Apr 18, 201945 min

Ep 1119Drishti: Deep Learning for Manufacturing with Krish Chaudhury

RECENT UPDATES: Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service Software Daily is looking for help with Android engineering, QA, machine learning, and more FindCollabs Hackathon has ended–winners will probably be announced by the time this episode airs; we will be announcing our next hackathon in a few weeks, so stay tuned Drishti is a company focused on improving manufacturing workflows using computer vision. A manufacturing environment consists of assembly lines. A line is composed of sequential stations along that manufacturing line. At each station on the assembly line, a worker performs an operation on the item that is being manufactured. This type of workflow is used for the manufacturing of cars, laptops, stereo equipment, and many other technology products. With Drishti, the manufacturing process is augmented by adding a camera at each station. Camera footage is used to train a machine learning model for each station on the assembly line. That machine learning model is used to ensure the accuracy and performance of each task that is being conducted on the assembly line. Krish Chaudhury is the CTO at Drishti. From 2005 to 2015 he led image processing and computer vision projects at Google before joining Flipkart, where he worked on image science and deep learning for another four years. Krish had spent more than twenty years working on image and vision related problems when he co-founded Drishti. In today’s episode, we discuss the science and application of computer vision, as well as the future of manufacturing technology and the business strategy of Drishti.

Apr 17, 201954 min

Ep 1118Lyft Data Discovery with Tao Feng and Mark Grover

RECENT UPDATES: Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service Software Daily is looking for help with Android engineering, QA, machine learning, and more FindCollabs Hackathon has ended–winners will probably be announced by the time this episode airs; we will be announcing our next hackathon in a few weeks, so stay tuned Lyft is a ridesharing company with petabytes of data. Within Lyft, many different employees can use those data sets to build useful applications. A business analyst creates a dashboard to see how driver satisfaction is changing over time. An economist studies the pricing data to ensure that Lyft’s prices are competitive. A data scientist creates a report of how the speed of a ride correlates with 5 star ratings. A machine learning engineer trains a model to detect fraud on the platform. All of these use cases make sense–and in each of them, the employee at Lyft needs to find the necessary data sets within the company to build their application. Amundsen is a tool for finding and discovering data sets within the company. Tao Feng and Mark Grover are engineers at Lyft and join the show to talk about the problem of data discovery and the tools they have built at Lyft.

Apr 16, 201954 min

Ep 1117Protein Structure Deep Learning with Mohammed Al Quraishi

RECENT UPDATES: Podsheets is our open source set of tools for managing podcasts and podcast businesses New version of Software Daily, our app and ad-free subscription service Software Daily is looking for help with Android engineering, QA, machine learning, and more FindCollabs Hackathon has ended–winners will probably be announced by the time this episode airs; we will be announcing our next hackathon in a few weeks, so stay tuned Until Google DeepMind came into the field, protein structure prediction was dominated by academics. Protein structure prediction is the process of predicting how a protein will fold by looking at genetic code. Protein structure prediction is a perfect field to approach through the application of deep learning, because the inputs are highly dimensional and there is a plentiful array of different sets of labeled data. Protein structure deep learning is a field in which many different approaches are taken, often involving supervised learning and reinforcement learning. Mohammed Al Quraishi is a systems biologist at Harvard. His background spans computer engineering, statistics, and genetics. In his work, Mohammed explores the interplay between biology and computer systems. One area of Mohammed’s focus is protein structure prediction. In a blog post last year, Mohammed gave a brief history of protein structure prediction and described the significance of DeepMind entering the field. DeepMind’s AlphaFold technology surpassed all other competitors in the most recent CASP protein structure competition. Mohammed joins the show to discuss biology, academia, deep learning, and DeepMind.

Apr 15, 201956 min

Ep 1116Podsheets: Open Source Podcasting

Podsheets is a set of open source tools for podcast hosting, publishing, ad management, community engagement, and more. Podsheets is influenced by our experience managing Software Engineering Daily, a full-time podcast business. Software Engineering Daily is a podcast that airs 5 times per week. With 4 ads per show and 50 business weeks per year, we

Apr 14, 201956 min

Ep 1115Bubbles with Haseeb Qureshi

Haseeb Qureshi is an entrepreneur and investor. As a teenager, Haseeb played poker professionally through the online poker bubble. His path from poker to software entrepreneurship has been explored in previous episodes. In 2007, Haseeb and I met at an online poker table. As we battled each other for thousands of dollars, Haseeb and I realized we shared an affinity for obnoxious screen names, obnoxious online avatars, and the city of Austin, Texas. We were both living in the city, and met each other in the real world. In our earliest days, Haseeb and I were not friends. It was a strange time–we were disembodied minds, drifting on the Internet, attached mostly to the fluctuating balances of our Full Tilt Poker and Pokerstars accounts. This was not a time for friendship–it was a time for ruthless, modern competition. Haseeb grew tired of poker. He wrote a book about the game to memorialize his thoughts, then abandoned it. He studied philosophy and literature, searching for something new in the historical musings of humanity. He traveled Europe, working as a farmer to reconnect with the physical world. He discovered the Effective Altruism movement. Finding no solace in his poker spoils, Haseeb gave away most of his money and started from scratch. As he rebuilt himself, he found software engineering and charted a path to San Francisco, where we reconnected. In this episode, Haseeb joins me for a discussion of software, philosophy, poker, and the nature of bubbles. Indeed, Haseeb and I have now lived through four major bubbles: dot coms, poker, the 2008 financial crisis, and the crypto bubble. Throughout these bubbles, the mediums change but never does the message: human beings are deeply irrational, tribalistic, and emotional.

Apr 12, 20191h 34m

Ep 1112Consul Service Mesh with Paul Banks

RECENT UPDATES: FindCollabs $5000 Hackathon Ends Saturday April 15th, 2019 New version of Software Daily, our app and ad-free subscription service Software Daily is looking for help with Android engineering, QA, machine learning, and more Consul is a tool from HashiCorp that allows users to store and retrieve information from a highly available key/value data store. Consul is used for storage of critical cluster information, such as service IP locations and configuration data. A service interacts with Consul via a daemon process on the node of that service. The daemon process periodically shares information with the Consul server over a gossip UDP protocol and can share data on a more immediate basis using TCP. Consul’s functionality has increased recently to add secure service connectivity. Consul Connect allows services to establish mutual TLS encryption with each other. The addition of mutual TLS to the Consul feature set is closely incidental with Consul gaining a title of “service mesh.” Service mesh is an increasingly popular pattern that can encompass a variety of features: load balancing, security policy management, service discovery, and routing. Tools which offer self-described “service mesh” functionality include Linkerd, Kong, AWS App Mesh, Solo.io Gloo, and Google’s Istio open source project. Paul Banks is the engineering lead of Consul at HashiCorp. He joins the show to talk about the service mesh category and the past, present, and future of Consul.

Apr 11, 201958 min

Ep 1111Machine Learning Joins with Arun Kumar

RECENT UPDATES: FindCollabs $5000 Hackathon Ends Saturday April 15th, 2019 New version of Software Daily, our app and ad-free subscription service Software Daily is looking for help with Android engineering, QA, machine learning, and more Data sets can be modeled in a row-wise, relational format. When two data sets share a common field, those data sets can be combined in a procedure called a join. A join combines the data of two data sets into one data set that is often bigger than the initial two data sets independently occupied. In fact, this new data set is often so much bigger that it creates problems for the machine learning engineers. Arun Kumar is an assistant professor at UC San Diego. He joins the show to discuss the modern lifecycle of machine learning models, and the gaps in the tooling. Arun’s research into improving processing of joined data sets has been adopted by companies such as Google. Some of that research has been adapted into open source machine learning tools that improve the performance of machine learning jobs with minimal code required.

Apr 10, 20191h 2m

Ep 1108Streaming with Holden Karau

RECENT UPDATES: FindCollabs $5000 Hackathon Ends Saturday April 15th, 2019 New version of Software Daily, our app and ad-free subscription service Software Daily is looking for help with Android engineering, QA, machine learning, and more Distributed stream processing allows developers to build applications on top of large sets of data that are being rapidly created. Stream processing is often described as an alternative to batch processing. In batch processing, a single large computation is performed over a large, static data set. In stream processing, a computation is performed repeatedly and continuously over a data set that is being appended to. A stream is often stored in a distributed queue such as Kafka, Kinesis, Pulsar, or Google PubSub. A stream is often processed with a stream processing tool such as Spark, Flink, Storm, or Google Cloud Dataflow. Holden Karau is an engineer who works on open source projects at Google. She returns to the show to describe the state of stream processing and discuss modern best practices.

Apr 9, 201948 min

Ep 1106AWS Storage with Kevin Miller

RECENT UPDATES: FindCollabs $5000 Hackathon Ends Saturday April 15th, 2019 New version of Software Daily, our app and ad-free subscription service Software Daily is looking for help with Android engineering, QA, machine learning, and more A software application requires compute and storage. Both compute and storage have been abstracted into cloud tools that can be used by developers to build highly available distributed systems. In our previous episode, we explored the compute side. In today’s episode we discuss storage. Application developers store data in a variety of abstractions. In-memory caches allow for fast lookups. Relational databases allow for efficient retrieval of well-structured tables. NoSQL databases allow for retrieval of documents that may have a less defined schema. File storage systems allow the access pattern of nested file systems, like on your laptop. Distributed object storage systems allow for highly durable storage of any data type. Amazon S3 is a distributed object storage system with a wide spectrum of use cases. S3 is used for media file storage, archiving of log files, and data lake applications. S3 functionality has increased over the years, developing different tiers of data retrieval latency and cost structure. AWS S3 Glacier allows for long-term storage of data at a large cost reduction, in exchange for increased latency of data access. Kevin Miller is the general manager of Amazon Glacier at Amazon Web Services. He joins the show to talk about the history of storage, the different options for storage in the cloud, and the design of S3 Glacier.

Apr 8, 201951 min

Ep 1105AWS Compute with Deepak Singh

Upcoming event: FindCollabs Hackathon at App Academy on April 6, 2019 On Amazon Web Services, there are many ways to run an application on a single node. The first compute option on AWS was the EC2 virtual server instance. But EC2 is a large abstraction compared to what many people need for their nodes–which is a container with a smaller set of resources to work with. Containers can be run within a managed cluster like ECS or EKS, or run on their own as AWS Fargate instances, or simply as Docker containers running without a container orchestration tool. Beyond the option of explicit container instances, users can run their application as a “serverless” function-as-a-service such as AWS Lambda. Functions-as-a-service abstract away the container and let the developer operate at a higher level, while also providing some cost savings. Developers use these different compute options for different reasons. Deepak Singh is the director of compute services at Amazon Web Services, and he joins the show to discuss the use cases and tradeoffs of these options. Deepak also discusses how these tools are useful internally to AWS. ECS and Lambda are high-level APIs that are used to build even higher level services such as AWS Batch, which is a service for performing batch processing over large data sets.

Apr 5, 201954 min

Ep 1103Data with Ben Lorica

Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 Ben Lorica is the chief data scientist at O’Reilly Media and the program director of the Strata Data Conference. In his work, Ben spends time with people across the software industry, giving him broad perspective. In the early days of the data engineering ecosystem, the Hadoop vendor wars were starting between Cloudera and Hortonworks. Strata was a neutral ground for practitioners and open source contributors to meet and share ideas about the Hadoop ecosystem. Since then, the conference has grown to encompass topics such as data science, distributed databases, streaming frameworks, and machine learning. There are many open questions in the data world right now. What is the best path that an enterprise can take to build out a data platform? How should a software team be arranged to efficiently build machine learning models? Which distributed streaming frameworks should I use for what purpose? Ben joins the show to discuss modern data engineering, data science, and infrastructure.

Apr 4, 201947 min

Ep 1102Stablecoins with Rune Christensen

Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 A currency can fulfill numerous financial use cases. One use case is store of value: currency holders can reliably expect their currency to maintain some value, though that value may fluctuate over time. Another use case is speculation: currency holders are owning currency in the hope that the market price of the currency will increase over time. Bitcoin is a useful store of value and an instrument for speculation. However, Bitcoin still does not fulfill the financial use case that most people need from a currency: price stability. The price of Bitcoin fluctuates rapidly, making it difficult to use Bitcoin for small purchases such as coffee. Imagine you want to buy a cup of coffee with Bitcoin. The coffee shop owner needs to offer the option to sell you that cup of coffee using Bitcoin as the medium of exchange. This owner must denominate the price of that coffee as some number of Bitcoin. Since the price of Bitcoin fluctuates so rapidly, the coffee shop owner needs to adjust the price of that cup of coffee constantly in order to make sure that the coffee is cheap enough for the consumer to want to buy it, but expensive enough to make a profit. It is hard to assign prices to market goods in terms of Bitcoin because the currency is in constant flux. Even though many of us would like to use Bitcoin in our everyday lives, most marketplaces are denominated in US dollars or other currencies because a marketplace needs a stable currency in order to operate. Rune Christensen is the CEO of MakerDAO, a system that provides a price-stable cryptocurrency. MakerDAO is an elegant set of currencies, collateralized debt, smart contracts, and other incentive tools that result in the creation of several transparent, decentralized financial instruments. Rune joins the show to talk about the importance of stablecoins and how MakerDAO has engineered a decentralized currency that has maintained stability even through tumultuous market conditions.

Apr 3, 20191h 11m

Ep 1101Blitzscaling with Chris Yeh

Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 Chris Yeh is an entrepreneur, investor, and author. He co-wrote Blitzscaling with LinkedIn founder Reid Hoffman. Blitzscaling is a strategy for growing a company that has found product market fit. Blitzscaling prioritizes speed over efficiency, arguing that fast growth is necessary to achieve “first scaler advantage.” When a company is the first to scale successfully within a large market, that company gains access to a wealth of market opportunities that are not available to companies which are not at scale. Examples of successful Blitzscalers include Airbnb, LinkedIn, Amazon, and Facebook. In the hypergrowth phases of these companies, there were deliberate strategic tradeoffs that caused the company to suffer in the short term in exchange for the chance at market dominance in the long term. Blitzscaling is a broad strategic concept which manifests differently in different companies. When Airbnb was in its early stages of growth in 2011, the company was faced with the existential threat of a European competitor called Wimdu. Wimdu offered to sell to Airbnb, but this would have required the merger of two companies with distinctly different cultures. Instead, Airbnb chose to raise more money and rapidly expand into Europe. In contrast, Google’s rapid path to becoming a dominant information service involved acquisitions that we now see as key Google products, including Android, Google Maps, and Google Earth. Through numerous examples in recent business history, Blitzscaling explores the fundamental tradeoff between speed and efficiency, usually biasing speed as the preferable element. But Blitzscaling does not work for every company. In the food delivery sector, many companies who tried to blitzscale ended up going out of business because they had lowered their prices too much in order to try to earn customer loyalty. By lowering their prices too much, food delivery startups built businesses with fundamentally bad unit economics and a fickle customer base. In other cases, aggressive blitzscaling can work for a short period of time, but can cause a company’s culture to suffer in ways that are very hard to repair. Blitzscaling can also cause problems in a core software product. Growing too quickly can cause a product to have a bloated user interface. If the backend infrastructure layer expands too quickly, sensitive data could be left exposed due to a lack of proper software security policies. Chris Yeh joins the show to talk about the strategy of Blitzscaling and his wide-ranging career. Chris studied creative writing and product design at Stanford before joining DE Shaw, the famous quantitative hedge fund. Later, he became an investor and worked in several leadership roles in software companies. His wide range of experiences make Chris an excellent author and conversationalist. We explored the ideas of both Blitzscaling and his previous book The Alliance, which lays out a modern vision for the dynamic between employers and employees. We also talked about investing, Dungeons and Dragons, and podcasting.

Apr 2, 20191h 8m

Ep 1100Uber Infrastructure with Prashant Varanasi and Akshay Shah

Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 Uber’s infrastructure supports millions of riders and billions of dollars in transactions. Uber has high throughput and high availability requirements, because users depend on the service for their day-to-day transportation. When Uber was going through hypergrowth in 2015, the number of services was growing rapidly, as was the load across those services. Using a cloud provider was a risky option, because the costs could potentially grow out of control. Uber made a decision early on to invest in physical hardware in order to keep costs at a reasonable level. In the last 3 years, Uber’s infrastructure has stabilized. The platform engineering team has built systems for monitoring, deployment, and service proxying. Developing and maintaining microservices within Uber has become easier. Prashant Varanasi and Akshay Shah are engineers who have been with Uber for more than three years. They work on Uber’s platform engineering team, and their current focus is on the service proxy layer, a sidecar that runs alongside Uber services providing features such as load balancing, service discovery, and rate limiting. Prashant and Akshay join the show to talk about Uber infrastructure, microservices, and the architecture of a service proxy. We also talk in detail about the benefits of using Go for critical systems infrastructure, and some techniques for profiling and debugging in Go.

Apr 1, 20191h 5m

Ep 1099Workload Scheduling with Brian Grant

Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 Google has been building large-scale scheduling systems for more than fifteen years. Google Borg was started around 2003, giving engineers at Google a unified platform to issue long-lived service workloads as well as short-lived batch workloads onto a pool of servers. Since the early days of Borg, the scheduler systems built by Google have matured through several iterations. Omega was an effort to improve the internal Borg system, and Kubernetes is an open source container orchestrator built with the learnings of Borg and Omega. A scheduling system needs to be able to accept a wide variety of workload types and find compute resources within a cluster to schedule those workloads onto. There is a wide variety of potential workloads that could be scheduled–batch jobs, stateful services, stateless services, and daemon services. Different workloads can have different priority levels. A high priority workload should be able to find compute resources quickly, and a low priority workload can wait longer to find resources. Brian Grant is a principal engineer at Google. He joins the show to talk about his experience building workload schedulers and designing APIs for engineers to interface with those schedulers.

Mar 29, 201946 min

Ep 1098Peloton: Uber’s Cluster Scheduler with Min Cai and Mayank Bansal

Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 Google’s Borg system is a cluster manager that powers the applications running across Google’s massive infrastructure. Borg provided inspiration for open source tools like Apache Mesos and Kubernetes. Over the last decade, some of the largest new technology companies have built their own systems that fulfill the roles of cluster management and resource scheduling. Netflix, Twitter, and Facebook have all spoken about their internal projects to make distributed systems resource allocation more economical. These companies find themselves continually reinventing scheduling and orchestration, with inspiration from Google Borg and their own internal experiences running large numbers of containers and virtual machines. Uber’s engineering team has built a cluster scheduler called Peloton. Peloton is based on Apache Mesos, and is architected to handle a wide range of workloads: data science jobs like Hadoop MapReduce; long running services such as a ridesharing marketplace service; monitoring daemons such as Uber’s M3 collector; and database services such as MySQL. Min Cai and Mayank Bansal are engineers at Uber who work on Peloton. When they set out to create Peloton, they looked at the existing schedulers in the ecosystem, including Kubernetes, Mesos, Hadoop’s YARN system, and Borg itself. Both Min and Mayank join the show today to give a brief history of distributed systems schedulers and discuss their work on Peloton. They have been working in the world of distributed systems schedulers for many years–including experiences building core Hadoop infrastructure and virtual machine schedulers at VMware.

Mar 28, 201949 min

Ep 1097Scaling Log Management with Renaud Boutet

Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 Log management requires the processing and indexing of high volumes of semi-structured data. A log management service takes log data and puts it in a cloud-hosted application so that application operators can access those logs to troubleshoot issues. A large tech company will produce terabytes of logs. Those logs are produced on the host where a service is running. A logging agent on that host will transfer the logs to the log management service in the cloud. Once the logs are in the cloud, they are parsed, indexed, and stored in a way that is easy to query. In 2014, Renaud Boutet co-founded Logmatic, a log management service that eventually became a leading provider. Logmatic was acquired by Datadog, and Renaud now works as a vice president at Datadog. In today’s episode, Renaud joins the show to talk about the architecture of a log management service. We talk about storage tiers, scalability requirements, failover strategies, and logging for serverless functions. Full disclosure: Datadog is a sponsor of Software Engineering Daily.

Mar 27, 201949 min

Ep 1096Security Businesses with Steve Herrod

Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 Steve Herrod was the CTO at VMware and now works as a managing director at General Catalyst, where he focuses on investments relating to security. Large enterprises are difficult to secure. An enterprise has sprawling infrastructure, with both on-prem and cloud infrastructure. Identity management systems, vulnerability scanning, secure network infrastructure, and policy management tools are just a few example areas where enterprises spend billions of dollars on security software. Threats often make their way into an enterprise by way of social engineering. This can result in phishing attacks, corporate espionage, and ransomware. Protecting against social engineering is very difficult, as there are so many channels to communicate through–Facebook Messenger, Linkedin, email, and ad networks can all be used to perform social engineering attacks. Enterprise security software is a very different business from other types of software companies. Unlike developer tools or cloud infrastructure, security software is usually not self-serve. Security solutions usually require a longer sales and integration process with a customer. Steve Herrod joins the show to talk about the enterprise security world, the go-to-market strategy for successful security companies, and his perspective on what makes for a viable venture capital investment.

Mar 26, 20191h 15m

Ep 1095CodeSandbox: Online Code Editor with Bas Buursma and Ives van Hoorne

Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 Coding in the browser has been attempted several times in the last decade. Building a development environment in the browser has numerous technical challenges. How does the code execute safely? How do you fit all of the requirements of a development environment into a browser window? How do you get users to switch from their normal IDE (interactive development environment)? CodeSandbox is an online code editor created by Ives van Hoorne and Bas Buursma. CodeSandbox allows users to program and run applications in the browser. It is a full developer platform that allows users the ability to install npm modules, run their code, and share their applications with other users. The engineering problems within CodeSandbox are not easy–building a web-based IDE is complicated. But CodeSandbox is also an exciting project because it lowers the barrier to entry for many newer programmers. The development experience for a new programmer is still a difficult onramp. If you are an experienced developer, you have a workflow that you are comfortable with. It might involve vim, or emacs, or JetBrains IDEs, or Eclipse. But newer developers can find these environments confusing and hard to get started with. The development environments of today are integrated with build tools, Github repositories, and deployment platforms. This can be overwhelming for a newer developer. CodeSandbox is a very visual tool, which makes it especially useful for new developers who learn through seeing examples running live in the browser. CodeSandbox is also used by web developers who want a modern, shareable form of developing software. Ives and Bas join the show to talk about the motivation for CodeSandbox and the engineering challenges they have solved.

Mar 25, 201950 min

Ep 1094Apache Superset with Maxime Beauchemin

Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 Data engineering touches every area of an organization. Engineers need a data platform to build search indexes and microservices. Data scientists need data pipelines to build machine learning models. Business analysts need flexible dashboards to understand the trends and customer use for a product. Max Beauchemin is a data engineer who has worked at Airbnb, Lyft, and Facebook. He’s the creator of two successful open source projects: Apache Airflow and Apache Superset. In a previous show, Max discussed data engineering at Airbnb, and the usage of Airflow. In today’s show, Max discusses the engineering of Apache Superset. Superset is an open source business intelligence web application. Superset allows users to create visualizations, slice and dice their data, and query it. Superset integrates with Druid, a database that supports exploratory, OLAP-style workloads. One reason Superset is distinctive is that it is a full open source application. Many open source projects are tools like databases, command line tools, and web frameworks. Superset is an open source application that can be used by individuals who are not developers–so the audience is wider than the typical open source tool built for engineers. Max joins the show to talk about his experience as a data engineer at Airbnb and Lyft, and the open source projects he has started.

Mar 22, 20191h 2m

Ep 1093FaunaDB with Evan Weaver

Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 Twitter’s early engineers faced scalability problems that caused infrastructure failures on a regular basis. The infamous “fail whale” could happen as a result of problems in the application servers, the network, or the database layer. When Twitter was scaling in its early days, the cloud providers were still immature. Engineers did not have access to the autoscaling cloud infrastructure that is available today. The early Twitter architecture was a combination of open source tools and internally created infrastructure custom built for Twitter’s workloads. Evan Weaver was an early engineer at Twitter, and he saw the deficiencies of the data tools that the company had access to. Twitter engineers wanted access to a truly reusable data platform that would fit Twitter’s requirements: high availability, globally replicated, and transactionally consistent. By 2012, Evan had left Twitter and started consulting for other technology companies. He found that databases across the industry were lacking the same properties that Twitter wanted, and the ideas for FaunaDB began to percolate. Around this time, there were two relevant papers about distributed databases that had come out: the Spanner paper from Google and the Calvin paper, a distributed systems paper from Yale. With inspiration from the literature, his time at Twitter, and his knowledge from consulting, Evan started FaunaDB. Seven years later, FaunaDB is a fully fledged open source project as well as a database company with a cloud service offering. Fauna is an OLTP database used by companies like Nvidia, Nextdoor, and Capital One. Evan joins the show to talk about his time spent scaling Twitter and the architecture of FaunaDB.

Mar 21, 201952 min

Ep 1092ElasticSearch at Scale with Volkan Yazici

Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 Bol.com is the biggest e-commerce company in the Netherlands and Belgium. For 20 years, Bol has been developing its software architecture, which includes a variety of services and databases, and a mix of physical and cloud infrastructure. For an ecommerce company, the search engine is critical for allowing customers to find the products they are looking for. But search also has many applications for internal systems. A search engine is a database with a query engine, and internal application developers want to build on top of that database. Volkan Yazici is an engineer at Bol.com specializing in search and the author of the blog post Using ElasticSearch as the Primary Data Store. In his post, Volkan describes the process of scaling ElasticSearch to fit the use cases of both internal and external users at a large ecommerce company. Volkan joins the show to discuss how search infrastructure at scale can require a carefully architected data pipeline in order to propagate changes to a large data set to a search index.

Mar 20, 201953 min

Ep 1091Serverless GraphQL with Tanmai Gopal

Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 Modern web development tools have given frontend developers more power. On the frontend, JavaScript frameworks like React and Vue have become easier to work with. For deployment, tools like Netlify and Zeit give developers a workflow that is tightly integrated with GitHub. At the database layer, autoscaling document storage systems like Firebase and hosted Mongo solutions make it easier to work with objects. There are also a multitude of APIs that give developers rich business functionality out of the box, making it easy to build applications around SMS, payments, and computer vision. If you are building a new application today, you have the option to build it around a completely “serverless” architecture. As the backend and frontend have changed, the middleware to communicate between those layers has also evolved. GraphQL is a modern way of fetching data from disparate data sources. In previous episodes, we have talked about how GraphQL works, and some common patterns for using GraphQL in mature applications. In today’s episode, Tanmai Gopal joins the show to describe how to use GraphQL in newer applications. Tanmai is the CEO of Hasura, a company building tools around GraphQL. He discusses the advantages of using serverless functions together with GraphQL, and how to architect an event-based serverless application.

Mar 19, 201955 min

Ep 1090OSS Businesses with Mike Volpi

In the world of commercial open source, there is plenty of room for both point solution providers and cloud providers. But they are competing for the same customers, and the competitive battlefield is expanding to the nuanced world of software licensing. By changing their licenses, open source projects like Kafka, MongoDB, and Redis can prohibit AWS from certain usage patterns. This might offer some protection for companies based around the point solutions–companies like Confluent and RedisLabs. Beyond the fracas of the battle between cloud providers and point solutions, there are newer open source companies with models that do not fit tightly into any historical business models. HashiCorp makes a suite of differentiated open source tools that have not been seriously contested or offered as a service by cloud providers. GitLab makes an open source platform that is built with monitoring, logging, CI, and code hosting out of the box. As the world of open source business models expands, more companies will find opportunity in open sourcing the code that runs their products. In many cases, they will find that it strengthens their advantage rather than weakens it. The defensibility of many businesses relies more on data and network effects than the contents of the codebase. We may see the default question gradually shift from “why should I open source my codebase?” to “why shouldn’t I open source my codebase?” Mike Volpi is a partner at Index Ventures and has invested in many open source businesses over the last decade. He is on the board of Confluent, Cockroach Labs, Kong, and Elastic. Mike joins the show to share his perspective on open source business models of the past, present, and future.

Mar 18, 20191h 4m

Ep 1089Crypto Bubble with Haseeb Qureshi

This is a post written and narrated by Haseeb Qureshi, a cryptocurrency investor and entrepreneur. Haseeb is speaking at an upcoming Software Engineering Daily Meetup. The ICO bubble had no single cause. Mono-causal explanations always fall short in explaining complex phenomena. But its effects are easier to pinpoint. There are now many world class teams well-capitalized to build, scale, and evolve blockchain technology, and tens of millions of people in the world who now understand decentralization, proof of work, and private keys. Looking back, it’s really quite amazing! It comes at a high cost, but Perez hints: it’s likely that bubbles like these are the only way to overcome technological inertia. At the same time, most people had their first interaction with crypto during its orgiastic adolescence. It’s not a great look. But this has been true for every technological revolution of the last 250 years. In that regard, crypto is in good company. I was too young to appreciate the dot com bubble when it happened. It’s strange to say, but I’m glad to have witnessed a speculative bubble from up close. I’ve now got war stories to share with future generations. It was a wild time, when anyone in the world could launch a coin and raise tens of millions of dollars to build a network that no one could control. I don’t think we’ll see anything like that again for a long time. So what happens now? If you believe that crypto has the stuff of a technological revolution, then as Perez puts it, the collapse will pave the way for a more fruitful deployment phase. At the end of the day, I’m an optimist about technology. So it won’t surprise you that I think this deployment phase is coming. But it will be slow, unglamorous, and probably won’t make for nearly as entertaining of headlines. Oh well.

Mar 17, 201947 min

« Prev 22 23 242526 27 28 Next »