
The Python Podcast.__init__
389 episodes — Page 1 of 8

Ep 388Update Your Model's View Of The World In Real Time With Streaming Machine Learning Using River
FullPreamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary The majority of machine learning projects that you read about or work on are built around batch processes. The model is trained, and then validated, and then deployed, with each step being a discrete and isolated task. Unfortunately, the real world is rarely static, leading to concept drift and model failures. River is a framework for building streaming machine learning projects that can constantly adapt to new information. In this episode Max Halford explains how the project works, why you might (or might not) want to consider streaming ML, and how to get started building with River. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started! Your host is Tobias Macey and today I’m interviewing Max Halford about River, a Python toolkit for streaming and online machine learning Interview Introduction How did you get involved in machine learning? Can you describe what River is and the story behind it? What is "online" machine learning? What are the practical differences with batch ML? Why is batch learning so predominant? What are the cases where someone would want/need to use online or streaming ML? The prevailing pattern for batch ML model lifecycles is to train, deploy, monitor, repeat. What does the ongoing maintenance for a streaming ML model look like? Concept drift is typically due to a discrepancy between the data used to train a model and the actual data being observed. How does the use of online learning affect the incidence of drift? Can you describe how the River framework is implemented? How have the design and goals of the project changed since you started working on it? How do the internal representations of the model differ from batch learning to allow for incremental updates to the model state? In the documentation you note the use of Python dictionaries for state management and the flexibility offered by that choice. What are the benefits and potential pitfalls of that decision? Can you describe the process of using River to design, implement, and validate a streaming ML model? What are the operational requirements for deploying and serving the model once it has been developed? What are some of the challenges that users of River might run into if they are coming from a batch learning background? What are the most interesting, innovative, or unexpected ways that you have seen River used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on River? When is River the wrong choice? What do you have planned for the future of River? Contact Info Email @halford_max on Twitter MaxHalford on GitHub Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links River scikit-multiflow Federated Machine Learning Hogwild! Google Paper Chip Huyen concept drift blog post Dan Crenshaw Berkeley Clipper MLOps Robustness Principle NY Taxi Dataset RiverTorch River Public Roadmap Beaver tool for deploying online models Prodigy ML human in the loop labeling The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0 Sponsored By:Linode: Do you want to try out some of the tools and applications that you heard about on Podcast.\_\_init\_\_? Do you have a side project that you want to share with the world? With Linode's managed Kubernetes platform it's now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the

Ep 387Declarative Machine Learning For High Performance Deep Learning Models With Predibase
FullPreamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary Deep learning is a revolutionary category of machine learning that accelerates our ability to build powerful inference models. Along with that power comes a great deal of complexity in determining what neural architectures are best suited to a given task, engineering features, scaling computation, etc. Predibase is building on the successes of the Ludwig framework for declarative deep learning and Horovod for horizontally distributing model training. In this episode CTO and co-founder of Predibase, Travis Addair, explains how they are reducing the burden of model development even further with their managed service for declarative and low-code ML and how they are integrating with the growing ecosystem of solutions for the full ML lifecycle. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host is Tobias Macey and today I’m interviewing Travis Addair about Predibase, a low-code platform for building ML models in a declarative format Interview Introduction How did you get involved in machine learning? Can you describe what Predibase is and the story behind it? Who is your target audience and how does that focus influence your user experience and feature development priorities? How would you describe the semantic differences between your chosen terminology of "declarative ML" and the "autoML" nomenclature that many projects and products have adopted? Another platform that launched recently with a promise of "declarative ML" is Continual. How would you characterize your relative strengths? Can you describe how the Predibase platform is implemented? How have the design and goals of the product changed as you worked through the initial implementation and started working with early customers? The operational aspects of the ML lifecycle are still fairly nascent. How have you thought about the boundaries for your product to avoid getting drawn into scope creep while providing a happy path to delivery? Ludwig is a core element of your platform. What are the other capabilities that you are layering around and on top of it to build a differentiated product? In addition to the existing interfaces for Ludwig you created a new language in the form of PQL. What was the motivation for that decision? How did you approach the semantic and syntactic design of the dialect? What is your vision for PQL in the space of "declarative ML" that you are working to define? Can you describe the available workflows for an individual or team that is using Predibase for prototyping and validating an ML model? Once a model has been deemed satisfactory, what is the path to production? How are you approaching governance and sustainability of Ludwig and Horovod while balancing your reliance on them in Predibase? What are some of the notable investments/improvements that you have made in Ludwig during your work of building Predibase? What are the most interesting, innovative, or unexpected ways that you have seen Predibase used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Predibase? When is Predibase the wrong choice? What do you have planned for the future of Predibase? Contact Info LinkedIn tgaddair on GitHub @travisaddair on Twitter Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co

Ep 386Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks
FullPreamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary Machine learning has the potential to transform industries and revolutionize business capabilities, but only if the models are reliable and robust. Because of the fundamental probabilistic nature of machine learning techniques it can be challenging to test and validate the generated models. The team at Deepchecks understands the widespread need to easily and repeatably check and verify the outputs of machine learning models and the complexity involved in making it a reality. In this episode Shir Chorev and Philip Tannor explain how they are addressing the problem with their open source deepchecks library and how you can start using it today to build trust in your machine learning applications. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Do you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you. Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out! Data powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today! Your host is Tobias Macey and today I’m interviewing Shir Chorev and Philip Tannor about Deepchecks, a Python package for comprehensively validating your machine learning models and data with minimal effort. Interview Introduction How did you get involved in machine learning? Can you describe what Deepchecks is and the story behind it? Who is the target audience for the project? What are the biggest challenges that these users face in bringing ML models from concept to production and how does DeepChecks address those problems? In the absence of DeepChecks how are practitioners solving the problems of model validation and comparison across iteratiosn? What are some of the other tools in this ecosystem and what are the differentiating features of DeepChecks? What are some examples of the kinds of tests that are useful for understanding the "correctness" of models? What are the methods by which ML engineers/data scientists/domain experts can define what "correctness" means in a given model or subject area? In software engineering the categories of tests are tiered as unit -> integration -> end-to-end. What are the relevant categories of tests that need to be built for validating the behavior of machine learning models? How do model monitoring utilities overlap with the kinds of tests that you are building with deepchecks? Can you describe how the DeepChecks package is implemented? How have the design and goals of the project changed or evolved from when you started working on it? What are the assumptions that you have built up from your own experiences that have been challenged by your early users and design partners? Can you describe the workflow for an individual or team using DeepChecks as part of their model training and deployment lifecycle? Test engineering is a deep discipline in its own right. How have you approached the user experience and API design to reduce the overhead for ML practitioners to adopt good practices? What are the interfaces available for creating reusable tests and compo

Ep 385Build A Full Stack ML Powered App In An Afternoon With Baseten
FullPreamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary Building an ML model is getting easier than ever, but it is still a challenge to get that model in front of the people that you built it for. Baseten is a platform that helps you quickly generate a full stack application powered by your model. You can easily create a web interface and APIs powered by the model you created, or a pre-trained model from their library. In this episode Tuhin Srivastava, co-founder of Basten, explains how the platform empowers data scientists and ML engineers to get their work in production without having to negotiate for help from their application development colleagues. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host is Tobias Macey and today I’m interviewing Tuhin Srivastava about Baseten, an ML Application Builder for data science and machine learning teams Interview Introduction How did you get involved in machine learning? Can you describe what Baseten is and the story behind it? Who are the target users for Baseten and what problems are you solving for them? What are some of the typical technical requirements for an application that is powered by a machine learning model? In the absence of Baseten, what are some of the common utilities/patterns that teams might rely on? What kinds of challenges do teams run into when serving a model in the context of an application? There are a number of projects that aim to reduce the overhead of turning a model into a usable product (e.g. Streamlit, Hex, etc.). What is your assessment of the current ecosystem for lowering the barrier to product development for ML and data science teams? Can you describe how the Baseten platform is designed? How have the design and goals of the project changed or evolved since you started working on it? How do you handle sandboxing of arbitrary user-managed code to ensure security and stability of the platform? How did you approach the system design to allow for mapping application development paradigms into a structure that was accessible to ML professionals? Can you describe the workflow for building an ML powered application? What types of models do you support? (e.g. NLP, computer vision, timeseries, deep neural nets vs. linear regression, etc.) How do the monitoring requirements shift for these different model types? What other challenges are presented by these different model types? What are the limitations in size/complexity/operational requirements that you have to impose to ensure a stable platform? What is the process for deploying model updates? For organizations that are relying on Baseten as a prototyping platform, what are the options for taking a successful application and handing it off to a product team for further customization? What are the most interesting, innovative, or unexpected ways that you have seen Baseten used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Baseten? When is Baseten the wrong choice? What do you have planned for the future of Baseten? Contact Info @tuhinone on Twitter LinkedIn Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Baseten Gumroad scikit-learn Tensorflow Keras Streamlit Podcast.__init__ Episode Retool Hex Podcast.__init__ Episode Kubernetes React Monaco

Ep 384Skip Straight To The Fun Part Of Your Project With PyScaffold
FullSummary Starting a new project is always exciting and full of possibility, until you have to set up all of the repetitive boilerplate. Fortunately there are useful project templates that eliminate that drudgery. PyScaffold goes above and beyond simple template repositories, and gives you a toolkit for different application types that are packed with best practices to make your life easier. In this episode Florian Wilhelm shares the story behind PyScaffold, how the templates are designed to reduce friction when getting a new project off the ground, and how you can extend it to suit your needs. Stop wasting time with boring boilerplate and get straight to the fun part with PyScaffold! Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Florian Wilhelm about PyScaffold, a Python project template generator with batteries included Interview Introductions How did you get introduced to Python? Can you describe what PyScaffold is and the story behind it? What is the main goal of the project? There are a huge number of templates and starter projects available (both in Python and other languages). What are the aspects of PyScaffold that might encourage someone to adopt it? What are the different types/categories of applications that you are focused on supporting with the scaffolding? For each category, what is your selection process for which dependencies to include? How do you approach the work of keeping the various components up to date with community "best practices"? Can you describe how PyScaffold is implemented? How have the design and goals of the project changed since you first started it? What is the user experience for someone bootstrapping a project with PyScaffold? How can you adapt an existing project into the structure of a pyscaffold template? Are there any facilities for updating a project started with PyScaffold to include patches/changes in the source template? What are the most interesting, innovative, or unexpected ways that you have seen PyScaffold used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on PyScaffold? When is PyScaffold the wrong choice? What do you have planned for the future of PyScaffold? Keep In Touch Website LinkedIn FlorianWilhelm on GitHub @florianwilhelm on Twitter Picks Tobias Daredevil TV series Florian The Peripheral Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links PyScaffold Innovex SAP Cookiecutter Pytest Podcast Episode Sphinx pre-commit Podcast Episode Black Flake8 Podcast Episode Poetry Setuptools mkdocs ReStructured Text Markdown Setuptools-SCM Hatch Flit Versioneer Gource git visualization MyPy Compiler Rust Cargo The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 383Add Configuration Best Practices To Your Application In An Afternoon With Dynaconf
FullSummary Application configuration is a deceptively complex problem. Everyone who is building a project that gets used more than once will end up needing to add configuration to control aspects of behavior or manage connections to other systems and services. At first glance it seems simple, but can quickly become unwieldy. Bruno Rocha created Dynaconf in an effort to provide a simple interface with powerful capabilities for managing settings across environments with a set of strong opinions. In this episode he shares the story behind the project, how its design allows for adapting to various applications, and how you can start using it today for your own projects. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Bruno Rocha about Dynaconf, a powerful and flexible framework for managing your application’s configuration settings Interview Introductions How did you get introduced to Python? Can you describe what Dynaconf is and the story behind it? What are your main goals for Dynaconf? What kinds of projects (e.g. web, devops, ML, etc.) are you focused on supporting with Dynaconf? Settings management is a deceptively complex and detailed aspect of software engineering, with a lot of conflicting opinions about the "right way". What are the design philosophies that you lean on for Dynaconf? Many engineers end up building their own frameworks for managing settings as their use cases and environments get increasingly complicated. What are some of the ways that those efforts can go wrong or become unmaintainable? Can you describe how Dynaconf is implemented? How have the design and goals of the project evolved since you first started it? What is the workflow for getting started with Dynaconf on a new project? How does the usage scale with the complexity of the host project? What are some strategies that you recommend for integrating Dynaconf into an existing project that already has complex requirements for settings across multiple environments? Secrets management is one of the most frequently under- or over-engineered aspects of application configuration. What are some of the ways that you have worked to strike a balance of making the "right way" easy? What are some of the more advanced or under-utilized capabilities of Dynaconf? What are the most interesting, innovative, or unexpected ways that you have seen Dynaconf used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Dynaconf? When is Dynaconf the wrong choice? What do you have planned for the future of Dynaconf? Keep In Touch rochacbruno on GitHub @rochacbruno on Twitter Website LinkedIn Picks Tobias SOPS Bruno Severance tv series Learn Rust Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Dynaconf Dynaconf GitHub Org Ansible Bash Perl 12 Factor Applications TOML Hashicorp Vault Pydantic Airflow Hydroconf The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 382Take A Tour Of The Hidden Language Of Hardware And How It Powers Your Code
FullSummary Software is eating the world, but that code has to have hardware to execute the instructions. Most people, and many software engineers, don’t have a proper understanding of how that hardware functions. Charles Petzold wrote the book "Code: The Hidden Language of Computer Hardware and Software" to make this a less opaque subject. In this episode he discusses what motivated him to revise that work in the second edition and the additional details that he packed in to explore the functioning of the CPU. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Charles Petzold about his work on the second edition of Code: The Hidden Language of Computer Hardware and Software Interview Introductions How did you get introduced to Python? Can you start by describing the focus and goal of "Code" and the story behind it? Who is the target audience for the book? The sequencing of the topics parallels the curriculum of a computer engineering course of study. Why do you think that it is useful/important for a general audience to understand the electrical engineering principles that underly modern computers? What was your process for determining how to segment the information that you wanted to address in the book to balance the pacing of the reader with the density of the information? Technical books are notoriously challenging to write due to the constantly changing subject matter. What are some of the ways that the first edition of "Code" was becoming outdated? What are the most notable changes in the foundational elements of computing that have happened in the time since the first edition was published? One of the concepts that I have found most helpful as a software engineer is that of "mechanical sympathy". What are some of the ways that a better understanding of computer hardware and electrical signal processing can influence and improve the way that an engineer writes code? What are some of the insights that you gained about your own use of computers and software while working on this book? What are the most interesting, unexpected, or challenging lessons that you have learned while writing "Code" and revising it for the second edition? Once the reader has finished with your book, what are some of the other references/resources that you recommend? Keep In Touch Website Picks Tobias The Imitation Game movie Charles The Annotated Turing book by Charles Petzold Confidence Man: The Making of Donald Trump and the Breaking of America by Maggie Haberman Links Code: The Hidden Language of Computer Hardware and Software Fortran PL/I BASIC C# Z80 Intel 8080 PC Magazine Assembly Language Logic Gates C Language ASCII == American Standard Code for Information Interchange SkiaSharp Algol Code first edition bibliography The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 381Take Control Of Your Electrical Systems With The Open Source FlexMeasures Energy Management System
FullSummary The generation, distribution, and consumption of energy is one of the most critical pieces of infrastructure for the modern world. With the rise of renewable energy there is an accompanying need for systems that can respond in real-time to the availability and demand for electricity. FlexMeasures is an open source energy management system that is designed to integrate a variety of inputs intelligently allocate energy resources to reduce waste in your home or grid. In this episode Nicolas Höning explains how the project is implemented, how it is being used in his startup Seita, and how you can try it out for your own energy needs. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Nicolas Höning about FlexMeasures, an open source project designed to manage energy resources dynamically to improve efficiency Interview Introductions How did you get introduced to Python? Can you describe what FlexMeasures is and the story behind it? What are the primary goals/objectives of the project? The energy sector is huge. Where can FlexMeasures be used? Energy systems are typically governed by a marketplace system. What are the benefits that FlexMeasures can provide for each side of that market? How do renewable sources of energy confuse/complicate the role that the different stakeholders represent? What are the different points of interaction that producers/consumers might have with the FlexMeasures platform? What are some examples of the types of decisions/recommendations that FlexMeasures might generate and how to they manifest in the energy systems? What are the types of information that FlexMeasures relies on for driving those decisions? Can you describe how FlexMeasures is implemented? How have the design and goals of the system changed/evolved since you started working on it? What are the interfaces that you provide for integrating with and extending the functionality of a FlexMeasures installation? What are the operating scales that FlexMeasures is designed for? What are the most interesting, innovative, or unexpected ways that you have seen FlexMeasures used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on FlexMeasures? When is FlexMeasures the wrong choice? What do you have planned for the future of FlexMeasures? Keep In Touch Website @nhoening on Twitter LinkedIn Picks Tobias She-Hulk Nicholas Kleo on Netflix Altair Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links FlexMeasures: Github Linux Energy Foundation Mailing List Twitter EyeQuant Energy Management System OpenEMS ICT == Information and Communications Technology HomeAssistant Podcast Episode FlexMeasures HomeAssistant Plugin Universal Smart Energy Framework PostgreSQL Data Engineering Podcast Episode TimescaleDB Data Engineering Podcast Episode OpenWeatherMap Timely-Beliefs library Flask Click Pyomo scikit-learn sktime LF Energy Flake8 MyPy Podcast Episode Black Arima Model Random Forest The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 380How And Why To Build Effective Teams As An Engineering Leader
FullSummary Your ability to build and maintain a software project is tempered by the strength of the team that you are working with. If you are in a position of leadership, then you are responsible for the growth and maintenance of that team. In this episode Jigar Desai, currently the SVP of engineering at Sisu Data, shares his experience as an engineering leader over the past several years and the useful insights he has gained into how to build effective engineering teams. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select Star’s data discovery platform solves that out of the box, with a fully automated catalog that includes lineage from where the data originated, all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your dbt, Snowflake, Tableau, Looker, or whatever you’re using and Select Star will set everything up in just a few hours. Go to pythonpodcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan. Your host as usual is Tobias Macey and today I’m interviewing Jigar Desai about building effective engineering teams Interview Introductions How did you get introduced to Python? What have you found to be the central challenges involved in building an effective engineering team? What are the measures that you use to determine what "effective" means for a given team? how to establish mutual trust in an engineering team challenges introduced at different levels of team size/organizational complexity establishing and managing career ladders You have mostly worked in heavily tech-focused companies. How do industry verticals impact the ways that you think about formation and structure of engineering teams? What are some of the different roles that you might focus on hiring/team compositions in industries that aren’t purely software? (e.g. fintech, logistics, etc.) notable evolutions in engineering practices/paradigm shifts in the industry What are some of the predictions that you have about how the future of engineering will look? What impact do you think low-code/no-code solutions will have on the types of projects that code-first developers will be tasked with? What are the most interesting, innovative, or unexpected ways that you have seen organizational leaders address the work of building and scaling engineering capacity? What are the most interesting, unexpected, or challenging lessons that you have learned while working in engineering leadership? What are the most informative mistakes that you would like to share? What are some resources and reference material that you recommend for anyone responsible for the success of their engineering teams? Keep In Touch LinkedIn Picks Tobias Bullet Train movie Jigar Top Gun Maverick movie Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Sisu Data OpenStack Java The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 379Complete Your Hardware "Weekend Projects" In An Actual Weekend With Belay
FullSummary Working on hardware projects often has significant friction involved when compared to pure software. Brian Pugh enjoys tinkering with microcontrollers, but his "weekend projects" often took longer than a weekend to complete, so he created Belay. In this episode he explains how Belay simplifies the interactions involved in developing for MicroPython boards and how you can use it to speed up your own experimentation. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select Star’s data discovery platform solves that out of the box, with a fully automated catalog that includes lineage from where the data originated, all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your dbt, Snowflake, Tableau, Looker, or whatever you’re using and Select Star will set everything up in just a few hours. Go to pythonpodcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan. Your host as usual is Tobias Macey and today I’m interviewing Brian Pugh about Belay, a python library that enables the rapid development of projects that interact with hardware via a micropython-compatible board. Interview Introductions How did you get introduced to Python? Can you describe what Belay is and the story behind it? Who are the target users for Belay? What are some of the points of friction involved in developing for hardware projects? What are some of the features of Belay that make that a smoother process? What are some of the ways that simplifying the develop/debug cycles can improve the overall experience of developing for hardware platforms? What are some of the inherent limitations of constrained hardware that Belay is unable to paper over? Can you describe how Belay is implemented? What does the workflow look like when using Belay as compared to using MicroPython directly? What are some of the ways that you are using Belay in your own projects? What are the most interesting, innovative, or unexpected ways that you have seen Belay used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Belay? When is Belay the wrong choice? What do you have planned for the future of Belay? Keep In Touch BrianPugh on GitHub LinkedIn Picks Tobias Gunnar Computer Glasses Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Belay Geomagical PIC Microcontroller AVR Microcontroller Matlab MicroPython Podcast Episode CircuitPython Podcast Episode Celery Potentiometer Raspberry Pi Raspberry Pi Pico ADC Converter Thonny Podcast Episode Adafruit Pyboard Python Inspect Module Python Tokenize Magnetometer Project Lidar The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 378Catching Up With Pyre, A Fast Type Checker For Python
FullSummary Static typing versus dynamic typing is one of the oldest debates in software development. In recent years a number of dynamic languages have worked toward a middle ground by adding support for type hints. Python’s type annotations have given rise to an ecosystem of tools that use that type information to validate the correctness of programs and help identify potential bugs. At Instagram they created the Pyre project with a focus on speed to allow for scaling to huge Python projects. In this episode Shannon Zhu discusses how it is implemented, how to use it in your development process, and how it compares to other type checkers in the Python ecosystem. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Shannon Zhu about Pyre, a type checker for Python 3 built from the ground up to support gradual typing and deliver responsive incremental checks Interview Introductions How did you get introduced to Python? Can you describe what Pyre is and the story behind it? There have been a number of tools created to support various aspects of typing for Python. How would you describe the various goals that they support and how Pyre fits in that ecosystem? What are the core goals and notable features of Pyre? Can you describe how Pyre is implemented? How have the design and goals of the project changed/evolved since you started working on it? What are the different ways that Pyre is used in the development workflow for a team or individual? What are some of the challenges/roadblocks that people run into when adopting type definitions in their Python projects? How has the evolution of type annotations and overall support for them affected your work on Pyre? As someone who is working closely with type systems, what are the strongest aspects of Python’s implementation and opportunities for improvement? What are the most interesting, innovative, or unexpected ways that you have seen Pyre used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pyre? When is Pyre the wrong choice? What do you have planned for the future of Pyre? Keep In Touch shannonzhu on GitHub Picks Tobias Lord Of The Rings: The Rings of Power on Amazon Video Shannon King’s Dilemma board game Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links PYre MyPy Podcast Episode PyRight PyType MonkeyType Podcast Episode Java C PEP 484 Flow Hack Continuous Integration OCaml PEP 675 – Arbitrary literal strings Gradual Typing AST == Abstract Syntax Tree Language Server Protocol Tensor Type Arithmetic PyCon: Securing Code With The Python Type System PyCon: Type Checked Python In The Real World PyCon: Łukasz Lange 2022 Keynote The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 377Standardizing On Python For All Software Projects At Ascend.io
FullSummary Every software project is subject to a series of decisions and tradeoffs. One of the first decisions to make is which programming language to use. For companies where their product is software, this is a decision that can have significant impact on their overall success. In this episode Sean Knapp discusses the languages that his team at Ascend use for building a service that powers complex and business critical data workflows. He also explains his motivation to standardize on Python for all layers of their system to improve developer productivity. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Sean Knapp about his motivations and experiences standardizing on Python for development at Ascend Interview Introductions How did you get introduced to Python? Can you describe what Ascend is and the story behind it? How many engineers work at Ascend? What are their different areas of focus? What are your policies for selecting which technologies (e.g. languages, frameworks, dev tooling, deployment, etc.) are supported at Ascend? What does it mean for a technology to be supported? You recently started standardizing on Python as the default language for development. How has Python been used up to now? What other languages are in common use at Ascend? What are some of the challenges/difficulties that motivated you to establish this policy? What are some of the tradeoffs that you have seen in the adoption of Python in place of your other adopted languages? How are you managing ongoing maintenance of projects/products that are not written in Python? What are some of the potential pitfalls/risks that you are guarding against in your investment in Python? What are the most interesting, innovative, or unexpected ways that you have seen Python used where it was previously a different technology? What are the most interesting, unexpected, or challenging lessons that you have learned while working on aligning all of your development on a single language? When is Python the wrong choice? What do you have planned for the future of engineering practices at Ascend? Keep In Touch LinkedIn @seanknapp on Twitter Picks Tobias Delver Lens app for scanning Magic: The Gathering cards Sean Typer DuckDB Amp It Up book (affiliate link) Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Ascend Data Engineering Podcast Episode Perl Google Sawzall Technical Debt Ruby gRPC Go Language Java PySpark Apache Arrow Thrift SQL Scala Snowflake runtime for Python Snowpark Typer CLI framework Pydantic Podcast Episode Pulumi Podcast Episode PyInfra Podcast Episode Packer Plot.ly Dash DuckDB The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 376Exploring The Process And Practice Of Building Better Software Through Code Reviews
FullSummary Writing code is only one piece of creating good software. Code reviews are an important step in the process of building applications that are maintainable and sustainable. In this episode On Freund shares his thoughts on the myriad purposes that code reviews serve, as well as exploring some of the patterns and anti-patterns that grow up around a seemingly simple process. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing On Freund about the intricacies and importance of code reviews Interview Introductions How did you get introduced to Python? Can you start by giving us your description of what a code review is? What is the purpose of the code review? At face value a code review appears to be a simple task. What are some of the subtleties that become evident with time and experience? What are some of the ways that code reviews can go wrong? What are some common anti-patterns that get applied to code reviews? What are the elements of code review that are useful to automate? What are some of the risks/bad habits that can result from overdoing automated checks/fixes or over-reliance on those tools in code reviews? identifying who can/should do a review for a piece of code how to use code reviews as a teaching tool for new/junior engineers how to use code reviews for avoiding siloed experience/promoting cross-training PR templates for capturing relevant context What are the most interesting, innovative, or unexpected ways that you have seen code reviews used? What are the most interesting, unexpected, or challenging lessons that you have learned while leading and supporting engineering teams? What are some resources that you recommend for anyone who wants to learn more about code review strategies and how to use them to scale their teams? Keep In Touch LinkedIn @onfreund on Twitter Picks Tobias The Girl Who Drank The Moon On Better Call Saul Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Wilco Code Review Home Assistant Podcast Episode Trunk-based Development Git Flow Pair Programming Feature Flags Podcast Episode KPI == Key Performance Indicator MIT Open Learning Engineering Handbook PEP Repository The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 375Ship With Confidence By Automating Quality Assurance
FullSummary Quality assurance in the software industry has become a shared responsibility in most organizations. Given the rapid pace of development and delivery it can be challenging to ensure that your application is still working the way it’s supposed to with each release. In this episode Jonathon Wright discusses the role of quality assurance in modern software teams and how automation can help. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Jonathon Wright about the role of automation in your testing and QA strategies Interview Introductions How did you get introduced to Python? Can you share your relationship with software testing/QA and automation? What are the main categories of how companies and software teams address testing and validation of their applications? What are some of the notable tradeoffs/challenges among those approaches? With the increased adoption of agile practices and the "shift left" mentality of DevOps, who is responsible for software quality? What are some of the cases where a discrete QA role or team becomes necessary? (or is it always necessary?) With testing and validation being a shared responsibility, competing with other priorities, what role does automation play? What are some of the ways that automation manifests in software quality and testing? How is automation distinct from software tests and CI/CD? For teams who are investing in automation for their applications, what are the questions they should be asking to identify what solutions to adopt? (what are the decision points in the build vs. buy equation?) At what stage(s) of the software lifecycle does automation live? What is the process for identifying which capabilities and interactions to target during the initial application of automation for QA and validation? One of the perennial challenges with any software testing, particularly for anything in the UI, is that it is a constantly moving target. What are some of the patterns and techniques, both from a developer and tooling perspective, that increase the robustness of automated validation? What are the most interesting, innovative, or unexpected ways that you have seen automation used for QA? What are the most interesting, unexpected, or challenging lessons that you have learned while working on QA and automation? When is automation the wrong choice? What are some of the resources that you recommend for anyone who wants to learn more about this topic? Keep In Touch LinkedIn @Jonathon_Wright on Twitter Website Picks Tobias The Sandman Netflix series and Graphic Novels by Neil Gaimain Jonathon House of the Dragon HBO series Mystic Quest TV series It’s Always Sunny in Philadelphia Links Haskell Idris Esperanto Klingon Planguage Lisp Language TDD == Test Driven Development BDD == Behavior Driven Development Gherkin Format Integration Testing Chaos Engineering Gremlin Chaos Toolkit Podcast Episode Requirements Engineering Keysight QA Lead Podcast Cognitive Learning TED Talk OpenTelemetry Podcast Episode Quality Engineering Selenium Swagger XPath Regular Expression Test Guild The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 374Remove Roadblocks And Let Your Developers Ship Faster With Self-Serve Infrastructure
FullSummary The goal of every software team is to get their code into production without breaking anything. This requires establishing a repeatable process that doesn’t introduce unnecessary roadblocks and friction. In this episode Ronak Rahman discusses the challenges that development teams encounter when trying to build and maintain velocity in their work, the role that access to infrastructure plays in that process, and how to build automation and guardrails for everyone to take part in the delivery process. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Ronak Rahman about how automating the path to production helps to build and maintain development velocity Interview Introductions How did you get introduced to Python? Can you describe what Quali is and the story behind it? What are the problems that you are trying to solve for software teams? How does Quali help to address those challenges? What are the bad habits that engineers fall into when they experience friction with getting their code into test and production environments? How do those habits contribute to negative feedback loops? What are signs that developers and managers need to watch for that signal the need for investment in developer experience improvements on the path to production? Can you describe what you have built at Quali and how it is implemented? How have the design and goals shifted/evolved from when you first started working on it? What are the positive and negative impacts that you have seen from the evolving set of options for application deployments? (e.g. K8s, containers, VMs, PaaS, FaaS, etc.) Can you describe how Quali fits into the workflow of software teams? Once a team has established patterns for deploying their software, what are some of the disruptions to their flow that they should guard against? What are the most interesting, innovative, or unexpected ways that you have seen Quali used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Quali? When is Quali the wrong choice? What do you have planned for the future of Quali? Keep In Touch @OfRonak on Twitter Picks Tobias The Terminal List on Amazon Ronak Midnight Gospel on Amazon Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Quali Torque Visual Studio Plugin Subversion IaC == Infrastructure as Code DevOps Terraform Pulumi Podcast Episode Cloudformation Flask The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 373The Benefits Of Python And Django For Going From Zero To MVP At Speed
FullSummary Every startup begins with an idea, but that won’t get you very far without testing the feasibility of that idea. A common practice is to build a Minimum Viable Product (MVP) that addresses the problem that you are trying to solve and working with early customers as they engage with that MVP. In this episode Tony Pavlovych shares his thoughts on Python’s strengths when building and launching that MVP and some of the potential pitfalls that businesses can run into on that path. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Tony Pavlovych about Python’s strengths for startups and the steps to building an MVP (minimum viable product) Interview Introductions How did you get introduced to Python? Can you describe what PLANEKS is and the story behind it? One of the services that you offer is building an MVP. What are the goals and outcomes associated with an MVP? What is the process for identifying the product focus and feature scope? What are some of the common misconceptions about building and launching MVPs that you have dealt with in your work with customers? What are the common pitfalls that companies encounter when building and validating an MVP? Can you describe the set of tools and frameworks (e.g. Django, Poetry, cookiecutter, etc.) that you have invested in to reduce the overhead of starting and maintaining velocity on multiple projects? What are the configurations that are most critical to keep constant across projects to maintain familiarity and sanity for your developers? (e.g. linting rules, build toolchains, etc.) What are the architectural patterns that you have found most useful to make MVPs flexible for adaptation and extension? Once the MVP is built and launched, what are the next steps to validate the product and determine priorities? What benefits do you get from choosing Python as your language for building an MVP/launching a startup? What are the challenges/risks involved in that choice? What are the most interesting, unexpected, or challenging lessons that you have learned while working on MVPs for your clients at PLANEKS? When is an MVP the wrong choice? What are the developments in the Python and broader software ecosystem that you are most interested in for the work you are doing for your team and clients? Keep In Touch LinkedIn Picks Tobias datamodel-code-generator Tony Screw It, Let’s Do It by Richard Branson (affiliate link) Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links PLANEKS Minimum Viable Product Django Cookiecutter Django Boilerplate OCR == Optical Character Recognition Tesseract OCR framework The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 372Powering The Next Generation Of Application Architectures With Web Assembly And The Fermyon Platform
FullSummary Application architectures have been in a constant state of evolution as new infrastructure capabilities are introduced. Virtualization, cloud, containers, mobile, and now web assembly have each introduced new options for how to build and deploy software. Recognizing the transformative potential of web assembly, Matt Butcher and his team at Fermyon are investing in tooling and services to improve the developer experience. In this episode he explains the opportunity that web assembly offers to all language communities, what they are building to power lightweight server-side microservices, and how Python developers can get started building and contributing to this nascent ecosystem. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share python workflows in a matter of minutes with 0 changes to your code. Shipyard provides powerful features like webhooks, error-handling, monitoring, automatic containerization, syncing with Github, and more. Plus, it comes with over 70 open-source, low-code templates to help you quickly build solutions with the tools you already use. Go to dataengineeringpodcast.com/shipyard to get started automating with a free developer plan today! Your host as usual is Tobias Macey and today I’m interviewing Matt Butcher about Fermyon and the impact of WebAssembly on software architecture and deployment across language boundaries Interview Introductions How did you get introduced to Python? For anyone who isn’t familiar with WebAssembly can you give your elevator pitch for why it matters? What is the current state of language support for Python in the WASM ecosystem? Can you describe what Fermyon is and the story behind it? What are your goals with Fermyon and what are the products that you are building to support those goals? There has been a steady progression of technologies aimed at better ways to build, deploy, and manage software (e.g. virtualization, cloud, containers, etc.). What are the problems with the previous options and how does WASM address them? What are some examples of the types of applications/services that work well in a WASM environment? Can you describe how you have architected the Fermyon platform? How did you approach the design of the interfaces and tooling to support developer ergonomics? How have the design and goals of the platform changed or evolved since you started working on it? Can you describe what a typical workflow is for an application team that is using Spin/Fermyon to build and deploy a service? What are some of the architectural patterns that WASM/Fermyon encourage? What are some of the limitations that WASM imposes on services using it as a runtime? (e.g. system access, threading/multiprocessing, library support, C extensions, etc.) What are the new and emerging topics and capabilities in the WASM ecosystem that you are keeping track of? With Spin as the core building block of your platform, how are you approaching governance and sustainability of the open source project? What are your guiding principles for when a capability belongs in the OSS vs. commercial offerings? What are the most interesting, innovative, or unexpected ways that you have seen Fermyon used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Fermyon? When is Fermyon the wrong choice? What do you have planned for the future of Fermyon? Keep In Touch LinkedIn @technosophos on Twitter technosophos on GitHub Picks Tobias Thor: Love & Thunder movie Matt Remembrance of Earth’s Past trilogy ("Three Body Problem" is the first) by Cixin Liu Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the s

Ep 371Gain A Deeper Understanding Of What Your Code Is Doing And Where It Spends Its Time With VizTracer
FullSummary As your code scales beyond a trivial level of complexity and sophistication it becomes difficult or impossible to know everything that it is doing. The flow of logic and data through your software and which parts are taking the most time are impossible to understand without help from your tools. VizTracer is the tool that you will turn to when you need to know all of the execution paths that are being exercised and which of those paths are the most expensive. In this episode Tian Gao explains why he created VizTracer and how you can use it to gain a deeper familiarity with the code that you are responsible for maintaining. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share python workflows in a matter of minutes with 0 changes to your code. Shipyard provides powerful features like webhooks, error-handling, monitoring, automatic containerization, syncing with Github, and more. Plus, it comes with over 70 open-source, low-code templates to help you quickly build solutions with the tools you already use. Go to dataengineeringpodcast.com/shipyard to get started automating with a free developer plan today! Your host as usual is Tobias Macey and today I’m interviewing Tian Gao about VizTracer, a low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution Interview Introductions How did you get introduced to Python? Can you describe what VizTracer is and the story behind it? What are the main goals that you are focused on with VizTracer? What are some examples of the types of bugs that profiling can help diagnose? How does profiling work together with other debugging approaches? (e.g. logging, breakpoint debugging, etc.) There are a number of profiling utilities for Python. What feature or combination of features were missing that motivated you to create VizTracer? Can you describe how VizTracer is implemented? How have the design and goals changed since you started working on it? There are a number of styles of profiling, what was your process for deciding which approach to use? What are the most complex engineering tasks involved in building a profiling utility? Can you describe the process of using VizTracer to identify and debug errors and performance issues in a project? What are the options for using VizTracer in a production environment? What are the interfaces and extension points that you have built in to allow developers to customize VizTracer? What are some of the ways that you have used VizTracer while working on VizTracer? What are the most interesting, innovative, or unexpected ways that you have seen VizTracer used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on VizTracer? When is VizTracer the wrong choice? What do you have planned for the future of VizTracer? Keep In Touch gaogaotiantian on GitHub LinkedIn Picks Tobias Travelers show on Netflix Tian objprint Lincoln Lawyer bilibili – Tian’s coding sessions in Chinese Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Viztracer Python cProfile Sampling Profiler Perfetto Coverage.py Podcast Episode Python setxprofile hook Circular Buffer Catapult Trace Viewer py-spy psutil gdb Flame graph The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 370Stream Processing In Real Time And At Scale In Pure Python With Bytewax
FullSummary Analysis of streaming data in real time has long been the domain of big data frameworks, predominantly written in Java. In order to take advantage of those capabilities from Python requires using client libraries that suffer from impedance mis-matches that make the work harder than necessary. Bytewax is a new open source platform for writing stream processing applications in pure Python that don’t have to be translated into foreign idioms. In this episode Bytewax founder Zander Matheson explains how the system works and how to get started with it today. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select Star’s data discovery platform solves that out of the box, with a fully automated catalog that includes lineage from where the data originated, all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your dbt, Snowflake, Tableau, Looker, or whatever you’re using and Select Star will set everything up in just a few hours. Go to pythonpodcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan. Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share python workflows in a matter of minutes with 0 changes to your code. Shipyard provides powerful features like webhooks, error-handling, monitoring, automatic containerization, syncing with Github, and more. Plus, it comes with over 70 open-source, low-code templates to help you quickly build solutions with the tools you already use. Go to dataengineeringpodcast.com/shipyard to get started automating with a free developer plan today! Your host as usual is Tobias Macey and today I’m interviewing Zander Matheson about Bytewax, an open source Python framework for building highly scalable dataflows to process ANY data stream. Interview Introductions How did you get introduced to Python? Can you describe what Bytewax is and the story behind it? Who are the target users for Bytewax? What is the problem that you are trying to solve with Bytewax? What are the alternative systems/architectures that you might replace with Bytewax? Can you describe how Bytewax is implemented? What are the benefits of Timely Dataflow as a core building block for a system like Bytewax? How have the design and goals of the project changed/evolved since you first started working on it? What are the axes available for scaling Bytewax execution? How have you approached the design of the Bytewax API to make it accessible to a broader audience? Can you describe what is involved in building a project with Bytewax? What are some of the stream processing concepts that engineers are likely to run up against as they are experimenting and designing their code? What is your motivation for providing the core technology of your business as an open source engine? How are you approaching the balance of project governance and sustainability with opportunities for commercialization? What are the most interesting, innovative, or unexpected ways that you have seen Bytewax used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Bytewax? When is Bytewax the wrong choice? What do you have planned for the future of Bytewax? Keep In Touch Slack Twitter LinkedIn Picks Tobias Alta Racks Zander Atherton Bikes Links Bytewax GitHub Flink Data Engineering Podcast Episode Spark Streaming Kafka Connect Faust Podcast Episode Ray Podcast Episode Dask Data Engineering Podcast Episode Timely Dataflow PyO3 Materialize Data Engineering Podcast Episode HyperLogLog Python River Library Shannon Entropy Calculation The blog post using incremental shannon entropy NATS waxctl Prometheus Grafana Streamz The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 369Tetra: A Full Stack Web Framework That Doesn't Make You Write Everything Twice
FullSummary Building a fully functional web application has been growing in complexity along with the growing popularity of javascript UI frameworks such as React, Vue, Angular, etc. Users have grown to expect interactive experiences with dynamic page updates, which leads to duplicated business logic and complex API contracts between the server-side application and the Javascript front-end. To reduce the friction involved in writing and maintaining a full application Sam Willis created Tetra, a framework built on top of Django that embeds the Javascript logic into the Python context where it is used. In this episode he explains his design goals for the project, how it has helped him build applications more rapidly, and how you can start using it to build your own projects today. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share python workflows in a matter of minutes with 0 changes to your code. Shipyard provides powerful features like webhooks, error-handling, monitoring, automatic containerization, syncing with Github, and more. Plus, it comes with over 70 open-source, low-code templates to help you quickly build solutions with the tools you already use. Go to dataengineeringpodcast.com/shipyard to get started automating with a free developer plan today! Your host as usual is Tobias Macey and today I’m interviewing Sam Willis about Tetra, a full stack component framework for your Django applications Interview Introductions How did you get introduced to Python? Can you describe what Tetra is and the story behind it? What are the problems that you are aiming to solve with this project? What are some of the other ways that you have addressed those problems? What are the shortcomings that you encountered with those solutions? What was missing in the existing landscape of full-stack application development patterns that prompted you to build a new meta-framework? What are some of the sources of inspiration (positive and negative) that you looked to while deciding on the component selection and implementation strategy? Can you describe how Tetra is implemented? What are the core principles that you are relying on to drive your design of APIs and developer experience? What is the process for building a full component in Tetra? What are some of the application design challenges that are introduced by Combining the javascript and Django logic and attributes? (e.g. reusing JS logic/CSS styles across components) A perennial challenge with combining the syntax across multiple languages in a single file is editor support. How are you thinking about that with Tetra’s implementation? What is your grand vision for Tetra and how are you working to make it sustainable? What are the most interesting, innovative, or unexpected ways that you have seen Tetra used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Tetra? When is Tetra the wrong choice? What do you have planned for the future of Tetra? Keep In Touch @samwillis on Twitter Website LinkedIn samwillis on GitHub Picks Tobias The Machine L

Ep 368Design Real-World Objects In Python With CadQuery
FullSummary Virtually everything that you interact with on a daily basis and many other things that make modern life possible were designed and modeled in software called CAD or Computer-Aided Design. These programs are advanced suites with graphical editing environments tailored to domain experts in areas such as mechanical engineering, electrical engineering, architecture, etc. While the UI-driven workflow is more accessible, it isn’t scalable which opens the door to code-driven workflows. In this episode Jeremy Wright discusses the design, uses, and benefits of the CadQuery framework for building 3D CAD models entirely in Python. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share python workflows in a matter of minutes with 0 changes to your code. Shipyard provides powerful features like webhooks, error-handling, monitoring, automatic containerization, syncing with Github, and more. Plus, it comes with over 70 open-source, low-code templates to help you quickly build solutions with the tools you already use. Go to dataengineeringpodcast.com/shipyard to get started automating with a free developer plan today! Your host as usual is Tobias Macey and today I’m interviewing Jeremy Wright about CadQuery, an easy-to-use Python module for building parametric 3D CAD models Interview Introductions How did you get introduced to Python? Can you start by explaining what CAD is and some of the real-world applications of it? Can you describe what CadQuery is and the story behind it? How did you get involved with it and what keeps you motivated? What are the different methods that are in common use for building CAD models? Are there approaches that are more common for models used in different industries? What was missing in other projects for programmatically generating CAD models that motivated you to build CadQuery? Can you describe how the CadQuery library is implemented? How have the design and goals of the project changed or evolved since you started working on it? How would you characterize the rate of change/evolution in the CAD ecosystem, and how has that factored into your work on CadQuery? How did you approach the process of API design? How do you balance accessibility for non-professionals with domain-related nomenclature? Can you describe some example workflows for going from idea to finished product with CadQuery? How are you using CadQuery in your own work? What are the most interesting, innovative, or unexpected ways that you have seen CadQuery used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on CadQuery? When is CadQuery the wrong choice? What do you have planned for the future of CadQuery? Keep In Touch Discord Twitter GitHub GitLab Picks Tobias Doctor Strange: In The Multiverse of Madness Jeremy Star Trek: Strange New Worlds Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine l

Ep 367Intelligent Dependency Resolution For Optimal Compatibility And Security With Project Thoth
FullSummary Building any software project is going to require relying on dependencies that you and your team didn’t write or maintain, and many of those will have dependencies of their own. This has led to a wide variety of potential and actual issues ranging from developer ergonomics to application security. In order to provide a higher degree of confidence in the optimal combinations of direct and transitive dependencies a team at Red Hat started Project Thoth. In this episode Fridolín Pokorný explains how the Thoth resolver uses multiple signals to find the best combination of dependency versions to ensure compatibility and avoid known security issues. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share python workflows in a matter of minutes with 0 changes to your code. Shipyard provides powerful features like webhooks, error-handling, monitoring, automatic containerization, syncing with Github, and more. Plus, it comes with over 70 open-source, low-code templates to help you quickly build solutions with the tools you already use. Go to dataengineeringpodcast.com/shipyard to get started automating with a free developer plan today! Your host as usual is Tobias Macey and today I’m interviewing Fridolín Pokorný about Project Thoth, a resolver service that computes the optimal combination of versions for your dependencies Interview Introductions How did you get introduced to Python? Can you describe what Project Thoth is and the story behind it? What are some examples of the types of problems that can be introduced by mismanaged dependency versions? The Python ecosystem has seen a number of dependency management tools introduced recently. What are the capabilities that Thoth offers that make it stand out? How does it compare to e.g. pip, Poetry, pip-tools, etc.? How do those other tools approach resolution of dependencies? Can you describe how Thoth is implemented? How have the scope and design of the project evolved since it was started? What are the sources of information that it relies on for generating the possible solution space? What are the algorithms that it relies on for finding an optimal combination of packages? Can you describe how Thoth fits into the workflow of a developer while selecting a set of dependencies and keeping them up to date over the life of a project? What are the opportunities for expanding Thoth’s application to other language ecosystems? What are the interfaces available for extending or integrating with Thoth? What are the most interesting, innovative, or unexpected ways that you have seen Thoth used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Thoth? When is Thoth the wrong choice? What do you have planned for the future of Thoth? Keep In Touch LinkedIn Website Picks Tobias Brass Against Fridolin micropipenv Links Redhat Emerging Technologies Group Project Thoth Thamos CLI PyPA Advisory Database Project2Vec Thoth Prescriptions Thoth: Egyptian God The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 366Take A Deep Dive On How Code Completion Works And How To Customize It
FullSummary Most developers have encountered code completion systems and rely on them as part of their daily work. They allow you to stay in the flow of programming, but have you ever stopped to think about how they work? In this episode Meredydd Luff takes us behind the scenes to dig into the mechanics of code completion engines and how you can customize them to fit your particular use case. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Meredydd Luff about how code completion works and what it takes to build your own Interview Introductions How did you get introduced to Python? Most programmers are familiar with the idea of code completion, but can you just give the elevator pitch to get us all on the same page? You gave a presentation recently at PyCon about how to build a code completion system. What was your approach to identifying what fundamental concepts needed to be addressed and how to fit that lesson into the available time? In the presentation you mentioned that you had built a more full-featured completion engine into Anvil. Can you describe what possessed you to build your own code completion tool? What are the core components required to build a completion engine? What are the benefits that can be realized by customizing the completion engine for a given language or task? Can you describe the feature set and implementation details of the full-fledged completion engine that is available in Anvil? Beyond the toy example, there are a number of considerations to address if you want to make the completion engine "production grade". Can you talk through some of the obvious edge cases and how to solve for them? (e.g. handling parsing of incomplete code) What are the inputs that you use to build up the list of candidate tokens for completion? Once you have a functioning baseline for offering completions, what are some of the signals that you hook into for ranking suggestions? In your presentation you leaned on the machinery available in the Python standard library. What are some of the ways that you might think about generalizing across languages vs. coupling to a given language? What design/architectural advice do you have for compartmentalizing logic in a full-featured completion engine? What are some of the complexities that become a factor when you are trying to scale across an entire code base? Beyond just being able to parse and process a body of code, there is also the question of integrating with the development environment. What are some of the challenges that get introduced when trying to access the appropriate set(s) of files and code through the editor interface(s)? What are the most interesting, innovative, or unexpected ways that you have seen code completion applied to developer experience? What are the most interesting, unexpected, or challenging lessons that you have learned while working on code completion for Anvil? When is code completion more effort than it’s worth? What do you have planned for the future of the Anvil code completion functionality? Keep In Touch LinkedIn meredydd on GitHub @meredydd on Twitter Picks Tobias "Weird Al" Yankovic Meredydd TimescaleDB Data Engineering Podcast Episode Promscale Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links PyCon presentation about building a completion engine Anvil Podcast Episode Nano Language Server Protocol Jedi Podcast Episode Skulpt Parser Abstract Syntax Tree OpenAPI GitHub Copilot Halting Problem Parser Generator Python Language Grammar Definition Lezer Parser Generator Tree-sitter PyScript Grafana Tempo Tracing Service The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 365Hunting Black Swans With Bees: Catching Up With The Inimitable Russell Keith-Magee
FullSummary Russell Keith-Magee is an accomplished engineer and a fixture of the Python community. His work on the Beeware suite of projects is one of the most ambitious undertakings in the ecosystem and unfailingly forward-looking. With his recent transition to working for Anaconda he is now able to dedicate his full focus to the effort. In this episode he reflects on the journey that he has taken so far, how Beeware is helping to address some of the threats to Python’s long term viability, and how he envisions its future in light of the recent release of PyScript, an in-browser runtime for Python. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Russell Keith-Magee about the latest status of the Beeware project, the state of Python’s black swans, and how the PyScript project ties into his ambitions for world domination Interview Introductions How did you get introduced to Python? For anyone who hasn’t been graced with the BeeWare vision, can you give the elevator pitch of what it is and why it matters? At PyCon US 2019 you presented a keynote about the various potential threats to the Python language community and its future viability. With the clarity of 3 years hindsight, how has the landscape shifted? What is PyScript and how does it fit into the venn diagram of BeeWare’s objectives and the portents of black swan events (and what is your involvement with it)? How does it differ from the dozens of other "Python in the browser" and "Python transpiled to Javascript" projects that have sprouted over the years? Now that you have been granted the opportunity to dedicate your full attention to BeeWare and build a team to support it, what new potential does that unlock? What are the current areas of focus/challenges that you are spending your time on for the BeeWare project? What are some of the efforts in the BeeWare suite that proved to be dead-ends? What are the most interesting, innovative, or unexpected ways that you have seen the BeeWare suite/PyScript used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on BeeWare? When is BeeWare the wrong choice? What do you have planned for the future of BeeWare/PyScript/Python/world domination? Keep In Touch LinkedIn Website @freakboy3742 on Twitter Picks Tobias Joby Gorillapod Russell PyScript The Great TV Show Links Black Swans Episode BeeWare Episode BeeWare Django Cordova Black Swan Apple II Altair Briefcase Web Assembly (WASM) Gary Bernhardt PyScript Pyodide Toga Kotlin Swift Gaffer Tape Repl.it Brython Transcrypt Python Anywhere Batavia Anaconda Conda Voc Maestral Eddington GUI The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 364Take Control Of Your Digital Photos By Running Your Own Smart Library Manager With LibrePhotos
FullSummary Digital cameras and the widespread availability of smartphones has allowed us all to generate massive libraries of personal photographs. Unfortunately, now we are all left to our own devices of how to manage them. While cloud services such as iPhotos and Google Photos are convenient, they aren’t always affordable and they put your pictures under the control of large companies with their own agendas. LibrePhotos is an open source and self-hosted alternative to these services that puts you in control of your digital memories. In this episode the maintainer of LibrePhotos, Niaz Faridani-Rad, explains how he got involved with the project, the capabilities that it offers for managing your image library, and how to get your own instance set up to take back control of your pictures. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! This episode is sponsored by Mergify. It’s an amazing tool to make you and your team way more productive with GitHub. Mergify is all about leveling up your pull requests with useful features that eliminate busy work. Automatic merges allow you define the conditions for acceptance and Mergify will take care of merging the pull request as soon as it’s ready. Automatic updates take care of merging your pull requests serially on top of each other, so there is no way to introduce a regression. With a merge queue you can merge your urgent pull request first, organize your Prs as you wish and Mergify will merge them in that order. Mergify’s backports feature will even copy the pull request into another branch once the pull request has been merged, shipping your bug fixes on multiple branches automatically. By saving time you and your team can focus on projects that matter. Mergify is coordinated with any CI and fully integrated into GitHub. They have a Startup Program that offers a 12 months credit to leverage Mergify (up to $21,000 of value). Start saving time; visit pythonpodcast.com/mergify today to sign up for a demo and get started! Or just click the link in the show notes. Your host as usual is Tobias Macey and today I’m interviewing Niaz Faridani-Rad about LibrePhotos, an open source, self-hosted application for managing your personal photo collection Interview Introductions How did you get introduced to Python? Can you describe what LibrePhotos is and the story behind it? What are the core objectives of the project? What kind of users are you focused on? What are some of the major features of LibrePhotos? There are a number of open source and commercial options for different photo oriented use cases. What are the main capabilities that influence someone’s decision to use one over the other? Many people’s baseline expectations will be around services such as Google Photos or iPhotos. What are some of the challenges that you face in trying to provide a comparable experience? One of the features that users rely on with these services is backup/disaster recovery of their photo library. What is the recommended approach for users of LibrePhotos? Can you describe how LibrePhotos is architected? How have the design and goals evolved since you first started working on it? How have recent advances in machine learning algorithms and related tooling improved the availability and quality of advanced features in LibrePhotos? How much improvement of accuracy in face/object recognition do you see as users invest in cataloging and organizing their collections? Is there a minimum quantity of images/iindividual people that are necessary to start using the ML powered features? What kinds of storage locations are supported? What are the interfaces available for extending/enhancing/integrating with LibrePhotos? What are the most interesting, innovative, or unexpected ways that you have seen LibrePhotos used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on LibrePhotos? When is LibrePhotos the wrong choice? What do you have planned for the future of LibrePhotos? Keep In Touch derneuere on GitHub @der_neuere on Twitter Website LinkedIn Picks Tobias Uncharted movie Niaz Steam Deck Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podca

Ep 363Making Investment Data Easy To Access And Analyze With The OpenBB Terminal
FullSummary Investing effectively is largely a game of information access and analysis. This can involve a substantial amount of research and time spent on finding, validating, and acquiring different information sources. In order to reduce the barrier to entry and provide a powerful framework for amateur and professional investors alike Didier Rodrigues Lopes created the OpenBB Terminal. In this episode he explains how a pandemic project that started as an experiment has led to him founding a new company and dedicating his time to growing and improving the project and its community. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Didier Rodrigues Lopes about the OpenBB Terminal, a modern Python-based integrated environment for investment research Interview Introductions How did you get introduced to Python? Can you describe what OpenBB is and the story behind it? What is the problem that you are trying to address by creating the OpenBB project and providing it as open source? What are some of the use cases where someone might need to use this project? The elephant in the room for financial data research is the Bloomberg Terminal. What are the other tools or services available for that purpose? What are the differentiating features of the OpenBB Terminal? Can you describe how the OpenBB Terminal is implemented? How have the design and goals/scope of the project changed since you started working on it? Can you describe a typical workflow for someone who is using the OpenBB Terminal? How have you approached the user experience design, and what are you optimizing for? What kinds of utilities do you offer beyond raw data access? What are some examples of data sources that you rely on? What is involved in integrating a new data source? What are the extension points and integration capabilities for expanding the functionality of the tool? What are the most interesting, innovative, or unexpected ways that you have seen OpenBB Terminal used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on OpenBB Terminal? When is OpenBB Terminal the wrong choice? What do you have planned for the future of OpenBB Terminal? Keep In Touch DidierRLopes on GitHub LinkedIn @didier_lopes on Twitter Picks Tobias Vikings: Valhalla show on Netflix Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links OpenBB Matlab Papermill Bloomberg Terminal Robinhood Coinbase The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 362Accelerate Your Machine Learning Experimentation With Automatic Checkpoints Using FLOR
FullSummary The experimentation phase of building a machine learning model requires a lot of trial and error. One of the limiting factors of how many experiments you can try is the length of time required to train the model which can be on the order of days or weeks. To reduce the time required to test different iterations Rolando Garcia Sanchez created FLOR which is a library that automatically checkpoints training epochs and instruments your code so that you can bypass early training cycles when you want to explore a different path in your algorithm. In this episode he explains how the tool works to speed up your experimentation phase and how to get started with it. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Rolando Garcia about FLOR, a suite of machine learning tools for hindsight logging that lets you speed up model experimentation by checkpointing training data Interview Introductions How did you get introduced to Python? Can you describe what FLOR is and the story behind it? What is the core problem that you are trying to solve for with FLOR? What are the fundamental challenges in model training and experimentation that make it necessary? How do machine learning reasearchers and engineers address this problem in the absence of something like FLOR? Can you describe how FLOR is implemented? What were the core engineering problems that you had to solve for while building it? What is the workflow for integrating FLOR into your model development process? What information are you capturing in the log structures and epoch checkpoints? How does FLOR use that data to prime the model training to a given state when backtracking and trying a different approach? How does the presence of FLOR change the costs of ML experimentation and what is the long-range impact of that shift? Once a model has been trained and optimized, what is the long-term utility of FLOR? What are the opportunities for supporting e.g. Horovod for distributed training of large models or with large datasets? What does the maintenance process for research-oriented OSS projects look like? What are the most interesting, innovative, or unexpected ways that you have seen FLOR used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on FLOR? When is FLOR the wrong choice? What do you have planned for the future of FLOR? Keep In Touch rlnsanz on GitHub @rogarcia_sanz on Twitter Picks Tobias The Batman Rolando Severance GitHub Codespaces Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links FLOR UC Berkeley Joe Hellerstein MLOps Data Engineering Podcast Episode RISE Lab AMP Lab Clipper Model Serving Ground Data Context Service Context: The Missing Piece Of The Machine Learning Lifecycle Airflow Copy on write ASTor Green Tree Snakes: Python AST Documentation MLFlow Amazon Sagemaker Cloudpickle Horovod Podcast Episode Ray Anyscale PyTorch Tensorflow The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 361Automatically Enforce Software Structures With Powerful Code Modifications Powered By LibCST
FullSummary Programmers love to automate tedious processes, including refactoring your code. In order to support the creation of code modifications for your Python projects Jimmy Lai created LibCST. It provides a richly typed and high level API for creating and manipulating concrete syntax trees of your source code. In this episode Jimmy Lai and Zsolt Dollenstein explain how it works, some of the linting and automatic code modification utilities that you can build with it and how to get started with using it to maintain your own Python projects. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Zsolt Dollenstein and Jimmy Lai about LibCST, a concrete syntax tree parser and serializer library for Python Interview Introductions How did you get introduced to Python? Can you describe what LibCST is and the story behind it? How does a concrete syntax tree differ from an abstract syntax tree? What are some of the situations where the preservation of the exact structure is necessary? There are a few other libraries in Python for creating concrete syntax trees. What was missing in the available options that made it necessary to create LibCST? What are the use cases that LibCST is focused on supporting Can you describe how LibCST is implemented? How have the design and goals of the project changed or evolved since you started working on it? How might I use LibCST for something like restructuring a set of modules to move a function definition while maintaining proper imports? How do the capabilities of LibCST for codemodding compare to the Rope framework? What are some other workflows that someone might build with LibCST? What are some of the ways that LibCST is being used in your own work? What are the most interesting, innovative, or unexpected ways that you have seen LibCST used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on LibCST? When is LibCST the wrong choice? What do you have planned for the future of LibCST? Keep In Touch Zsolt zsol on GitHub LinkedIn Jimmy jimmylai on GitHub LinkedIn Picks Tobias Osprey Manta Backpack Zsolt Autotransform Glean Jimmy Paying down technical debt Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links LibCST Carta lib2to3 Abstract Syntax Tree Concrete Syntax Tree Pyre Parso Cython Podcast Episode mypyc Rope Flake8 Podcast Episode Pylint ESLint Fixit MonkeyType Podcast Episode The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 360Cloud Native Networking For Developers With The Gloo Platform
FullSummary Communication is a fundamental requirement for any program or application. As the friction involved in deploying code has gone down, the motivation for architecting your system as microservices goes up. This shifts the communication patterns in your software from function calls to network calls. In this episode Idit Levine explains how the Gloo platform that she and her team at Solo have created makes it easier for you to configure and monitor the network topologies for your microservice environments. She also discusses what developers need to know about networking in cloud native environments and how a combination of API gateways and service mesh technologies allow you to more rapidly iterate on your systems. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Idit Levine about what developers need to know about service-oriented networking and her work at Solo on the Gloo project Interview Introductions How did you get introduced to Python? Can you describe what Solo is and the story behind it? How much should developers need to know about the ways that their applications and services are communicating? What is the current state of networking for applications across physical, cloud, and containerized environments? How do service mesh features influence the architectural decisions that software teams make while building their applications? What operational capabilities do they unlock? What are the aspects of application networking that are simplified or enhanced by service mesh platforms? In what ways has service mesh introduced new complexity to operating software systems? How can developers mirror the network topologies for production environments while working on new features? What are the most interesting, innovative, or unexpected ways that you have seen Gloo used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Gloo? When is Gloo the wrong choice? What do you have planned for the future of Gloo? Keep In Touch LinkedIn @Idit_Levine on Twitter Picks Tobias Shadow and Bone on Netflix Idit Elizabeth Holmes HBO Documentary Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Solo Computational Biology Microservices Kubernetes Service Mesh Istio LinkerD Envoy Proxy API Gateway CRD == Custom Resource Definition Gloo Edge Bazel Build System GraphQL mTLS GitOps Dagger WASM == Web Assembly Kubernetes Gateway API Consul Connect eBPF The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 359Accelerate And Simplify Cloud Native Development For Kubernetes Environments With Gefyra
FullSummary Cloud native architectures have been gaining prominence for the past few years due to the rising popularity of Kubernetes. This introduces new complications to development workflows due to the need to integrate with multiple services as you build new components for your production systems. In order to reduce the friction involved in developing applications for cloud native environments Michael Schilonka created Gefyra. In this episode he explains how it connects your local machine to a running Kubernetes environment so that you can rapidly iterate on your software in the context of the whole system. He also shares how the Django Hurricane plugin lets your applications work closely with the Kubernetes process model. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Your host as usual is Tobias Macey and today I’m interviewing Michael Schilonka about Gefyra and what is involved with developing applications for Kubernetes environments Interview Introductions How did you get introduced to Python? Can you describe what Gefyra is and the story behind it? What are the challenges that Kubernetes introduces to the development process? What are some of the strategies that developers might use for developing and testing applications that are deployed to Kubernetes environments? What are the use cases that Gefyra is focused on enabling? What are some of the other tools or platforms that Gefyra might replace or supplement? What are the services that need to be present in the K8s cluster to enable Gefyra’s functionality? Can you describe how Gefyra is implemented? How have the design and goals of the project changed since you first started working on it? What is the process for getting Gefyra set up between a K8s cluster and a developer’s laptop? Can you describe what the developer’s workflow looks like when using Gefyra? How do you avoid collisions/resource contention among a team of developers who are working on the same project? What are some of the ways that developing for Kubernetes influences the architectural and design decisions for a project? What are some of the additional practices or systems that you have found to be beneficial for accelerating development in cloud-native environments? What are the most interesting, innovative, or unexpected ways that you have seen Gefyra used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Gefyra? When is Gefyra the wrong choice? What do you have planned for the future of Gefyra? Keep In Touch LinkedIn Schille on GitHub Picks Tobias kubernetes.el – Kubernetes interface for Emacs Michael It’s fermentation friday, perfect for baking a sourdough bread or brewing beer Two of my favorit YouTube channels Kurzgesagt – In a Nutshell and LockPickingLawyer For entrepreneurial spirits: Reddit community research with (GummySearch)[https://gummysearch.com/]?utm_source=rss&utm_medium=rss Links Kopf framework PyOxidizer Tuna Wireguard-go https://k3d.io/?utm_source=rss&utm_medium=rss kind Django Hurricane Blueshoe Django Kubernetes K3d Telepresence Unikube Sidecar Pattern Docker-compose Kubernetes Patterns book O’Reilly Platform Amazon (affiliate link) CodeZero CoreDNS Nginx Cookiecutter Tornado Podcast Episode uWSGI Podcast Episode 12 Factor App Pycloak Keycloak Kubernetes Operator Kubernetes CRD (Custom Resource Definition The intro and outro music is from Requiem for a

Ep 358Building A Community And Technology Stack For Scalable Big Data Geoscience At Pangeo
FullSummary Science is founded on the collection and analysis of data. For disciplines that rely on data about the earth the ability to simulate and generate that data has been growing faster than the tools for analysis of that data can keep up with. In order to help scale that capacity for everyone working in geosciences the Pangeo project compiled a reference stack that combines powerful tools into an out-of-the-box solution for researchers to be productive in short order. In this episode Ryan Abernathy and Joe Hamman explain what the Pangeo project really is, how they have integrated a combination of XArray, Dask, and Jupyter to power these analytical workflows, and how it has helped to accelerate research on multidimensional geospatial datasets. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Your host as usual is Tobias Macey and today I’m interviewing Ryan Abernathy and Joe Hamman about Pangeo, a community platform for Big Data geoscience Interview Introductions How did you get introduced to Python? Can you describe what Pangeo is and the story behind it? What is your role in the project/community and how did you get involved? What are the goals of the project and community? What are the areas of effort and how are they organized? What are the scientific domains that Pangeo is focused on supporting? What are the primary challenges associated with data management and analysis in these scientific communities? What are the forms that these data take and how have they been evolving? (e.g. formats/sources) What are some of the challenges introduced by the widespread adoption of cloud resources and the associated architectural patterns? Can you describe the technical components that fall under the Pangeo umbrella? How do they come together to form a functional workflow for geo sciences? How has the scope of the Pangeo project changed or evolved since it started? What are the most interesting, innovative, or unexpected ways that you have seen Pangeo used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pangeo? When is Pangeo the wrong choice? What do you have planned for the future of Pangeo? Keep In Touch Joe @HammanHydro on Twitter Ryan @rabernat on Twitter rabernat on GitHub Website Picks Tobias Mountain Biking Ryan Klara And The Sun by Kazuo Ishiguro Joe Range by David Epstein Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Pangeo Pangeo Forge CarbonPlan M2LInES LEAP Columbia University XArray MIT MatLab PHP Ruby Java NumPy SciPy Matplotlib C Fortran Perl Dask Data Engineering Podcast Episode Jupyter IDL HDF5 Unidata NetCDF CF Metadata Conventions Intake Podcast Episode FSSpec Parquet Data Engineering Podcast Episode Zarr Data Engineering Podcast Pangeo Forge Airbyte Data Engineering Podcast Episode Fivetran Data Engineering Podcast Episode Stitch TileDB Data Engineering Podcast Episode Pythia The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra /

Ep 357Automating Application Lifecycles For Developer Happiness At Wayfair
FullSummary A common piece of advice when starting anything new is to "begin with the end in mind". In order to help the engineers at Wayfair manage the complete lifecycle of their applications Joshua Woodward runs a team that provides tooling and assistance along every step of the journey. In this episode he shares some of the lessons and tactics that they have developed while assisting other engineering teams with starting, deploying, and sunsetting projects. This is an interesting look at the inner workings of large organizations and how they invest in the scaffolding that supports their myriad efforts. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Your host as usual is Tobias Macey and today I’m interviewing Joshua Woodward about how the application lifecycle team at Wayfair uses Python to Interview Introductions Josh Woodward, for the past year have been managing the application lifecycle team at Wayfair. Prior to that, IC on python platforms team. Embed with teams looking to decouple from monolith. See pain points first hand. How did you get introduced to Python? High school physics class, TI84 Calculator, friend wrote a program to solve vector problems, I thought it was amazing. Used TI-Basic to solve specific physics problems for me. (Give fixed inputs, run through equation, get outputs) Approaching college, thinking about student loans. Heard about python and decided to give it a shot. Wrote program to simulate various payback / interest scenarios. Went to college for ME, switched to SE when I found out my dorm neighbors were using python to draw cool images with python + turtle Can you describe what the role of the application lifecycle team is and the story behind it? Story behind it: Around 2018, in a state where we had deploy congestion, challenging to iterate and ship changes. tech org invested in containerization and decoupling to directly combat this problem. Teams incentiviced to decouple. While on python platforms, the team had already been experimenting with code templating. Standard cookiecutter template for flask apps. Wayfair experimenting with Kubernetes late 2017. Spent 1 year embedding with 4 different teams to help knowledge transfer re: k8s, containers, application setup, python best practices, testing, linting, etc – through that we got a lot of great feedback on our tooling. Took senior engineers weeks to get something setup. Know who to contact, click the right buttons, file the right ticket Approach: Counted manual steps. Something like 60 distinct / atomic activities that had to be performed to get a "hello world" response from a basic flask app in production. Focus on reduce manual steps Released product (Mamba, on theme of snakes) Initially, supporting one main user story. User story: "As an engineer, I would like to create a production ready application in 10 minutes so that I can have a reliable and standardized application setup that follows best practices." grew out of python platforms, created own team with own scope, that was about 1.5 years ago. What is your team’s scope now? Team Scope is to facilitate the creation, maintenance, and decommissioning of decoupled applications at Wayfair. What are the interfaces that your team has to the rest of the organization? People Interfaces: We value getting feedback on our work to build strong products. Make assumptions, Willing to be wrong. Validate assumptions with

Ep 356Run Your Applications Reliably On Kubernetes Without Losing Sleep With Robusta
FullSummary Kubernetes is a framework that aims to simplify the work of running applications in production, but it forces you to adopt new patterns for debugging and resolving issues in your systems. Robusta is aimed at making that a more pleasant experience for developers and operators through pre-built automations, easy debugging, and a simple means of creating your own event-based workflows to find, fix, and alert on errors in production. In this episode Natan Yellin explains how the project got started, how it is architected and tested, and how you can start using it today to keep your Python projects running reliably. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Your host as usual is Tobias Macey and today I’m interviewing Natan Yellin about Robusta, Interview Introductions How did you get introduced to Python? Can you describe what Robusta is and the story behind it? What are some of the challenges that teams face when running their systems in Kubernetes? How does Robusta help address those difficulties? How does Robusta compare to e.g. Rookout? What are some of the ways that Robusta is able to provide specific insights for Python applications? Can you describe how Robusta is implemented? What are some of the most challenging engineering tasks that you have had to work through while building Robusta? How have the capabilities and components evolved from when you started working on it? What is the workflow for integrating Robusta into a Kubernetes environment and a team’s maintenance processes? What are some examples of the kinds of questions that Robusta can help answer out of the box? What are some tasks that Robusta facilitates which require manual exploration? What are the interfaces available for customizing and extending the functionality of Robusta? What is involved in adding a new automation capability to Robusta? How have you approached the design of the tool to make it ergonomic and intuitive so that it doesn’t contribute to the stresses of dealing with errors in production? Given that it is a tool to help resolve problems in production infrastructure, how have you worked to ensure its reliability and resilience? What is the governance and sustainability model for Robusta? What are the most interesting, innovative, or unexpected ways that you have seen Robusta used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Robusta? When is Robusta the wrong choice? What do you have planned for the future of Robusta? Keep In Touch LinkedIn @aantn on Twitter aantn on GitHub Website Picks Tobias Kubernetes: Up And Running (affiliate link) Natan Kubernetes for SysAdmins Youtube video by Kelsey Hightower Learn to delegate Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Robusta GHOP Objective C Snyk Heroku Google AppEngine OOM Killer Bin Packing/Knapsack Problem Prometheus Kubernetes Pods PySpy tracemalloc Pyrasite VSCode Debugger Pydantic Podcast Episode Helm ̵

Ep 355Accelerate The Development And Delivery Of Your Machine Learning Applications Using Ray And Deploy It At Anyscale
FullSummary Building a machine learning application is inherently complex. Once it becomes necessary to scale the operation or training of the model, or introduce online re-training the process becomes even more challenging. In order to reduce the operational burden of AI developers Robert Nishihara helped to create the Ray framework that handles the distributed computing aspects of machine learning operations. To support the ongoing development and simplify adoption of Ray he co-founded Anyscale. In this episode he re-joins the show to share how the project, its community, and the ecosystem around it have grown and evolved over the intervening two years. He also explains how the techniques and adoption of machine learning have influenced the direction of the project. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Robert Nishihara about his work at Anyscale and the Ray distributed execution framework Interview Introductions How did you get introduced to Python? Can you describe what Anyscale is and the story behind it? How has the Ray project and ecosystem evolved since we last spoke? (2 years ago) How has the landscape of AI/ML technologies and techniques shifted in that time? What are the main areas where organizations are trying to apply ML/AI? What are some of the issues that teams encounter when trying to move from prototype to production with ML/AI applications? What are the features of Ray that help to mitigate those challenges? With the introduction of more widely available streaming/real-time technologies the viability of reinforcement learning has increased. What new challenges does that approach introduce? What are some of the operational complexities associated with managing a deployment of Ray? What are some of the specialized utilities that you have had to develop to maintain a large and multi-tenant platform for your customers? What is the governance model around the Ray project and how does the work at Anyscale influence the roadmap? What are the most interesting, innovative, or unexpected ways that you have seen Anyscale/Ray used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Ray and Anyscale? When is Anyscale/Ray the wrong choice? What do you have planned for the future of Anyscale/Ray? Keep In Touch robertnishihara on GitHub @robertnishihara on Twitter Website LinkedIn Picks Tobias The Edge Chronicles: Beyond The Deepwoods Robert Production RL Summit Project Hail Mary by Andy Weir Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Ray Podcast Episode Anyscale UC Berkeley Matlab Deep Learning Pandas NumPy Horovod Podcast Episode XGBoost Modin Podcast Episode Dask Ray Datasets Reinforcement Learning Production Reinforcement Learning Summit AlphaGo Databricks Snowflake Data Engineering Podcast Episode TPU == Tensor Processing Unit Weights and Biases MLFlow RLLib Ray Serve The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 354See The Structure Of Your Software At A Glance With Call Graphs From Code2Flow
FullSummary As software projects grow and change it can become difficult to keep track of all of the logical flows. By visualizing the interconnections of function definitions, classes, and their invocations you can speed up the time to comprehension for newcomers to a project, or help yourself remember what you worked on last month. In this episode Scott Rogowski shares his work on Code2Flow as a way to generate a call graph of your programs. He explains how it got started, how it works, and how you can start using it to understand your Python, Ruby, and PHP projects. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Subsurface Live is the cloud data lake conference, a virtual conference where data engineers, data scientists, data architects, and data analysts can gather and hear about cloud data lakes and the data ecosystem. Subsurface Live Winter 2022 includes keynote talks from Bill Inmon, the father of the data warehouse, Author of Deep Work Cal Newport, and several more from companies such as Dremio, AWS, dbt, and more. Subsurface will also have many breakout sessions featuring Pandas creator Wes McKinney, Apache Superset & Airflow creator Maxime Beauchemin, and engineers from Apple, Uber, Adobe, Bloomberg, and more. Meet other data professionals and learn about the data technologies and practices helping companies meet their current and future data needs. Register today at pythonpodcast.com/subsurface Your host as usual is Tobias Macey and today I’m interviewing Scott Rogowski about Code2Flow, a utility for generating "pretty good" call graphs for dynamic languages Interview Introductions How did you get introduced to Python? Can you describe what Code2Flow is and the story behind it? What are some of the ways that a program’s call graph might be used? How does the visual representation generated by Code2Flow help with exploring the structure of a project? What are some of the alternative approaches/tools that might be used to gain similar insights? What do you see as the overlap in utility between Code2Flow and e.g. SourceGraph? Can you describe how the Code2Flow project is implemented? How have the design and goals of the project changed since you first began working on it? Given that Code2Flow is implemented in Python, how have you managed the parsing/processing of the other languages that you support? Visualizing a complex program can quickly become very messy. How have you approached the layout of the output to enhance comprehension? What are some of the situations where Code2Flow will be unable to provide a full picture of a program’s call graph? What are some of the pieces of information that are unavailable due to the static analysis approach that you have taken? Can you describe the process of applying Code2Flow to a project? Once the structure is on display, what are some next steps that an individual or team might take to analyze and act on the information? Given the static nature of the output, how might Code2Flow be incorporated in a CI/CD system to provide insight into the evolution of a projects structure? What are the most interesting, innovative, or unexpected ways that you have seen Code2Flow used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Code2Flow? When is Code2Flow the wrong choice? What do you have planned for the future of Code2Flow? Keep In Touch Website scottrogowski on GitHub Picks Tobias Taking Vacation Universal Studios, Florida Scott Service work Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Code2Flow Colombia Mongita TI-83 Ruby PHP AST == Abstract Syntax Tree Graphviz Pylint Robert Frost The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 353Scaling Knowledge Management For Technical Teams With Knowledge Repo
FullSummary One of the most persistent challenges faced by organizations of all sizes is the recording and distribution of institutional knowledge. In technical teams this is exacerbated by the need to incorporate technical review feedback and manage access to data before publishing. When faced with this problem as an early data scientist at AirBnB, Chetan Sharma helped create the Knowledge Repo project as a solution. In this episode he shares the story behind its creation and growth, how and why it was released as open source, and the features that make it a compelling option for your own team’s knowledge management journey. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Chetan Sharma about Knowledge Repo, an open source framework for managing documentation for technical users Interview Introductions How did you get introduced to Python? EE + CS/AI + Stats degrees Airbnb working on ML models Knowledge Repo itself Can you describe what Knowledge Repo is and the story behind it? We started seeing interviewees use ipython notebooks, thought they were great Wanted to push more people to use notebooks, but they weren’t very shareable, vettable Existing notebook hosting services weren’t very good, and weren’t built for people who aren’t data stakeholders. It was especially poor with images, annoying cell blocks Made a simple post processor to remove cell blocks, push the images to s3, and host on flask Once we were pushing notebooks into a Github repo for hosting on a flask app, so many things became possible Review cycles Shareability / collaboration features Indexing / searching Concurrently, great work was happening on developing internal R packages / python libraries to provide consistent, branded aesthetics What are some of the approaches that teams typically take for recording and sharing institutional knowledge? Copy and paste to google docs, slides Facebook was using facebook photo albums untrustworthy, not discoverable, divorced from the code What are the unique requirements that are introduced when attempting to record and distribute learnings related to data such as A/B experiments, analytical methods, data sets, etc.? Reproducibility is a big one Making sure the learnings are trustworthy (good data? no bugs?) Distributing widely, across the org and across time Experimentation Experimentation is at the end of a research-design-build-measure cycle, strategic analysis is often before Capturing all of the context Can you describe how the Knowledge Repo project is architected? Repositories: a store of posts, most commonly a github repo Markdown as original lingua franca, eventually a KR specific “KR post” concept (which is still basically markdown) Post processors Convert whatever upstream file to markdown / KR post (Jupyter notebook, R Markdown, markdown were the original ones) Handle images and other large assets, usually pushing them to cloud storage Evolved to handle PDFs, googledocs, keynotes What were the motivating factors for making it available as an open source project? It was such a common problem. Even incredibly sophisticated data teams at Uber, Facebook, etc. were begging us to share the system. What is the workflow for creating, sharing, and discovering information in an installation of Knowledge Repo? Create a github repo for hosting strategic analysis Use the KR script to create a stub/template for whatever format you’re working in Do your work in Jupyter, etc. Instead of using github scripts (git add) use knowledge scripts (knowledge add), which is basically the github scripts with postprocessors Do typical Github workflows See the result in the hosted knowledge repo app What are some of the options available for extending or customizing an installation of Knowledge Repo? More postprocessors! google docs, presentations, UX research, anything can be done in KR with a simple postprocessor to turn it to markdown/images/PDF Tying the system to your internal data tools. For example, an experimentation system like Eppo or whatever you use for marketing campaigns If you were to start over today, what are some of the ways that you might approach the solution to knowledge management differently? Think of it

Ep 352Simplify And Scale Your Software Development Cycles By Putting On Pants (Build Tool)
FullSummary Software development is a complex undertaking due to the number of options available and choices to be made in every stage of the lifecycle. In order to make it more scaleable it is necessary to establish common practices and patterns and introduce strong opinions. One area that can have a huge impact on the productivity of the engineers engaged with a project is the tooling used for building, validating, and deploying changes introduced to the software. In this episode maintainers of the Pants build tool Eric Arellano, Stu Hood, and Andreas Stenius discuss the recent updates that add support for more languages, efforts made to simplify its adoption, and the growth of the community that uses it. They also explore how using Pants as the single entry point for all of your routine tasks allows you to spend your time on the decisions that matter. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Building data integration workflows is time consuming and tedious, requiring an unpleasant amount of boilerplate code to do it right. Rivery is a managed platform for building our ELT pipelines that offers the industry’s first native integration with Python, allowing you to seamlessly load and export Pandas dataframes to and from all of your databases, services, and data warehouses with a few clicks and no extra code. Rivery is hosting a live demo of their first class Python support on February 22nd, and when you use the promo code "Python" during registration you will be entered to win a brand new series 7 apple watch. Go to pythonpodcast.com/rivery today to learn more and register. Your host as usual is Tobias Macey and today I’m interviewing Eric Arellano, Stu Hood, and Andreas Stenius about the Pants build tool and all of the work that has gone into it recently Interview Introductions How did you get introduced to Python? Can you describe what Pants is and the story behind it? What is the scope of concerns that Pants is focused on addressing? What are some of the notable changes in the project and its ecosystem over the past 1 1/2 years? How do you approach the work of defining the target scope of the Pants toolchain? What are some of your guiding principles to decide when a feature request belongs in the core vs as a plugin? What are some of the ergonomic improvements that you have added to simplify the work of getting started with Pants and adopting it across teams? What are some of the challenges that teams run into as they start to scale the size of their monorepos? (e.g. project design, boilerplate reduction, etc.) How are you managing the work of growing and supporting the community as you move beyond early adopters/experts into newcomers to Pants and programming? How are you handling support for multiple language ecosystems? What are some of the challenges involved with making Pants feel idiomatic for such a range of communities? How does the use of Python as the plugin/extension syntax work for teams that don’t use it as their primary language? What are the architectural changes that needed to be made for you to be capable of integrating with the different execution environments? How would you characterize the level of feature coverage across the different supported languages? Now that you have laid the foundation, how much effort is required to add new language targets? What are the most interesting, innovative, or unexpected ways that you have seen Pants used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pants? When is Pants the wrong choice? What do you have planned for the future of Pants? Keep In Touch Eric LinkedIn Eric-Arellano on GitHub @earellanoaz on Twitter Stu LinkedIn @stuhood on Twitter stuhood on GitHub Andreas @andreasstenius on Twitter kaos on GitHub Picks Tobias Last Kingdom on Netflix Eric Getting Curious Stu Checks and Balance Podcast Andreas The Pragmatic Programmer Links Pants Make Earthly Podcast Episode MyPy Podcast Episode PyRight Pylint Flake8 Podcast Episode Bazel pre-commit Podcast Episode Underpants library PyOxidizer Podcast Episode Eric PyCon Talk The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 351Achieve Repeatable Builds Of Your Software On Any Machine With Earthly
FullSummary It doesn’t matter how amazing your application is if you are unable to deliver it to your users. Frustrated with the rampant complexity involved in building and deploying software Vlad A. Ionescu created the Earthly tool to reduce the toil involved in creating repeatable software builds. In this episode he explains the complexities that are inherent to building software projects and how he designed the syntax and structure of Earthly to make it easy to adopt for developers across all language environments. By adopting Earthly you can use the same techniques for building on your laptop and in your CI/CD pipelines. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Vlad A. Ionescu about Earthly, a syntax and runtime for software builds to reduce friction between development and delivery Interview Introductions How did you get introduced to Python? Can you describe what Earthly is and the story behind it? What are the core principles that engineers should consider when designing their build and delivery process? What are some of the common problems that engineers run into when they are designing their build process? What are some of the challenges that are unique to the Python ecosystem? What is the role of Earthly in the overall software lifecycle? What are the other tools/systems that a team is likely to use alongside Earthly? What are the components that Earthly might replace? How is Earthly implemented? What were the core design requirements when you first began working on it? How have the design and goals of Earthly changed or evolved as you have explored the problem further? What is the workflow for a Python developer to get started with Earthly? How can Earthly help with the challenge of managing Javascript and CSS assets for web application projects? What are some of the challenges (technical, conceptual, or organizational) that an engineer or team might encounter when adopting Earthly? What are some of the features or capabilities of Earthly that are overlooked or misunderstood that you think are worth exploring? What are the most interesting, innovative, or unexpected ways that you have seen Earthly used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Earthly? When is Earthly the wrong choice? What do you have planned for the future of Earthly? Keep In Touch LinkedIn @VladAIonescu on Twitter Website Picks Tobias Shape Up book Vlad High Output Management by Andy Grove Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Earthly Bazel Pants Podcast Episode ARM AWS Graviton Apple M1 CPU Qemu Phoenix web framework for Elixir language The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 350Building A Detailed View Of Your Software Delivery Process With The Eiffel Protocol
FullSummary The process of getting software delivered to an environment where users can interact with it requires many steps along the way. In some cases the journey can require a large number of interdependent workflows that need to be orchestrated across technical and organizational boundaries, making it difficult to know what the current status is. Faced with such a complex delivery workflow the engineers at Ericsson created a message based protocol and accompanying tooling to let the various actors in the process provide information about the events that happened across the different stages. In this episode Daniel Ståhl and Magnus Bäck explain how the Eiffel protocol allows you to build a tooling agnostic visibility layer for your software delivery process, letting you answer all of your questions about what is happening between writing a line of code and your users executing it. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Daniel Ståhl and Magnus Bäck about Eiffel, an open protocol for platform agnostic communication for CI/CD systems Interview Introductions How did you get introduced to Python? Can you describe what Eiffel is and the story behind it? What are the goals of the Eiffel protocol and ecosystem? What is the role of Python in the Eiffel ecosystem? What are some of the types of questions that someone might ask about their CI/CD workflow? How does Eiffel help to answer those questions? Who are the personas that you would expect to interact with an Eiffel system? Can you describe the core architectural elements required to integrate Eiffel into the software lifecycle? How have the design and goals of the Eiffel protocol/architecture changed or evolved since you first began working on it? What are some example workflows that an engineering/product team might build with Eiffel? What are some of the challenges that teams encounter when integrating Eiffel into their delivery process? What are the most interesting, innovative, or unexpected ways that you have seen Eiffel used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Eiffel? When is Eiffel the wrong choice? What do you have planned for the future of Eiffel? Keep In Touch Daniel d-stahl-ericsson on GitHub LinkedIn Magnus LinkedIn magnusbaeck on GitHub Picks Tobias Red Notice Daniel The Witcher Magnus Lego Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Eiffel Ericsson Axis Communications Hudson CI framework Spinnaker Jenkins Tekton Gradle Artifactory JSON Schema RabbitMQ Prometheus Continuous Delivery Foundation CD Events XKCD Competing Standards Python Eiffel SDK Pydantic The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 349Improve Your Productivity By Investing In Developer Experience Design For Your Projects
FullSummary When we are creating applications we spend a significant amount of effort on optimizing the experience of our end users to ensure that they are able to complete the tasks that the system is intended for. A similar effort that we should all consider is optimizing the developer experience for ourselves and other engineers who contribute to the projects that we work on. Adam Johnson recently wrote a book on how to improve the developer experience for Django projects and in this episode he shares some of the insights that he has gained through that project and his work with clients to help you improve the experience that you and your team have when collaborating on software development. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Adam Johnson about optimizing your developer experience Interview Introductions How did you get introduced to Python? Can you describe what you mean by the term "developer experience"? How does it compare to the concept of user experience design? What are the main goals that you aim for through improving DX? When considering DX, what are the categories of focus for improvement? (e.g. the experience of a given software project, the developer’s physical environment, their editing environment, etc.) What are some of the most high impact optimizations that a developer can make? What are some of the areas of focus that have the most variable impact on a developer’s experience of a project? What are some of the most helpful tools or practices that you rely on in your own projects? How does the size of a development team or the scale of an organization impact the decisions and benefits around DX improvements? One of the perennial challenges with selecting a given tool or architectural pattern is the continually changing landscape of software. How have your choices for DX strategies changed or evolved over the years? What are the most interesting, innovative, or unexpected developer experience tweaks that you have encountered? What are the most interesting, unexpected, or challenging lessons that you have learned while working on your book? What are some of the potential pitfalls that individuals and teams need to guard against in their quest to improve developer experience for their projects? What are some of the new tools or practices that you are considering incorporating into your own work? Keep In Touch @AdamChainz on Twitter Website adamchainz on GitHub Picks Tobias Eternals movie Adam Fan of Eternals, enjoyed Neil Gaiman series Also general MCU fan, watched it all in lockdown Moon Knight trailer Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Boost Your Django DX Rust Ripgrep Factory Boy Mimesis Podcast Episode Language Server Protocol EditorConfig Starship Command Prompt Pre-Commit Podcast Episode Flake8 Podcast Episode DevDocs Dash library documentation search tool pyupgrade StandardJS Cython Podcast Episode The Phoenix Project The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 348An Exploration Of Effective Pandas Practices With Matt Harrison
FullSummary Pandas has grown to be a ubiquitous tool for working with data at every stage. It has become so well known that many people learn Python solely for the purpose of using Pandas. With all of this activity and the long history of the project it can be easy to find misleading or outdated information about how to use it. In this episode Matt Harrison shares his work on the book "Effective Pandas" and some of the best practices and potential pitfalls that you should know for applying Pandas in your own work. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Matt Harrison about best practices for using Pandas for data exploration, manipulation, and analysis Interview Introductions How did you get introduced to Python? What motivated you to write a book about Pandas? There are a number of books available that cover some aspect of the Pandas framework or its application. What was missing from the available literature? Who is your target audience for this book? What are some of the most surprising things that you have learned about Pandas while working on this book? What are the sharp edges that you see newcomers to pandas run into most frequently? It is easy to use Pandas in a naive manner and get things done. What are some of the bad habits that you have seen people form in their work with Pandas? How and when do those habits become harmful? What are the most interesting, innovative, or unexpected ways that you have seen Pandas used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on this book? What are some of the projects that you are planning to work on in the near/medium term? Keep In Touch Website @__mharrison__ on Twitter Blog mattharrison on GitHub Picks Tobias MSR Snowshoes Matt Telemark Skiing 22 Designs Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Effective Pandas Book (affiliate link with 20% discount code applied) Discount code INIT TCL Perl Pandas Podcast Episode Pandas Extension Arrays Podcast Episode Koalas Dask Data Engineering Podcast Episode Modin Podcast Episode The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 347Generate Your Text Files With Python Using Cog
FullSummary Developers hate wasting effort on manual processes when we can write code to do it instead. Cog is a tool to manage the work of automating the creation of text inside another file by executing arbitrary Python code. In this episode Ned Batchelder shares the story of why he created Cog in the first place, some of the interesting ways that he uses it in his daily work, and the unique challenges of maintaining a project with a small audience and a well defined scope. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Ned Batchelder about Cog, a tool for generating files or text from embedded Python logic Interview Introductions How did you get introduced to Python? Can you describe what Cog is and the story behind it? What are the use cases that you initially created Cog to address? What were the shortcomings or extraneous overhead that you encountered in tools such as Jinja, Mako, Genshi, etc. that led you to create a new tool? What was your path from a quick and dirty script that suited your own purposes to turning it into a niche open source project that was general and stable enough for the broader community? One of your claims to fame is your role as the maintainer for coverage.py. How has your experience managing such a widely used project translated to the relatively small and low traffic project like Cog? Can you describe how Cog is implemented? How did you approach the design of the syntactic elements for embedding Python code into a host file? What is the workflow for someone using Cog to generate all or parts of a file? How does the introduction of third party dependencies impact the viability and utility of Cog as compared to other templating systems? What are the most interesting, innovative, or unexpected ways that you have seen Cog used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Cog? When is Cog the wrong choice? What do you have planned for the future of Cog? Keep In Touch Website nedbat on GitHub @nedbat on Twitter LinkedIn Picks Tobias Samson Q9U Microphone Ned McFly Command Line History Tool Go for a walk Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Cog Boston Python Lotus Lotus Notes Zope Cheetah Template Engine Coverage.py Podcast Episode Unix Philosophy Hungarian Notation Jupyter Notebooks GitHub Profile ReadMe Ned’s GitHub Profile Raw Markdown The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 346A Friendly Approach To Regression Models For Programmers
FullSummary Statistical regression models are a staple of predictive forecasts in a wide range of applications. In this episode Matthew Rudd explains the various types of regression models, when to use them, and his work on the book "Regression: A Friendly Guide" to help programmers add regression techniques to their toolbox. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Matthew Rudd about the applications of statistical modeling and regression, and how to start using it for your work Interview Introductions How did you get introduced to Python? Can you start by describing some use cases for statistical regression? What was your motivation for writing a book to explain this family of algorithms to programmers? What are your goals for the book? Who is the target audience? What are some of the different categories of regression algorithms? What are some heuristics for identifying which regression to use? How have you approached the balance of using software principles for explaining the work of building the models with the mathematical underpinnings that make them work? What are some of the concepts that are most challenging for people who are first working with regression models? What are the most interesting, innovative, or unexpected ways that you have seen statistical regression models used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on your book? What are some of the resources that you recommend for folks who want to learn more about the inner workings and applications of regression models after they finish your book? Keep In Touch LinkedIn @MatthewBRudd on Twitter Picks Tobias The Argument podcast from the NY Times Matthew Primus Claypool Lennon Delirium South of Reality Links Regression: A Friendly Guide Sewanee University of the South Sewanee Data Lab Mark Lutz Python books Elements of Statistical Learning Linear Regression Logistic Regression Modeling Binary Data Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 345Fast, Flexible, and Incremental Task Automation With doit
FullSummary Every software project needs a tool for managing the repetitive tasks that are involved in building, running, and deploying the code. Frustrated with the limitations of tools like Make, Scons, and others Eduardo Schettino created doit to handle task automation in his own work and released it as open source. In this episode he shares the story behind the project, how it is implemented under the hood, and how you can start using it in your own projects to save you time and effort. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Eduardo Schettino about Doit, a flexible and low overhead task automation tool Interview Introductions How did you get introduced to Python? Can you describe what doit is and the story behind it? What are the main goals and use cases of doit? Can you describe how you approached the implementation of Doit? How has the design changed or evolved since you first began working on it? The realm of task automation tools for developers is an exceedingly crowded one, with each tool prioritizing certain use cases. How would you characterize the position of doit in the current ecosystem? How does it compare to e.g. Click, Invoke, Typer, etc.? What is your guiding philosophy for when and how to add new features? You have been running the project for ~13 years now. How has the evolution of the Python language and ecosystem influenced your approach to the development and maintenance of doit? What is the workflow for getting started with doit and integrating it into your development process? For every project there are some tasks that are identical and some that are bespoke for that application. What are the options for maintaining a standard set of tasks across repositories and composing them with per-project activites? What are some of the useful patterns that you and the community have established for designing tasks and execution graphs? How do you use doit in your own work? What are the most interesting, innovative, or unexpected ways that you have seen doit used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on doit? When is doit the wrong choice? What do you have planned for the future of doit? Keep In Touch LinkedIn schettino72 on GitHub Picks Tobias The Matrix series Eduardo John Pilger Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links doit Zope Twisted Django Pyflakes scons Make Nikola Podcast Episode Nose Pytest Podcast Episode Click Typer Invoke Puppet Ansible Chef Sphinx Snakemake Airflow Luigi pytest-incremental import-deps dbm MetalK8s The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 344The Technological, Business, and Sales Challenges Of Building The Ethical Ads Network
FullSummary Whether we like it or not, advertising is a common and effective way to make money on the internet. In order to support the work being done at Read The Docs they decided to include advertisements on the documentation sites they were hosting, but they didn’t want to alienate their users or collect unnecessary information. In this episode David Fischer explains how they built the Ethical Ads network to solve their problem, the technical and business challenges that are involved, and the open source application that they built to power their network. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing David Fischer about the Ethical Ads marketplace and the technology that runs Interview Introductions How did you get introduced to Python? Can you describe what the Ethical Ads project is and the story behind it? What are the technical and organizational requirements involved in running an ad network? How have you approached the problem of kickstarting the flywheel for the two-sided marketplace? What are some of the challenges that you face in building an accurate profile of your audience without using detailed tracking methods? What are the benefits that you see in focusing exclusively on developers in your publisher relationships? Can you describe the design and implementation of the ad server? How has the architecture evolved since you first began working on it? If you were to start over today what might you do differently? How have you approached scaling for performance and geographic distribution? What mechanisms do you use for tracking impressions/measuring ad effectiveness? How can advertisers experiment with A/B testing of ad copy? If someone wants to run their own advertisements with the ethical ads server, what is involved in getting it deployed and integrated into their sites? What are the integration and extension points available for customizing the behavior of the platform? What are some of the most notable lessons that you have learned about online advertising since you first started working on the Ethical Ads project? What are the most interesting, innovative, or unexpected ways that you have seen Ethical Ads used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on the Ethical Ads platform? What do you have planned for the future of the Ethical Ads platform? Keep In Touch davidfischer on GitHub @djfische on Twitter LinkedIn Picks Tobias Ship It! Podcast David Local Python Meetup Click CLI framework useragents library TLD for parsing internet domains Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Ethical Ads Network Ethical Ads Server San Diego Python Read The Docs Podcast Episode CodeFund CPM == Cost Per Mille The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 343Accidentally Building A Business With Python At Listen Notes
FullSummary Podcasts are one of the few mediums in the internet era that are still distributed through an open ecosystem. This has a number of benefits, but it also brings the challenge of making it difficult to find the content that you are looking for. Frustrated by the inability to pick and choose single episodes across various shows for his listening Wenbin Fang started the Listen Notes project to fulfill his own needs. He ended up turning that project into his full time business which has grown into the most full featured podcast search engine on the market. In this episode he explains how he build the Listen Notes application using Python and Django, his work to turn it into a sustainable business, and the various ways that you can build other applications and experiences on top of his API. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Wenbin Fang about the technology powering the Listen Notes podcast discovery platform Interview Introductions How did you get introduced to Python? Can you describe what Listen Notes is and the story behind it? What are some of the main goals that listeners have when searching for a podcast? What are the challenges that they commonly encounter when looking for information in a podcast? What are the different sources of information that you can use to extract useful details about a podcast? How do you identify and prioritize new features or product enhancements? Can you describe how the Listen Notes platform is architected? How has it changed or evolved since you first began working on it? How did you approach the technology selection for the initial version of Listen Notes? If you were to start over today, what might you do differently? What are the technical challenges that are posed by the ecosystem around podcasts? What are the biggest changes that have happened in the methods of production and consumption for podcasts since you first became involved in the space? How do you approach the design and contracts of the Listen Notes web API given how core that is to your platform? What are the most complex or complicated engineering projects that you have done for Listen Notes? What are the pieces of the infrastructure for podcasts that you would like to see improved, changed, or replaced? What are some of the kinds of projects that developers can build with the Listen Notes API? What, if any, impact have the introduction of podcasts to closed platforms such as Spotify, Amazon Music, etc. had on your business? What are some of the most surprising things that you have learned about podcasts and their consumption while building Listen Notes? What are the most interesting, innovative, or unexpected ways that you have seen Listen Notes used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Listen Notes? What do you have planned for the future of Listen Notes? Keep In Touch Website LinkedIn wenbinf on GitHub @wenbinf on Twitter Picks Tobias Wheel of Time TV Series Wenbin Superhuman email client Links Listen Notes Graphviz NextDoor PostgreSQL Elasticsearch Redis RabbitMQ Celery ReactJS Django Bootstrap CSS Digital Ocean Tailwind CSS Entity Resolution Clickhouse Data Engineering Podcast Episode The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 342Making Orbital Mechanics More Accessible With Poliastro
FullSummary Outer space holds a deep fascination for people of all ages, and the key principle in its exploration both near and far is orbital mechanics. Poliastro is a pure Python package for exploring and simulating orbit calculations. In this episode Juan Luis Cano Rodriguez shares the story behind the project, how you can use it to learn more about space travel, and some of the interesting projects that have used it for planning planetary and interplanetary missions. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Juan Luis Cano Rodriguez about Poliastro, an open source library for interactive Astrodynamics and Orbital Mechanics, with a focus on ease of use, speed, and quick visualization. Interview Introductions How did you get introduced to Python? Can you describe what Poliastro is and the story behind it? What are some of the simulations that Poliastro is designed to be used for? How much knowledge of orbital mechanics is necessary to get started with Poliastro? Can you describe how the project is implemented? How have the goals and design of the project changed or evolved since you first started it? What are some of the design philosophies that you focus on to make the package accessible to the range of users that you support? Can you talk through the workflow of using Poliastro to do something like track the path of the ISS and its traversal of the debris field from the recent satellite destruction? What are some of the other libraries or frameworks that are commonly used with Poliastro? How are you using Poliastro in your own work? What are some overlooked or underused aspects of the project that you would like to highlight? What are the most interesting, innovative, or unexpected ways that you have seen Poliastro used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Poliastro? When is Poliastro the wrong choice? What do you have planned for the future of Poliastro? Keep In Touch LinkedIn GitHub Email Twitter Picks Tobias Josh Blue (comedian) Juan Luis DJ Cotts DJ Weaver Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Poliastro Fortran 90 (if only this community existed back then! https://ondrejcertik.com/blog/2021/03/resurrecting-fortran/)?utm_source=rss&utm_medium=rss Satellogic Read the Docs Wolfram Alpha Mathematica SageMath 2-Body Problem AstroPy Podcast Episode Numba Import Linter Vallado "Fundamentals of Astrodynamics" International Space Station Starlink Satellites Planetary Ephemeritas Data Satellite Data Kerbal Space Program NumFOCUS Open Collective Python SGP4 Libre Space Foundation The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 340Build Better Analytics And Models With A Focus On The Data Experience
FullSummary A lot of time and energy goes into data analysis and machine learning projects to address various goals. Most of the effort is focused on the technical aspects and validating the results, but how much time do you spend on considering the experience of the people who are using the outputs of these projects? In this episode Benn Stancil explores the impact that our technical focus has on the perceived value of our work, and how taking the time to consider what the desired experience will be can lead us to approach our work more holistically and increase the satisfaction of everyone involved. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Benn Stancil about the perennial frustrations of working with data and thoughts on how to improve the experience Interview Introductions How did you get introduced to Python? Can you start by discussing your perspective on the most frustrating elements of working with data in an organization? How might that compound when working with machine learning? What are the sources of the disconnect between our level of technical sophistication and our ability to produce meaningful insights from our data? There have been a number of formulations about a "hierarchy of needs" pertaining to data. When the goal is to bring ML/AI methods to bear on an organization’s processes or products how can thinking about the intended experience act to improve the end result? What are some failure modes or suboptimal outcomes that might be expected when building from a tooling/technology/technique first mindset? What are some of the design elements that we can incorporate into our development environments/data infrastructure/data modeling that can incentivize a more experience driven process for building data products/analyses/ML models? How does the design and capabilities of the Mode platform allow teams to progress along the journey from data discovery to descriptive analytics, to ML experiments? What are the most interesting, innovative, or unexpected approaches that you have seen for encouraging the creation of positive data experiences? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Mode and data analysis? When is a data experience the wrong approach? What do you have planned for the future of Mode to support this ideal? Keep In Touch LinkedIn @bennstancil on Twitter Picks Tobias Venture Unlocked Podcast Benn Wrap Text by Bobby Pinero Counting Stuff by Randy Au Ray Data Co by Mr Ben Modern Data Democracy By JP Monteiro Bad Blood Podcast Bad Blood Book Links Mode Analytics Tidyverse Airflow Fivetran Data Engineering Podcast Episode dbt Data Engineering Podcast Episode Conway’s Law Cinchy Data Engineering Podcast Episode Reverse ETL The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 341Declarative Deep Learning From Your Laptop To Production With Ludwig and Horovod
FullSummary Deep learning frameworks encourage you to focus on the structure of your model ahead of the data that you are working with. Ludwig is a tool that uses a data oriented approach to building and training deep learning models so that you can experiment faster based on the information that you actually have, rather than spending all of our time manipulating features to make them match your inputs. In this episode Travis Addair explains how Ludwig is designed to improve the adoption of deep learning for more companies and a wider range of users. He also explains how the Horovod framework plugs in easily to allow for scaling your training workflow from your laptop out to a massive cluster of servers and GPUs. The combination of these tools allows for a declarative workflow that starts off easy but gives you full control over the end result. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Travis Adair about building and training machine learning models with Ludwig and Horovod Interview Introductions How did you get introduced to Python? Can you describe what Horovod and Ludwig are? How do the projects work together? What was your path to being involved in those projects and what is your current role? There are a number of AutoML libraries available for frameworks such as scikit-learn, etc. What are the challenges that are introduced by applying that workflow to deep learning architectures? What are the use cases that Ludwig is designed to enable? Who are the target users of Ludwig? How do the workflows change/progress for the different personas? How is the underlying framework architected? What are the available extension points to provide a progressive exposure of complexity? How have the goals and design of the project changed or evolved as it has gained more widespread adoption beyond Uber? What was the motivation for migrating the core of Ludwig from Tensorflow to Pytorch? Can you describe the workflow of building a model definition with Ludwig? How much knowledge of neural network architectures and their relevant characteristics is necessary to use Ludwig effectively? What are the motivating factors for adding Horovod to the process? What is involved in moving from a single machine/single process training loop to a multi-core or multi-machine distributed training process? The combination of Ludwig and Horovod provide a shallower learning curve for building and scaling model training. What do you see as their potential impact on the availability and adoption of more sophisticated ML capabilities across organizations of varying scale? What do you see as other significant barriers to widespread use of ML functionality? What are the most interesting, innovative, or unexpected ways that you have seen Ludwig and/or Horovod used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Ludwig and Horovod? When is Ludwig and/or Horovod the wrong choice? What do you have planned for the future of both projects? Keep In Touch LinkedIn @TravisAddair on Twitter tgaddair on GitHub Picks Tobias Zeal and Ardor Travis Opeth Agaloch Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Ludwig Horovod Predibase Uber Michelangelo Tensorflow PyTorch Podcast Episode Gradient Boosted Trees XGBoost CatBoost LightGBM PyCaret HyperBand scikit-optimize Keras Vision Transformer Architecture HuggingFace Jax DeepSpeed AllReduce Nvidia Collective Communications Library (NCCL) Training Epoch ElasticDL Raft Consensus Algorithm TorchScript Transfer Learning Gordon Bell Prize Anyscale Ray Podcast Episode The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ep 339Building Conversational AI to Augment Sales Teams at Structurely
FullSummary The true power of artificial intelligence is its ability to work collaboratively with humans. Nate Joens co-founded Structurely to create a conversational AI platform that augments human sales teams to help guide potential customers through the initial steps of the funnel. In this episode he discusses the technical and social considerations that need to be combined for a seamless conversational experience and how he and his team are tackling the problem. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Nate Joens about his work at Structurely to build conversational AI utilities that augment human sales interactions Interview Introductions How did you get introduced to Python? Can you describe what Structurely is and the story behind it? What are the elements that comprise a "conversational AI"? How is it distinct from the wave of chatbots that were popular in recent years? What lessons from that approach can we take forward into AI enabled conversational platforms? How are you applying AI to the sales process? How much domain expertise is necessary to make an effective and engaging conversational AI? (e.g. knowledge of sales techniques vs. knowledge of real estate, etc.) Can you describe how you have designed the Structurely platform? What are the biggest engineering challenges that you have had to work through? What challenges or complexities have been most persistent? What are the design complexities that you have to work through to make the AI accessible for end users? What are some of the advancements in AI/NLP/transfer learning that have been most beneficial for teams building conversational AI? What are the signals that you emphasize when monitoring the performance of your models? What is your approach for feeding real-world customer interactions back into your model development and training loop? What are the most active areas of research in conversational AI applications and techniques? What are the most interesting, innovative, or unexpected ways that you have seen Structurely and/or conversational AI used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on conversational AI at Structurely? When is conversational AI the wrong choice? What do you have planned for the future of Structurely? Keep In Touch @whonatejoens on Twitter LinkedIn Picks Tobias Vantage AWS Cost Management Nate VideoForm Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Stucturely GIS Generative AI GPT-3 Sanky Diagram PyTorch Podcast Episode Allen Institute for AI F Score Snorkel Podcast Episode Few-Shot Learning Zero Shot Learning Voxable The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA