The Python Podcast.init

389 episodes — Page 2 of 8

Ep 338Build Composable And Reusable Feature Engineering Pipelines with Feature-Engine

Summary Every machine learning model has to start with feature engineering. This is the process of combining input variables into a more meaningful signal for the problem that you are trying to solve. Many times this process can lead to duplicating code from previous projects, or introducing technical debt in the form of poorly maintained feature pipelines. In order to make the practice more manageable Soledad Galli created the feature-engine library. In this episode she explains how it has helped her and others build reusable transformations that can be applied in a composable manner with your scikit-learn projects. She also discusses the importance of understanding the data that you are working with and the domain in which your model will be used to ensure that you are selecting the right features. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Soledad Galli about feature-engine, a Python library to engineer features for use in machine learning models Interview Introductions How did you get introduced to Python? Can you describe what feature-engine is and the story behind it? What are the complexities that are inherent to feature engineering? What are the problems that are introduced due to incidental complexity and technical debt? What was missing in the available set of libraries/frameworks/toolkits for feature engineering that you are solving for with feature-engine? What are some examples of the types of domain knowledge that are needed to effectively build features for an ML model? Given the fact that features are constructed through methods such as normalizing data distributions, imputing missing values, combining attributes, etc. what are some of the potential risks that are introduced by incorrectly applied transformations or invalid assumptions about the impact of these manipulations? Can you describe how feature-engine is implemented? How have the design and goals of the project changed or evolved since you started working on it? What (if any) difference exists in the feature engineering process for frameworks like scikit-learn as compared to deep learning approaches using PyTorch, Tensorflow, etc.? Can you describe the workflow of identifying and generating useful features during model development? What are the tools that are available for testing and debugging of the feature pipelines? What do you see as the potential benefits or drawbacks of integrating feature-engine with a feature store such as Feast or Tecton? What are the most interesting, innovative, or unexpected ways that you have seen feature-engine used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on feature-engine? When is feature-engine the wrong choice? What do you have planned for the future of feature-engine? Keep In Touch LinkedIn @Soledad_Galli on Twitter solegalli on GitHub Picks Tobias Dune Movie Dune Series Soledad The Social Dilemma Don’t Be Evil by Rana Foroohar Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links feature-engine Feature Engineering Python Feature Engineering Cookbook scikit-learn Feature Stores Podcast Episode Pandas Podcast Episode PyTorch Podcast Episode Tensorflow Feast Tecton Data Engineering Podcast Episode Kaggle Dask Data Engineering Podcast Episode The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Oct 31, 202153 min

Ep 337Speed Up Your Python Data Applications By Parallelizing Them With Bodo

Full

Summary The speed of Python is a subject of constant debate, but there is no denying that for compute heavy work it is not the optimal tool. Rather than rewriting your data oriented applications, or having to rearchitect them, the team at Bodo wrote a compiler that will do the optimization for you. In this episode Ehsan Totoni explains how they are able to translate pure Python into massively parallel processes that are optimized for high performance compute systems. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Ehsan Totoni about Bodo, an inferential compiler for Python that automatically parallelizes your data oriented projects Interview Introductions How did you get introduced to Python? Can you describe what Bodo is and the story behind it? What are some of the use cases that it is being applied to? What are the motivating factors for something like Dask or Ray as compared to Bodo? What are the software patterns that contribute to slowdowns in data processing code? What are some of the ways that the compiler is able to optimize those operations? Can you describe how Bodo is implemented? How does Bodo process the Python code for compiling to the optimized form? What are the compilation techniques for understanding the semantics of the code being processed? How do you manage packages that rely on C extensions? What do you use as an intermediate representation for translating into the optimized output? What is the workflow for applying Bodo to a Python project? What debugging utilities does it provide for identifying any errors that occur due to the added parallelism? What kind of support does Bodo have for optimizing a machine learning project with Bodo? (e.g. using PyTorch/Tensorflow/MxNet/etc.) When working with a workflow orchestrator such as Dagster for Airflow, what would the integration process look like for being able to take advantage of the optimized Bodo output? What are the most interesting, innovative, or unexpected ways that you have seen Bodo used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Bodo? When is Bodo the wrong choice? What do you have planned for the future of Bodo? Keep In Touch LinkedIn @EhsanTn on Twitter ehsantn on GitHub Picks Tobias Paracord Crafts Ehsan [ Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Bodo Data Engineering Podcast Episode University of Illinois Urbana-Champaign HPC MPI Elastic Fabric Adapter All-to-All Communication Dask Data Engineering Podcast Episode Ray Podcast Episode Pandas Extension Arrays Podcast Episode GeoPandas Numba LLVM scikit-learn Horovod Dagster Podcast.__init__ Episode Data Engineering Podcast Episode Airflow Podcast Episode IPython Parallel Parquet The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Oct 25, 202158 min

Ep 336An Exploration Of Financial Exchange Risk Management Strategies

Full

Summary The world of finance has driven the development of many sophisticated techniques for data analysis. In this episode Paul Stafford shares his experiences working in the realm of risk management for financial exchanges. He discusses the types of risk that are involved, the statistical methods that he has found most useful for identifying strategies to mitigate that risk, and the software libraries that have helped him most in his work. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Paul Stafford about building risk models to guard against financial exchange rate volatility Interview Introductions How did you get introduced to Python? What are the principles involved in risk management, and how are statistical methods used? How did you get involved in financial markets? In what ways did your background in science and engineering prepare you for work in finance and risk management? What are the tools that you have found most useful in your career in finance? How have recent trends such as the widespread adoption of deep learning impacted the capabilities and risks present in foreign exchange strategies? What are the challenges that you face in obtaining and validating the input data that you are relying on for building financial and statistical models? How has the volatility of the pandemic impacted the robustness and resilience of your predictive capabilities? What are the areas where the available tools are typically insufficient? What are the most interesting, innovative, or unexpected strategies or techniques that you have seen applied to risk management? What are the most interesting, unexpected, or challenging lessons that you have learned while working in risk management? What are the economic and industry trends that you are keeping a close eye on for your work at Deaglo and your own personal projects? Keep In Touch LinkedIn Picks Tobias The Vault (movie) Paul Motorcycle Trip of the Grand Canyon Links Deaglo Partners, LLC. Value At Risk (VaR) Black-Scholes Equation Linear Algebra Principal Component Analysis Eigenvectors and Eigenvalues Markov Chain Monte Carlo Violin Plot Kurtosis PyMC3 Podcast Episode Bayesian Regression Constrained Optimization Ethereum Smart Contracts Behavioral Finance Black Swan by Nassim Nicholas Taleb (affiliate link) SciPy Convention RealPython 3Blue1Brown Sentiment Analysis The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Oct 16, 202134 min

Ep 335Build Better Machine Learning Models By Understanding Their Decisions With SHAP

Full

Summary Machine learning and deep learning techniques are powerful tools for a large and growing number of applications. Unfortunately, it is difficult or impossible to understand the reasons for the answers that they give to the questions they are asked. In order to help shine some light on what information is being used to provide the outputs to your machine learning models Scott Lundberg created the SHAP project. In this episode he explains how it can be used to provide insight into which features are most impactful when generating an output, and how that insight can be applied to make more useful and informed design choices. This is a fascinating and important subject and this episode is an excellent exploration of how to start addressing the challenge of explainability. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Scott Lundberg about SHAP, a library that implements a game theoretic approach to explain the output of any machine learning model Interview Introductions How did you get introduced to Python? Can you describe what SHAP is and the story behind it? What are some of the contexts that create the need to explain the reasoning behind the outputs of an ML model? How do different types of models (deep learning, CNN/RNN, bayesian vs. frequentist, etc.) and different categories of ML (e.g. NLP, computer vision) influence the challenge of understanding the meaningful signals in their reasoning? Taking a step back, how do you define "explainability" when discussing inferences produced by ML models? What are the degrees of specificity/accuracy when seeking to understand the decision processes involved? Can you describe how SHAP is implemented? What are the signals that you are tracking to understand what features are being used to determine a given output? What are the assumptions that you had as you started this project that have been challenged or updated as you explored the problem in greater depth? Can you describe the workflow for someone using SHAP? What are the challenges faced by practitioners in interpreting the visualizations generated from SHAP? How much domain knowledge and context is necessary to use SHAP effectively? What are the ongoing areas of research around tracking of ML decision processes? How are you using SHAP in your own work? What are the most interesting, innovative, or unexpected ways that you have seen SHAP used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on SHAP? When is SHAP the wrong choice? What do you have planned for the future of SHAP? Keep In Touch slundberg on GitHub Website LinkedIn Picks Tobias Reminiscence Scott Augustine’s Confessions Links SHAP Microsoft Research Matlab Game Theory Computational Biology LIME Shapley Values Julia Language ResNet CNN == Convolutional Neural Network RNN == Recurrent Neural Network A* Algorithm CFPB == Consumer Financial Protection Bureau NP Hard Huggingface Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations Numba Log Odds InterpretML Polyjuice The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Oct 9, 20211h 4m

Ep 334Accelerating Drug Discovery Using Machine Learning With TorchDrug

Full

Summary Finding new and effective treatments for disease is a complex and time consuming endeavor, requiring a high degree of domain knowledge and specialized equipment. Combining his expertise in machine learning and graph algorithms with is interest in drug discovery Jian Tang created the TorchDrug project to help reduce the amount of time needed to find new candidate molecules for testing. In this episode he explains how the project is being used by machine learning researchers and biochemists to collaborate on finding effective treatments for real-world diseases. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Jian Tang about TorchDrug Interview Introductions How did you get introduced to Python? Can you describe what TorchDrug is and the story behind it? What are the goals of the TorchDrug project? Who are the target users of the project? What are the main ways that it is being used? What are the challenges faced by biologists and chemists working on development and discovery of pharmaceuticals? What are some of the other tools/techniques that they would use (in isolation or combination with TorchDrug)? Can you describe how TorchDrug is implemented? How have you approached the design of the project and its APIs to make it accessible to engineers that don’t possess domain expertise in drug discovery research? How do graph structures help when modeling and experimenting with chemical structures for drug discovery? What are the formats and sources of data that you are working with? What are some of the complexities/challenges that you have had to deal with to integrate with up or downstream systems to fit into the overall research process? Can you talk through the workflow of using TorchDrug to build and validate a model? What is involved in determining and codifying a goal state for the model to optimize for? What are the biggest open questions in the area of drug discovery and research? How is TorchDrug being used to assist in the exploration of those problems? What are the most interesting, unexpected, or challenging lessons that you have learned while working on TorchDrug? When is TorchDrug the wrong choice? What do you have planned for the future of TorchDrug? Keep In Touch tangjianpku on GitHub @tangjianpku on Twitter Website LinkedIn Picks Tobias Rope refactoring library Jian Attending conferences once the pandemic is over Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links TorchDrug Mila Yoshua Bengio Alphafold Few-shot learning Metalearning PyTorch Geometric DeepGraph Library NetworKit Podcast Episode graph-tool Podcast Episode The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Sep 30, 202144 min

Ep 333An Exploration Of Automated Speech Recognition

Full

Summary The overwhelming growth of smartphones, smart speakers, and spoken word content has corresponded with increasingly sophisticated machine learning models for recognizing speech content in audio data. Dylan Fox founded Assembly to provide access to the most advanced automated speech recognition models for developers to incorporate into their own products. In this episode he gives an overview of the current state of the art for automated speech recognition, the varying requirements for accuracy and speed of models depending on the context in which they are used, and what is required to build a special purpose model for your own ASR applications. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Dylan Fox about the challenges of training and deploying large models for automated speech recognition Interview Introductions How did you get introduced to Python? What is involved in building an ASR model? How does the complexity/difficulty compare to models for other data formats? (e.g. computer vision, NLP, NER, etc.) How have ASR models changed over the last 5, 10, 15 years? What are some other categories of ML applications that work with audio data? How does the level of complexity compare to ASR applications? What is the typical size of an ASR model that you are deploying at Assembly? What are the factors that contribute to the overall size of a given model? How does accuracy compare with model size? How does the size of a model contribute to the overall challenge of deploying/monitoring/scaling it in a production environment? How can startups effectively manage the time/cost that comes with training large models? What are some techniques that you use/attributes that you focus on for feature definitions in the source audio data? Can you describe the lifecycle stages of an ASR model at Assembly? What are the aspects of ASR which are still intractable or impractical to productionize? What are the most interesting, innovative, or unexpected ways that you have seen ASR technology used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on ASR? What are the trends in research or industry that you are keeping an eye on? Keep In Touch LinkedIn @YouveGotFox on Twitter Picks Tobias The Hitman’s Wife’s Bodyguard Dylan Inspiration 4 Documentary Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Learn Python The Hard Way DeepSpeech Wav2Letter BERT GPT-3 Convolutional Neural Network (CNN) Recurrent Neural Network (RNN) Mycroft Podcast Episode CMU Sphinx Pocket Sphinx Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) DeepSpeech Paper Transformer Architecture Audio Analytic Sound Recognition Podcast Episode Horovod distributed training library Knowledge Distillation Libre Speech Data Set Lambda Labs Wav2Vec The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Sep 26, 202154 min

Ep 332Experimenting With Reinforcement Learning Using MushroomRL

Full

Summary Reinforcement learning is a branch of machine learning and AI that has a lot of promise for applications that need to evolve with changes to their inputs. To support the research happening in the field, including applications for robotics, Carlo D’Eramo and Davide Tateo created MushroomRL. In this episode they share how they have designed the project to be easy to work with, so that students can use it in their study, as well as extensible so that it can be used by businesses and industry professionals. They also discuss the strengths of reinforcement learning, how to design problems that can leverage its capabilities, and how to get started with MushroomRL for your own work. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Davide Tateo and Carlo D’Eramo about MushroomRL, a library for building reinforcement learning experiments Interview Introductions How did you get introduced to Python? Can you start by describing what reinforcement learning is and how it differs from other approaches for machine learning? What are some example use cases where reinforcement learning might be necessary? Can you describe what MushroomRL is and the story behind it? Who are the target users of the project? What are its main goals? What are your suggestions to other developers for implementing a succesful library? What are some of the core concepts that researchers and/or engineers need to understand to be able to effectively use reinforcement learning techniques? Can you describe how MushroomRL is architected? How have the goals and design of the project changed or evolved since you began working on it? What is the workflow for building and executing an experiment with MushroomRL? How do you track the states and outcomes of experiments? What are some of the considerations involved in designing an environment and reward functions for an agent to interact with? What are some of the open questions that are being explored in reinforcement learning? How are you using MushroomRL in your own research? What are the most interesting, innovative, or unexpected ways that you have seen MushroomRL used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on MushroomRL? When is MushroomRL the wrong choice? What do you have planned for the future of MushroomRL? How can the open-source community contribute to MushroomRL? What kind of support you are willing to provide to users? Keep In Touch Davide boris-il-forte on GitHub Website Carlo carloderamo on GitHub Website Picks Tobias Britannia TV Series Davide 1984 by George Orwell Carlo Twin Peaks TV Series Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links MushroomRL TU Darmstadt MuJoCo PyBullet iGibson Habitat OpenAI Gym PyTorch Podcast Episode RLLib Ray Podcast Episode OpenAI Baselines Stable Baselines ROS The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Sep 19, 202154 min

Ep 331Doing Dask Powered Data Science In The Saturn Cloud

Full

Summary A perennial problem of doing data science is that it works great on your laptop, until it doesn’t. Another problem is being able to recreate your environment to collaborate on a problem with colleagues. Saturn Cloud aims to help with both of those problems by providing an easy to use platform for creating reproducible environments that you can use to build data science workflows and scale them easily with a managed Dask service. In this episode Julia Signall, head of open source at Saturn Cloud, explains how she is working with the product team and PyData community to reduce the points of friction that data scientists encounter as they are getting their work done. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Julia Signell about building distributed processing workflows in Python through the power of Dask Interview Introductions How did you get introduced to Python? Can you describe what you are building at Saturn Cloud? Who are your target users and how does that inform the features and priorities that you build into your platform? What are the road blocks that data scientists typically encounter when working on their laptop/workstation? How does open source factor into the Saturn product? What are some of the projects that you are collaborating with/contributing to as part of your work at Saturn? How has your experience at Anaconda informed your work at Saturn? Can you describe how the Saturn Cloud platform is architected? How has it changed or evolved since it was first launched? Can you describe the learning curve that data scientists go through when adopting Dask? What are some examples of projects or workflows that Dask enables which are not possible/practical to do locally? How would you characterize the overall awareness/adoption of Dask in the Python data science community? What are the most interesting, innovative, or unexpected ways that you have seen Saturn Cloud used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Saturn Cloud? When is Saturn Cloud the wrong choice? What do you have planned for the future of Saturn Cloud? Keep In Touch @jsignell on Twitter jsignell on GitHub Picks Tobias Peter Rabbit 2 Julia PawPaw Fruit Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Saturn Cloud Dask Podcast Episode Pangeo XArray Conda Mamba Holoviz Dash Anaconda Podcast Episode Kubernetes Tornado Podcast Episode Prefect Podcast Episode Dagster Podcast Episode Airflow Ray Podcast Episode The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Sep 10, 202138 min

Ep 330Monitor The Health Of Your Machine Learning Products In Production With Evidently

Full

Summary You’ve got a machine learning model trained and running in production, but that’s only half of the battle. Are you certain that it is still serving the predictions that you tested? Are the inputs within the range of tolerance that you designed? Monitoring machine learning products is an essential step of the story so that you know when it needs to be retrained against new data, or parameters need to be adjusted. In this episode Emeli Dral shares the work that she and her team at Evidently are doing to build an open source system for tracking and alerting on the health of your ML products in production. She discusses the ways that model drift can occur, the types of metrics that you need to track, and what to do when the health of your system is suffering. This is an important and complex aspect of the machine learning lifecycle, so give it a listen and then try out Evidently for your own projects. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Emeli Dral about monitoring machine learning models in production with Evidently Interview Introductions How did you get introduced to Python? Can you describe what Evidently is and the story behind it? What are the metrics that are useful for determining the performance and health of a machine learning model? What are the questions that you are trying to answer with those metrics? How does monitoring of machine learning models compare to monitoring of infrastructure or "traditional" software projects? What are the failure modes for a model? Can you describe the design and implementation of Evidently? How has the architecture changed or evolved since you started working on it? What categories of model is Evidently designed to work with? What are some strategies for making models conducive to monitoring? What is involved in monitoring a model on a continuous basis? What are some considerations when establishing useful thresholds for metrics to alert on? Once an alert has been triggered what is the process for resolving it? If the training process takes a long time, how can you mitigate the impact of a model failure until the new/updated version is deployed? What are the most interesting, innovative, or unexpected ways that you have seen Evidently used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Evidently? When is Evidently the wrong choice? What do you have planned for the future of Evidently? Keep In Touch LinkedIn @EmeliDral on Twitter emeli-dral on GitHub Picks Tobias The Suicide Squad Emeli Airflow Links Evidently AI Open Source Yandex Grafana The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Sep 3, 202150 min

Ep 329Making Automated Machine Learning More Accessible With EvalML

Full

Summary Building a machine learning model is a process that requires a lot of iteration and trial and error. For certain classes of problem a large portion of the searching and tuning can be automated. This allows data scientists to focus their time on more complex or valuable projects, as well as opening the door for non-specialists to experiment with machine learning. Frustrated with some of the awkward or difficult to use tools for AutoML, Angela Lin and Jeremy Shih helped to create the EvalML framework. In this episode they share the use cases for automated machine learning, how they have designed the EvalML project to be approachable, and how you can use it for building and training your own models. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Angela Lin, Jeremy Shih about EvalML, an AutoML library which builds, optimizes, and evaluates machine learning pipelines Interview Introductions How did you get introduced to Python? Can you describe what EvalML is and the story behind it? What do we mean by the term AutoML? What are the kinds of problems that are best suited to applications of automated ML? What does the landscape for AutoML tools look like? What was missing in the available offerings that motivated you and your team to create EvalML? Who is the target audience for EvalML? How is the EvalML project implemented? How has the project changed or evolved since you first began working on it? What is the workflow for building a model with EvalML? Can you describe the preprocessing steps that are necessary and the input formats that it is expecting? What are the supported algorithms/model architectures? How does EvalML explore the search space for an optimal model? What decision functions does it employ to determine an appropriate stopping point? What is involved in operationalizing an AutoML pipeline? What are some challenges or edge cases that you see users of EvalML run into? What are the most interesting, innovative, or unexpected ways that you have seen EvalML used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on EvalML? When is EvalML the wrong choice? When is auto ML the wrong approach? What do you have planned for the future of EvalML? Keep In Touch Angela angela97lin on GitHub LinkedIn Jeremy jeremyliweishih on GitHub LinkedIn Picks Tobias Gloryhammer Angela Sarma mediterranean restaurant Jeremy Crucial Conversations by Stephen Covey (affiliate link) Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links EvalML FeatureLabs Alteryx Scheme NetLogo Flask AutoML Woodwork FeatureTools Compose Random Forest XGBoost Prophet GreyKite Shap The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Aug 25, 202145 min

Ep 328Growing And Supporting The Data Science Community At Anaconda

Full

Summary Data scientists are tasked with answering challenging questions using data that is often messy and incomplete. Anaconda is on a mission to make the lives of data professionals more manageable through creation and maintenance of high quality libraries and frameworks, the distribution of an easy to use Python distribution and package ecosystem, and high quality training material. In this episode Kevin Goldsmith, CTO of Anaconda, discusses the technical and social challenges faced by data scientists, the ways that the Python ecosystem has evolved to help address those difficulties, and how Anaconda is engaging with the community to provide high quality tools and education for this constantly changing practice. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Kevin Goldsmith about Anaconda’s contributions to the Python ecosystem for data science Interview Introductions How did you get introduced to Python? Can you start by describing what Anaconda focuses on solving for? What was your path into the CTO position? From your perspective as the CTO of Anaconda, what are the biggest challenges facing data scientists today? What is the breakdown between technical and organizational sources for those difficulties? How is the Anaconda product suite architected to help address some of those problems? Where are you spending your focus to allow Anaconda to address the current and future needs of data scientists? Python has been a dominant force in the data and analytics ecosystem for several years now. What do you see as the future of the space? (e.g. monoglot vs. polyglot workflows) What are the most interesting, innovative, or unexpected ways that you have seen the Anaconda platform used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Anaconda and data science tooling? Keep In Touch LinkedIn @KevinGoldsmith on Twitter Website Picks Tobias Perdido Street Station The Scar Iron Council Kevin Lego Typewriter Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Anaconda Spotify Lisp Scheme C# Anaconda Nucleus PyData AnacondaCon Grid Computing PyTorch Podcast Episode Tensorflow Pyston Podcast Episode Dask Podcast Episode Numba Panel dashboard framework Datashader Jupyter R Julia AstroPy Podcast Episode Arrow Data Teams by Jesse Anderson Podcast Episode The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Aug 19, 202155 min

Ep 327Network Analysis At The Speed Of C With The Power Of Python Using NetworKit

Full

Summary Analysing networks is a growing area of research in academia and industry. In order to be able to answer questions about large or complex relationships it is necessary to have fast and efficient algorithms that can process the data quickly. In this episode Eugenio Angriman discusses his contributions to the NetworKit library to provide an accessible interface for these algorithms. He shares how he is using NetworKit for his own research, the challenges of working with large and complex networks, and the kinds of questions that can be answered with data that fits on your laptop. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Eugenio Angriman about NetworKit, an open-source toolkit for large-scale network analysis Interview Introductions How did you get introduced to Python? Can you describe what NetworKit is and the story behind it? A core focus of the project is for use with graphs containing millions to billions of nodes. What are some of the situations where you might encounter networks of that scale? There are a number of network analysis libraries in Python. How would you characterize NetworKit’s position in the ecosystem? What are the algorithmic challenges that graph structures pose when aiming for scalability and performance? How do you approach building efficient algorithms for complex network analysis? Can you describe how NetworKit is architected? What are the design principles that you focus on for the library? How have the design and goals of the project changed or evolved since you have been working on it? NetworKit’s code base has now a discrete size and several developers contributed to it. Are there any minimum quality requirements that new code needs to fulfill before it can be merged into NetworKit? How do you ensure that such requirements are met? What are some of the active areas of research for networked data analysis? How are you using NetworKit for your own work? What are kind of background knowledge in graph analysis is necessary for users of NetworKit? What are some of the underutilized or overlooked aspects of NetworKit that you think should be highlighted? What are the most interesting, innovative, or unexpected ways that you have seen NetworKit used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on NetworKit? When is NetworKit the wrong choice? What do you have planned for the future of NetworKit? Keep In Touch angriman on GitHub LinkedIn Picks Tobias Edgar Allen Poe NetworKit The Spinoza Problem Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links NetworKit Humboldt University Berlin graph-tool Podcast Episode NetworkX Adjacency List Cython Podcast Episode Node Embeddings Centrality Score NetworKit In The Cloud Gunrock Hornet The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Aug 15, 202137 min

Ep 326Delivering Deep Learning Powered Speech Recognition As A Service For Developers At AssemblyAI

Full

Summary Building a software-as-a-service (SaaS) business is a fairly well understood pattern at this point. When the core of the service is a set of machine learning products it introduces a whole new set of challenges. In this episode Dylan Fox shares his experience building Assembly AI as a reliable and affordable option for automatic speech recognition that caters to a developer audience. He discusses the machine learning development and deployment processes that his team relies on, the scalability and performance considerations that deep learning models introduce, and the user experience design that goes into building for a developer audience. This is a fascinating conversation about a unique cross-section of considerations and how Dylan and his team are building an impressive and useful service. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Dylan Fox about AssemblyAI, a powerful and easy to use speech recognition API designed for developers Interview Introductions How did you get introduced to Python? Can you describe what Assembly AI is and the story behind it? Speech recognition is a service that is being added to every cloud platform, video service, and podcast product. What do you see as the motivating factors for the current growth in this industry? How would you characterize your overall position in the market? What are the core goals that you are focused on with AssemblyAI? Can you describe the different ways that you are using Python across the company? How is the AssemblyAI platform architected? What are the complexities that you have to work around to maintain high uptime for an API powered by a deep learning model? What are the scaling challenges that crop up, whether on the training or serving? What are the axes for improvement for a speech recognition model? How do you balance tradeoffs of speed and accuracy as you iterate on the model? What is your process for managing the deep learning workflow? How do you manage CI/CD for your deep learning models? What are the open areas of research in speech recognition? What are the most interesting, innovative, or unexpected ways that you have seen AssemblyAI used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on AssemblyAI? When is AssemblyAI the wrong choice? What do you have planned for the future of AssemblyAI? Keep In Touch LinkedIn @YouveGotFox on Twitter Picks Tobias H.P. Lovecraft Dylan Project Hail Mary by Andy Weir Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links AssemblyAI Two Scoops of Django Nuance Dragon Natural Speaking PyTorch Podcast Episode Tensorflow FastAPI Flask Tornado Podcast Episode Neural Magic Podcast Episode The Martian The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Aug 4, 202152 min

Ep 325Taking Aim At The Legacy Of SQL With The Preql Relational Language

Full

Summary SQL has gone through many cycles of popularity and disfavor. Despite its longevity it is objectively challenging to work with in a collaborative and composable manner. In order to address these shortcomings and build a new interface for your database oriented workloads Erez Shinan created Preql. It is based on the same relational algebra that inspired SQL, but brings in more robust computer science principles to make it more manageable as you scale in complexity. In this episode he shares his motivation for creating the Preql project, how he has used Python to develop a new language for interacting with database engines, and the challenges of taking on the legacy of SQL as an individual. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Erez Shinan about Preql, an interpreted, relational programming language, that specializes in database queries Interview Introductions How did you get introduced to Python? Can you describe what Preql is and the story behind it? What are goals and target use cases for the project? There have been numerous projects that aim to make SQL more maintainable and composable. What is it about the language and syntax that makes it so challenging? How does Preql approach this problem that is different from other efforts? (e.g. ORMs, dbt-style Jinja, PyPika) How did you approach the design of the syntax to make it familiar to people who know SQL? Can you describe how Preql is implemented? How has the design and architecture changed or evolved since you began working on it? What is a typical workflow for someone using Preql to build a library of analytical queries? Beyond strict compilation to SQL, what are some of the other features that you have incorporated into Preql? How does a Preql program get executed against a target database, particularly when using capabilities that can’t be directly translated to SQL? ** What are the main difficulties / challenges of compiling to SQL ? What are some of the features or use cases that are not immediately obvious or prone to be overlooked that you think are worth mentioning? What are the most interesting, innovative, or unexpected ways that you have seen Preql used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Preql? When is Preql the wrong choice? What do you have planned for the future of Preql? Keep In Touch erezsh on GitHub erezsh on Twitter Picks Tobias Counterpart Erez Bansko, Bulgaria Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Preql Lark Postgres Data Engineering Podcast Episode MySQL Relational Algebra Pandas Podcast Episode ORM == Object Relational Mapper dbt Data Engineering Podcast Episode PyPika GraphQL Julia runtype Rich terminal UI library prompt-toolkit DuckDB Askgit BigQuery Snowflake The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Jul 28, 202136 min

Ep 324Unleash The Power Of Dataframes At Any Scale With Modin

Full

Summary When you start working on a data project there are always a variety of unknown factors that you have to explore. One of those is the volume of total data that you will eventually need to handle, and the speed and scale at which it will need to be processed. If you optimize for scale too early then it adds a high barrier to entry due to the complexities of distributed systems, but if you invest in a lot of engineering up front then it can be challenging to refactor for scale. Modin is a project that aims to remove that decision by letting you seamlessly replace your existing Pandas code and scale across CPU cores or across a cluster of machines. In this episode Devin Petersohn explains why he started working on solving this problem, how Modin is architected to allow for a smooth escalation from small to large volumes of data and compute, and how you can start using it today to accelerate your Pandas workflows. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Devin Petersohn about Modin, a Pandas compatible dataframe library for datasets from 1MB to 1TB+ Interview Introductions How did you get introduced to Python? Can you describe what Modin is and the story behind it? Why study dataframes? How do dataframes compare to databases? What can you do in a dataframe that you couldn’t in a database? What are your overall goals for the Modin project? Who are the target users of Modin and how does that influence your prioritization of features? What are some of the API inconsistencies that you have had to abstract and work around between Pandas, Ray, and Dask to give users a seamless experience? What are some of the considerations in terms of capabilities or user experience that will influence whether to use Ray or Dask as the execution engine? Can you describe how Modin is implemented? How has the constraint of replicating the Pandas API influenced your architectural choices? What are the most complex or challenging Pandas APIs to replicate in Modin? In addition to the core Pandas API you have also added experimental features such as SQL support and a spreadsheet interface. How have those capabilities affected the range of potential use cases and end users? What are some of the complexities that come from acting as a middleware between the Pandas API and the Ray and Dask frameworks? What are some of the initial ideas or assumptions that you had about the design or utility of Modin that have been challenged as you worked through building and releasing it? What are the most interesting, innovative, or unexpected ways that you have seen Modin used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Modin? When is Modin the wrong choice? What do you have planned for the future of Modin? Keep In Touch devin-petersohn on GitHub LinkedIn Picks Tobias xxh Devin Lux Podcast Episode Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Modin UC Berkeley RISELAB XArray Pandas Podcast Episode Dask Podcast Episode Ray Podcast Episode Spark The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Jul 22, 202138 min

Ep 323Exploring The SpeechBrain Toolkit For Speech Processing

Full

Summary With the rising availability of computation in everyday devices, there has been a corresponding increase in the appetite for voice as the primary interface. To accomodate this desire it is necessary for us to have high quality libraries for being able to process and generate audio data that can make sense of human speech. To facilitate research and industry applications for speech data Mirco Ravanelli and Peter Plantinga are building SpeechBrain. In this episode they explain how it works under the hood, the projects that they are using it for, and how you can get started with it today. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Mirco Ravanelli and Peter Plantinga about SpeechBrain, an open-source and all-in-one speech toolkit powered by PyTorch Interview Introductions How did you get introduced to Python? Can you describe what SpeechBrain is and the story behind it? What are the goals and target use cases of the SpeechBrain project? What are some of the ways that processing audio with a focus on speech differs from more general audio processing? What are some of the other libraries/frameworks/services that are available to work with speech data and what are the unique capabilities that SpeechBrain offers? How is SpeechBrain implemented? What was your decision process for determining which framework to build on top of? What are some of the original ideas and assumptions that you had for SpeechBrain which have been changed or invalidated as you worked through implementing it? Can you talk through the workflow of using SpeechBrain? What would be involved in developing a system to automate transcription with speaker recognition and diarization? In the documentation it mentions that SpeechBrain is built to be used for research purposes. What are some of the kinds of research that it is being used for? What are some of the features or capabilities of SpeechBrain which might be non-obvious that you would like to highlight? What are the most interesting, innovative, or unexpected ways that you have seen SpeechBrain used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on SpeechBrain? When is SpeechBrain the wrong choice? What do you have planned for the future of SpeechBrain? Keep In Touch Mirco mravanelli on GitHub LinkedIn @mirco_ravanelli on Twitter Peter pplantinga on GitHub @ComPeterScience on Twitter Website LinkedIn Picks Tobias x.ai Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links SpeechBrain Mila Speech Processing Speech Enhancement NumPy SciPy Theano PyTorch Podcast Episode Speech Recognition NeMo ESPNet Sequence to Sequence (Seq2Seq) HyperParameters TorchAudio PyTorch Lightning Keras HuggingFace Generative Adversarial Network Snorkel Data Engineering Podcast Episode The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Jul 14, 202137 min

Ep 322Fast And Educational Exploration And Analysis Of Graph Data Structures With graph-tool

Full

Summary If you are interested in a library for working with graph structures that will also help you learn more about the research and theory behind the algorithms then look no further than graph-tool. In this episode Tiago Peixoto shares his work on graph algorithms and networked data and how he has built graph-tool to help in that research. He explains how it is implemented, how it evolved from a simple command line tool to a full-fledged library, and the benefits that he has found from building a personal project in the open. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Tiago Peixoto about graph-tool, an efficient Python module for manipulation and statistical analysis of graphs Interview Introductions How did you get introduced to Python? Can you describe what graph-tool is and the story behind it? What are some scenarious where someone might encounter a graph oriented data set? In what ways are those graphs typically represented? In your experience, what is the overlap of people who are working with networked data, and the use of graph-native databases? (e.g. Neo4J, DGraph, etc.) What kinds of analysis or manipulation might someone need to perform on a graph structure? There are a few different tools in Python for working with networked data. How would you characterize the current ecosystem and why someone might choose graph-tool? Can you describe how graph-tool is implemented? How have the goals and design of the package changed or evolved since you first began working on it? Who are your target users and what are the guiding principles that you use to inform the API design for the package? How much knowledge of graph theory or algorithms are required to make effective use of graph-tool? Can you talk through an example workflow of using graph-tool to load, process, and analyze a graph? What are some of the overlooked or underutilized aspects of graph-tool that you think more people should know about? What are some systems/applications that you have seen which would be simplified by adopting a graph model for their data? What is your impression of the overall awareness of the benefits of graphs for simplifying aspects of data processing and analysis? What are some cases where a graph structure adds unnecessary complexity? What are the most interesting, innovative, or unexpected ways that you have seen graph-tool used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on graph-tool? When is graph-tool the wrong choice? What do you have planned for the future of graph-tool? Keep In Touch Website graph-tool Picks Tobias 97 Things Every Data Engineer Should Know Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Central European University NetworkX GML GraphML Neo4J DGraph Data Engineering Podcast Episode NetworKit igraph Matplotlib C++ Templates Boost Graph Library OpenMP Maximum Matching The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Jul 7, 202142 min

Ep 321Lightening The Load For Deep Learning With Sparse Networks Using Neural Magic

Full

Summary Deep learning has largely taken over the research and applications of artificial intelligence, with some truly impressive results. The challenge that it presents is that for reasonable speed and performance it requires specialized hardware, generally in the form of a dedicated GPU (Graphics Processing Unit). This raises the cost of the infrastructure, adds deployment complexity, and drastically increases the energy requirements for training and serving of models. To address these challenges Nir Shavit combined his experiences in multi-core computing and brain science to co-found Neural Magic where he is leading the efforts to build a set of tools that prune dense neural networks to allow them to execute on commodity CPU hardware. In this episode he explains how sparsification of deep learning models works, the potential that it unlocks for making machine learning and specialized AI more accessible, and how you can start using it today. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Nir Shavit about Neural Magic and the benefits of using sparsification techniques for deep learning models Interview Introductions How did you get introduced to Python? Can you describe what Neural Magic is and the story behind it? What are the attributes of deep learning architectures that influence the bias toward GPU hardware for training them? What are the mathematical aspects of neural networks that have biased the current generation of software tools toward that architectural style? How does sparsifying a network architecture allow for improved performance on commodity CPU architectures? What is involved in converting a dense neural network into a sparse network? Can you describe the components of the Neural Magic architecture and how they are used together to reduce the footprint of deep learning architectures and accelerate their performance on CPUs? What are some of the goals or design approaches that have changed or evolved since you first began working on the Neural Magic platform? For someone who has an existing model defined, what is the process to convert it to run with the DeepSparse engine? What are some of the options for applications of deep learning that are unlocked by enabling the models to train and run without GPU or other specialized hardware? The current set of components for Neural Magic is either open source or free to use. What is your long-term business model, and how are you approaching governance of the open source projects? What are the most interesting, innovative, or unexpected ways that you have seen Neural Magic and model sparsification used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Neural Magic? When is Neural Magic or sparse networks the wrong choice? What do you have planned for the future of Neural Magic? Keep In Touch Research Overview LinkedIn Picks Tobias The Tick TV show Nir Bauhaus documentary Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Neural Magic MIT Computational Neurobiology 6.006 MIT Course FLOPS == FLoating point OPerations per Second Perceptron Convolutional Neural Network Lisp Quantization of ML YOLO ML Model Federated Learning Podcast Episode Reinforcement Learning GPT-3 OpenAI Transfer Learning Podcast Episode about Transfer Learning for NLP Tensor Columns Neural Magic DeepSparse Engine ONNX CUDA Sparse Zoo Tab9 The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Jun 30, 202148 min

Ep 320Finding The Core Of Python For A Bright Future With Brett Cannon

Full

Summary Brett Cannon has been a long-time contributor to the Python language and community in many ways. In this episode he shares some of his work and thoughts on modernizing the ecosystem around the language. This includes standards for packaging, discovering the true core of the language, and how to make it possible to target mobile and web platforms. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at pythonpodcast.com/hightouch. Your host as usual is Tobias Macey and today I’m interviewing Brett Cannon about improvements in the packaging ecosystem, the promise of WebAssembly, and his recent explorations of CPython’s interpreter Interview Introductions How did you get introduced to Python? As a core contributor to CPython, a member of the steering Council, and the team lead for VSCode’s Python extension, what are your current areas of focus for the language? One of the PEPs that you were involved with recently introduced the pyproject.toml file for simplifying the work of building Python packages. Can you share some of the background behind that work and the goals that you had for it? Since its introduction a lot of people have co-opted that file for other project configuration. What was your reaction to that, and if you had foreseen that usage what might you have changed or added in the PEP to account for it? What are the long term impacts on the packaging ecosystem that you anticipate with the standardization efforts that are happening? Another area where there is a lot of attention right now is being able to target additional deployment environments such as the browser, with web assembly, and mobile devices, with projects like BriefCase and Kivy. You had a recent post where you posed some questions about the true nature of Python and the possibility of removing pieces of it to simplify building for these other runtimes. What is your personal sense of the minimal set of features that we need for something to still be Python? How have projects such as MicroPython and PyOdide influenced your thinking on the matter? You have also recently been writing a series of articles about the implementation details of different syntactic elements of Python. What was your inspiration for that? What are some of the interesting or surprising details that you encountered while unwrapping the way that the interpreter handles those syntactic elements? How have those explorations helped you in your efforts to identify the core of Python? Recent releases of Python have brought in some substantial changes to the interpreter and new language features (e.g. PEG parser, pattern matching). What are some of the other large initiatives that you are keeping track of? What are your personal goals for the near to medium term future of Python? What are the most interesting, unexpected, or challenging lessons that you have learned while working on the Python language and related tooling? If you were to redesign Python today, what are some of the things that you would do differently? Keep In Touch brettcannon on GitHub @brettsky on Twitter Blog Picks Tobias Cold Brew Iced Tea Loki on Disney+ Brett Rich Textual The physics facts included in all of the Python 3.10 release announcements, e.g. you will never see a green star Links Brett’s Blog Python VSCode Extension Python Steering Council Python Package Authority UC Berkeley Vancouver, BC Squamish, Musquiam, Tsleil-waututh First Nations Pascal Python C O’Reilly PyCon US 2021 Steering Council Keynote Python Developer-In-Residence PSF Visionary Sponsorship Setuptools Pip Python Wheels PyPI PEP 518 PEP 517 PEP 621 pyproject.toml Flit Enscons PyPA Build PyOxidizer Pex Shiv cx_Freeze cibuildwh

Jun 23, 20211h 3m

Ep 319Traversing The Challenges And Promise Of Graph Machine Learning

Full

Summary The foundation of every ML model is the data that it is trained on. In many cases you will be working with tabular or unstructured information, but there is a growing trend toward networked, or graph data sets. Benedek Rozemberczki has focused his research and career around graph machine learning applications. In this episode he discusses the common sources of networked data, the challenges of working with graph data in machine learning projects, and describes the libraries that he has created to help him in his work. If you are dealing with connected data then this interview will provide a wealth of context and resources to improve your projects. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at pythonpodcast.com/hightouch. Your host as usual is Tobias Macey and today I’m interviewing Benedek Rozemberczki about his work on machine learning for graph data, including a variety of libraries to support his efforts Interview Introductions How did you get introduced to Python? Can you start by giving an overview of when you might want to do machine learning on networked/graph data? How do networked data sets change the way that you approach machine learning tasks? Can you describe the current state of the ecosystem for machine learning on graphs? You have created a number of libraries to address different aspects of machine learning on graphs. Can you list them and share some of the stories behind their creation? How do the different tools relate to each other? Can you talk through some of the structural and user experience design principles that you lean on when building these libraries? When you are working with networked data sets, what is your current workflow from idea to completion? What are the most difficult aspects of working with networked data sets for machine learning applications? What are the most interesting, innovative, or unexpected ways that you have seen graph ML used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on graph ML problems? What are some examples of when you would choose not to use some or all of your own libraries? What do you have planned for the future of your libraries/what new libraries do you anticipate needing to build? Keep In Touch benedekrozemberczki on GitHub @benrozemberczki on Twitter LinkedIn Picks Tobias Wrath of Man Benedek Hunt for the Wilderpeople Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Karate Club PyTorch Geometric Temporal AstraZeneca Budapest University of Edinburgh Matlab R Bipartite Graph Node Classification Graph Classification PyTorch Podcast Episode PyTorch Geometric DGL (Deep Graph Library) Parametric Machine Learning graph-tool Jax NetworkX Little Ball of Fur GCN == Graph Convolutional Network NetworKit Gensim Podcast Episode Nvidia cuGraph Random Walk scikit-learn MalNet Graph Representation Learning by William Hamilton The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Jun 16, 202147 min

Ep 318Keep Your Analytics Lint Free With SQLFluff

Full

Summary The growth of analytics has accelerated the use of SQL as a first class language. It has also grown the amount of collaboration involved in writing and maintaining SQL queries. With collaboration comes the inevitable variation in how queries are written, both structurally and stylistically which can lead to a significant amount of wasted time and energy during code review and employee onboarding. Alan Cruickshank was feeling the pain of this wasted effort first-hand which led him down the path of creating SQLFluff as a linter and formatter to enforce consistency and find bugs in the SQL code that he and his team were working with. In this episode he shares the story of how SQLFluff evolved from a simple hackathon project to an open source linter that is used across a range of companies and fosters a growing community of users and contributors. He explains how it has grown to support multiple dialects of SQL, as well as integrating with projects like DBT to handle templated queries. This is a great conversation about the long detours that are sometimes necessary to reach your original destination and the powerful impact that good tooling can have on team productivity. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial. Your host as usual is Tobias Macey and today I’m interviewing Alan Cruickshank about SQLFluff, a dialect-flexible and configurable SQL linter Interview Introductions How did you get introduced to Python? Can you describe what SQLFluff is and the story behind it? SQL is one of the oldest programming languages that is still in regular use. Why do you think that there are so few linters for it? Who are the target users of SQLFluff and how do those personas influence the design and user experience of the project? What are some of the characteristics of SQL and how it is used that contribute to readability/comprehension challenges? What are some of the additional difficulties that are introduced by templating in the queries? How is SQLFluff implemented? How have the goals and design of the project changed since you first began working on it? How do you handle support of varying SQL dialects without undue maintenance burdens? What are some of the stylistic elements and strategies for making SQL code more maintainable? What are some strategies for making queries self-documenting? What are some signs that you should document it anyway? What are some of the kinds of bugs that you are able to identify with SQLFluff? What are some of the resources/references that you relied on for identifying useful linting rules? What are some methods for measuring code quality in SQL? What are the most interesting, innovative, or unexpected ways that you have seen SQLFluff used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on SQLFluff? When is SQLFluff the wrong choice? What do you have planned for the future of SQLFluff? Keep In Touch alanmcruickshank on GitHub Website LinkedIn Picks Tobias The Nevers Alan Lost Connections: Uncovering the Real Causes of Depression – and the Unexpected Solutions by Johann Hari (affiliate link) The Wim Hof Method by Wim Hof Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review

Jun 9, 20211h 13m

Ep 317Exploring The Patterns And Practices For Deep Learning With Andrew Ferlitsch

Full

Summary Deep learning is gaining an immense amount of popularity due to the incredible results that it is able to offer with comparatively little effort. Because of this there are a number of engineers who are trying their hand at building machine learning models with the wealth of frameworks that are available. Andrew Ferlitsch wrote a book to capture the useful patterns and best practices for building models with deep learning to make it more approachable for newcomers ot the field. In this episode he shares his deep expertise and extensive experience in building and teaching machine learning across many companies and industries. This is an entertaining and educational conversation about how to build maintainable models across a variety of applications. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial. Scaling your data infrastructure is hard. Maintaining data quality standards as you scale is harder. Databand solves this. Their Unified Data Observability platform gives data engineers visibility over their stack without changing existing pipeline code. Get end-to-end visibility on your pipelines, and identify the root cause of issues before bad data is delivered. Seamlessly integrate with over 20 tools like Apache Airflow, Spark, Snowflake, and more. Use customizable dashboards to see where pipelines are broken and how that impacts delivery downstream. Get alerts on leading indicators of pipeline failure. Open up your pipeline and see exactly which code strings are broken – so you can fix the issue immediately. Create more reliable data products. Go to pythonpodcast.com/databand today to start your free trial! Your host as usual is Tobias Macey and today I’m interviewing Andrew Ferlitsch about the patterns and practices for deep learning applications Interview Introductions How did you get introduced to Python? Can you start by describing the major elements of a model architecture? What is the relationship between the specific learning task being addressed and the architecture of the learning network? In your experience, what is the level of awareness of a typical ML engineer or data scientist with respect to the most current design patterns in deep learning? Your currently working on a book about deep learning patterns and practices. What was your motivation for starting that project? What are your goals for the book? How have advancements in the operability of machine learning influenced the ways that the models are designed and trained? How do recent approaches such as transfer learning impact the needs of the supporting tools and infrastructure? Can you describe the different design patterns that you cover in your book and the selection process for when and how to apply them? What are the aspects of bringing deep learning to production that continue to be a challenge? What are some of the emerging practices that you are optimistic about? What are some of the industry trends or areas of current research that you are most excited about? What are the most interesting, innovative, or unexpected patterns that you have encountered? What are the most interesting, unexpected, or challenging lessons that you have learned while working on the book? What are some of the other resources that you recommend for listeners to learn more about how to build production ready models? Keep In Touch LinkedIn @AndrewFerlitsch on Twitter andrewferlitsch on GitHub Picks Tobias Designing Data Intensive Applications (affiliate link) Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest o

Jun 2, 202144 min

Ep 316Automatically Generate Your Unit Tests From Scratch With Pynguin

Full

Summary Unit tests are an important tool to ensure the proper functioning of your application, but writing them can be a chore. Stephan Lukasczyk wants to reduce the monotony of the process for Python developers. As part of his PhD research he created the Pynguin project to automate the creation of unit tests. In this episode he explains the complexity involved in generating useful tests for a dynamic language, how he has designed Pynguin to address the challenges, and how you can start using it today for your own work. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial. Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at pythonpodcast.com/hightouch. Your host as usual is Tobias Macey and today I’m interviewing Stephan Lukasczyk about Pynguin, the PYthoN General UnIt test geNerator Interview Introductions How did you get introduced to Python? Can you describe what Pynguin is and the story behind it? What are the problems that Pynguin is designed to solve? What other projects are you drawing inspiration from? What are some of the use cases for automatic test generation? How is Pynguin implemented? What are the challenges that the dynamic nature of Python introduces? What are some of the packages and libraries that have been most helpful while building Pynguin? Can you talk through the workflow of using Pynguin to generate tests for a project? What are some of the limitations on what kinds of projects Pynguin can be used for? What are some design or implementation strategies in the code that you are generating tests for that will help make Pynguin’s job easier? Once a test suite has been created, what are the next steps? What are some of the initial assumptions or goals of the project that have been revised or challenged once you began implementing it? What are the most interesting, innovative, or unexpected ways that you have seen Pynguin used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pynguin? When is Pynguin the wrong choice? What do you have planned for the future of Pynguin? Keep In Touch Related to Pynguin: best via GitHub Find me on Twitter Picks Tobias Concourse CI Stephan Cycling Take care of your health Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Pynguin University of Passau Passau, Germany Evosuite Hypothesis Podcast Episode Astor Walrus Operator MyPy Podcast Episode Pytest Podcast Episode UnitTest Bytecode library Pytype Monkeytype Podcast Episode Atheris from Google – coverage-guided fuzzing Blog series about “Python behind the scenes”:

May 25, 202157 min

Ep 315Leveling Up Natural Language Processing with Transfer Learning

Full

Summary Natural language processing is a powerful tool for extracting insights from large volumes of text. With the growth of the internet and social platforms, and the increasing number of people and communities conducting their professional and personal activities online, the opportunities for NLP to create amazing insights and experiences are endless. In order to work with such a large and growing corpus it has become necessary to move beyond purely statistical methods and embrace the capabilities of deep learning, and transfer learning in particular. In this episode Paul Azunre shares his journey into the application and implementation of transfer learning for natural language processing. This is a fascinating look at the possibilities of emerging machine learning techniques for transforming the ways that we interact with technology. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial. Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at pythonpodcast.com/hightouch. Your host as usual is Tobias Macey and today I’m interviewing Paul Azunre about using transfer learning for natural language processing Interview Introductions How did you get introduced to Python? Can you start by explaining what transfer learning is? How is transfer learning being applied to natural language processing? What motivated you to write a book about the application of transfer learning to NLP? What are some of the applications of NLP that are impractical on intractable without transfer learning? At a high level, what are the steps for building a new language model via transfer learning? There have been a number of base models created recently, such as BERT and ERNIE, ELMo, GPT-3, etc. What are the factors that need to be considered when selecting which model to build from? If there are multiple models that contain the seeds for different aspects of the end goal that you are trying to obtain, what is the feasibility of extracting the relevant capabilities from each of them and combining them in the final model? What are some of the tools or frameworks that you have found most useful while working with NLP and transfer learning? How would you characterize the current state of the ecosystem for transfer learning and deep learning techniques applied to NLP problems? What are the most interesting, innovative, or unexpected applications of transfer learning with NLP that you have seen? What are the most interesting, unexpected, or challenging lessons that you have learned while working on the book? When is transfer learning the wrong choice for an NLP project? What are the trends or techniques that you are most excited for? Keep In Touch LinkedIn Website @pazunre on Twitter Picks Tobias Infected Mushroom Paul Tenet Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcasti

May 18, 202146 min

Ep 314Federated Learning For All With Flower

Full

Summary Machine learning is a tool that has typically been performed on large volumes of data in one place. As more computing happens at the edge on mobile and low power devices, the learning is being federated which brings a new set of challenges. Daniel Beutel co-created the Flower framework to make federated learning more manageable. In this episode he shares his motivations for starting the project, how you can use it for your own work, and the unique challenges and benefits that this emerging model offers. This is a great exploration of the federated learning space and a framework that makes it more approachable. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial. Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at pythonpodcast.com/hightouch. Your host as usual is Tobias Macey and today I’m interviewing Daniel Beutel about Flower, a framework for building federated learning systems Interview Introductions How did you get introduced to Python? Can you start by describing what federated learning is? What is Flower and what’s the story behind it? What are the trade-offs between federated and centralized models of machine learning? What are some of the types of use cases or workloads that federated learning is used for? Federated learning appears to be a growing area of interest. How would you characterize the current state of the ecosystem? What are the most complex or challenging aspects of federating model training? How does Flower simplify the process of distributing the model training process? Can you describe how Flower is implemented? How have the goals and/or design of Flower changed or evolved since you first began working on it? One of the design principles that you list is "understandability". What are some of the ways that that manifests in the project? It also mentions extensibility. What are the interfaces that Flower exposes for integration or extending its capabilities? For someone who has an existing project that runs in a centralized manner, what are some indicators that a federated approach would be beneficial? What is involved in translating the existing project to run in a federated fashion using Flower? What is involved in building a production ready system with Flower? How does your work at Adap inform the design and product direction for Flower? What are some of the most interesting, innovative, or unexpected ways that you have seen Flower used? What are the most interesting, unexpected, or challenging lessons that you have learned from your work on and with Flower? When is Flower the wrong choice? What do you have planned for the future of the project? Keep In Touch LinkedIn danieljanes on GitHub @daniel_janes on Twitter Picks Tobias Rummy Card Game Daniel Stand Up Paddling Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us

May 11, 20211h 1m

Ep 313Data Exploration and Visualization Made Effortless with Lux

Full

Summary Data exploration is an important step in any analysis or machine learning project. Visualizing the data that you are working with makes that exploration faster and more effective, but having to remember and write all of the code to build a scatter plot or histogram is tedious and time consuming. In order to eliminate that friction Doris Lee helped create the Lux project, which wraps your Pandas data frame and automatically generates a set of visualizations without you having to lift a finger. In this episode she explains how Lux works under the hood, what inspired her to create it in the first place, and how it can help you create a better end result. The Lux project is a valuable addition to the toolbox of anyone who is doing data wrangling with Pandas. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial. Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at pythonpodcast.com/hightouch. Your host as usual is Tobias Macey and today I’m interviewing Doris Lee about Lux, a Python library that facilitates fast and easy data exploration by automating the visualization and data analysis process Interview Introductions How did you get introduced to Python? Can you start by describing what Lux is and how the project got started? What is the role of visualization in a data science workflow? What are the challenges that data scientists face in the exploratory phase of their analysis? There are a wide variety of data visualization tools in the Python ecosystem with differing areas of focus. What is the role of Lux in that ecosystem? How does Lux compare to tools such as scikit-yb? What is the workflow for someone using Lux in their analysis and what problems does it solve for them? Can you talk through how Lux is architected? How have the goals and design of Lux changed or evolved since you first began working on it? Data visualization is a broad field. How do you determine which kinds of charts or plots are best suited to a particular data set or exploration? What are some of the capabilities of Lux that are often overlooked or underutilized? How has Lux impacted your own work in data analysis/data science? What are some of the other gaps that you see in the available tooling for data science? What are some of the most interesting, innovative, or unexpected ways that you have seen Lux used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on and with Lux? When is Lux the wrong choice? What do you have planned for the future of the project? Keep In Touch dorisjlee on GitHub Website LinkedIn Picks Tobias Pirates of the Carribean movies Doris Snake Wrangling for Kids Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a

May 4, 202151 min

Ep 312Extensible Open Source Authorization For Your Applications With Oso

Full

Summary Any project that is used by more than one person will eventually need to handle permissions for each of those users. It is certainly possible to write that logic yourself, but you’ll almost certainly do it wrong at least once. Rather than waste your time fighting with bugs in your authorization code it makes sense to use a well-maintained library that has already made and fixed all of the mistakes so that you don’t have to. In this episode Sam Scott shares the Oso framework to give you a clean separation between your authorization policies and your application code. He explains how you can call a simple function to ask if something is allowed, and then manage the complex rules that match your particular needs as a separate concern. He describes the motivation for building a domain specific language based on logic programming for policy definitions, how it integrates with the host language (such as Python), and how you can start using it in your own applications today. This is a must listen even if you never use the project because it is a great exploration of all of the incidental complexity that is involved in permissions management. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial. Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at pythonpodcast.com/hightouch. Your host as usual is Tobias Macey and today I’m interviewing Sam Scott about Oso, an open source library for managing authorization in your applications Interview Introductions How did you get introduced to Python? Can you start by describing what Oso is and the story behind it? What was missing from the ecosystem of authorization libraries/frameworks that motivated you to create a new one? What are some of the most common mistakes that you see developers make when implementing authorization logic? At a high level, what is the process of using Oso to add access control policies to a piece of software? What is the motivation for using a DSL for defining policies as opposed to writing those definitions in pure Python? How have you approached the design of the policy language, particularly deciding what constraints to impose? What other policy frameworks or dialects have you drawn inspiration from? How is the Oso framework implemented? How have the goals and design of Oso changed or evolved since you first began working on it? What are some useful design patterns for integrating Oso into an application? How does the type of application (e.g. web app vs. system daemon, etc.) affect the ways that Oso is used? Given that Oso supports multiple language runtimes, what is involved in defining and enforcing policies that span multiple processes? (e.g. Python backend and Javascript frontend, Python microservice communicating with Go microservice, etc.) What are some of the common mistakes or areas of confusion for users who are getting started with Oso and Polar? What are some of the capabilities of Oso that are often overlooked or misunderstood? I noticed that you’re backed by some venture firms. What is your current product vision and how does that relate to your current open

Apr 27, 202151 min

Ep 311Teaching Geeks The Value And Skills Of Public Speaking

Full

Summary Being able to present your ideas is one of the most valuable and powerful skills to have as a professional, regardless of your industry. For software engineers it is especially important to be able to communicate clearly and effectively because of the detail-oriented nature of the work. Unfortunately, many people who work in software are more comfortable in front of the keyboard than a crowd. In this episode Neil Thompson shares his story of being an accidental public speaker and how he is helping other engineers start down the road of being effective presenters. He discusses the benefits for your career, how to build the skills, and how to find opportunities to practice them. Even if you never want to speak at a conference, it’s still worth your while to listen to Neil’s advice and find ways to level up your presentation and speaking skills. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial. Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at pythonpodcast.com/hightouch. Your host as usual is Tobias Macey and today I’m interviewing Neil Thompson about the value of public speaking skills as a developer and how to gain them Interview Introductions How did you get into engineering? Can you start by discussing the different types of public speaking that we are talking about and some of the different venues where it might take place? How did you get into public speaking? What are some of the ways that our speaking abilities can impact the value that we provide and the trajectory of our career as engineers? What were some of the methods and resources that you used to improve your own public speaking skills? What are the common mistakes that people make when speaking to a group? What are some of the non-obvious ways that speaking skills can be useful as an engineer? What was your approach to learning how to be an effective speaker? What are some of the mis-steps or dead ends that you encountered? What are the different skills or capabilities that are necessary for being an effective presenter? What are some ways that engineers can practice their presentation skills? How do different audiences/venues influence the approach that you take to how to prepare for a presentation? How has your experience in public speaking factored into the work you do for your podcast? What are some of the most interesting, innovative, or unexpected presentations or speaking techniques that you have seen or used/created? What are the most interesting, unexpected, or challenging lessons that you have learned from speaking and teaching others to speak in a professional context? What resources do you recommend for engineers who want to improve their speaking and presenting skills? Keep In Touch LinkedIn @neil_i_thompson on Twitter Picks Tobias Falcon and the Winter Soldier Neil Teach The Geek To Speak Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you&#

Apr 20, 202142 min

Ep 310Let The Robots Do The Work Using Robotic Process Automation with Robocorp

Full

Summary One of the great promises of computers is that they will make our work faster and easier, so why do we all spend so much time manually copying data from websites, or entering information into web forms, or any of the other tedious tasks that take up our time? As developers our first inclination is to "just write a script" to automate things, but how do you share that with your non-technical co-workers? In this episode Antti Karjalainen, CEO and co-founder of Robocorp, explains how Robotic Process Automation (RPA) can help us all cut down on time-wasting tasks and let the computers do what they’re supposed to. He shares how he got involved in the RPA industry, his work with Robot Framework and RPA framework, how to build and distribute bots, and how to decide if a task is worth automating. If you’re sick of spending your time on mind-numbing copy and paste then give this episode a listen and then let the robots do the work for you. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial. Software is read more than it is written, so complex and poorly organized logic slows down everyone who has to work with it. Sourcery makes those problems a thing of the past, giving you automatic refactoring recommendations in your IDE or text editor while you write (I even have it working in Emacs). It isn’t just another linting tool that nags you about issues. It’s like pair programming with a senior engineer, finding and applying structural improvements to your functions so that you can write cleaner code faster. Best of all, listeners of Podcast.__init__ get 6 months of their Pro tier for free if you go to pythonpodcast.com/sourcery today and use the promo code INIT when you sign up. Your host as usual is Tobias Macey and today I’m interviewing Antti Karjalainen about the RPA Framework for automating your daily tasks and his work at Robocorp to manage your robots in production Interview Introductions How did you get introduced to Python? Can you start by giving an overview of what Robotic Process Automation is? What are some of the ways that RPA might be used? What are the advantages over writing a custom library or script in Python to automate a given task? How does the functionality of RPA compare to automation services like Zapier, IFTTT, etc.? What are you building at Robocorp and what was your motivation for starting the business? Who is your target customer and how does that inform the products that you are building? Can you give an overview of the state of the ecosystem for RPA tools and products and how Robocorp and RPA framework fit within it? How does the RPA Framework relate to Robot Framework? What are some of the challenges that developers and end users often run into when trying to build, use, or implement an RPA system? How is the RPA framework itself implemented? How has the design of the project evolved since you first began working on it? Can you talk through an example workflow for building a robot? Once you have built a robot, what are some of the considerations for local execution or deploying it to a production environment? How can you chain together multiple robots? What is involved in extending the set of operations available in the framework? What are the available integration points for plugging a robot written with RPA Framework into another Python project? What are the dividing lines between RPA Framework and Robocorp? How are you handling the governance of the open source project? What are some of the most interesting, innovative, or unexpected ways that you have seen RPA Framework and the Robocorp platfor

Apr 13, 202145 min

Ep 309Keep Your Code Clean And Maintainable Using Static Analysis With Flake8

Full

Summary When you are writing code it is all to easy to introduce subtle bugs or leave behind unused code. Unused variables, unused imports, overly complex logic, etc. If you are careful and diligent you can find these problems yourself, but isn’t that what computers are supposed to help you with? Thankfully Python has a wealth of tools that will work with you to keep your code clean and maintainable. In this episode Anthony Sottile explores Flake8, one of the most popular options for identifying those problematic lines of code. He shares how he became involved in the project and took over as maintainer and explains the different categories of code quality tooling and how Flake8 compares to other static analyzers. He also discusses the ecosystem of plugins that have grown up around it, including some detailed examples of how you can write your own (and why you might want to). Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial. Your host as usual is Tobias Macey and today I’m interviewing Anthony Sottile about Flake8 Interview Introductions How did you get introduced to Python? Can you start by giving an overview of what Flake8 is and how you got involved with the project? There are a variety of tools available for checking or enforcing code quality. How would you characterize Flake8 in comparison to the other options? What do you see as the motivating factors for individuals or teams to integrate static analysis/linting in their toolchain and workflow? What are some of the challenges that might prevent someone from adopting something like Flake8? How can developers add Flake8 to an existing project without spending hours or days fixing all of the violations? Can you describe the overall design and implementation of Flake8? How has the design and goals of the project changed or evolved? There are a wide array of plugins for Flake8. What is involved in adding new functionality or linting rules? What capabilities does Flake8 provide that make it a viable platform for building plugins? What are some of the limitations of Flake8 as a platform? What do you see as the factors that have contributed to the widespread usage of Flake8 and the large number of available plugins? What challenges does that pose as a maintainer of Flake8? What are some of the other tools that you see developers use alongside Flake8 to help manage code quality and style enforcement? What are some of the most interesting, innovative, or unexpected ways that you have seen Flake8 and its plugin ecosystem used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Flake8? When is Flake8 the wrong choice? What do you have planned for the future of Flake8? Keep In Touch @codewithanthony on Twitter asottile on GitHub LinkedIn Picks Tobias SEVENEVES by Neal Stephenson Anthony pre-commit CI Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Flake8 PyFlakes PyCodestyle McCabe pre-commit Podcast Episode PEP 484 MyPy Pylance Pyright Pylint Black yapf autopep8 pyupgrade isort reorder-python-imports Static Analysis pydocst

Apr 6, 202149 min

Ep 308Make Your Code More Readable With The Magic Of Refactoring Using Sourcery

Full

Summary Writing code that is easy to read and understand will have a lasting impact on you and your teammates over the life of a project. Sometimes it can be difficult to identify opportunities for simplifying a block of code, especially if you are early in your journey as a developer. If you work with senior engineers they can help by pointing out ways to refactor your code to be more readable, but they aren’t always available. Brendan Maginnis and Nick Thapen created Sourcery to act as a full time pair programmer sitting in your editor of choice, offering suggestions and automatically refactoring your Python code. In this episode they share their journey of building a tool to automatically find opportunities for refactoring in your code, including how it works under the hood, the types of refactoring that it supports currently, and how you can start using it in your own work today. It always pays to keep your tool box organized and your tools sharp and Sourcery is definitely worth adding to your repertoire. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial. Your host as usual is Tobias Macey and today I’m interviewing Nick Thapen and Brendan Maginnis about Sourcery, an advanced refactoring engine that cleans up your code as you work Interview Introductions How did you get introduced to Python? Can you start by giving an overview of what Sourcery is? What was your motivation for building a system for performing automated refactoring? What are your goals and priorities with Sourcery? There are a number of services that aim to automate portions of the developer workflow, such as code completions, quality checks, refactoring, etc. What was lacking in the existing tooling that made Sourcery a necessary project? How does Sourcery compare with some of the other services that offer AI or ML powered assistance? (e.g. Kite, Tab9, Codata(?)) What was your reasoning for focusing solely on Python for your refactoring, rather than trying to support multiple language targets? Can you give some examples of the types of refactoring that you are able to automate? Can you describe how Sourcery is implemented? What are some of the ways that the system has changed or evolved in design and/or scope? What are some examples of the types of refactorings that Sourcery is ill-suited for and which still require manual intervention? What is involved in adding support for a new editor? How much variation is there in terms of implementation or available functionality across editors? How has the introduction of the Language Server Protocol influenced your approach to editor integration? What are some of the most interesting, unexpected, or challenging lessons that you have learned while working on Sourcery? When is Sourcery the wrong choice? What do you have planned for the future of Sourcery Keep In Touch Nick LinkedIn @nthapen on Twitter Brendan LinkedIn @brendan_m6s on Twitter brendanator on GitHub Picks Tobias The Croods: New Age Nick The Magicians TV Series Brendan David Copperfield by Charles Dickens Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Li

Mar 30, 20211h 0m

Ep 307Be Data Driven At Any Scale With Superset

Full

Summary Becoming data driven is the stated goal of a large and growing number of organizations. In order to achieve that mission they need a reliable and scalable method of accessing and analyzing the data that they have. While business intelligence solutions have been around for ages, they don’t all work well with the systems that we rely on today and a majority of them are not open source. Superset is a Python powered platform for exploring your data and building rich interactive dashboards that gets the information that your organization needs in front of the people that need it. In this episode Maxime Beauchemin, the creator of Superset, shares how the project got started and why it has become such a widely used and popular option for exploring and sharing data at companies of all sizes. He also explains how it functions, how you can customize it to fit your specific needs, and how to get it up and running in your own environment. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial. Your host as usual is Tobias Macey and today I’m interviewing Max Beauchemin about Superset, an open source platform for data exploration and visualization Interview Introductions How did you get introduced to Python? Can you start by giving an overview of what Superset is and what it might be used for? What problem were you trying to solve when you created it? What tools or platforms did you consider before deciding to build something new? There are a few different ways that someone might categorize Superset, such as business intelligence, data exploration, dashboarding, data visualization. How would you characterize it and how it fits in the current state of the industry and ecosystem? What are some of the lessons that you have learned from your work on Airflow that you applied to Superset? Can you give an overview of how Superset is implemented? How have the goals, design and architecture evolved since you first began working on it? Given its origin as a hackathon project the choice of Python seems natural. What are some of the challenges that choice has posed over the life of the project? If you were to start the whole project over today what might you do differently? Can you describe what’s involved in getting started with a new setup of Superset? What are the available interfaces and integration points for someone who wants to extend it or add new functionality? What are some of the most often overlooked, misunderstood, or underused capabilities of Superset? One of the perennial challenges with a tool that allows users to build data visualizations is the potential to build dashboards or charts that are visually appealing but ultimately meaningless or wrong. How much guidance does Superset provide in helping to select a useful representation of the data? In addition to being the original author and a project maintainer you have also started a company to offer Superset as a service. What are your goals with that business and what is the opportunity that it provides? What are some of the most interesting, innovative, or unexpected ways that you have seen Superset used? What are the most interesting, unexpected, or challenging lessons that you have learned while building and growing the Superset project and community? When is Superset the wrong choice? What do you have planned for the future of Superset and Preset? Keep In Touch LinkedIn @mistercrunch on Twitter mistercrunch on GitHub Picks Tobias SOPS Max Frank Zappa Documentary Accelerate: The Science of Lean Software and DevOps Closing Announcements Thank you for listening! Don’t forget to check out our

Mar 22, 202147 min

Ep 306Practical Advice On Using Python To Power A Business

Full

Summary Python is a language that is used in almost every imaginable context and by people from an amazing range of backgrounds. A lot of the people who use it wouldn’t even call themselves programmers, because that is not the primary focus of their job. In this episode Chris Moffitt shares his experience writing Python as a business user. In order to share his insights and help others who have run up against the limits of Excel he maintains the site Practical Business Python where he publishes articles that help introduce newcomers to Python and explain how to perform tasks such as building reports, automating Excel files, and doing data analysis. This is a great conversation that illustrates how useful it is to learn Python even if you never intend to write software professionally. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial. Your host as usual is Tobias Macey and today I’m interviewing Chris Moffitt about how Python is used to help manage business needs and processes and his work to share advice on this topic at Practical Business Python Interview Introductions How did you get introduced to Python? Can you start by giving an overview of your mission at Practical Business Python? What was your inspiration for starting the site and what keeps you motivated? What are some of the kinds of problems that a business user is looking to solve for themselves? Why is Python a viable tool for a business user to become familiar with? How would you characterize the difference between the ways that a software engineer and a business user approach Python? What do you see as the tipping point of complexity or time investment past which a business user will pass a given project on to a software engineer? How much familiarity with adjacent concerns such as version control, software design, etc. do you consider useful for a business user? What are some of the ways that you use Python in your day-to-day? What are some of the onramps for integrating Python into a user’s workflow? What are some common stumbling blocks that business users run into when getting started with Python? What are some of the most interesting, innovative, or impressive ways that you have seen Python employed by business users? What are some of the most interesting, unexpected, or challenging lessons that you have learned while working on the Practical Business Python site? What are some cases where you would advocate for a tool other than Python for a business use case? What do you have planned for the future of the site? Keep In Touch LinkedIn chris1610 on GitHub @chris1610 on Twitter Picks Tobias The Data Science Roundup Newsletter This Week In Data Newsletter Chris Moffitt Line Of Duty BBC Series Out Of The Dark by David Weber Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Practical Business Python blog Electrical Engineering Unix Perl Data Science Django Raspberry Pi Pandas Excel VBA == Visual Basic for Applications VSCode Excel PowerFX Pathlib Conda Python Wheels PEP 582 SAP Salesforce Tableau Prophet library for timeseries forecasting Talk Python

Mar 16, 202149 min

Ep 305Analyzing The Ecosystem of Python Data Companies With Tony Liu

Full

Summary There are a large and growing number of businesses built by and for data science and machine learning teams that rely on Python. Tony Liu is a venture investor who is following that market closely and betting on its continued success. In this episode he shares his own journey into the role of an investor and discusses what he is most excited about in the industry. He also explains what he looks at when investing in a business and gives advice on what potential founders and early employees of startups should be thinking about when starting on that journey. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Tony Liu about his perspectives on the landscape of Python in the data ecosystem from his role as an investor Interview Introductions How did you get introduced to Python? Can you start by sharing your background in the data ecosystem? What led you to your current role as a venture investor? What is your current area of focus in your investments? What do you see as the major strengths of Python in the current landscape for data and analytics? What are the areas where the ecosystem is still lacking? Where are you seeing growth in the space and what do you see as the motivating factors? As an investor, what are the qualities that you look for in a startup that is trying to compete in the data ecosystem? What is your process for learning about and identifying companies that demonstrate the potential to succeed? Do you focus on a particular problem domain and research a grouping of companies that are focused on that problem, or do you start from a given company to determine where to place your bets? How has COVID changed the competitive landscape? Can you share some of the companies that you have invested in? What was noteable about their respective businesses that provided you with the confidence that they were worth investing in? What are some of the most interesting, unexpected, or challenging lessons that you have learned from your experience as a venture investor? What are some of the companies that you are keeping a close eye on, whether as potential investments or as competitors to your existing portfolio? What are some of the problem spaces that you would like to see companies try to tackle? What advice do you have for engineers who might be considering building a new business? Do you have any advice for engineers who are working at a startup as to how best to compete in the current market? Keep In Touch LinkedIn Picks Tobias The Sleepover movie What do ya do with a Bernie Sanders? music video Tony Uncut Gems Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Costanoa Ventures Sports Analytics Turo Databricks Koalas DataRobot Faust Podcast Episode Oozie Azkaban Airflow Podcast Episode Prefect Data Engineering Podcast Episode Dagster Podcast Episode Data Engineering Podcast Episode Kubeflow MLFlow Metaflow Podcast Episode Pandas Podcast Episode Spark Data Engineering Podcast Episode DBT Data Engineering Podcast Episode SnowflakeDB Data Engineering Podcast Episode Coiled Podcast Episode Noteable Dask Data Engineering Podcast Episode Data Engineering Podcast Episode About Notebooks at Netflix The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Mar 9, 202139 min

Ep 304Go From Notebook To Pipeline For Your Data Science Projects With Orchest

Full

Summary Jupyter notebooks are a dominant tool for data scientists, but they lack a number of conveniences for building reusable and maintainable systems. For machine learning projects in particular there is a need for being able to pivot from exploring a particular dataset or problem to integrating that solution into a larger workflow. Rick Lamers and Yannick Perrenet were tired of struggling with one-off solutions when they created the Orchest platform. In this episode they explain how Orchest allows you to turn your notebooks into executable components that are integrated into a graph of execution for running end-to-end machine learning workflows. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Rick Lamers and Yannick Perrenet about Orchest, a development environment designed for building data science pipelines from notebooks and scripts. Interview Introductions How did you get introduced to Python? Can you start by giving an overview of what Orchest is and the story behind it? Who are the users that you are building Orchest for and what are their biggest challenges? What are some examples of the types of tools or workflows that they are using now? What are some of the other tools or strategies in the data science ecosystem that Orchest might replace? (e.g. MLFlow, Metaflow, etc.) What problems does Orchest solve? Can you describe how Orchest is implemented? How have the design and goals of the project changed since you first started working on it? What is the workflow for someone who is using Orchest? What are some of the sharp edges that they might run into? What is the deployable unit once a pipeline has been created? How do you handle verification and promotion of pipelines across staging and production environments? What are the interfaces available for integrating with or extending Orchest? How might an organization incorporate a pipeline defined in Orchest with the rest of their data orchestration workflows? How are you approaching governance and sustainability of the Orchest project? What are the most interesting, innovative, or unexpected ways that you have seen Orchest used? What are the most interesting, unexpected, or challenging lessons that you have learned while building Orchest? When is Orchest the wrong choice? What do you have planned for the future of the project and company? Keep In Touch Rick ricklamers on GitHub LinkedIn @RickLamers on Twitter Yannick yannickperrenet on GitHub LinkedIn Picks Tobias Fresh Bagels Rick Vaex Yannick Cookiecutter Pyenv Links Orchest Geoffrey Hinton Yann LeCun CoffeeScript Vim GAN == Generative Adversarial Network Git SQL BigQuery Software Carpentry Podcast Episode Google Colab Airflow Podcast Episode Kedro Data Engineering Podcast Episode nbdev Podcast Episode Papermill Data Engineering Podcast Episode MLFlow Metaflow Podcast Episode DVC Podcast Episode Andrew Ng Kubeflow Lua Caddy Traefik DAG == Directed Acyclic Graph Jupyter Enterprise Gateway Streamlit Kubernetes Dagster Podcast.__init__ Episode Data Engineering Podcast Episode DBT Data Engineering Podcast Episode GitLab Spark ETL The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Mar 2, 202144 min

Ep 303Write Your Python Scripts In A Flow Based Visual Editor With Ryven

Full

Summary When you are writing a script it can become unwieldy to understand how the logic and data are flowing through the program. To make this easier to follow you can use a flow-based approach to building your programs. Leonn Thomm created the Ryven project as an environment for visually constructing a flow-based program. In this episode he shares his inspiration for creating the Ryven project, how it changes the way you think about program design, how Ryven is implemented, and how to get started with it for your own programs. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Leon Thomm about Ryven, a flow-based visual scripting environment for Python Interview Introductions How did you get introduced to Python? Can you start by giving an overview of what Ryven is and what inspired you to create it? What is flow-based visual scripting? What are other popular flow-based visual scripting systems out there and have they been inspiring to the project? What problem(s) do these try to solve? What are some of the places where you are drawing inspiration for Ryven? What are the kinds of projects that someone might build with Ryven? How are you using Ryven in your personal projects? How does structuring a project as a set of nodes in a flow graph influence the way that you think about how to design the solution to a problem? Can you describe how Ryven is implemented? How has the design or goals of the project changed or evolved since you first began working on it? For someone who wants to use Ryven to build a project can you describe their workflow? How do you handle things like code quality and tests for a Ryven project? How do you manage collaboration for a Ryven project? (e.g. version control) What are some of the most interesting, innovative, or unexpected ways that you have seen Ryven used? What are the most interesting, unexpected, or challenging lessons that you have learned while building Ryven? When is Ryven the wrong choice? What do you have planned for the future of the project? Keep In Touch leon-thomm on GitHub Picks Tobias PyInfra Leon A Universe from Nothing! by Lawrence M. Krauss Links Ryven Switzerland Qt C++ framework Flow-based Scripting Unreal Engine Node-RED IFTTT == IF This Then That DAG == Directed Acyclic Graph Mind Map Literate Programming nbdev Podcast Episode Org Mode OpenCV scikit-learn Unreal Python The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Feb 23, 202147 min

Ep 302CrossHair: Your Automatic Pair Programmer

Full

Summary One of the perennial challenges in software engineering is to reduce the opportunity for bugs to creep into the system. Some of the tools in our arsenal that help in this endeavor include rich type systems, static analysis, writing tests, well defined interfaces, and linting. Phillip Schanely created the CrossHair project in order to add another ally in the fight against broken code. It sits somewhere between type systems, automated test generation, and static analysis. In this episode he explains his motivation for creating it, how he uses it for his own projects, and how to start incorporating it into yours. He also discusses the utility of writing contracts for your functions, and the differences between property based testing and SMT solvers. This is an interesting and informative conversation about some of the more nuanced aspects of how to write well-behaved programs. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Phillip Schanely about CrossHair, an analysis tool for Python that blurs the line between testing and type systems. Interview Introductions How did you get introduced to Python? Can you start by giving an overview of what the CrossHair project is and how it got started? What are some examples of the types of tools that CrossHair might augment or replace? (e.g. Pydantic, Doctest, etc.) What are the categories of bugs or problems in your code that CrossHair can help to identify or discover? Can you explain the benefits of implementing contracts in your software? What are the limitations of contract implementations? What are the available interfaces for creating and validating contracts? How does the use of contracts in your software influence the overall design of the system? How does CrossHair compare to type systems in terms of use cases or capabilities? Can you describe how CrossHair is implemented? How has the design or goal of CrossHair changed or evolved since you first began working on it? What are some of the other projects that you have gained inspiration or ideas from while working on CrossHair? (inside or outside of the Python ecosystem) For someone who wants to get started with CrossHair, can you talk through the developer workflow? I noticed that you recently added support for validating the functional equivalency of different method implementations. What was the inspiration for that capability? What kinds of use cases does that enable? How much of CrossHair are you able to dogfood while developing CrossHair? What are some of the most interesting, innovative, or unexpected ways that you have seen CrossHair used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on CrossHair? When is CrossHair the wrong choice? What do you have planned for the future of the project? Keep In Touch pschanely on GitHub @pschanely on Twitter LinkedIn Picks Tobias The War With Grandpa Phillip Hammock chairs! (affiliate link) Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links CrossHair NLTK == Natural Language ToolKit ACL2 Liquid Haskell SMT Solver Doctest Property Based Testing Hypothesis Podcast Episode Halting Problem Pydantic PEP 316 icontract Eiffel programming language Design By Contract Metamorphic Testing Higher Order Types Fuzz Testing The Fuzzing Book Python Audit Hooks GitHub Scientist Laboratory Python implementation of GitHub Scientist Podcast Episode Taint Analysis The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Feb 16, 202142 min

Ep 301Giving Your Data Science Projects And Teams A Home At DagsHub

Full

Summary Collaborating on software projects is largely a solved problem, with a variety of hosted or self-managed platforms to choose from. For data science projects, collaboration is still an open question. There are a number of projects that aim to bring collaboration to data science, but they are all solving a different aspect of the problem. Dean Pleban and Guy Smoilovsky created DagsHub to give individuals and teams a place to store and version their code, data, and models. In this episode they explain how DagsHub is designed to make it easier to create and track machine learning experiments, and serve as a way to promote collaboration on open source data science projects. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Dean Pleban and Guy Smoilovsky about DagsHub, a platform to track experiments, and version data, models & pipelines for your data science and machine learning projects. Interview Introduction How did you first get introduced to Python? Can you start by describing what the DagsHub platform is and why you built it? There are a number of projects and platforms that aim to support collaboration among data scientists. What are the distinguishing features of DagsHub and how does it compare to the other options in the ecosystem? What are the biggest opportunities for improvement that you still see in the space of collaboration on data projects? What do you see as the biggest points of friction for building experiments and managing source data collaboratively? Can you describe how the DagsHub platform is implemented? How have the design and goals of the system changed or evolved since you first began working on it? How has your own understanding and practices of working on data science/ML projects changed changed? GitHub has a number of convenience features beyond just storing a git repository. What are the capabilities that you are focusing on to add value to the data science workflow within DagsHub? How are you approaching the bootstrapping problem of building a critical mass of users to be able to generate a beneficial network effect? Are there any conventions that make it easier or more familiar for newcomers to a given project? (e.g. code layout, data labeling/tagging formats, etc.) What are your recommendations for managing onwership/licensing of data assets in public projects? What are some of the most interesting, innovative, or unexpected ways that you have seen DagsHub used? What are the most interesting, unexpected, or challenging lessons that you have learned while building DagsHub? When is DagsHub the wrong choice? What do you have planned for the future of the platform and business? Keep In Touch Follow us on Twitter or LinkedIn, join our Discord, sign up to DAGsHub @DeanPlbn @Guy_T_Sky @TheRealDAGsHub DagsHub Discord Picks Tobias The Remarkable Journey of Prince Jen by Lloyd Alexander Dean Quantum Computing Since Democritus by Scott Aaronson The Expanse TV Series Guy Try to consume only the very best of available content, not the things that are coming out right now. Applies to textbooks, TV shows, movies Less Wrong blog Slate Star Codex \ Astral Codex Ten Avatar: The Last Airbender 3 Blue 1 Brown YouTube Channel Haskell Clojure Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links DagsHub DVC Podcast Episode Data Science Cookiecutter Jupyter Notebooks Papers With Code Connected Papers The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Feb 9, 202159 min

Ep 300Exploring Literate Programming For Python Projects With nbdev

Full

Summary Creating well designed software is largely a problem of context and understanding. The majority of programming environments rely on documentation, tests, and code being logically separated despite being contextually linked. In order to weave all of these concerns together there have been many efforts to create a literate programming environment. In this episode Jeremy Howard of fast.ai fame and Hamel Husain of GitHub share the work they have done on nbdev. The explain how it allows you to weave together documentation, code, and tests in the same context so that it is more natural to explore and build understanding when working on a project. It is built on top of the Jupyter environment, allowing you to take advantage of the other great elements of that ecosystem, and it provides a number of excellent out of the box features to reduce the friction in adopting good project hygiene, including continuous integration and well designed documentation sites. Regardless of whether you have been programming for 5 days, 5 years, or 5 decades you should take a look at nbdev to experience a different way of looking at your code. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Jeremy Howard and Hamel Husain about nbdev, a library for turning Jupyter notebooks into Python libraries. Interview Introductions How did you get introduced to Python? Can you start by describing what nbdev is and the goals of the project? What is the story behind how and why it got started? Who is the target audience for the nbdev project? How does that focus influence the features and design of nbdev? What do you see as the primary challenges of building and collaborating on projects written in notebooks? What are some of the other projects that are working to simplify or improve the experience of using notebooks? How does nbdev compare to or complement those other tools? Can you describe how nbdev is implemented? How has the design and goals of the project evolved since it was first started? What is the workflow of someone who is using nbdev? At what point in the lifecycle of a notebook oriented project should someone start integrating nbdev? How does nbdev scale when working on a project that spans multiple notebooks/modules? How does working in a notebook environment change your approach to software development and project design? What are the most interesting, innovative, or unexpected ways that you have seen nbdev used? What are the most interesting, unexpected, or challenging lessons that you have learned from working on nbdev? When is nbdev the wrong choice? What do you have planned for the future of the project? Keep In Touch Jeremy LinkedIn @jeremyphoward on Twitter jph00 on GitHub Hamel hamelsmu on GitHub Website @HamelHusain on Twitter LinkedIn Picks Tobias Rivals! Frenemies Who Changed The World Jeremy Chess Hamel Moonwalking With Einstein by Joshua Foer (affiliate link) Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links nbdev fast.ai GitHub Perl Fastmail R Studio R Markdown Literate Programming fastcore JupyterLab nteract Jupyter Voilà GitHub Actions Sphinx Google Colab Working In Public by Nadia Eghbal (affiliate link) Jekyll Hugo Cython Podcast Episode The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Feb 2, 202151 min

Ep 299Making The Sans I/O Ideal A Reality For The Websockets Library

Full

Summary Working with network protocols is a common need for software projects, particularly in the current age of the internet. As a result, there are a multitude of libraries that provide interfaces to the various protocols. The problem is that implementing a network protocol properly and handling all of the edge cases is hard, and most of the available libraries are bound to a particular I/O paradigm which prevents them from being widely reused. To address this shortcoming there has been a movement towards "sans I/O" implementations that provide the business logic for a given protocol while remaining agnostic to whether you are using async I/O, Twisted, threads, etc. In this episode Aymeric Augustin shares his experience of refactoring his popular websockets library to be I/O agnostic, including the challenges involved in how to design the interfaces, the benefits it provides in simplifying the tests, and the work needed to add back support for async I/O and other runtimes. This is a great conversation about what is involved in making an ideal a reality. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Aymeric Augustin about his work on the websockets library and the work involved in making it sans I/O Interview Introductions How did you get introduced to Python? Can you start by giving an overview of your work on the websockets library and how the project got started? What does "sans I/O" mean and what are the goals associated with it? Can you share the history of your work on the websockets project? What was your motivation for starting down the path of rearchitecting a project that is already production ready? Can you talk through how the websockets library is architected currently? How has the design of the project evolved since you first began working on it? At a high level, what were the changes required to make it functionally sans i/o? What do you see as the primary challenges associated with making network related libraries sans i/o? In your experience of porting websockets to be purely protocol oriented, what are the technical and design challenges that you faced? One of the goals of the Sans I/O approach is to support reusability and composability of network protocol implementations. What has your experience been as to the viability of those goals in practice? What is your current perspective on the cost/benefit of the sans i/o conversion? Who are the primary consumers of the websockets library? How do you foresee the target audience changing once you have completed extracting the protocol logic? What are some of the most interesting, innovative, or unexpected ways that you have seen the websockets project used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on the websockets project and sans i/o conversion? What do you have planned for the future of the project? Keep In Touch LinkedIn @aymericaugustin on Twitter Website Picks Tobias Jigsaw Puzzles Aymeric Inside Qonto interview Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Sans I/O: When The Rubber Meets The Road Websockets library Websockets Protocol Qonto Tulip Asyncio CERN Particle Accelerator Sans I/O Cory Benfield HTTP/2 Twisted Curio Trio Inversion of Control ohneio helper library for implementing sans I/O network protocols SOCKS Proxy Sanic The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Jan 26, 202138 min

Ep 298Driving Toward A Faster Python Interpreter With Pyston

Full

Summary One of the common complaints about Python is that it is slow. There are languages and runtimes that can execute code faster, but they are not as easy to be productive with, so many people are willing to make that tradeoff. There are some use cases, however, that truly need the benefit of faster execution. To address this problem Kevin Modzelewski helped to create the Pyston intepreter that is focused on speeding up unmodified Python code. In this episode he shares the history of the project, discusses his current efforts to optimize a fork of the CPython interpreter, and his goals for building a business to support the ongoing work to make Python faster for everyone. This is an interesting look at the opportunities that exist in the Python ecosystem and the work being done to address some of them. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Kevin Modzelewski about his work on Pyston, an interpreter for Python focused on compatibility and speed. Interview Introductions How did you get introduced to Python? Can you start by describing what Pyston is and how it got started? Can you share some of the history of the project and the recent changes? What is your motivation for focusing on Pyston and Python optimization? What are the use cases that you are primarily focused on with Pyston? Why do you think Python needs another performance project? Can you describe the technical implementation of Pyston? How has the project evolved since you first began working on it? What are the biggest challenges that you face in maintaining compatibility with CPython? How does the approach to Pyston compare to projects like PyPy and Pyjion? How are you approaching sustainability and governance of the project? What are some of the most interesting, innovative, or unexpected uses for Pyston that you have seen? What have you found to be the most interesting, unexpected, or challenging lessons that you have learned while working on Pyston? When is Pyston the wrong choice? What do you have planned for the future of the project? Keep In Touch kmod on GitHub Blog LinkedIn Picks Tobias Last Week In AWS Newsletter Kevin Meditation Calm App Headspace Links Pyston Discord Chat Dropbox CPython PyPy Pyjion Podcast Episode Jython hpy Podcast Episode JIT Compiler Python Software Foundation Podcast Episode The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Jan 19, 202144 min

Ep 297Project Scaffolding That Evolves With Your Software Using Copier

Full

Summary Every software project has a certain amount of boilerplate to handle things like linting rules, test configuration, and packaging. Rather than recreate everything manually every time you start a new project you can use a utility to generate all of the necessary scaffolding from a template. This allows you to extract best practices and team standards into a reusable project that will save you time. The Copier project is one such utility that goes above and beyond the bare minimum by supporting project evolution, letting you bring in the changes to the source template after you already have a project that you have dedicated significant work on. In this episode Jairo Llopis explains how the Copier project works under the hood and the advanced capabilities that it provides, including managing the full lifecycle of a project, composing together multiple project templates, and how you can start using it for your own work today. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Jairo Llopis about Copier, a library for managing project templates Interview Introductions How did you get introduced to Python? Can you start by describing what the Copier project is? How did you get involved in the project? Can you share some of the history of the project? What do you see as the most common uses for a project templating tool? There are a variety of different tools for scaffolding projects across a wide range of languages. What are the distinguishing features of Copier that might lead someone to choose it over the alternatives? Can you describe how the Copier project is implemented? How has the design and feature set evolved over time? What is the workflow for someone building a template with Copier? What are some of the edge cases or complexities that they might run into? What are the options for extensibility or integration with Copier? What are some of the capabilities or use cases for Copier that are often overlooked? What are some of the most interesting, innovative, or unexpected ways that you have seen Copier used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on and with Copier? When is Copier the wrong choice? What do you have planned for the future of the project? Keep In Touch Yajo on GitHub __yajo on Twitter Website Picks Tobias Playing Cards Jairo Mozilla Hubs Links Copier Tecnativa Odoo Open Source ERP Cookiecutter Yeoman Jinja Cookiecutter, Yeoman, and Copier Blog Post doodba-copier-template Copier Templates A Story of Duplicate Code Traefik The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Jan 12, 202157 min

Ep 296How Python's Evolution Impacts Your Fluency With Luciano Ramalho

Full

Summary On its surface Python is a simple language which is what has contributed to its rise in popularity. As you move to intermediate and advanced usage you will find a number of interesting and elegant design elements that will let you build scalable and maintainable systems and design friendly interfaces. Luciano Ramalho is best known as the author of Fluent Python which has quickly become a leading resource for Python developers to increase their facility with the language. In this episode he shares his journey with Python and his perspective on how the recent changes to the interpreter and ecosystem are influencing who is adopting it and how it is being used. Luciano has an interesting perspective on how the feedback loop between the community and the language is driving the curent and future priorities of the features that are added. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Luciano Ramalho about the recent and upcoming changes in the Python language Interview Introductions How did you get introduced to Python? Can you start by giving an overview of the role that Python has played in your career? What other languages do you work with on a regular basis? How has that experience influenced the ways that you use Python? What do you see as the biggest changes that have been added to Python in recent years? How have the changes in Python changed the way that you approach program design? How has your work on Fluent Python influenced your perspective on the language and its utility? What do you find to be the most confusing aspects of Python, whether for newcomers or experienced developers? How would you characterize the types of features that have been added to Python in recent years? What, if any, trends have you observed in the types of features that are proposed and included in Python and what do you see as the motivating factors for them? What changes to the language are you tracking? Which are you personally invested in? What new features or capabilities would you like to see included in Python? Keep In Touch @ramalhoorg on Twitter ramalho on GitHub LinkedIn Picks Tobias Magic: The Gathering: Arena Luciano The Queen’s Gambit Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Fluent Python Library and Information Sciences Thoughtworks São Paulo, Brazil Perl PHP Object Oriented Programming Dunder Methods Python Essential Reference Python In A Nutshell Python Typing Module Pytype Pyre MyPy AsyncIO Typing Protocols Duck Typing Static Typing Where Possible, Dynamic Typing Where Needed TypeScript Ruby 3 Type Annotations C# Go Language KotlinJS Matrix Multiplication Operator Walrus Operator == Assignment Expressions CPython PEG Parser Podcast Episode PEP 3099: Things that will Not Change in Python 3000 Elixir Pattern Matching Erlang Prolog Python Pattern Matching PEP SWIG Symbolic Computation Python Descriptors Beeware The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Jan 5, 20211h 0m

Ep 295Making Content Management A Smooth Experience With A Headless CMS

Full

Summary Building a web application requires integrating a number of separate concerns into a single experience. One of the common requirements is a content management system to allow product owners and marketers to make the changes needed for them to do their jobs. Rather than spend the time and focus of your developers to build the end to end system a growing trend is to use a headless CMS. In this episode Jake Lumetta shares why he decided to spend his time and energy on building a headless CMS as a service, when and why you might want to use one, and how to integrate it into your applications so that you can focus on the rest of your application. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer. Springboard has launched their School of Data to help you get a career in the field through a comprehensive set of programs that are 100% online and tailored to fit your busy schedule. With a network of expert mentors who are available to coach you during weekly 1:1 video calls, a tuition-back guarantee that means you don’t pay until you get a job, resume preparation, and interview assistance there’s no reason to wait. Springboard is offering up to 20 scholarships of $500 towards the tuition cost, exclusively to listeners of this show. Go to pythonpodcast.com/springboard today to learn more and give your career a boost to the next level. Your host as usual is Tobias Macey and today I’m interviewing Jake Lumetta about Butter CMS and the role of a headless CMS in the modern web ecosystem. Interview Introductions How did you get introduced to Python? Can you start by describing what a headless CMS is? How does the use case and user experience differ from working with a traditional CMS (e.g. WordPress, etc.)? How does a headless CMS compare to using a framework such as Django CMS or Wagtail? Can you describe what you have built at ButterCMS? What was your motivation for starting a business to provide a CMS as a service? How would you characterize the current state of the CMS ecosystem? How does ButterCMS compare to the available open source and commercial options? What are the trends in the web ecosystem that have made a headless CMS necessary or useful? What types of information are people managing in a CMS? How are people integrating headless CMS systems into their Python applications? Can you describe the architecture for Butter? How has the system changed or evolved since you first began working on it? What was your decision process for determining what language(s) and technology stack to use for building the platform? What are the aspects of building and maintaining a CMS that are most complex? What are some of the most interesting, innovative, or unexpected ways that you have seen ButterCMS used? What have you found to be the most interesting, unexpected, or challenging lessons that you have learned while building ButterCMS? When is ButterCMS the wrong choice? What do you have planned for the future of ButterCMS? Keep In Touch LinkedIn @jakelumetta on Twitter Picks Tobias The Arrow TV Show Jake Ghost In The Wires by Kevin Mitnick Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links ButterCMS Hiring: Dir of Engineering PHP Django MVC == Model, View, Controller Headless CMS WordPress Django CMS Wagtail Podcast Episode SEO == Search Engine Optimization JAM (Javascript, APIs, and Markup) Stack Netlify Vercel Cloudflare Pages Vue.js React.js Django Rest Framework Fastly CDN == Content Delivery Network AWS Cloudfront Ionic React Native The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC

Dec 28, 202048 min

Ep 294Turning Notebooks Into Collaborative And Dynamic Data Applications With Hex

Full

Summary Notebooks have been a useful tool for analytics, exploratory programming, and shareable data science for years, and their popularity is continuing to grow. Despite their widespread use, there are still a number of challenges that inhibit collaboration and use by non-technical stakeholders. Barry McCardel and his team at Hex have built a platform to make collaboration on Jupyter notebooks a first class experience, as well as allowing notebooks to be parameterized and exposing the logic through interactive web applications. In this episode Barry shares his perspective on the state of the notebook ecosystem, why it is such as powerful tool for computing and analytics, and how he has built a successful business around improving the end to end experience of working with notebooks. This was a great conversation about an important piece of the toolkit for every analyst and data scientist. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Do you want to get better at Python? Now is an excellent time to take an online course. Whether you’re just learning Python or you’re looking for deep dives on topics like APIs, memory mangement, async and await, and more, our friends at Talk Python Training have a top-notch course for you. If you’re just getting started, be sure to check out the Python for Absolute Beginners course. It’s like the first year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving. Go to pythonpodcast.com/talkpython today and get 10% off the course that will help you find your next level. That’s pythonpodcast.com/talkpython, and don’t forget to thank them for supporting the show. Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer. Springboard has launched their School of Data to help you get a career in the field through a comprehensive set of programs that are 100% online and tailored to fit your busy schedule. With a network of expert mentors who are available to coach you during weekly 1:1 video calls, a tuition-back guarantee that means you don’t pay until you get a job, resume preparation, and interview assistance there’s no reason to wait. Springboard is offering up to 20 scholarships of $500 towards the tuition cost, exclusively to listeners of this show. Go to pythonpodcast.com/springboard today to learn more and give your career a boost to the next level. Your host as usual is Tobias Macey and today I’m interviewing Barry McCardel about Hex, a managed platform to turn your notebooks into collaborative, interactive data apps and stories Interview Introductions How did you get introduced to Python? Can you start by describing what you have built at Hex and your motivation for starting the business? Who are the primary users of the Hex platform? How has that focus influenced your product direction and the features that you prioritize? What are the biggest roadblocks that you see data analysts and data consumers running into? How have those roadblocks shifted in recent years? What is it about the concept of a notebook that has caused them to see such a massive rise in usage and popularity? What are the barriers to productivity and accessibility that still exist in the notebook ecosystem? What are the pieces for working in and with notebooks that are still missing? What does Hex add to the experience of working with notebooks? Can you describe how the Hex platform implemented? How has the design of the platform changed or evolved since you first began working on it? Where does Hex sit in the lifecycle of notebook creation and usage? How does it compare to other services built to support users of notebooks such as Zepl, Saturn Cloud, Noteable, etc.? You focus on the Jupyter platform, but there are a number of other notebook frameworks that have sprung up in recent years. What do you see as being the relative strengths of the available options? What are the trends in the tooling, capabilities, and use cases for notebooks that you are keeping an eye on? What are the most interesting, innovative, or unexpected ways that you have seen the Hex platform used? What are the most interesting,

Dec 21, 202042 min

Ep 293Add Anomaly Detection To Your Time Series Data With Luminaire

Full

Summary When working with data it’s important to understand when it is correct. If there is a time dimension, then it can be difficult to know when variation is normal. Anomaly detection is a useful tool to address these challenges, but a difficult one to do well. In this episode Smit Shah and Sayan Chakraborty share the work they have done on Luminaire to make anomaly detection easier to work with. They explain the complexities inherent to working with time series data, the strategies that they have incorporated into Luminaire, and how they are using it in their data pipelines to identify errors early. If you are working with any kind of time series then it’s worth giving Luminaure a look. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer. Springboard has launched their School of Data to help you get a career in the field through a comprehensive set of programs that are 100% online and tailored to fit your busy schedule. With a network of expert mentors who are available to coach you during weekly 1:1 video calls, a tuition-back guarantee that means you don’t pay until you get a job, resume preparation, and interview assistance there’s no reason to wait. Springboard is offering up to 20 scholarships of $500 towards the tuition cost, exclusively to listeners of this show. Go to pythonpodcast.com/springboard today to learn more and give your career a boost to the next level. Your host as usual is Tobias Macey and today I’m interviewing Smit Shah and Sayan Chakraborty about Luminaire, a machine learning based package for anomaly detection on timeseries data Interview Introductions How did you get introduced to Python? Can you start by describing what Luminaire is and how the project got started? Where does the name come from? How does Luminaire compare to other frameworks for working with timeseries data such as Prophet? What are the main use cases that Luminaire is powering at Zillow? What are some of the complexities inherent to anomaly detection that are non-obvious at first glance? How are you addressing those challenges in Luminaire? Can you describe how Luminaire is implemented? How has the design of the project evolved since it was first started? What was the motivation for releasing Luminaire as open source? For someone who is using Luminaire, what is the process for training and deploying a model with it? What are some common ways that it is used within a larger system? How do sustained anomalies such as the current pandemic affect the work of identifying other sources of meaningful outliers? What are some of the most interesting, innovative, or unexpected ways that you have seen Luminaire being used? What are some of the most interesting, unexpected, or challening lessons that you have learned while building and using Luminaire? When is Luminaire the wrong choice? What do you have planned for the future of the project? Keep In Touch Smit LinkedIn shahsmit14 on GitHub Sayan LinkedIn Website @tweettosayan on Twitter Picks Tobias Flakehell Smit Apache Ranger Sayan Prediction Machines: The Simple Economics Of Artificial Intelligence Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Luminaire Zillow Anomaly Detection Facebook Prophet IEEE Big Data Conference Unsupervised Learning ARIMA (Autoregressive Integrated Moving Average) Model Airflow The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Dec 15, 202054 min

Ep 292Building Big Data Pipelines For Audio With Klio

Full

Summary Technologies for building data pipelines have been around for decades, with many mature options for a variety of workloads. However, most of those tools are focused on processing of text based data, both structured and unstructured. For projects that need to manage large numbers of binary and audio files the list of options is much shorter. In this episode Lynn Root shares the work that she and her team at Spotify have done on the Klio project to make that list a bit longer. She discusses the problems that are specific to working with binary data, how the Klio project is architected to allow for scalable and efficient processing of massive numbers of audio files, why it was released as open source, and how you can start using it today for your own projects. If you are struggling with ad-hoc infrastructure and a medley of tools that have been cobbled together for analyzing large or numerous binary assets then this is definitely a tool worth testing out. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Do you want to get better at Python? Now is an excellent time to take an online course. Whether you’re just learning Python or you’re looking for deep dives on topics like APIs, memory mangement, async and await, and more, our friends at Talk Python Training have a top-notch course for you. If you’re just getting started, be sure to check out the Python for Absolute Beginners course. It’s like the first year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving. Go to pythonpodcast.com/talkpython today and get 10% off the course that will help you find your next level. That’s pythonpodcast.com/talkpython, and don’t forget to thank them for supporting the show. Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer. Springboard has launched their School of Data to help you get a career in the field through a comprehensive set of programs that are 100% online and tailored to fit your busy schedule. With a network of expert mentors who are available to coach you during weekly 1:1 video calls, a tuition-back guarantee that means you don’t pay until you get a job, resume preparation, and interview assistance there’s no reason to wait. Springboard is offering up to 20 scholarships of $500 towards the tuition cost, exclusively to listeners of this show. Go to pythonpodcast.com/springboard today to learn more and give your career a boost to the next level. Your host as usual is Tobias Macey and today I’m interviewing Lynn Root about Klio, an open source pipeline for processing audio and binary data Interview Introductions How did you get introduced to Python? Can you start by describing what Klio is and how it got started? What are some of the challenges that are unique to processing audio data as compared to text? What use cases does Klio enable? What are some of the alternative options available for working with binary data? What capabilities were lacking in other solutions that made it worthwhile to build a new system from scratch? Can you describe the design and architecture of Klio? What was the motivation for implementing Klio as a Python framework, rather than building on top of the Scio project? How much of a challenge has it been to interface to the Beam framework from Python? (Java <-> Python impedance mismatch) One of the interesting optimizations in Klio is the option for bottom up execution of a job to avoid processing a given file unless absolutely necessary. What are some of the other useful or interesting capabilities that are built into Klio? What was the motivation and process for releasing Klio as open source? For someone who is building a pipeline with Klio, can you talk through the workflow? What are the extension and integration points that are exposed? How does Klio handle third party dependencies for a given job? What are some of the challenges, misunderstandings, or edge cases that users of Klio should be aware of? What are some of the most interesting, unexpected, or challenging lessons that you have learned while building and growing the Klio project? What are some of the mos

Dec 7, 202053 min

Ep 291Open Sourcing The Anvil Full Stack Python Web App Platform

Full

Summary Building a complete web application requires expertise in a wide range of disciplines. As a result it is often the work of a whole team of engineers to get a new project from idea to production. Meredydd Luff and his co-founder built the Anvil platform to make it possible to build full stack applications entirely in Python. In this episode he explains why they released the application server as open source, how you can use it to run your own projects for free, and why developer tooling is the sweet spot for an open source business model. He also shares his vision for how the end-to-end experience of building for the web should look, and some of the innovative projects and companies that were made possible by the reduced friction that the Anvil platform provides. Give it a listen today to gain some perspective on what it could be like to build a web app. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Do you want to get better at Python? Now is an excellent time to take an online course. Whether you’re just learning Python or you’re looking for deep dives on topics like APIs, memory mangement, async and await, and more, our friends at Talk Python Training have a top-notch course for you. If you’re just getting started, be sure to check out the Python for Absolute Beginners course. It’s like the first year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving. Go to pythonpodcast.com/talkpython today and get 10% off the course that will help you find your next level. That’s pythonpodcast.com/talkpython, and don’t forget to thank them for supporting the show. Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer. Springboard has launched their School of Data to help you get a career in the field through a comprehensive set of programs that are 100% online and tailored to fit your busy schedule. With a network of expert mentors who are available to coach you during weekly 1:1 video calls, a tuition-back guarantee that means you don’t pay until you get a job, resume preparation, and interview assistance there’s no reason to wait. Springboard is offering up to 20 scholarships of $500 towards the tuition cost, exclusively to listeners of this show. Go to pythonpodcast.com/springboard today to learn more and give your career a boost to the next level. Your host as usual is Tobias Macey and today I’m interviewing Meredydd Luff about the process and motivations for releasing the Anvil platform as open source Interview Introductions How did you get introduced to Python? Can you start by giving an overview of what Anvil is and some of the story behind it? What is new or different in Anvil since we last spoke in June of 2019? What are the most common or most impressive use cases for Anvil that you have seen? On your website you mention Anvil being used for deploying models and productionizing notebooks. How does Anvil help in those use cases? How much of the adoption of Anvil do you attribute to the use of Skulpt and providing a way to write Python for the browser? What are some of the complications that users might run into when trying to integrate with the broader Javascript ecosystem? How does the release of the Anvil App Server affect your business model? How does the workflow for users of the Anvil platform change if they decide to run their own instance? What is involved in getting it deployed to production? What other tools or companies did you look to for positive and negative examples of how to run a successful business based on open source? What was your motivation for open sourcing the core runtime of Anvil? What was involved in getting the code cleaned up and ready for a public release? What are the other ways that your business relies on or contributes to the open source ecosystem? What do you see as the primary threats to open source business models? What are some of the most interesting, unexpected, or challenging lessons that you have learned while building and growing Anvil? What do you have planned for the future of the platform and business? Keep In Touch LinkedIn @meredydd on Twitter mer

Dec 1, 202051 min

Ep 290Pants Has Got Your Python Monorepo Covered

Full

Summary In a software project writing code is just one step of the overall lifecycle. There are many repetitive steps such as linting, running tests, and packaging that need to be run for each project that you maintain. In order to reduce the overhead of these repeat tasks, and to simplify the process of integrating code across multiple systems the use of monorepos has been growing in popularity. The Pants build tool is purpose built for addressing all of the drudgery and for working with monorepos of all sizes. In this episode core maintainers Eric Arellano and Stu Hood explain how the Pants project works, the benefits of automatic dependency inference, and how you can start using it in your own projects today. They also share useful tips for how to organize your projects, and how the plugin oriented architecture adds flexibility for you to customize Pants to your specific needs. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer. Springboard has launched their School of Data to help you get a career in the field through a comprehensive set of programs that are 100% online and tailored to fit your busy schedule. With a network of expert mentors who are available to coach you during weekly 1:1 video calls, a tuition-back guarantee that means you don’t pay until you get a job, resume preparation, and interview assistance there’s no reason to wait. Springboard is offering up to 20 scholarships of $500 towards the tuition cost, exclusively to listeners of this show. Go to pythonpodcast.com/springboard today to learn more and give your career a boost to the next level. Feature flagging is a simple concept that enables you to ship faster, test in production, and do easy rollbacks without redeploying code. Teams using feature flags release new software with less risk, and release more often. ConfigCat is a feature flag service that lets you easily add flags to your Python code, and 9 other platforms. By adopting ConfigCat you and your manager can track and toggle your feature flags from their visual dashboard without redeploying any code or configuration, including granular targeting rules. You can roll out new features to a subset or your users for beta testing or canary deployments. With their simple API, clear documentation, and pricing that is independent of your team size you can get your first feature flags added in minutes without breaking the bank. Go to pythonpodcast.com/configcat today to get 35% off any paid plan with code PYTHONPODCAST or try out their free forever plan. Your host as usual is Tobias Macey and today I’m interviewing Eric Arellano and Stu Hood about Pants, a flexible build system that works well with monorepos. Interview Introductions How did you get introduced to Python? Can you start by describing what Pants is and how it got started? What’s the story behind the name? What is a monorepo and why might I want one? What are the challenges caused by working with a monorepo? Why are monorepos so uncommon in Python projects? What is the workflow for a developer or team who is managing a project with Pants? How does Pants integrate with the broader ecosystem of Python tools for dependency management and packaging (e.g. Poetry, Pip, pip-tools, Flit, Twine, Pex, Shiv, etc.)? What is involved in setting up Pants for working with a new Python project? What complications might developers encounter when trying to implement Pants in an existing project? How is Pants itself implemented? How have the design, goals, or architecture evolved since Pants was first created? What are the major changes in the v2 release? What was the motivation for the major overhaul of the project? How do you recommend developers lay out their projects to work well with Python? How can I handle code shared between different modules or packages, and reducing the third party dependencies that are built into the respective packages? What are some of the most interesting, unexpected, or innovative ways that you have seen Pants used? What have you found to be the most interesting, unexpected, or challenging aspects of working on Pants? What are the case

Nov 23, 202051 min

Ep 289Scale Your Data Science Teams With Machine Learning Operations Principles

Full

Summary Building a machine learning model is a process that requires well curated and cleaned data and a lot of experimentation. Doing it repeatably and at scale with a team requires a way to share your discoveries with your teammates. This has led to a new set of operational ML platforms. In this episode Michael Del Balso shares the lessons that he learned from building the platform at Uber for putting machine learning into production. He also explains how the feature store is becoming the core abstraction for data teams to collaborate on building machine learning models. If you are struggling to get your models into production, or scale your data science throughput, then this interview is worth a listen. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Do you want to get better at Python? Now is an excellent time to take an online course. Whether you’re just learning Python or you’re looking for deep dives on topics like APIs, memory mangement, async and await, and more, our friends at Talk Python Training have a top-notch course for you. If you’re just getting started, be sure to check out the Python for Absolute Beginners course. It’s like the first year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving. Go to pythonpodcast.com/talkpython today and get 10% off the course that will help you find your next level. That’s pythonpodcast.com/talkpython, and don’t forget to thank them for supporting the show. Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer. Springboard has launched their School of Data to help you get a career in the field through a comprehensive set of programs that are 100% online and tailored to fit your busy schedule. With a network of expert mentors who are available to coach you during weekly 1:1 video calls, a tuition-back guarantee that means you don’t pay until you get a job, resume preparation, and interview assistance there’s no reason to wait. Springboard is offering up to 20 scholarships of $500 towards the tuition cost, exclusively to listeners of this show. Go to pythonpodcast.com/springboard today to learn more and give your career a boost to the next level. Your host as usual is Tobias Macey and today I’m interviewing Mike Del Balso about what is involved in operationalizing machine learning, and his work at Tecton to provide that platform as a service Interview Introductions How did you get introduced to Python? Can you start by describing what is encompassed by the term "Operational ML"? What other approaches are there to building and managing machine learning projects? How do these approaches differ from operational ML in terms of the use cases that they enable or the scenarios where they can be employed? How would you characterize the current level of maturity for the average organization or enterprise in terms of their capacity for delivering ML projects? What are the necessary components for an operational ML platform? You helped to build the Michelangelo platform at Uber. How did you determine what capabilities were necessary to provide a unified approach for building and deploying models? How did your work on Michelangelo inform your work on Tecton? How does the use of a feature store influence the structure and workflow of a data team? In addition to the feature store, what are the other necessary components of a full pipeline for identifying, training, and deploying machine learning models? Once a model is in production, what signals or metrics do you track to feed into the next iteration of model development? One of the common challenges in data science and machine learning is managing collaboration. How do tools such as feature stores or the Michelangelo platform address that problem? What are the most interesting, unexpected, or challenging lessons that you have learned while building operational ML platforms? What advice or recommendations do you have for teams who are trying to work with machine learning? What do you have planned for the future of Tecton? Keep In Touch LinkedIn Picks Tobias Sandman graphic novel series by Neil

Nov 17, 202051 min

« Prev 123 4 5 Next »

The Python Podcast.__init__

Ep 338Build Composable And Reusable Feature Engineering Pipelines with Feature-Engine

Ep 337Speed Up Your Python Data Applications By Parallelizing Them With Bodo

Ep 336An Exploration Of Financial Exchange Risk Management Strategies

Ep 335Build Better Machine Learning Models By Understanding Their Decisions With SHAP

Ep 334Accelerating Drug Discovery Using Machine Learning With TorchDrug

Ep 333An Exploration Of Automated Speech Recognition

Ep 332Experimenting With Reinforcement Learning Using MushroomRL

Ep 331Doing Dask Powered Data Science In The Saturn Cloud

Ep 330Monitor The Health Of Your Machine Learning Products In Production With Evidently

Ep 329Making Automated Machine Learning More Accessible With EvalML

Ep 328Growing And Supporting The Data Science Community At Anaconda

Ep 327Network Analysis At The Speed Of C With The Power Of Python Using NetworKit

Ep 326Delivering Deep Learning Powered Speech Recognition As A Service For Developers At AssemblyAI

Ep 325Taking Aim At The Legacy Of SQL With The Preql Relational Language

Ep 324Unleash The Power Of Dataframes At Any Scale With Modin

Ep 323Exploring The SpeechBrain Toolkit For Speech Processing

Ep 322Fast And Educational Exploration And Analysis Of Graph Data Structures With graph-tool

Ep 321Lightening The Load For Deep Learning With Sparse Networks Using Neural Magic

Ep 320Finding The Core Of Python For A Bright Future With Brett Cannon

Ep 319Traversing The Challenges And Promise Of Graph Machine Learning

Ep 318Keep Your Analytics Lint Free With SQLFluff

Ep 317Exploring The Patterns And Practices For Deep Learning With Andrew Ferlitsch

Ep 316Automatically Generate Your Unit Tests From Scratch With Pynguin

Ep 315Leveling Up Natural Language Processing with Transfer Learning

Ep 314Federated Learning For All With Flower

Ep 313Data Exploration and Visualization Made Effortless with Lux

Ep 312Extensible Open Source Authorization For Your Applications With Oso

Ep 311Teaching Geeks The Value And Skills Of Public Speaking

Ep 310Let The Robots Do The Work Using Robotic Process Automation with Robocorp

Ep 309Keep Your Code Clean And Maintainable Using Static Analysis With Flake8

Ep 308Make Your Code More Readable With The Magic Of Refactoring Using Sourcery

Ep 307Be Data Driven At Any Scale With Superset

Ep 306Practical Advice On Using Python To Power A Business

Ep 305Analyzing The Ecosystem of Python Data Companies With Tony Liu

Ep 304Go From Notebook To Pipeline For Your Data Science Projects With Orchest

Ep 303Write Your Python Scripts In A Flow Based Visual Editor With Ryven

Ep 302CrossHair: Your Automatic Pair Programmer

Ep 301Giving Your Data Science Projects And Teams A Home At DagsHub

Ep 300Exploring Literate Programming For Python Projects With nbdev

Ep 299Making The Sans I/O Ideal A Reality For The Websockets Library

Ep 298Driving Toward A Faster Python Interpreter With Pyston

Ep 297Project Scaffolding That Evolves With Your Software Using Copier

Ep 296How Python's Evolution Impacts Your Fluency With Luciano Ramalho

Ep 295Making Content Management A Smooth Experience With A Headless CMS

Ep 294Turning Notebooks Into Collaborative And Dynamic Data Applications With Hex

Ep 293Add Anomaly Detection To Your Time Series Data With Luminaire

Ep 292Building Big Data Pipelines For Audio With Klio

Ep 291Open Sourcing The Anvil Full Stack Python Web App Platform

Ep 290Pants Has Got Your Python Monorepo Covered

Ep 289Scale Your Data Science Teams With Machine Learning Operations Principles

The Python Podcast.init