The Python Podcast.init

389 episodes — Page 1 of 8

Ep 388Update Your Model's View Of The World In Real Time With Streaming Machine Learning Using River

Preamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary The majority of machine learning projects that you read about or work on are built around batch processes. The model is trained, and then validated, and then deployed, with each step being a discrete and isolated task. Unfortunately, the real world is rarely static, leading to concept drift and model failures. River is a framework for building streaming machine learning projects that can constantly adapt to new information. In this episode Max Halford explains how the project works, why you might (or might not) want to consider streaming ML, and how to get started building with River. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started! Your host is Tobias Macey and today I’m interviewing Max Halford about River, a Python toolkit for streaming and online machine learning Interview Introduction How did you get involved in machine learning? Can you describe what River is and the story behind it? What is "online" machine learning? What are the practical differences with batch ML? Why is batch learning so predominant? What are the cases where someone would want/need to use online or streaming ML? The prevailing pattern for batch ML model lifecycles is to train, deploy, monitor, repeat. What does the ongoing maintenance for a streaming ML model look like? Concept drift is typically due to a discrepancy between the data used to train a model and the actual data being observed. How does the use of online learning affect the incidence of drift? Can you describe how the River framework is implemented? How have the design and goals of the project changed since you started working on it? How do the internal representations of the model differ from batch learning to allow for incremental updates to the model state? In the documentation you note the use of Python dictionaries for state management and the flexibility offered by that choice. What are the benefits and potential pitfalls of that decision? Can you describe the process of using River to design, implement, and validate a streaming ML model? What are the operational requirements for deploying and serving the model once it has been developed? What are some of the challenges that users of River might run into if they are coming from a batch learning background? What are the most interesting, innovative, or unexpected ways that you have seen River used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on River? When is River the wrong choice? What do you have planned for the future of River? Contact Info Email @halford_max on Twitter MaxHalford on GitHub Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links River scikit-multiflow Federated Machine Learning Hogwild! Google Paper Chip Huyen concept drift blog post Dan Crenshaw Berkeley Clipper MLOps Robustness Principle NY Taxi Dataset RiverTorch River Public Roadmap Beaver tool for deploying online models Prodigy ML human in the loop labeling The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0 Sponsored By:Linode: Do you want to try out some of the tools and applications that you heard about on Podcast.\_\_init\_\_? Do you have a side project that you want to share with the world? With Linode's managed Kubernetes platform it's now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the

The Python Podcast.__init__

Ep 388Update Your Model's View Of The World In Real Time With Streaming Machine Learning Using River

Ep 387Declarative Machine Learning For High Performance Deep Learning Models With Predibase

Ep 386Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks

Ep 385Build A Full Stack ML Powered App In An Afternoon With Baseten

Ep 384Skip Straight To The Fun Part Of Your Project With PyScaffold

Ep 383Add Configuration Best Practices To Your Application In An Afternoon With Dynaconf

Ep 382Take A Tour Of The Hidden Language Of Hardware And How It Powers Your Code

Ep 381Take Control Of Your Electrical Systems With The Open Source FlexMeasures Energy Management System

Ep 380How And Why To Build Effective Teams As An Engineering Leader

Ep 379Complete Your Hardware "Weekend Projects" In An Actual Weekend With Belay

Ep 378Catching Up With Pyre, A Fast Type Checker For Python

Ep 377Standardizing On Python For All Software Projects At Ascend.io

Ep 376Exploring The Process And Practice Of Building Better Software Through Code Reviews

Ep 375Ship With Confidence By Automating Quality Assurance

Ep 374Remove Roadblocks And Let Your Developers Ship Faster With Self-Serve Infrastructure

Ep 373The Benefits Of Python And Django For Going From Zero To MVP At Speed

Ep 372Powering The Next Generation Of Application Architectures With Web Assembly And The Fermyon Platform

Ep 371Gain A Deeper Understanding Of What Your Code Is Doing And Where It Spends Its Time With VizTracer

Ep 370Stream Processing In Real Time And At Scale In Pure Python With Bytewax

Ep 369Tetra: A Full Stack Web Framework That Doesn't Make You Write Everything Twice

Ep 368Design Real-World Objects In Python With CadQuery

Ep 367Intelligent Dependency Resolution For Optimal Compatibility And Security With Project Thoth

Ep 366Take A Deep Dive On How Code Completion Works And How To Customize It

Ep 365Hunting Black Swans With Bees: Catching Up With The Inimitable Russell Keith-Magee

Ep 364Take Control Of Your Digital Photos By Running Your Own Smart Library Manager With LibrePhotos

Ep 363Making Investment Data Easy To Access And Analyze With The OpenBB Terminal

Ep 362Accelerate Your Machine Learning Experimentation With Automatic Checkpoints Using FLOR

Ep 361Automatically Enforce Software Structures With Powerful Code Modifications Powered By LibCST

Ep 360Cloud Native Networking For Developers With The Gloo Platform

Ep 359Accelerate And Simplify Cloud Native Development For Kubernetes Environments With Gefyra

Ep 358Building A Community And Technology Stack For Scalable Big Data Geoscience At Pangeo

Ep 357Automating Application Lifecycles For Developer Happiness At Wayfair

Ep 356Run Your Applications Reliably On Kubernetes Without Losing Sleep With Robusta

Ep 355Accelerate The Development And Delivery Of Your Machine Learning Applications Using Ray And Deploy It At Anyscale

Ep 354See The Structure Of Your Software At A Glance With Call Graphs From Code2Flow

Ep 353Scaling Knowledge Management For Technical Teams With Knowledge Repo

Ep 352Simplify And Scale Your Software Development Cycles By Putting On Pants (Build Tool)

Ep 351Achieve Repeatable Builds Of Your Software On Any Machine With Earthly

Ep 350Building A Detailed View Of Your Software Delivery Process With The Eiffel Protocol

Ep 349Improve Your Productivity By Investing In Developer Experience Design For Your Projects

Ep 348An Exploration Of Effective Pandas Practices With Matt Harrison

Ep 347Generate Your Text Files With Python Using Cog

Ep 346A Friendly Approach To Regression Models For Programmers

Ep 345Fast, Flexible, and Incremental Task Automation With doit

Ep 344The Technological, Business, and Sales Challenges Of Building The Ethical Ads Network

Ep 343Accidentally Building A Business With Python At Listen Notes

Ep 342Making Orbital Mechanics More Accessible With Poliastro

Ep 340Build Better Analytics And Models With A Focus On The Data Experience

Ep 341Declarative Deep Learning From Your Laptop To Production With Ludwig and Horovod

Ep 339Building Conversational AI to Augment Sales Teams at Structurely

The Python Podcast.init