PLAY PODCASTS
Speeding Up Your DataFrames With Polars
Episode 140

Speeding Up Your DataFrames With Polars

The Real Python Podcast · Real Python

January 13, 202357m 57s

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

<p>How can you get more performance from your existing data science infrastructure? What if a DataFrame library could take advantage of your machine&rsquo;s available cores and provide built-in methods for handling larger-than-RAM datasets? This week on the show, Liam Brannigan is here to discuss Polars.</p> <p>Liam is an experienced data scientist working in finance, technology, and environmental analysis. He&rsquo;s recently started contributing to the documentation for Polars and developing a training course for the library.</p> <p>We talk about the library&rsquo;s overall speed and lack of additional dependencies. Liam explains the advantages of lazy vs eager mode and which to choose when performing data exploration or attempting to load a dataset larger than your RAM.</p> <p>We also discuss potential barriers to switching to Polars from a pandas workflow. Across our conversation, we explore several other libraries and technologies, including Apache Arrow, DuckDB, query optimization, and the &ldquo;rustification&rdquo; of Python tools.</p> <div class="alert alert-primary" role="alert"> <p><strong>Course Spotlight:</strong> <a href="https://realpython.com/courses/graph-data-with-python-and-ggplot/">Graph Your Data With Python and ggplot</a> </p> <p>In this course, you&rsquo;ll learn how to use ggplot in Python to build data visualizations with plotnine. You&rsquo;ll discover what a grammar of graphics is and how it can help you create plots in a very concise and consistent way.</p> </div> <p>Show Topics:</p> <ul> <li>00:00:00 &ndash; Introduction</li> <li>00:02:06 &ndash; Liam&rsquo;s background and intro to Polars</li> <li>00:03:37 &ndash; Hurdles to switching to Polars</li> <li>00:05:23 &ndash; Creating training resources</li> <li>00:08:15 &ndash; No index </li> <li>00:09:46 &ndash; Data science 2025 predictions</li> <li>00:12:02 &ndash; Contributions to Polars</li> <li>00:15:07 &ndash; Eager vs lazy mode &amp; query optimization</li> <li>00:19:25 &ndash; Sponsor: Anaconda Nucleus</li> <li>00:20:00 &ndash; Apache Arrow and parquet</li> <li>00:24:43 &ndash; DuckDB and column orientation</li> <li>00:29:27 &ndash; The &ldquo;rustification&rdquo; of libraries</li> <li>00:34:49 &ndash; Video Course Spotlight</li> <li>00:36:16 &ndash; GPUs and memory requirements</li> <li>00:45:49 &ndash; No additional library requirements</li> <li>00:47:37 &ndash; Development of the ecosystem</li> <li>00:51:33 &ndash; Chaining operations</li> <li>00:53:39 &ndash; How can people follow your work?</li> <li>00:54:51 &ndash; What are you excited about in the world of Python?</li> <li>00:56:09 &ndash; What do you want to learn next?</li> <li>00:56:58 &ndash; Thanks and goodbye</li> </ul> <p>Show Links:</p> <ul> <li><a href="http://braaannigan.github.io//">Liam Brannigan - Data Scientist</a></li> <li><a href="https://www.pola.rs/">Polars</a></li> <li><a href="https://pypi.org/project/polars/">polars - PyPI</a></li> <li><a href="https://pola-rs.github.io/polars-book/user-guide/coming_from_pandas.html">Coming from Pandas - Polars - User Guide</a></li> <li><a href="https://www.youtube.com/channel/UC-J3uR0g7CxCSnx0YFE6R_g">Rho-Signal Data Analytics - YouTube</a></li> <li><a href="https://www.rhosignal.com/posts/polars-pandas-cheatsheet/">Cheatsheet for Pandas to Polars - Rho Signal</a></li> <li><a href="https://www.udemy.com/course/data-analysis-with-polars/?couponCode=POLARS-DISCOUNT">Data Analysis with Polars - Udemy</a></li> <li><a href="https://www.pola.rs/posts/i-wrote-one-of-the-fastest-dataframe-libraries/">I wrote one of the fastest DataFrame libraries - Polars</a></li> <li><a href="https://h2oai.github.io/db-benchmark/">Database-like ops benchmark comparison</a></li> <li><a href="https://braaannigan.github.io/software/2022/11/07/data-science-2025.html">Data science 2025 - Liam Brannigan</a></li> <li><a href="https://duckdb.org/">DuckDB - An in-process SQL OLAP database management system</a></li> <li><a href="https://www.orchest.io/blog/the-great-python-dataframe-showdown-part-1-demystifying-apache-arrow">The great Python DataFrame showdown, part 1: Demystifying Apache Arrow</a></li> <li><a href="https://arrow.apache.org/">Apache Arrow</a></li> <li><a href="https://www.rust-lang.org/learn">Learn Rust - Rust Programming Language</a></li> <li><a href="https://kevinheavey.github.io/modern-polars/">Modern Polars</a></li> <li><a href="https://www.anaconda.com/blog/pyscript-updates-bytecode-alliance-pyodide-and-micropython">Anaconda - PyScript Updates: Bytecode Alliance, Pyodide, and MicroPython</a></li> <li><a href="https://jupytext.readthedocs.io/en/latest/">Jupytext - Jupyter Notebooks as Markdown Documents, Julia, Python or R Scripts</a></li> <li><a href="https://braaannigan.github.io/software/2022/10/25/polars-up-and-running.html">Polars up and running - Liam Brannigan</a></li> <li><a href="https://braaannigan.github.io/blog/blog_index.html">Liam Brannigan - Data Scientist - Blog</a></li> <li><a href="https://twitter.com/braaannigan">Liam Brannigan (@braaannigan) - Twitter</a></li> <li><a href="https://www.linkedin.com/in/liam-brannigan-9080b214a/">Liam Brannigan - LinkedIn</a></li> </ul> <p>Level up your Python skills with our expert-led courses:</p> <ul> <li><a href="https://realpython.com/courses/threading-python/">Threading in Python</a></li> <li><a href="https://realpython.com/courses/reading-writing-files-pandas/">Reading and Writing Files With pandas</a></li> <li><a href="https://realpython.com/courses/graph-data-with-python-and-ggplot/">Graph Your Data With Python and ggplot</a></li> </ul> <p><a rel="payment" href="https://realpython.com/join">Support the podcast &amp; join our community of Pythonistas</a></p>