
Episode 217
Packaging Data Analyses & Using pandas GroupBy
The Real Python Podcast · Real Python
August 16, 202455m 22s
Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
<p>What are the best practices for organizing data analysis projects in Python? What are the advantages of a more package-centric approach to data science? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder’s Weekly articles and projects.</p>
<p>We discuss Joshua Cook’s recent article “How I Use Python to Organize My Data Analyses.” The article covers how his process for building data analysis projects has evolved and now incorporates modern Python packaging techniques. </p>
<p>Christopher shares his recent video course on grouping real-world data with pandas. The course offers a quick refresher before digging into how to use pandas GroupBy to manipulate, transform, and summarize data.</p>
<p>We also share several other articles and projects from the Python community, including a news roundup, working with JSON data in Python, running an Asyncio event loop in a separate thread, knowing the why behind a system’s code, a retro game engine for Python, and a project for vendorizing packages from PyPI.</p>
<p>This episode is sponsored by Mailtrap.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Course Spotlight:</strong> <a href="https://realpython.com/courses/pandas-groupby-real-world-data/">pandas GroupBy: Grouping Real World Data in Python</a></p>
<p>In this course, you’ll learn how to work adeptly with the pandas GroupBy while mastering ways to manipulate, transform, and summarize data. You’ll work with real-world datasets and chain GroupBy methods together to get data into an output that suits your needs.</p>
</div>
<p>Topics:</p>
<ul>
<li>00:00:00 – Introduction</li>
<li>00:02:18 – Setuptools Breaks Things, Then Fixes Them</li>
<li>00:04:57 – PEP 751: A File Format to List Python Dependencies</li>
<li>00:07:04 – Python 3.13.0 Release Candidate 1 Released</li>
<li>00:07:15 – Python Insider: Python 3.12.5 released</li>
<li>00:07:22 – Django 5.1 released - Django Weblog</li>
<li>00:07:27 – Django security releases issued: 5.0.8 and 4.2.15</li>
<li>00:07:49 – How I Use Python to Organize My Data Analyses</li>
<li>00:13:45 – Sponsor: Mailtrap</li>
<li>00:14:21 – pandas GroupBy: Grouping Real World Data in Python</li>
<li>00:20:33 – Working With JSON Data in Python</li>
<li>00:25:01 – Asyncio Event Loop in Separate Thread</li>
<li>00:30:33 – Video Course Spotlight</li>
<li>00:31:47 – Habits of great software engineers</li>
<li>00:49:17 – pyxel: A Retro Game Engine for Python</li>
<li>00:52:36 – python-vendorize: Vendorize Packages From PyPI</li>
<li>00:54:18 – Thanks and goodbye</li>
</ul>
<p>News:</p>
<ul>
<li><a href="https://www.bitecode.dev/p/whats-up-python-setuptools-breaks">Setuptools Breaks Things, Then Fixes Them</a> – This post is Bite Code’s monthly summary, but the lead story happened just days ago. In line with a 7 year old deprecation, setuptools finally removed the ability to call its <code>test</code> command. Many packages promptly broke. The following day the change was undone.</li>
<li><a href="https://peps.python.org/pep-0751/">PEP 751: A File Format to List Python Dependencies for Installation Reproducibility (New)</a> – This PEP proposes a new file format for dependency specification to enable reproducible installation in a Python environment.</li>
<li><a href="https://pythoninsider.blogspot.com/2024/08/python-3130-release-candidate-1-released.html">Python 3.13.0 Release Candidate 1 Released</a></li>
<li><a href="https://pythoninsider.blogspot.com/2024/08/python-3125-released.html">Python Insider: Python 3.12.5 released</a></li>
<li><a href="https://www.djangoproject.com/weblog/2024/aug/07/django-51-released/">Django 5.1 released - Django Weblog</a></li>
<li><a href="https://www.djangoproject.com/weblog/2024/aug/06/security-releases/">Django security releases issued: 5.0.8 and 4.2.15 - Django Weblog</a></li>
</ul>
<p>Show Links:</p>
<ul>
<li><a href="https://joshuacook.netlify.app/posts/2024-07-27_python-data-analysis-org/">How I Use Python to Organize My Data Analyses</a> – This is a description of how Joshua uses Python in a package-centric way to organize his approach to data analyses. This is a system he has evolved while working on his computational biology Ph.D. and working in industry.</li>
<li><a href="https://realpython.com/courses/pandas-groupby-real-world-data/">pandas GroupBy: Grouping Real World Data in Python</a> – In this course, you’ll learn how to work adeptly with the pandas GroupBy while mastering ways to manipulate, transform, and summarize data. You’ll work with real-world datasets and chain GroupBy methods together to get data into an output that suits your needs.</li>
<li><a href="https://realpython.com/python-json/">Working With JSON Data in Python</a> – In this tutorial, you’ll learn how to read and write JSON-encoded data in Python. You’ll begin with practical examples that show how to use Python’s built-in “json” module and then move on to learn how to serialize and deserialize custom data.</li>
<li><a href="https://superfastpython.com/asyncio-event-loop-separate-thread/">Asyncio Event Loop in Separate Thread</a> – Typically, the asyncio event loop runs in the main thread, but as that is the one used by the interpreter, sometimes you want the event loop to run in a separate thread. This article talks about why and how to do just that.</li>
</ul>
<p>Discussion:</p>
<ul>
<li><a href="https://vadimkravcenko.com/shorts/habits-of-great-software-engineers/">Habits of great software engineers</a></li>
</ul>
<p>Projects:</p>
<ul>
<li><a href="https://github.com/kitao/pyxel">pyxel: A Retro Game Engine for Python</a></li>
<li><a href="https://github.com/mwilliamson/python-vendorize">python-vendorize: Vendorize Packages From PyPI</a></li>
</ul>
<p>Additional Links:</p>
<ul>
<li><a href="https://realpython.com/courses/packaging-with-pyproject-toml/">Everyday Project Packaging With pyproject.toml – Real Python</a></li>
<li><a href="https://www.youtube.com/watch?v=v6tALyc4C10">Packaging Your Python Code With pyproject.toml - Complete Code Conversation - YouTube</a></li>
<li><a href="https://realpython.com/podcasts/rpp/197/">Episode #197: Using Python in Bioinformatics and the Laboratory – The Real Python Podcast</a></li>
</ul>
<p>Level up your Python skills with our expert-led courses:</p>
<ul>
<li><a href="https://realpython.com/courses/packaging-with-pyproject-toml/">Everyday Project Packaging With pyproject.toml</a></li>
<li><a href="https://realpython.com/courses/working-json-data-python/">Working With JSON in Python</a></li>
<li><a href="https://realpython.com/courses/pandas-groupby-real-world-data/">pandas GroupBy: Grouping Real World Data in Python</a></li>
</ul> <p><a rel="payment" href="https://realpython.com/join">Support the podcast & join our community of Pythonistas</a></p>