PLAY PODCASTS
Orchestrating Large and Small Projects With Apache Airflow
Episode 142

Orchestrating Large and Small Projects With Apache Airflow

The Real Python Podcast · Real Python

January 27, 202354m 24s

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

<p>Have you worked on a project that needed an orchestration tool? How do you define the workflow of an entire data pipeline or a messaging system with Python? This week on the show, Calvin Hendryx-Parker is back to talk about using Apache Airflow and orchestrating Python projects.</p> <p>Calvin is the co-founder and CTO of Six Feet Up and a Python Web Conference co-organizer. He&rsquo;s recently been working on a massive project that requires thousands of jobs involving transferring and transforming data. Through his research into orchestration systems, he found Apache Airflow. </p> <p>Airflow is an open-source tool to define, schedule, and monitor workflows. The platform is pure Python and integrates with a wide variety of services. We discuss how workflows are defined by creating directed acyclic graphs (DAG). </p> <p>Calvin talks about how a recent project outgrew the system and how his team built a clever solution using Python. We also discuss the upcoming Python Web Conference and what virtual attendees can expect.</p> <div class="alert alert-primary" role="alert"> <p><strong>Course Spotlight:</strong> <a href="https://realpython.com/courses/python-basics-oop/">Python Basics: Object-Oriented Programming</a> </p> <p>In this video course, you&rsquo;ll get to know OOP, or object-oriented programming. You&rsquo;ll learn how to create a class, use classes to create new objects, and instantiate classes with attributes.</p> </div> <p>Topics:</p> <ul> <li>00:00:00 &ndash; Introduction</li> <li>00:02:24 &ndash; Describing the large data pipeline</li> <li>00:04:38 &ndash; What format was the data in?</li> <li>00:06:04 &ndash; Was the format of the data changed for storage?</li> <li>00:09:34 &ndash; Data engineering and describing sources and targets</li> <li>00:11:29 &ndash; Apache Airflow orchestration and hitting limitations</li> <li>00:18:12 &ndash; Sponsor: CData Software</li> <li>00:18:54 &ndash; DAG: Directed acyclic graphs</li> <li>00:22:29 &ndash; Streaming data and other tool choices</li> <li>00:25:38 &ndash; Overcoming DAG Factory limitations</li> <li>00:31:49 &ndash; Another industry example for Airflow</li> <li>00:34:24 &ndash; Finding solutions as a consultancy</li> <li>00:35:12 &ndash; Is there a minimum-size project for Airflow?</li> <li>00:37:37 &ndash; Django under the hood</li> <li>00:38:31 &ndash; Video Course Spotlight</li> <li>00:39:58 &ndash; The Python Web Conference 2023</li> <li>00:44:24 &ndash; Do you have any upcoming conference talks?</li> <li>00:45:53 &ndash; How can people follow your work online?</li> <li>00:46:52 &ndash; IndyPy talk by Mariatta Wijaya</li> <li>00:48:01 &ndash; What are you excited about in the world of Python?</li> <li>00:51:45 &ndash; What do you want to learn next?</li> <li>00:53:22 &ndash; Thanks and goodbye</li> </ul> <p>Show Links:</p> <ul> <li><a href="https://airflow.apache.org/docs/">Apache Airflow - Documentation</a></li> <li><a href="https://sixfeetup.com/blog/too-big-for-dag-factories">Too Big for DAG Factories? — Six Feet Up</a></li> <li><a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">Directed acyclic graph - Wikipedia</a></li> <li><a href="https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html">DAGs — Airflow Documentation</a></li> <li><a href="https://docs.astronomer.io/learn/dynamically-generating-dags">Dynamically generating DAGs in Airflow - Astronomer Documentation</a></li> <li><a href="https://www.databricks.com/">Data Lakehouse Architecture and AI Company - Databricks</a></li> <li><a href="https://realpython.com/podcasts/rpp/10/">Episode #10: Python Job Hunting in a Pandemic – The Real Python Podcast</a></li> <li><a href="https://realpython.com/podcasts/rpp/124/">Episode #124: Exploring Recursion in Python With Al Sweigart – The Real Python Podcast</a></li> <li><a href="https://inventwithpython.com/recursion/">The Recursive Book of Recursion</a></li> <li><a href="https://realpython.com/podcasts/rpp/61/">Episode #61: Scaling Data Science and Machine Learning Infrastructure Like Netflix – The Real Python Podcast</a></li> <li><a href="https://indypy.org/#">IndyPy — Indiana Python User Group</a></li> <li><a href="https://www.youtube.com/watch?v=zEIPTg22OYE&amp;list=PLt4L3V8wVnF6JgEz7BLuRIZSS6Qsx_AFn">Contributing to Python - Mariatta Wijaya - Python Core Developer - YouTube</a></li> <li><a href="https://www.home-assistant.io/">Home Assistant</a></li> <li><a href="https://www.arturia.com/products/hardware-synths/microfreak/details">Arturia - MicroFreak</a></li> <li><a href="https://www.arturia.com/products/software-instruments/pigments/overview">Arturia - Pigments</a></li> <li><a href="https://fosstodon.org/@calvinhp">CalvinHP (@[email protected]) - Fosstodon</a></li> <li><a href="https://twitter.com/calvinhp">calvinhp - Twitter</a></li> <li><a href="https://sixfeetup.com/blog">Six Feet Up - Blog</a></li> <li><a href="https://2023.pythonwebconf.com/">Python Web Conference 2023</a></li> </ul> <p>Level up your Python skills with our expert-led courses:</p> <ul> <li><a href="https://realpython.com/courses/data-cleaning-with-pandas-and-numpy/">Data Cleaning With pandas and NumPy</a></li> <li><a href="https://realpython.com/courses/python-basics-oop/">Python Basics: Object-Oriented Programming</a></li> <li><a href="https://realpython.com/courses/intro-object-oriented-programming-oop-python/">A Conceptual Primer on OOP in Python</a></li> </ul> <p><a rel="payment" href="https://realpython.com/join">Support the podcast &amp; join our community of Pythonistas</a></p>