PLAY PODCASTS
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Astronomer · The Data Flowcast

104 episodesEN

Show overview

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI has been publishing since 2018, and across the 8 years since has built a catalogue of 104 episodes. That works out to roughly 45 hours of audio in total. Releases follow a monthly cadence.

Episodes typically run twenty to thirty-five minutes — most land between 22 min and 29 min — and the run-time is fairly consistent across the catalogue. None of the episodes are flagged explicit by the publisher. It is catalogued as a EN-language Technology show.

The show is actively publishing — the most recent episode landed 2 days ago, with 22 episodes already out so far this year. The busiest year was 2025, with 45 episodes published. Published by The Data Flowcast.

Episodes
104
Running
2018–2026 · 8y
Median length
25 min
Cadence
Monthly

From the publisher

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/

Latest Episodes

View all 104 episodes

Managing a Customer Analytics Platform with Airflow at Skimlinks

Jun 11, 202622 min

Building a custom Tableau provider for Airflow at JLR

Jun 4, 202621 min

Orchestrating 2,000 Airflow pipelines at Luiza Labs with Mateus Ferreira

May 28, 202632 min

Enhancing DAGs for Data Processing with William Orgertrice III at Cargill

May 21, 202626 min

Getting Into Data Engineering with Shrividya Hegde, Data and AI Engineer

May 14, 202627 min

Orchestrating DBT With Cosmos and Airflow with Filip Kunčar at ShipMonk Product Development

May 7, 202624 min

Building Airflow CTL with Buğra Öztürk at Mollie

Apr 30, 202619 min

Introducing Airflow’s Common AI Provider with Pavan Kumar Gopidesu and Kaxil Naik

Apr 23, 202628 min

Building AI Debugging Agents Into Airflow DAGs at Jeppesen ForeFlight with Samantha Blaney Cuevas

Apr 16, 202622 min

S1 Ep 76Introducing Airflow 3.2

We introduce Airflow 3.2 and its updates for teams that build and operate data pipelines.Astronomer’s Head of Customer Education, Marc Lamberti, and Senior Manager of Developer Relations, Kenten Danas, break down what’s new, from asset partitioning to Async Python tasks and DAG versioning. They explore how these updates improve scheduling, performance and observability in production workflows. Key Takeaways:00:00 Introduction.02:10 Airflow 3 architecture separates workers from the metadata database.03:05 Plugin versioning and UI-based backfills simplify operations.06:20 Asset partitioning enables granular, partition-level scheduling.07:15 Triggering DAGs on partitions instead of full datasets.11:05 Deferrable operators reduce worker slot usage.12:00 Async operators reduce database pressure and overhead.14:10 Async improves throughput, not single task speed.22:20 Inlets and outlets improve asset lineage visibility.23:00 DAG version markers show changes directly in the UI.Resources Mentioned:Marc Lambertihttps://www.linkedin.com/in/marclamberti/Apache Airflow https://airflow.apache.org/Astronomer | LinkedInhttps://www.linkedin.com/company/astronomer/Astronomer | Websitehttps://www.astronomer.io/3.2 Webinarhttps://www.astronomer.io/events/webinars/introducing-airflow-3-2-videoAsset Partitioning Guidehttps://www.astronomer.io/docs/learn/airflow-partitioned-runsAsynchronous Processes Guidehttps://www.astronomer.io/docs/learn/deferrable-operatorsRelease Noteshttps://airflow.apache.org/docs/apache-airflow/stable/release_notes.html#airflow-3-2-0-2026-04-07Provider Registryhttps://airflow.apache.org/registry/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Apr 9, 202626 min

S1 Ep 75Reflections on a Decade of Data Engineering at Seattle Data Guy

Lessons from the past decade of data engineering reveal how much the ecosystem has changed and what has stayed surprisingly consistent.In this episode, Benjamin Rogojan, Owner and Data Consultant at Seattle Data Guy, joins us to reflect on how the data engineering landscape has evolved alongside Apache Airflow. We explore when Airflow makes sense as an orchestrator, why batch processing is still dominant and how AI is reshaping the workflows and responsibilities of modern data engineers.Key Takeaways:00:00 Introduction.03:00 Airflow becomes valuable when workflows involve many pipelines, teams and dependencies.05:00 Data engineers are still focused on making data accessible and aligning work with business needs.05:30 Batch pipelines remain the most common approach even as real-time use cases grow.07:45 Many “real-time” requests are actually event-driven batch workflows.09:00 Airflow replaced many custom-built pipeline systems with built-in dependency management.11:00 Modern orchestration tools often build on Airflow concepts or differentiate from them.14:00 AI can assist with writing SQL and pipelines but still requires experienced engineers.15:30 Organizations are collecting increasingly granular data creating more engineering demand.19:00 The data stack has shifted rapidly from Hadoop-era systems to modern cloud platforms.Resources Mentioned:Benjamin Rogojanhttps://www.linkedin.com/in/benjaminrogojan/Seattle Data Guyhttps://www.linkedin.com/company/seattle-data-guy/Apache Airflowhttps://airflow.apache.orgAirflow Summit / Airflow Conferencehttps://airflowsummit.orgSnowflakehttps://www.snowflake.comHubSpot Data Sharing / APIshttps://developers.hubspot.comMLflowhttps://mlflow.orgThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Apr 3, 202626 min

S1 Ep 74Managing Data Quality and Governance With Airflow at Credit Karma with Ashir Alam

Data quality is not optional when you manage credit data at scale.In this episode, Ashir Alam, Senior Data Engineer at Credit Karma, joins us to share how his team acts as the gatekeeper for credit data ingestion, how they standardize data quality with Airflow and DAG Factory and how they scale safely across thousands of DAGs. We explore how governance, PII protection and orchestration come together inside a modern data platform. Key Takeaways:00:00 Introduction.01:00 Overview of Credit Karma’s products and financial data ecosystem.02:00 The team acts as gatekeepers for ingesting data from TransUnion and Equifax.03:00 Why PII handling and controlled downstream access led to adopting Airflow.04:00 BigQuery as the warehouse and Airflow as the primary orchestrator.05:00 Why data quality and governance are critical in financial systems.07:00 Why Airflow was selected: ease of use and unified ETL plus data quality.09:00 Introduction to DAG Factory and YAML-based DAG generation.10:00 GitHub executor creates PR-driven DAG workflows with CI checks.12:00 BigQuery operators, structured checks and custom Slack and PagerDuty alerts.13:00 Failed checks stop ETL pipelines and trigger notifications.17:00 Scaling DAG Factory across thousands of DAGs and runtime vs compile-time concerns.19:00 Future improvements: better defaults, retries and GenAI workflows in Airflow.Resources Mentioned:Ashir Alamhttps://www.linkedin.com/in/ashir-alam/Credit Karmahttps://www.linkedin.com/company/intuit-credit-karma/Apache Airflowhttps://airflow.apache.org/DAG Factoryhttps://github.com/astronomer/dag-factoryBigQuery (Google Cloud)https://cloud.google.com/bigqueryGitHubhttps://github.com/Slackhttps://slack.com/PagerDutyhttps://www.pagerduty.com/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Mar 26, 202622 min

S1 Ep 73Open Source Airflow Contributions and Performance Improvements at G-Research with Christos Bisias

Modern Airflow isn’t just orchestration. It's a contribution. In this episode, we explore how open source investment drives real performance gains and deeper observability.We’re joined by Christos Bisias, Open Source Software Engineer, Apache Airflow at G-Research, to discuss how his team uses Airflow for large-scale data transformations, contributes upstream and improves scheduler throughput and OpenTelemetry support. From trace-level observability to CI-enforced metrics governance and a major scheduler optimization, this conversation spans strategy, engineering and community impact.Key Takeaways:00:00 Introduction.01:20 How G-Research applies machine learning and big data to predict financial market movements.02:15 Contributing to open source is a business decision.03:10 Maintaining a fork is costly.04:30 OpenTelemetry collects metrics, logs and traces to provide deep system visibility. 06:10 Custom spans help identify bottlenecks inside tasks and enable performance optimization. 08:05 OpenTelemetry integration works properly in Airflow 3.0 and above.10:00 A YAML-based metrics registry with CI enforcement ensures consistency between docs and exported metrics.12:10 Scheduler throughput improved significantly by applying concurrency limits earlier in the database query.  15:20 Future Task SDK changes may enable language-agnostic DAG authoring beyond Python.Resources Mentioned:Christos Bisiashttps://www.linkedin.com/in/xbis/G-Research https://www.linkedin.com/company/g-research/Apache Airflowhttps://airflow.apache.org/OpenTelemetryhttps://opentelemetry.io/Prometheushttps://prometheus.io/Grafanahttps://grafana.com/Jaegerhttps://www.jaegertracing.io/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Mar 19, 202617 min

S1 Ep 72Automating Threat Intelligence Using Airflow with Karan Alang

In this episode, Karan Alang, Principal Software Engineer at Versa Networks, joins the conversation to discuss how Airflow can be used to automate threat intelligence in modern cybersecurity environments. He explains the growing scale of cloud computing, the profitability of hacking and the shortage of SOC analysts. Karan also outlines a novel architecture that combines Airflow, XDR, graph databases and LLMs to orchestrate automated threat detection and response.Key Takeaways:00:00 Introduction.05:00 Organizations face massive log volumes and a shortage of SOC analysts.07:00 The solution integrates Airflow, XDR, Neo4j graph databases and LLMs into one architecture.08:00 MITRE ATT&CK provides a global framework for mapping tactics and techniques.11:00 Airflow acts as the orchestration backbone for ingestion graph transformation and LLM workflows.13:00 Graph databases provide a full relationship view of attackers’ systems and entities.14:00 LLMs automate mapping activity to MITRE ATT&CK and assign explainable risk scores.17:00 Traditional signature-based detection allows lateral movement and exfiltration before teams can react.18:00 End-to-end automation is essential to mitigating modern cybersecurity threats.20:00 Future opportunities include deeper LLM integration as first-class citizens within Airflow.Resources Mentioned:Karan Alanghttps://www.linkedin.com/in/karan-alang-4173437Versa Networks | LinkedInhttps://www.linkedin.com/company/versa-networksVersa Networks | Websitehttps://versa-networks.comGoogle Cloud Composer (Managed Airflow on GCP)https://cloud.google.com/composerMicrosoft Defender XDR https://www.microsoft.com/es-es/security/business/siem-and-xdr/microsoft-defender-xdrNeo4j (Graph Database)https://neo4j.comMITRE ATT&CK Frameworkhttps://attack.mitre.orgThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Mar 12, 202622 min

S1 Ep 71Using Plugins To Customize Airflow at Ponder Labs with Egor Tarasenko

In this episode, we explore how teams scale Apache Airflow in complex environments and what it takes to make orchestration work across many stakeholders. We look at real-world challenges around visibility, ownership and predictability as data platforms grow.Egor Tarasenko, Data and AI Engineer at Ponder Labs, joins us to share how Ponder Labs customizes Airflow for education organizations using plugins, event-driven architectures and AI-powered tooling. He explains how his team supports large charter school networks and why structure, consistency and extensibility become critical at scale.Key Takeaways:00:00 Introduction.01:21 Ponder Labs helps education organizations bring data from many systems together so it becomes useful for teachers, school leaders and administrators.03:10 Airflow serves as the backbone for orchestrating ingestion, transformation and reverse ETL across client data platforms.05:43 Everything is triggered from Airflow to maintain dependency, visibility and a single operational picture.09:05 Managing hundreds of DAGs requires a focus on structure, visibility and consistency across teams.09:51 Treating DAGs like APIs helps teams scale without needing deep knowledge of upstream logic.12:00 Custom plugins like schedule insights help predict DAG run times across layered dependencies.15:00 AI-powered Airflow chat enables non-technical stakeholders to understand DAG ownership dependencies and cluster activity.22:06 Migrating plugins to Airflow 3 improves developer experience through cleaner APIs and faster extensibility.Resources Mentioned:Egor Tarasenkohttps://www.linkedin.com/in/egorseno/Apache Airflowhttps://airflow.apache.orgdbthttps://www.getdbt.comAstronomer Astro Platformhttps://www.astronomer.ioEgor Tarasenko on Substack https://egortarasenko.substack.com Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Mar 5, 202627 min

S1 Ep 70Scaling Airflow at Wix for Analytics and AI with Ethan Shalev

Modern data orchestration at scale demands reliability, speed and thoughtful adoption of new tooling. As organizations grow, keeping pipelines efficient while supporting more teams becomes a critical challenge.In this episode, we’re joined by Ethan Shalev, Data Engineer at Wix, to discuss how Wix operates Airflow at massive scale, migrates to Airflow 3 and uses AI to accelerate development.Key Takeaways:00:00 Introduction.02:13 Wix structures data engineering across multiple product-focused organizations.03:40 Migrating nearly 8,000 DAGs to Airflow 3 requires careful planning.04:31 Migration creates an opportunity to remove long-standing legacy Airflow code.05:32 Internal playbooks and Cursor rules standardize and speed up DAG migrations.07:39 Airflow 3 introduces backfills, DAG versioning and asset-aware scheduling.09:16 Deferrable operators reduce scheduler congestion in large Airflow environments.12:54 AI-generated code still requires review and strong testing practices.14:52 Moving to managed Airflow reduces operational burden on internal platform teams.15:57 Improving multi-tenancy and UI personalization remains a key Airflow need.Resources Mentioned:Ethan Shalevhttps://www.linkedin.com/in/eshalev/Wix | LinkedInhttps://www.linkedin.com/company/wix-com/Wix | Websitehttps://www.wix.com/Apache Airflowhttps://airflow.apache.org/Astronomerhttps://www.astronomer.io/Trinohttps://trino.io/Apache Iceberghttps://iceberg.apache.org/Cursorhttps://cursor.sh/Airflow Summithttps://airflowsummit.org/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Feb 26, 202618 min

S1 Ep 69Using Airflow To Orchestrate Billions of Events at Addi with Carlos Daniel Puerto Niño

Strong data orchestration is as much about culture and visibility as it is about technology. As data platforms scale, teams need systems that reduce cognitive load while increasing reliability and observability.In this episode, Carlos Daniel Puerto Niño, Senior Analytics Engineer and Data Analyst at Addi, joins us to share how Addi uses Airflow to support batch orchestration, manage organizational complexity and improve monitoring across its data platform.Key Takeaways:00:00 Introduction.01:25 Changes in company strategy increase data platform complexity over time.04:00 Centralized data teams help manage organizational and technical change.06:08 Scalable architectures support growing data volumes and use cases.09:10 Adopting orchestration tools introduces operational and maintenance challenges.14:43 Abstraction layers lower technical barriers for onboarding new team members.15:36 Modularity and visibility improve the reliability of data pipelines.18:14 Integrated monitoring supports faster incident response and resolution.22:19 Limited access to orchestration metadata constrains proactive analysis.Resources Mentioned:Carlos Daniel Puerto Niñohttps://www.linkedin.com/in/carlospuertoni%C3%B1o/Addi | LinkedInhttps://www.linkedin.com/company/addicol/Addi | Websitehttps://www.addi.comApache Airflowhttps://airflow.apache.org/Astronomerhttps://www.astronomer.io/Databrickshttps://www.databricks.com/dbthttps://www.getdbt.com/Grafanahttps://grafana.com/Slackhttps://slack.com/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Feb 19, 202624 min

S1 Ep 70Building Event-Driven Data Pipelines With Airflow 3 at Astrafy with Andrea Bombino

Real-time data expectations are reshaping how modern data teams think about orchestration and dependencies. As event-driven architectures become more common, teams need to rethink how pipelines react to data changes, rather than schedules.In this episode, Andrea Bombino, Co-Founder and Head of Analytics Engineering at Astrafy, joins us to discuss how event-driven scheduling in Airflow is evolving and how Astrafy applies it to deliver faster, more responsive data pipelines.Key Takeaways:00:00 Introduction.02:02 Astrafy’s role in guiding clients across the modern data stack.03:15 Strong DAG dependencies create challenges for time-based scheduling.04:48 Event-driven pipelines respond to increasing real-time data demands.05:30 Airflow 3 introduces native support for event-driven orchestration.06:27 Sensor-based workflows reveal scalability and efficiency limitations.11:32 Event-driven assets improve efficiency and pipeline elegance.14:45 Governance and cross-instance coordination emerge as ongoing challenges.Resources Mentioned:Andrea Bombinohttps://www.linkedin.com/in/andrea-bombino/Astrafy | LinkedInhttps://www.linkedin.com/company/astrafy/Astrafy | Websitehttps://www.astrafy.ioApache Airflowhttps://airflow.apache.org/Google Cloudhttps://cloud.google.com/Google Pub/Subhttps://cloud.google.com/pubsubGoogle BigQueryhttps://cloud.google.com/bigqueryThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Feb 12, 202618 min

S1 Ep 69Uphold’s Approach to Orchestrating Modern Data Workflows with Jaime Oliveira

A strong data-driven mindset underpins how fintech teams scale analytics, infrastructure and decision-making across the business.In this episode, Jaime Oliveira, Lead Data Engineer at Uphold, joins us to discuss how Uphold structures its data organization and orchestration strategy. Jaime shares how the team uses Airflow and dbt to support analytics, reporting and data activation while evolving their approach as the stack grows.Key Takeaways:00:00 Introduction.01:23 A data-driven mindset supports product development and business decisions.02:55 Diverse ingestion pipelines enable scalable analytics.04:18 A single orchestration platform simplifies analytics workflows.05:17 Early experience with orchestration tools shapes engineering practices.08:16 Analytics orchestration works best when aligned with transformation workflows.09:25 Infrastructure choices involve tradeoffs in testing, visibility and overhead.16:39 More collaborative workflow tools could improve accessibility and autonomy.Resources Mentioned:Jaime Oliveirahttps://www.linkedin.com/in/jaime-oliveira-b075855a/Uphold | LinkedInhttps://www.linkedin.com/company/upholdinc/Uphold | Websitehttps://uphold.comApache Airflowhttps://airflow.apache.orgdbthttps://www.getdbt.comSnowflakehttps://www.snowflake.comKuberneteshttps://kubernetes.ioAstronomer Cosmoshttps://astronomer.github.io/astronomer-cosmosCosmos e-bookhttps://www.astronomer.io/ebooks/orchestrating-dbt-with-airflow-using-cosmos/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Feb 5, 202618 min

S1 Ep 68Modern Airflow Best Practices for Scalable Data Pipelines with Bhavani Ravi

Building reliable data pipelines at scale requires more than writing code. It depends on thoughtful design, infrastructure trade-offs and an understanding of how orchestration platforms evolve over time.In this episode, Airflow best practices shaped by real-world implementation are examined. Bhavani Ravi, Independent Software Consultant and Apache Airflow Champion, shares lessons on pipeline design, architectural decisions and the evolution of the Airflow ecosystem in modern data environments.Key Takeaways:00:00 Introduction.01:30 Independent consulting supports effective Airflow adoption.02:38 Early challenges shaped modern Airflow practices.03:21 Airflow setup has become significantly simpler.04:30 New features expanded workflow capabilities.06:03 Frequent releases support long-term sustainability.07:34 Community and providers strengthen the ecosystem.10:03 Pipeline design should come before coding.10:55 Decoupling logic requires careful trade-offs.13:30 Plugins extend Airflow into new use cases.Resources Mentioned:Bhavani Ravihttps://www.linkedin.com/in/bhavanicodes/Apache Airflowhttps://airflow.apache.org/Kuberneteshttps://kubernetes.io/Azure Fabrichttps://learn.microsoft.com/en-us/fabric/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Jan 29, 202617 min
All rights reserved