PLAY PODCASTS
From Notebooks to Production: Xorq’s lockfile Approach for Reproducible, Portable ML Pipelines
Episode 24

From Notebooks to Production: Xorq’s lockfile Approach for Reproducible, Portable ML Pipelines

Tech on the Rocks · Kostas, Nitay

January 29, 202657m 26s

Audio is streamed directly from the publisher (media.transistor.fm) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

In this episode, Hussain shares the story behind xorq: a “lockfile for ML pipelines” that makes notebook work easier to reproduce, debug, and ship. We talk about why the research→production path is still so manual, how schemas (and Arrow) become the contract between systems, and what it takes to run the same pipeline across engines like Snowflake and Databricks. We also dig into escape hatches for imperative code, why feature stores didn’t become the default, and how xorq fits alongside other technologies like Iceberg.

Chapters

00:00 Hussain's Journey in Data Science

06:00 The Need for xorq: Bridging Research and Production

10:38 Challenges in Machine Learning Deployment

17:40 The Role of Lock Files in Data Pipelines

29:51 Understanding Schema Management in Data Systems

34:40 Navigating Declarative and Imperative Transformations

36:39 The Developer's Journey with xorq

38:34 Feature Stores vs. xorq: A Comparative Analysis

43:43 The Future of Feature Stores and Machine Learning

51:41 Reproducibility in Data Pipelines: xorq vs. Git-like Operations

55:47 The Future of xorq and the Data Ecosystem

Topics

reproducible MLML pipelineslockfilemanifestpipeline registrydeclarative pipelinesIBISArrowArrow record batchesArrowFlightDataFusionDuckDBPolarsSnowflakeDatabricksmulti-engine executionpipeline portabilitylineageschema contractsschema evolutionfail-fast compilationUDFspandas UDFsfeature storessemantic layertemporal joinsdata consistency vs computation consistencygit-for-dataIcebergNessietime travelMLOpsresearch-to-productionmonitoring from declarations