70: The Difference Between Data Lakes and Data Warehouses with Vinoth Chandar of Apache Hudi

This week on The Data Stack Show, Eric and Kostas chat with Vinoth Chandar, Creator of the Hudi Project at the Apache Software Foundation. During the episode, Vinosh discusses his experiences building data lakes at companies like LinkedIn, Uber, and Confluent. He also gets into the differences between datalakes and warehouses, and when going open source makes sense.

The Data Stack Show

January 12, 20221h 0m

Audio is streamed directly from the publisher (afp-928695-injected.calisto.simplecastaudio.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

Highlights from this week’s conversation include:

Vinoth’s career background (3:19)
Building a data lake at Uber (6:52)
Defining what a data lake is (14:01)
How data warehouses differ from data lakes (22:46)
When you should utilize an open source solution in your datastack (37:36)
Evolving from a data warehouse to a data lake (45:09)
Early wins Hudi earned inside of Uber (52:30)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

← All episodes of The Data Stack Show