Data Engineering Podcast

513 episodes — Page 8 of 11

Ep 162Proven Patterns For Building Successful Data Teams

Summary Building data products are complicated by the fact that there are so many different stakeholders with competing goals and priorities. It is also challenging because of the number of roles and capabilities that are necessary to go from idea to delivery. Different organizations have tried a multitude of organizational strategies to improve the success rate of these data teams with varying levels of success. In this episode Jesse Anderson shares the lessons that he has learned while working with dozens of businesses across industries to determine the team structures and communication styles that have generated the best results. If you are struggling to deliver value from big data, or just starting down the path of building the organizational capacity to turn raw information into valuable products then this is a conversation that you don’t want to miss. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. Your host is Tobias Macey and today I’m interviewing Jesse Anderson about best practices for organizing and managing data teams Interview Introduction How did you get involved in the area of data management? Can you start by giving an overview of how you view the mission and responsibilities of a data team? What are the critical elements of a successful data team? Beyond the core pillars of data science, data engineering, and operations, what other specialized roles do you find helpful for larger or more sophisticated teams? For organizations that have "small data", how does that change the necessary composition of roles for successful data projects? What are the signs and symptoms that point to the need for a dedicated team that focuses on data? With data scientists and data engineers in particular being in such high demand, what are strategies that you have found effective for attracting new talent? In the case where you have engineers on staff, how do you identify internal talent that can be trained into these specialized roles? Another challenge that organizations face in dealing with data is how the team is organized. What are your thoughts on effective strategies for how to structure the communication and reporting structures of data teams? (e.g. centralized, embedded, etc.) How do you recommend evaluating potential candidates for each of the necessary roles? What are your thoughts on when to hire an outside consultant, v

Data Engineering Podcast

Ep 162Proven Patterns For Building Successful Data Teams

Ep 161Streaming Data Integration Without The Code at Equalum

Ep 160Keeping A Bigeye On The Data Quality Market

Ep 159Self Service Data Management From Ingest To Insights With Isima

Ep 158Building A Cost Effective Data Catalog With Tree Schema

Ep 157Add Version Control To Your Data Lake With LakeFS

Ep 156Cloud Native Data Security As Code With Cyral

Ep 155Better Data Quality Through Observability With Monte Carlo

Ep 154Rapid Delivery Of Business Intelligence Using Power BI

Ep 153Self Service Real Time Data Integration Without The Headaches With Meroxa

Ep 152Speed Up And Simplify Your Streaming Data Workloads With Red Panda

Ep 151Cutting Through The Noise And Focusing On The Fundamentals Of Data Engineering With The Data Janitor

Ep 150Distributed In Memory Processing And Streaming With Hazelcast

Ep 149Simplify Your Data Architecture With The Presto Distributed SQL Engine

Ep 148Building A Better Data Warehouse For The Cloud At Firebolt

Ep 147Metadata Management And Integration At LinkedIn With DataHub

Ep 146Exploring The TileDB Universal Data Engine

Ep 145Closing The Loop On Event Data Collection With Iteratively

Ep 144A Practical Introduction To Graph Data Applications

Ep 143Build More Reliable Distributed Systems By Breaking Them With Jepsen

Ep 142Making Wind Energy More Efficient With Data At Turbit Systems

Ep 141Open Source Production Grade Data Integration With Meltano

Ep 140DataOps For Streaming Systems With Lenses.io

Ep 139Data Collection And Management To Power Sound Recognition At Audio Analytic

Ep 138Bringing Business Analytics To End Users With GoodData

Ep 137Accelerate Your Machine Learning With The StreamSQL Feature Store

Ep 136Data Management Trends From An Investor Perspective

Ep 135Building A Data Lake For The Database Administrator At Upsolver

Ep 134Mapping The Customer Journey For B2B Companies At Dreamdata

Ep 133Power Up Your PostgreSQL Analytics With Swarm64

Ep 132StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

Ep 131Enterprise Data Operations And Orchestration At Infoworks

Ep 130Taming Complexity In Your Data Driven Organization With DataOps

Ep 129Building Real Time Applications On Streaming Data With Eventador

Ep 128Making Data Collection In Your Code Easy With Rookout

Ep 127Building A Knowledge Graph Of Commercial Real Estate At Cherre

Ep 126The Life Of A Non-Profit Data Professional

Ep 125Behind The Scenes Of The Linode Object Storage Service

Ep 124Building A New Foundation For CouchDB

Ep 123Scaling Data Governance For Global Businesses With A Data Hub Architecture

Ep 122Easier Stream Processing On Kafka With ksqlDB

Ep 121Shining A Light on Shadow IT In Data And Analytics

Ep 120Data Infrastructure Automation For Private SaaS At Snowplow

Ep 119Data Modeling That Evolves With Your Business Using Data Vault

Ep 118The Benefits And Challenges Of Building A Data Trust

Ep 117Pay Down Technical Debt In Your Data Pipeline With Great Expectations

Ep 116Replatforming Production Dataflows

Ep 115Planet Scale SQL For The New Generation Of Applications With YugabyteDB

Ep 114Change Data Capture For All Of Your Databases With Debezium

Ep 113Building The DataDog Platform For Processing Timeseries Data At Massive Scale