PLAY PODCASTS
33: Katharine Jarmul - Testing in Data Science
Episode 33

33: Katharine Jarmul - Testing in Data Science

Test & Code

November 30, 201737m 15s

Audio is streamed directly from the publisher (test-and-code.sfo3.cdn.digitaloceanspaces.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

A discussion with Katharine Jarmul, aka kjam, about some of the challenges of data science with respect to testing.

Some of the topics we discuss:

  • experimentation vs testing
  • testing pipelines and pipeline changes
  • automating data validation
  • property based testing
  • schema validation and detecting schema changes
  • using unit test techniques to test data pipeline stages
  • testing nodes and transitions in DAGs
  • testing expected and unexpected data
  • missing data and non-signals
  • corrupting a dataset with noise
  • fuzz testing for both data pipelines and web APIs
  • datafuzz
  • hypothesis
  • testing internal interfaces
  • documenting and sharing domain expertise to build good reasonableness
  • intermediary data and stages
  • neural networks
  • speaking at conferences

Special Guest: Katharine Jarmul.

Links:

Topics

pythonprogrammingsoftwaretesting