
David Hand: How Dark Data Makes AI and LLMs Dangerously Unreliable
YouTube Link: https://www.youtube.com/watch?v=41JBrC5e5tA David Hand, professor of statistics, reveals how ChatGPT lies with "dark data"; more generally, large language models and even peer review. Listen now early and ad-free on Patreon https://patreon.com/curtjaimungal. - Patreon: https://patreon.com/curtjaimungal (early access to ad-free audio episodes!) - Crypto: https://tinyurl.com/cryptoTOE - PayPal: https://tinyurl.com/paypalTOE - Twitter: https://twitter.com/TOEwithCurt - Discord Invite: https://discord.com/invite/kBcnfNVwqs - iTunes: https://podcasts.apple.com/ca/podcast... - Pandora: https://pdora.co/33b9lfP - Spotify: https://open.spotify.com/show/4gL14b9... - Subreddit r/TheoriesOfEverything: https://reddit.com/r/theoriesofeveryt... - TOE Merch: https://tinyurl.com/TOEmerch DAVID HAND'S BOOKS: - Dark Data: https://amzn.to/446Fou1 - The Improbability Principle: https://amzn.to/3DOn1iX TIMESTAMPS: 00:00:00 Introduction 00:01:34 What is Dark Data? (missing data matters more than what you have) 00:07:03 The perils of "changing definitions" 00:09:15 David on writing and his selective process 00:20:15 Theory-driven vs. data-driven models (& the constitution of LLMs) 00:32:08 The dilemma of partial truths 00:34:40 The "File Drawer Problem" & its adverse effects on clinical trials 00:39:09 Regression to the mean (how random variations lead to misleading conclusions) 00:44:12 Publication bias 00:48:03 Open-access models and their pitfalls 00:54:06 Why LLMs are simultaneously brilliant & stupid 01:03:40 David’s daily routine 01:06:24 The mean vs. median 01:11:07 Every type of "Dark Data" listed (watch this first!)
Theories of Everything with Curt Jaimungal · Theories of Everything
Show Notes
- 00:00:00 - Introduction
- 00:01:34 - What is Dark Data? (missing data matters more than what you have)
- 00:07:03 - The perils of "changing definitions"
- 00:09:15 - David on writing and his selective process
- 00:20:15 - Theory-driven vs. data-driven models (& the constitution of LLMs)
- 00:32:08 - The dilemma of partial truths
- 00:34:40 - The "File Drawer Problem" & its adverse effects on clinical trials
- 00:39:09 - Regression to the mean (how random variations lead to misleading conclusions)
- 00:44:12 - Publication bias
- 00:48:03 - Open-access models and their pitfalls
- 00:54:06 - Why LLMs are simultaneously brilliant & stupid
- 01:03:40 - David’s daily routine
- 01:06:24 - The mean vs. median
- 01:11:07 - Every type of "Dark Data" listed (watch this first!)
SPONSORS:
- Patreon: https://patreon.com/curtjaimungal
- Crypto: https://tinyurl.com/cryptoTOE
- PayPal: https://tinyurl.com/paypalTOE
- Twitter: https://twitter.com/TOEwithCurt
- Discord Invite: https://discord.com/invite/kBcnfNVwqs
- iTunes: https://podcasts.apple.com/ca/podcast...
- Pandora: https://pdora.co/33b9lfP
- Spotify: https://open.spotify.com/show/4gL14b9...
- Subreddit r/TheoriesOfEverything: https://reddit.com/r/theoriesofeveryt...
- TOE Merch: https://tinyurl.com/TOEmerch
RESOURCES:
- YouTube Link: https://www.youtube.com/watch?v=41JBrC5e5tA
- Dark Data: https://amzn.to/446Fou1
- The Improbability Principle: https://amzn.to/3DOn1iX
Theories of Everything with Curt Jaimungal features long-form, technically detailed interviews with leading researchers in physics, mathematics, consciousness, and philosophy, exploring topics at the level of active research. For academics, graduate students, and anyone seeking depth beyond popular science.
SPONSOR: I subscribe to The Economist for their science and tech coverage. As a TOE listener, get 35% off! No other podcast has this: https://economist.com/TOE
FOLLOW: Substack | Spotify | YouTube | Twitter
Learn more about your ad choices. Visit megaphone.fm/adchoices
Learn more about your ad choices. Visit megaphone.fm/adchoices