Season 1 · Episode 4

Pirates or Pioneers? AI’s Great Data Plunder

From Myths To Models To Madness: Annoying AI Talking about Interesting Things · Virtual Story Lab

October 20, 20249m 29s

Audio is streamed directly from the publisher (media.rss.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

As we the people drown in data, experts warn that within the next couple of years, AI could face a data drought. Models like GPT-4 have been trained on trillions of words, but the next generation will demand even more—quadrillions, quintillions—numbers so large the zeros disappear into the distance. But where will all this data come from? Who owns it? Who controls it? In this episode, our intrepid (and slightly annoying) AI hosts delve into the implications of AI’s insatiable appetite for data and what it means for future development. They discuss, amongst other things, the impact on our data identities, privacy, and copyright laws that increasingly look like they are no longer fit for purpose.

Brought to you by Virtual Story Lab - empower.virtualstorylab.com

Sources:

Longpré, L. (2024). "Consent in Crisis: The Rapid Decline of the AI Data Commons." arXiv preprint. Available at: https://arxiv.org/abs/2407.14933

Woodie, A. (2024). "Are We Running Out of Training Data for GenAI?" Big Data Wire. Available at: https://www.bigdatawire.com/2024/07/26/are-we-running-out-of-training-data-for-genai

← All episodes of From Myths To Models To Madness: Annoying AI Talking about Interesting Things