
The Gradient: Perspectives on AI
149 episodes — Page 2 of 3

Laurence Liew: AI Singapore
In episode 98 of The Gradient Podcast, Daniel Bashir speaks to Laurence Liew.Laurence is the Director for AI Innovation at AI Singapore. He is driving the adoption of AI by the Singapore ecosystem through the 100 Experiments, AI Apprenticeship Programmes and the Generational AI Talent Development initiative. He is the current Co-Chair of the Innovations and Commercialisation working group and Co-Chair of the "Broad Adoption of AI by SME" committee.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:25) Laurence’s background* (07:00) AI Singapore and Singapore’s AI Strategy* (08:27) Awareness and adoption of AI in Singapore* (19:45) AI Apprenticeship Program stories* (27:35) Developing generational AI talent within Singapore, literacy* (32:25) Singapore’s place within the global AI ecosystem* (38:30) How the generative AI boom has affected Singapore* (43:50) Laurence’s vision for the future of Singapore’s tech ecosystem* (49:41) OutroLinks:* AI Singapore Get full access to The Gradient at thegradientpub.substack.com/subscribe

Michael Levin & Adam Goldstein: Intelligence and its Many Scales
In episode 97 of The Gradient Podcast, Daniel Bashir speaks to Professor Michael Levin and Adam Goldstein. Professor Levin is a Distinguished Professor and Vannevar Bush Chair in the Biology Department at Tufts University. He also directs the Allen Discovery Center at Tufts. His group, the Levin Lab, focuses on understanding the biophysical mechanisms that implement decision-making during complex pattern regulation, and harnessing endogenous bioelectric dynamics toward rational control of growth and form. Adam Goldstein was a visiting scientist at the Levin Lab, where he worked on cancer research, and is the co-founder and Chairman of Astonishing Labs. Previously Adam founded Hipmunk, wrote tech books for O'Reilly, and was a Visiting Partner at Y Combinator.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:37) Intros* (03:20) Prof. Levin intro* (04:26) Adam intro* (06:25) A perspective on intelligence* (08:40) Diverse intelligence — in unconventional embodiments and unfamiliar spaces, substrate independence* (12:23) Failure of the life-machine distinction, text-based systems, grounding, and embodiment* (16:12) What it is to be a Self, fluidity and persistence* (22:45) The combination problem in cognitive function, levels and representation* (27:10) Goals for AI / cognitive science, Prof Levin’s perspective on building intelligent systems* (31:25) Adam’s and Prof. Levin’s recent research—regenerative medicine and cancer* (36:25) Examples of regeneration, Adam on the right approach to the regeneration problem as generation* (45:25) Protein engineering vs. Adam and Prof. Levin’s program, implicit assumptions underlying biology* (48:15) Regeneration example in liver disease* (50:50) Perspectives on AI and its goalsLinks:* Levin Lab homepage* Forms of life, forms of mind* Adam’s homepage* Research* On Having No Head: Cognition throughout Biological Systems* Technological Approach to Mind Everywhere* Living Things Are Not (20th Century) Machines: Updating Mechanism Metaphors in Light of the Modern Science of Machine Behavior* Life, death, and self: Fundamental questions of primitive cognition viewed through the lens of body plasticity and synthetic organisms* Modular cognition* Endless Forms* Future Medicine: from molecular pathways to the collective intelligence of the body* Technological Approach to Mind Everywhere: an experimentally-grounded framework for understanding diverse bodies and minds* The Computational Boundary of a “Self”: Developmental Bioelectricity Drives Multicellularity and Scale-Free Cognition* Machine life Get full access to The Gradient at thegradientpub.substack.com/subscribe

Jonathan Frankle: From Lottery Tickets to LLMs
In episode 96 of The Gradient Podcast, Daniel Bashir speaks to Jonathan Frankle.Jonathan is the Chief Scientist at MosaicML and (as of release). Jonathan completed his PhD at MIT, where he investigated the properties of sparse neural networks that allow them to train effectively through his lottery ticket hypothesis. He also spends a portion of his time working on technology policy, and currently works with the OECD to implement the AI principles he helped develop in 2019.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:35) Jonathan’s background and work* (04:25) Origins of the Lottery Ticket Hypothesis* (06:00) Jonathan’s empiricism and approach to science* (08:25) More Karl Popper discourse + hot takes* (09:45) Walkthrough of the Lottery Ticket Hypothesis* (12:00) Issues with the Lottery Ticket Hypothesis as a statement* (12:30) Jonathan’s advice for PhD students, on asking good questions* (15:55) Strengths and Promise of the Lottery Ticket Hypothesis* (18:55) More Lottery Ticket Hypothesis Papers* (19:10) Comparing Rewinding and Fine-tuning* (23:00) Care in making experimental choices* (25:05) Linear Mode Connectivity and the Lottery Ticket Hypothesis* (27:50) On what is being measured and how* (28:50) “The outcome of optimization is determined to a linearly connected region”* (31:15) On good metrics* (32:54) On the Predictability of Pruning Across Scales — scaling laws for pruning* (34:40) The paper’s takeaway* (38:45) Pruning Neural Networks at Initialization — on a scientific disagreement* (45:00) On making takedown papers useful* (46:15) On what can be known early in training* (49:15) Jonathan’s perspective on important research questions today* (54:40) MosaicML* (55:19) How Mosaic got started* (56:17) Mosaic highlights* (57:33) Customer stories* (1:00:30) Jonathan’s work and perspectives on AI policy* (1:05:45) The key question: what we want* (1:07:35) OutroLinks:* Jonathan’s homepage and Twitter* Papers* The Lottery Ticket Hypothesis and follow-up work* Comparing Rewinding and Fine-tuning in Neural Network Pruning* Linear Mode Connectivity and the LTH* On the Predictability of Pruning Across Scales* Pruning Neural Networks at Initialization: Why Are We Missing The Mark?* Desirable Inefficiency Get full access to The Gradient at thegradientpub.substack.com/subscribe

Nao Tokui: "Surfing" Musical Creativity with AI
In episode 95 of The Gradient Podcast, Daniel Bashir speaks to Nao Tokui.Nao Tokui is an artist/DJ and researcher based in Tokyo. While pursuing his Ph.D. at The University of Tokyo, he produced his first music album and singles using AI, including a 12-inch record with Nujabes, a legendary Japanese hip-hop producer. After completing his Ph.D. research, he founded Qosmo, AI Creativity and Music Lab, in 2009. Since then, he has been actively working at the intersection of AI technology and art. Nao and his team's works have been exhibited at renowned venues such as the New York MoMA and the Barbican Centre in London. Their performances have also been showcased at various music festivals, including MUTEK and Sonar. Additionally, he is leading the development of AI-based music instruments at his newly founded company, Neutone. In 2021, Nao received the Okawa Publishing Award for his Japanese book on art, creativity, and AI. The book is scheduled to be released in English as "Surfing human creativity with AI — A user's guide" in November 2023.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:15) Nao’s background and how he got into AI and music* (05:10) Nao’s experiences as a DJ, collaboration with Nujabes* (07:10) HCI and music* (10:35) Leveraging the difference between AI systems and humans* (12:40) Total control vs total chaos* (13:45) Qosmo and the Neutone Project, misusable AI tools* (17:25) On music and “creating something new”* (21:00) Declarative and top-down vs. bottom-up creation, individual taste* (23:50) How generative AI enables humans* (26:25) On misusing technology and art* (32:00) Dawn Patrol EP* (36:00) A two-discriminator GAN for creating music in new genres* (37:45) The AI DJ Project* (38:20) The interactive vision of the project* (42:10) How AI chooses music, breaking from constraints* (43:15) Interpretability and how an AI system DJs differently* (45:15) How the project altered Nao’s perspective on DJing, the role of humans* (51:40) Nao’s book Creating with AI* (55:15) Human-AI interaction as joint improvisation* (58:10) Nao’s advice and takeaways for thinking about AI creatively* (1:01:32) OutroLinks:* Nao’s homepage and Twitter* Other links:* Neutone, AI audio plugin* Real-time AI-generative DJ performance* Qosmo* Dawn Patrol EP* Nao’s book: Surfing human creativity with AI — A user's guide* Paper on Creative-GAN for deviating from existing music genres Get full access to The Gradient at thegradientpub.substack.com/subscribe

Divyansh Kaushik: The Realities of AI Policy
In episode 94 of The Gradient Podcast, Daniel Bashir speaks to Divyansh Kaushik.Divyansh is the Associate Director for Emerging Technologies and National Security at the Federation of American Scientists where his focus areas include, amongst other things, AI policy, STEM immigration, and US-China strategic competition. He holds a PhD from Carnegie Mellon University, where he focused on designing reliable AI systems that align with human values. In addition to his advocacy work on Capitol Hill, he also played a key role in establishing the Congressional Graduate Research and Development Caucus. He is a frequent contributor to leading publications, including Vox, National Defense Magazine, The Dispatch, Daily Caller, and Forbes.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:20) Divyansh intro/background* (06:00) Zachary Lipton Appreciation Session ( + advice from Prof Lipton)* (08:00) How Divyansh got involved in policy* (11:30) What does policy work look like? Divyansh’s early experiences* (15:42) AI policy issues, divides, party lines* (19:15) Bringing AI talent into the US* (26:45) US/China saber rattling, impact of Xi Jinping’s presidency* (33:49) China’s AI regulations, CCP motivations, China’s disadvantages in AI and benefits of the US policy process* (42:42) Trading off AI governance and stifling innovation* (51:17) AI governance comments from Jeremy Howard / Connor Leahy / Andrew Maynard, regulating use vs basic technology, limits on scaling* (1:01:30) Articulating and communicating the issues for AI governance* (1:03:10) Existential risk concerns in AI governance, theories of change* (1:10:15) How can AI researchers/practitioners better communicate with policymakers?* (1:16:57) OutroLinks:* Divyansh’s Twitter and FAS page* Divyansh’s policy work:* The impact of international scientists, engineers, and students on US research outputs and global competitiveness* How Congress can shape AI governance without stifling innovation* How Do OpenAI’s Efforts To Make GPT-4 “Safer” Stack Up Against The NIST AI Risk Management Framework?* Six Policy Ideas for the National AI Strategy* Other work mentioned/discussed:* Jeremy Howard’s AI Safety and the Age of Dislightenment* Proposals from Connor Leahy* Andrew Maynard’s Regulating Frontier AI: To Open Source or Not? Get full access to The Gradient at thegradientpub.substack.com/subscribe

Tal Linzen: Psycholinguistics and Language Modeling
In episode 93 of The Gradient Podcast, Daniel Bashir speaks to Professor Tal Linzen.Professor Linzen is an Associate Professor of Linguistics and Data Science at New York University and a Research Scientist at Google. He directs the Computation and Psycholinguistics Lab, where he and his collaborators use behavioral experiments and computational methods to study how people learn and understand language. They also develop methods for evaluating, understanding, and improving computational systems for language processing.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:25) Prof. Linzen’s background* (05:37) Back and forth between psycholinguistics and deep learning research, LM evaluation* (08:40) How can deep learning successes/failures help us understand human language use, methodological concerns, comparing human representations to LM representations* (14:22) Behavioral capacities and degrees of freedom in representations* (16:40) How LMs are becoming less and less like humans* (19:25) Assessing LSTMs’ ability to learn syntax-sensitive dependencies* (22:48) Similarities between structure-sensitive dependencies, sophistication of syntactic representations* (25:30) RNNs implicitly implement tensor-product representations—vector representations of symbolic structures* (29:45) Representations required to solve certain tasks, difficulty of natural language* (33:25) Accelerating progress towards human-like linguistic generalization* (34:30) The pre-training agnostic identically distributed evaluation paradigm* (39:50) Ways to mitigate differences in evaluation* (44:20) Surprisal does not explain syntactic disambiguation difficulty* (45:00) How to measure processing difficulty, predictability and processing difficulty* (49:20) What other factors influence processing difficulty?* (53:10) How to plant trees in language models* (55:45) Architectural influences on generalizing knowledge of linguistic structure* (58:20) “Cognitively relevant regimes” and speed of generalization* (1:00:45) Acquisition of syntax and sampling simpler vs. more complex sentences* (1:04:03) Curriculum learning for progressively more complicated syntax* (1:05:35) Hypothesizing tree-structured representations* (1:08:00) Reflecting on a prediction from the past* (1:10:15) Goals and “the correct direction” in AI research* (1:14:04) OutroLinks:* Prof. Linzen’s Twitter and homepage* Papers* Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies* RNNS Implicitly Implement Tensor-Product Representations* How Can We Accelerate Progress Towards Human-like Linguistic Generalization?* Surprisal does not explain syntactic disambiguation difficulty: evidence from a large-scale benchmark* How to Plant Trees in LMs: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases Get full access to The Gradient at thegradientpub.substack.com/subscribe

Kevin K. Yang: Engineering Proteins with ML
In episode 92 of The Gradient Podcast, Daniel Bashir speaks to Kevin K. Yang.Kevin is a senior researcher at Microsoft Research (MSR) who works on problems at the intersection of machine learning and biology, with an emphasis on protein engineering. He completed his PhD at Caltech with Frances Arnold on applying machine learning to protein engineering. Before joining MSR, he was a machine learning scientist at Generate Biomedicines, where he used machine learning to optimize proteins.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:40) Kevin’s background* (06:00) Protein engineering early in Kevin’s career* (12:10) From research to real-world proteins: the process* (17:40) Generative models + pretraining for proteins* (22:47) Folding diffusion for protein structure generation* (30:45) Protein evolutionary dynamics and generative models of protein sequences* (40:03) Analogies and disanalogies between protein modeling and language models* (41:45) In representation learning* (45:50) Convolutions vs. transformers and inductive biases* (49:25) Pretraining tasks for protein structure* (51:45) More on representation learning for protein structure* (54:06) Kevin’s thoughts on interpretability in deep learning for protein engineering* (56:50) Multimodality in protein engineering and future directions* (59:14) OutroLinks:* Kevin’s Twitter and homepage* Research* Generative models + pre-training for proteins and chemistry* Broad intro to techniques in the space* Protein structure generation via folding diffusion* Protein sequence design with deep generative models (review)* Evolutionary velocity with protein language models predicts evolutionary dynamics of diverse proteins* Protein generation with evolutionary diffusion: sequence is all you need* ML for protein engineering* ML-guided directed evolution for protein engineering (review)* Learned protein embeddings for ML* Adaptive machine learning for protein engineering (review)* Multimodal deep learning for protein engineering Get full access to The Gradient at thegradientpub.substack.com/subscribe

Arjun Ramani & Zhengdong Wang: Why Transformative AI is Really, Really Hard to Achieve
In episode 91 of The Gradient Podcast, Daniel Bashir speaks to Arjun Ramani and Zhengdong Wang. Arjun is the global business and economics correspondent at The Economist.Zhengdong is a research engineer at Google DeepMind.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (03:53) Arjun intro* (06:04) Zhengdong intro* (09:50) How Arjun and Zhengdong met in the woods* (11:52) Overarching narratives about technological progress and AI* (14:20) Setting up the claim: Arjun on what “transformative” means* (15:52) What enables transformative economic growth?* (21:19) From GPT-3 to ChatGPT; is there something special about AI?* (24:15) Zhengdong on “real AI” and divisiveness* (27:00) Arjun on the independence of bottlenecks to progress/growth* (29:05) Zhengdong on bottleneck independence* (32:45) More examples on bottlenecks and surplus wealth* (37:06) Technical arguments—what are the hardest problems in AI?* (38:00) Robotics* (40:41) Challenges of deployment in high-stakes settings and data sources / synthetic data, self-driving* (45:13) When synthetic data works* (49:06) Harder tasks, process knowledge* (51:45) Performance art as a critical bottleneck* (53:45) Obligatory Taylor Swift Discourse* (54:45) AI Taylor Swift???* (54:50) The social arguments* (55:20) Speed of technology diffusion — “diffusion lags” and dynamics of trust with AI* (1:00:55) ChatGPT adoption, where major productivity gains come from* (1:03:50) Timescales of transformation* (1:10:22) Unpredictability in human affairs* (1:14:07) The economic arguments* (1:14:35) Key themes — diffusion lags, different sectors* (1:21:15) More on bottlenecks, AI trust, premiums on human workers* (1:22:30) Automated systems and human interaction* (1:25:45) Campaign text reachouts* (1:30:00) Counterarguments* (1:30:18) Solving intelligence and solving science/innovation* (1:34:07) Strengths and weaknesses of the broad applicability of Arjun and Zhengdong’s argument* (1:35:34) The “proves too much” worry — how could any innovation have ever happened?* (1:37:25) Examples of bringing down barriers to innovation/transformation* (1:43:45) What to do with all of this information? * (1:48:45) OutroLinks:* Zhengdong’s homepage and Twitter* Arjun’s homepage and Twitter* Why transformative artificial intelligence is really, really hard to achieve* Other resources and links mentioned:* Allan-Feuer and Sanders: Transformative AGI by 2043 is * On AlphaStar Zero* Hardmaru on AI as applied philosophy* Robotics Transformer 2* Davis Blalock on synthetic data* Matt Clancy on automating invention and bottlenecks* Michael Webb on 80,000 Hours Podcast* Bob Gordon: The Rise and Fall of American Growth* OpenAI economic impact paper* David Autor: new work paper* Baumol effect paper* Pew research centre poll, public concern on AI* Human premium Economist piece* Callum Williams — London tube and AI/jobs* Culture Series book 1, Iain Banks Get full access to The Gradient at thegradientpub.substack.com/subscribe

Miles Grimshaw: Benchmark, LangChain, and Investing in AI
In episode 90 of The Gradient Podcast, Daniel Bashir speaks to Miles Grimshaw.Miles is General Partner at Benchmark. He was previously a General Partner at Thrive Capital, where he helped the firm raise its fourth and fifth funds, and sourced deals in Lattice, Mapbox, Benchling, and Airtable, among others.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:48) Miles’ background (note: Miles is now the second newest GP at Benchmark)* (06:07) Miles’ investment philosophy and previous investments* (12:25) Investing in the “decade of deep learning” and how Miles became interested in AI* (18:53) Miles’ / Benchmark’s investment in Langchain* (24:29) On AI advances and adoption* (39:25) Hardware shortages, radically changing UX for LLMs* (48:12) Opportunities for AI applications in new domains* (50:15) Miles’ advice for potential founders in AI* (1:00:00) OutroLinks:* Miles’ Twitter* Benchmark homepage* LangChain homepage Get full access to The Gradient at thegradientpub.substack.com/subscribe

Shreya Shankar: Machine Learning in the Real World
In episode 89 of The Gradient Podcast, Daniel Bashir speaks to Shreya Shankar.Shreya is a computer scientist pursuing her PhD in databases at UC Berkeley. Her research interest is in building end-to-end systems for people to develop production-grade machine learning applications. She was previously the first ML engineer at Viaduct, did research at Google Brain, and software engineering at Facebook. She graduated from Stanford with a B.S. and M.S. in computer science with concentrations in systems and artificial intelligence. At Stanford, helped run SHE++, an organization that helps empower underrepresented minorities in technology.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:22) Shreya’s background and journey into ML / MLOps* (04:51) ML advances in 2013-2016* (05:45) Shift in Stanford undergrad class ecosystems, accessibility of deep learning research* (09:10) Why Shreya left her job as an ML engineer* (13:30) How Shreya became interested in databases, data quality in ML* (14:50) Daniel complains about things* (16:00) What makes ML engineering uniquely difficult* (16:50) Being a “historian of the craft” of ML engineering* (22:25) Levels of abstraction, what ML engineers do/don’t have to think about* (24:16) Observability for Production ML Pipelines* (28:30) Metrics for real-time ML systems* (31:20) Proposed solutions* (34:00) Moving Fast with Broken Data* (34:25) Existing data validation measures and where they fall short* (36:31) Partition summarization for data validation* (38:30) Small data and quantitative statistics for data cleaning* (40:25) Streaming ML Evaluation* (40:45) What makes a metric actionable* (42:15) Differences in streaming ML vs. batch ML* (45:45) Delayed and incomplete labels* (49:23) Operationalizing Machine Learning* (49:55) The difficult life of an ML engineer* (53:00) Best practices, tools, pain points* (55:56) Pitfalls in current MLOps tools* (1:00:30) LLMOps / FMOps* (1:07:10) Thoughts on ML Engineering, MLE through the lens of data engineering* (1:10:42) Building products, user expectations for AI products* (1:15:50) OutroLinks:* Papers* Towards Observability for Production Machine Learning Pipelines* Rethinking Streaming ML Evaluation* Operationalizing Machine Learning* Moving Fast With Broken Data* Blog posts* The Modern ML Monitoring Mess* Thoughts on ML Engineering After a Year of my PhD Get full access to The Gradient at thegradientpub.substack.com/subscribe

Stevan Harnad: AI's Symbol Grounding Problem
In episode 88 of The Gradient Podcast, Daniel Bashir speaks to Professor Stevan Harnad.Stevan Harnad is professor of psychology and cognitive science at Université du Québec à Montréal, adjunct professor of cognitive science at McGill University, and professor emeritus of cognitive science at the University of Southampton. His research is on category learning, categorical perception, symbol grounding, the evolution of language, and animal and human sentience (otherwise known as “consciousness”). He is also an advocate for open access and an activist for animal rights.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (05:20) Professor Harnad’s background: interests in cognitive psychobiology, editing Behavioral and Brain Sciences* (07:40) John Searle submits the Chinese Room article* (09:20) Early reactions to Searle and Prof. Harnad’s role* (13:38) The core of Searle’s argument and the generator of the Symbol Grounding Problem, “strong AI”* (19:00) Ways to ground symbols* (20:26) The acquisition of categories* (25:00) Pantomiming, non-linguistic category formation* (27:45) Mathematics, abstraction, and grounding* (36:20) Symbol manipulation and interpretation language* (40:40) On the Whorf Hypothesis* (48:39) Defining “grounding” and introducing the “T3” Turing Test* (53:22) Turing’s concerns, AI and reverse-engineering cognition* (59:25) Other Minds, T4 and zombies* (1:05:48) Degrees of freedom in solutions to the Turing Test, the easy and hard problems of cognition* (1:14:33) Over-interepretation of AI systems’ behavior, sentience concerns, T3 and evidence sentience* (1:24:35) Prof. Harnad’s commentary on claims in The Vector Grounding Problem* (1:28:05) RLHF and grounding, LLMs’ (ungrounded) capabilities, syntactic structure and propositions* (1:35:30) Multimodal AI systems (image-text and robotic) and grounding, compositionality* (1:42:50) Chomsky’s Universal Grammar, LLMs and T2* (1:50:55) T3 and cognitive simulation* (1:57:34) OutroLinks:* Professor Harnad’s webpage and skywritings* Papers:* Category Induction and Representation* Categorical Perception* From Sensorimotor Categories to Grounded Symbols* Minds, machines and Searle 2* The Latent Structure of Dictionaries Get full access to The Gradient at thegradientpub.substack.com/subscribe

Terry Winograd: AI, HCI, Language, and Cognition
In episode 87 of The Gradient Podcast, Daniel Bashir speaks to Professor Terry Winograd. Professor Winograd is Professor Emeritus of Computer Science at Stanford University. His research focuses on human-computer interaction design and the design of technologies for development. He founded the Stanford Human-Computer Interaction Group, where he directed the teaching programs and HCI research. He is also a founding faculty member of the Stanford d.school and a founding member and past president of Computer Professionals for Social Responsibility.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (03:00) Professor Winograd’s background* (05:10) At the MIT AI Lab* (05:45) The atmosphere in the MIT AI Lab, Minsky/Chomsky debates* (06:20) Blue-sky research, government funding for academic research* (10:10) Isolation and collaboration between research groups* (11:45) Phases in the development of ideas and how cross-disciplinary work fits in* (12:26) SHRDLU and the MIT AI Lab’s intellectual roots* (17:20) Early responses to SHRDLU: Minsky, Dreyfus, others* (20:55) How Prof. Winograd’s thinking about AI’s abilities and limitations evolved* (22:25) How this relates to current AI systems and discussions of intelligence* (23:47) Repetitive debates in AI, semantics and grounding* (27:00) The concept of investment, care, trust in human communication vs machine communication* (28:53) Projecting human-ness onto AI systems and non-human things and what this means for society* (31:30) Time after leaving MIT in 1973, time at Xerox PARC, how Winograd’s thinking evolved during this time* (38:28) What Does It Mean to Understand Language? Speech acts, commitments, and the grounding of language* (42:40) Reification of representations in science and ML* (46:15) LLMs, their training processes, and their behavior* (49:40) How do we coexist with systems that we don’t understand?* (51:20) Progress narratives in AI and human agency* (53:30) Transitioning to intelligence augmentation, founding the Stanford HCI group and d.school, advising Larry Page and Sergey Brin* (1:01:25) Chatbots and how we consume information* (1:06:52) Evolutions in journalism, progress in trust for modern AI systems* (1:09:18) Shifts in the social contract, from institutions to personalities* (1:12:05) AI and HCI in recent years* (1:17:05) Philosophy of design and the d.school* (1:21:20) Designing AI systems for people* (1:25:10) Prof. Winograd’s perspective on watermarking for detecting GPT outputs* (1:25:55) The politics of being a technologist* (1:30:10) Echos of the past in AI regulation and competition and learning from history* (1:32:34) OutroLinks:* Professor Winograd’s Homepage* Papers/topics discussed:* SHRDLU* Beyond Programming Languages* What Does It Mean to Understand Language?* The PageRank Citation Ranking* Stanford Digital Libraries project* Talk: My Politics as a Technologist Get full access to The Gradient at thegradientpub.substack.com/subscribe

Gil Strang: Linear Algebra and Deep Learning
In episode 86 of The Gradient Podcast, Daniel Bashir speaks to Professor Gil Strang. Professor Strang is one of the world’s foremost mathematics educators and a mathematician with contributions to finite element theory, the calculus of variations, wavelet analysis, and linear algebra. He has spent six decades teaching mathematics at MIT, where he was the MathWorks Professor of Mathematics. He was among the first MIT faculty members to publish a course on MIT’s OpenCourseware and has since championed both linear algebra education and open courseware.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:00) Professor Strang’s background and journey into teaching linear algebra* (04:55) Undergrad interests* (07:10) Writing textbooks* (10:20) Prof. Strang’s interests in deep learning* (11:00) How Professor Strang thought about teaching early on* (16:20) MIT OpenCourseWare and education accessibility* (19:50) Prof Strang’s applied/example-based approach to teaching linear algebra and closing the theory-practice gap* (22:00) Examples!* (27:20) Orthogonality* (29:15) Singular values* (34:40) Professor Strang’s favorite topics in linear algebra* (37:55) Pedagogical approaches to deep learning, mathematical ingredients of deep learning’s complexity* (42:04) Generalization and double descent in deep learning, powers and limitations* (46:20) Did deep learning have to evolve as it did?* (48:30) Teaching deep learning to younger students* (50:50) How Prof. Strang’s approach to teaching linear algebra has evolved over time* (53:00) The Four Fundamental Subspaces* (56:15) Reflections on a career in teaching* (59:49) OutroLinks:* Professor Strang’s homepage Get full access to The Gradient at thegradientpub.substack.com/subscribe

Anant Agarwal: AI for Education
In episode 85 of The Gradient Podcast, Andrey Kurenkov speaks to Anant AgarwalAnant Agarwal is the chief platform officer of 2U, and founder of edX. Anant taught the first edX course on circuits and electronics from MIT, which drew 155,000 students from 162 countries. He has served as the director of CSAIL, MIT's Computer Science and Artificial Intelligence Laboratory, and is a professor of electrical engineering and computer science at MIT. He is a successful serial entrepreneur, having co-founded several companies including Tilera Corporation, which created the Tile multicore processor, and Virtual Machine Works.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:30) History with research* (05:56) Founding EdX* (13:05) AI at EdX* (18:40) Reaction to AI as a teacher* (25:00) Student interest in AI* (32:20) AI’s impact on academia* (35:00) Future of AI in education* (38:25) AI writing essays* (43:38) Experiences playing with ChatGPT Get full access to The Gradient at thegradientpub.substack.com/subscribe

Raphaël Millière: The Vector Grounding Problem and Self-Consciousness
In episode 84 of The Gradient Podcast, Daniel Bashir speaks to Professor Raphaël Millière.Professor Millière is a Lecturer (Assistant Professor) in the Philosophy of Artificial Intelligence at Macquarie University in Sydney, Australia. Previously, he was the 2020 Robert A. Burt Presidential Scholar in Society and Neuroscience in Columbia University’s Center for Science and Society, and completed his DPhil in philosophy at the University of Oxford, where he focused on self-consciousness.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:20) Prof. Millière’s background* (08:07) AI + philosophy questions and the human side / empiricism* (18:38) Putting aside metaphysical issues* (20:28) Prof. Millière’s work on self-consciousness, does consciousness constitutively involve self-consciousness?* (32:05) Relationship to recent pronouncements about AI sentience* (41:54) Chatbots’ self-presentation as having a “self”* (51:05) Intro to grounding and related concepts* (1:00:06) The different types of grounding* (1:08:48) Lexical representations and things in the world, distributional hypothesis, concepts in LLMs* (1:21:40) Representational content and overcoming the vector grounding problem* (1:32:01) Causal-informational relations and teleology* (1:43:45) Levels of grounding, extralinguistic aspects of meaning* (1:52:12) Future problems and ongoing projects* (2:04:05) OutroLinks:* Professor Millière’s homepage and Twitter* Research* Are There Degrees of Self-Consciousness?* The Varieties of Selflessness* Selfless Memories* The Vector Grounding Problem Get full access to The Gradient at thegradientpub.substack.com/subscribe

Peli Grietzer: A Mathematized Philosophy of Literature
In episode 83 of The Gradient Podcast, Daniel Bashir speaks to Peli Grietzer. Peli is a scholar whose work borrows mathematical ideas from machine learning theory to think through “ambient” and ineffable phenomena like moods, vibes, cultural logics, and structures of feeling. He is working on a book titled Big Mood: A Transcendental-Computational Essay in Art and contributes to the experimental literature collective Gauss PDF. Peli has a PhD in mathematically informed literary theory from Harvard Comparative Literature in collaboration with the HUJI Einstein Institute of Mathematics.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:17) Peli’s background* (10:40) Daniel takes 2 entire minutes to ask how Peli thinks about ~ Art ~* (26:10) Idealism and art as revealing the nature of reality, extralinguistic experiences of truth through literature* (52:05) The autoencoder as a way to understand Romantic theories of art* (1:14:55) More on how Peli thinks about autoencoders* (1:18:05) Connections to ambient meaning, stimmung/mood* (1:37:18) Examples of poetry/literature as mathematical experience, aesthetic unity and totalizing worldviews* (1:51:15) Moods clashing within a single work* (2:10:14) Modernist writers* (2:32:46) OutroLinks:* Peli’s Twitter* A Theory of Vibe* Why poetry is a variety of mathematical experience* Peli’s thesis Get full access to The Gradient at thegradientpub.substack.com/subscribe

Ryan Drapeau: Battling Fraud with ML at Stripe
In episode 82 of The Gradient Podcast, Daniel Bashir speaks to Ryan Drapeau.Ryan is a Staff Software Engineer at Stripe and technical lead for Stripe’s Payment Fraud organization, which uses machine learning to help prevent billions of dollars of credit card and payments fraud for business every year.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:15) Ryan’s background* (05:28) Differences between adversarial problems (fraud, content moderation, etc.)* (08:50) How fraud manifests for businesses* (11:07) Types of fraud* (15:49) Fraud as an industry* (19:05) Information asymmetries between fraudsters and defenders* (22:40) Fraud as an ML problem and Stripe Radar* (25:45) Evolution of Stripe Radar* (31:38) Architectural evolution* (41:38) Why ResNets for Stripe Radar?* (44:15) Future architectures for Stripe Radar and the explainability/performance tradeoff* (48:58) War stories* (52:55) Federated learning opportunities for Stripe Radar* (55:50) Vectors for improvement in Stripe’s fraud detection systems* (59:22) More ways of thinking about the fraud problem, multiclass models* (1:03:30) Lessons Ryan has picked up from working on fraud* (1:05:44) OutroLinks:* How We Built It: Stripe Radar* Stripe 2022 Update Get full access to The Gradient at thegradientpub.substack.com/subscribe

Shiv Rao: Enabling Better Patient Care with AI
In episode 81 of The Gradient Podcast, Daniel Bashir speaks to Shiv Rao.Shiv Rao, MD is the co-founder and CEO of Abridge, a healthcare conversation company that uses cutting-edge NLP and generative AI to bring context and understanding to every medical conversation. Shiv previously served as an Executive Vice President at UPMC Enterprises, managing the provider-facing portfolio of technology investments and R&D. He is a practicing cardiologist in UPMC's Heart and Vascular Institute.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:34) Shiv’s medicine/technology/VC background* (05:45) Difficulties for tech in healthcare and how this informs Shiv’s approach* (10:52) “Productivity with a smile” and how AI can make medicine feel more human* (12:35) Shiv’s experiences in medicine and how Abridge’s product helps doctors* (16:10) How the role of a clinical team could evolve* (19:30) Abridge’s partnerships and real-life use cases* (23:00) Shiv’s perspectives on concerns about bias/trust/privacy* (25:25) Clinical decision support vs “automating doctors”* (29:07) Transparency and Abridge’s user experience* (35:20) Algorithmic solutionism vs human-focused approaches to technology development * (38:50) Ways AI might impact healthcare* (41:10) Generative AI applications* (45:00) Generative AI opportunities beyond documentation* (49:25) Innovation and reducing friction, UX* (50:56) Why people make wild predictions about AI* (54:25) What it means to “automate away” a doctor, how we’re misusing the medical workforce* (56:10) Shiv’s advice for people interested in AI + healthcare* (1:00:04) OutroLinks:* Abridge Homepage Get full access to The Gradient at thegradientpub.substack.com/subscribe

Hugo Larochelle: Deep Learning as Science
In episode 80 of The Gradient Podcast, Daniel Bashir speaks to Professor Hugo Larochelle. Professor Larochelle leads the Montreal Google DeepMind team and is adjunct professor at Université de Montréal and a Canada CIFAR Chair. His research focuses on the study and development of deep learning algorithms.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:38) Prof. Larochelle’s background, working in Bengio’s lab* (04:53) Prof. Larochelle’s work and connectionism* (08:20) 2004-2009, work with Bengio* (08:40) Nonlocal Estimation of Manifold Structure, manifolds and deep learning* (13:58) Manifold learning in vision and language* (16:00) Relationship to Denoising Autoencoders and greedy layer-wise pretraining* (21:00) From input copying to learning about local distribution structure* (22:30) Zero-Data Learning of New Tasks* (22:45) The phrase “extend machine learning towards AI” and terminology* (26:55) Prescient hints of prompt engineering* (29:10) Daniel goes on totally unnecessary tangent* (30:00) Methods for training deep networks (strategies and robust interdependent codes)* (33:45) Motivations for layer-wise pretraining* (35:15) Robust Interdependent Codes and interactions between neurons in a single network layer* (39:00) 2009-2011, postdoc in Geoff Hinton’s lab* (40:00) Reflections on the AlexNet moment* (41:45) Frustration with methods for evaluating unsupervised methods, NADE* (44:45) How researchers thought about representation learning, toying with objectives instead of architectures* (47:40) The Restricted Boltzmann Forest* (50:45) Imposing structure for tractable learning of distributions* (53:11) 2011-2016 at U Sherbooke (and Twitter)* (53:45) How Prof. Larochelle approached research problems* (56:00) How Domain Adversarial Networks came about* (57:12) Can we still learn from Restricted Boltzmann Machines?* (1:02:20) The ~ Infinite ~ Restricted Boltzmann Machine* (1:06:55) The need for researchers doing different sorts of work* (1:08:58) 2017-present, at MILA (and Google)* (1:09:30) Modulating Early Visual Processing by Language, neuroscientific inspiration* (1:13:22) Representation learning and generalization, what is a good representation (Meta-Dataset, Universal representation transformer layer, universal template, Head2Toe)* (1:15:10) Meta-Dataset motivation* (1:18:00) Shifting focus to the problem—good practices for “recycling deep learning”* (1:19:15) Head2Toe intuitions* (1:21:40) What are “universal representations” and manifold perspective on datasets, what is the right pretraining dataset* (1:26:02) Prof. Larochelle’s takeaways from Fortuitous Forgetting in Connectionist Networks (led by Hattie Zhou)* (1:32:15) Obligatory commentary on The Present Moment and current directions in ML* (1:36:18) The creation and motivations of the TMLR journal* (1:41:48) Prof. Larochelle’s takeaways about doing good science, building research groups, and nurturing a research environment* (1:44:05) Prof. Larochelle’s advice for aspiring researchers today* (1:47:41) OutroLinks:* Professor Larochelle’s homepage and Twitter* Transactions on Machine Learning Research* Papers* 2004-2009* Nonlocal Estimation of Manifold Structure* Classification using Discriminative Restricted Boltzmann Machines* Zero-data learning of new tasks* Exploring Strategies for Training Deep Neural Networks* Deep Learning using Robust Interdependent Codes* 2009-2011* Stacked Denoising Autoencoders* Tractable multivariate binary density estimation and the restricted Boltzmann forest* The Neural Autoregressive Distribution Estimator* Learning Attentional Policies for Tracking and Recognition in Video with Deep Networks* 2011-2016* Practical Bayesian Optimization of Machine Learning Algorithms* Learning Algorithms for the Classification Restricted Boltzmann Machine* A neural autoregressive topic model* Domain-Adversarial Training of Neural Networks* NADE* An Infinite Restricted Boltzmann Machine* 2017-present* Modulating early visual processing by language* Meta-Dataset* A Universal Representation Transformer Layer for Few-Shot Image Classification* Learning a universal template for few-shot dataset generalization* Impact of aliasing on generalization in deep convolutional networks* Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning* Fortuitous Forgetting in Connectionist Networks Get full access to The Gradient at thegradientpub.substack.com/subscribe

Jeremie Harris: Realistic Alignment and AI Policy
In episode 79 of The Gradient Podcast, Daniel Bashir speaks to Jeremie Harris.Jeremie is co-founder of Gladstone AI, author of the book Quantum Physics Made Me Do It, and co-host of the Last Week in AI Podcast. Jeremy previously hosted the Towards Data Science podcast and worked on a number of other startups after leaving a PhD in physics.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:37) Jeremie’s physics background and transition to ML* (05:19) The physicist-to-AI person pipeline, how Jeremie’s background impacts his approach to AI* (08:20) A tangent on inflationism/deflationism about natural laws (I promise this applies to AI)* (11:45) How ML implies a particular viewpoint on the above question* (13:20) Jeremie’s first (recommendation systems) company, how startup founders can make mistakes even when they’ve read Paul Graham essays* (17:30) Classic startup wisdom, different sorts of startups* (19:35) OpenAI’s approach in shipping features for DALL-E 2 and generation vs. discrimination as an approach to product* (24:55) Capabilities and risk* (26:43) Commentary on fundamental limitations of alignment in LLMs* (30:45) Intrinsic difficulties in alignment problems* (41:15) Daniel tries to steel man / defend anti-longtermist arguments (nicely :) )* (46:23) Anthropic’s paper on asking models to be less biased* (47:20) Why Jeremie is excited about Anthropic’s Constitutional AI scheme* (51:05) Jeremie’s thoughts on recent Eliezer discourse* (56:50) Cheese / task vectors and steerability/controllability in LLMs* (59:50) Difficulty of one-shot solutions in alignment work, better strategies* (1:02:00) Lack of theoretical understanding of deep learning systems / alignment* (1:04:50) Jeremie’s work and perspectives on AI policy* (1:10:00) Incrementality in convincing policymakers* (1:14:00) How recent developments impact policy efforts* (1:16:20) Benefits and drawbacks of open source* (1:19:30) Arguments in favor of (limited) open source* (1:20:35) Quantum Physics (not Mechanics) Made Me Do It* (1:24:10) Some theories of consciousness and corresponding physics* (1:29:49) OutroLinks:* Jeremie’s Twitter* Quantum Physics Made Me Do It* Gladstone AI Get full access to The Gradient at thegradientpub.substack.com/subscribe

Antoine Blondeau: Alpha Intelligence Capital and Investing in AI
In episode 78 of The Gradient Podcast, Daniel Bashir speaks to Antoine Blondeau.Antoine is a serial AI entrepreneur and Co-Founder and Managing Partner of Alpha Intelligence Capital. He was chief executive at Dejima when the firm worked on CALO, one of the biggest AI projects in US history and precursor to Apple’s Siri. Later, he co-founded Sentient Technologies, which boasted the title of world’s highest funded AI company in 2016. In 2018, he founded Alpha Intelligence Capital to support future AI unicorns, and has raised more than $300 million, which has been deployed into 31 companies.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:30) Antoine’s background* (04:00) Dejima and the CALO cognitive assistant (the precursor to Siri)* (07:35) More detail on CALO* (10:10) Sentient Technologies and entrepreneurship during the AlexNet moment* (14:35) Early predictions on scale* (17:15) Role of evolutionary computation and neuroevolution* (20:00) Antoine’s motivations for becoming an investor* (22:30) Alpha Intelligence Capital’s investment focus* (27:40) Safety and trust issues in fully automated systems* (37:00) Models of culture, discernment in the use of AI systems* (39:30) Antoine’s experience as an investor in today’s AI environment* (44:43) How modern LLMs impact standard advice regarding the appropriateness of cutting-edge technologies in business* (49:00) Data (and other) moats* (52:07) Application/research areas Antoine is watching* (55:00) Antoine’s advice for people watching AI’s current developments* (58:47) OutroLinks:* Alpha Intelligence Capital Homepage Get full access to The Gradient at thegradientpub.substack.com/subscribe

Joon Park: Generative Agents and Human-Computer Interaction
In episode 77 of The Gradient Podcast, Daniel Bashir speaks to Joon Park.Joon is a third-year PhD student at Stanford, advised by Professors Michael Bernstein and Percy Liang. He designs, builds, and evaluates interactive systems that support new forms of human-computer interaction by leveraging state-of-the-art advances in natural language processing such as large language models. His research introduced the concept of, and the techniques for building generative agents—computational software agents that simulate believable human behavior. Joon’s work has been supported by the Microsoft Research PhD Fellowship, the Stanford School of Engineering Fellowship, and the Siebel Scholarship.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:43) Joon’s path from studio art to social computing / AI* (05:00) Joon’s perspectives on Human-Computer Interaction (HCI) and its recent evolution* (06:45) How foundation models enter the picture* (10:28) On slow algorithms and technology: A Slow Algorithm Improves Users’ Assessments of the Algorithm’s Accuracy* (12:10) Motivations* (17:55) The jellybean-counting task, hypotheses* (22:00) Applications and takeaways* (28:05) Deliberate engagement in social media / computing systems, incentives* (32:55) Daniel rants about The Social Dilemma + anti- social media rhetoric, Joon on the role of academics, framings of addiction* (39:05) Measuring the Prevalence of Anti-Social Behavior in Online Communities* (48:30) Statistics on anti-social behavior and anecdotal information, limitations in the paper’s measurements* (51:45) Participatory and value-sensitive design* (52:50) “Interaction” in On the Opportunities and Risks of Foundation Models* (53:45) Broader insights on foundation models and emergent behavior* (56:50) Joon’s section on interaction* (1:01:05) Daniel’s bad segue to Social Simulacra: Creating Populated Prototypes for Social Computing Systems* (1:02:50) Context for Social Simulacra and Generative Agents, why Social Simulacra was tackled first* (1:24:05) The value of norms* (1:26:20) Collaborations between designers and developers of social simulacra* (1:30:00) Generative Agents: Interactive Simulacra of Human Behavior* (1:30:30) Context / intro* (1:45:10) On (too much) coherence in generative agents and believability* (1:52:02) Instruction tuning’s impact on generative agents, model alignment w/ believability goals, desirability of agent conflict / toxic LLMs* (1:56:55) Release strategies and toxicity in LLMs* (2:03:05) On designing interfaces and responsible use* (2:09:05) Capability advances and the capability-safety research gap* (2:14:12) Worries about LLM integration, human-centered framework for technology release / LLM incorporation* (2:18:00) Joon’s philosophy as an HCI researcher* (2:20:39) OutroLinks:* Joon’s homepage and Twitter* Research* A Slow Algorithm Improves Users’ Assessments of the Algorithm’s Accuracy* Measuring the Prevalence of Anti-Social Behavior in Online Communities* On the Opportunities and Risks of Foundation Models* Social Simulacra: Creating Populated Prototypes for Social Computing Systems* Generative Agents: Interactive Simulacra of Human Behavior Get full access to The Gradient at thegradientpub.substack.com/subscribe

Christoffer Holmgård: AI for Video Games
In episode 76 of The Gradient Podcast, Andrey Kurenkov speaks to Dr Christoffer HolmgårdDr. Holmgård is a co-founder and the CEO of Modl.ai, which is building AI Engine for game development. Before starting the company, Christoffer was director of the indie game studio Die Gute Fabrik (which is German for "The Good Factory"), and has also done extensive research as an assistant professor in AI and Machine Learning for Games at Northeastern University. Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:(00:00) Intro(01:30) History with video games (06:30) History with AI(09:40) Modeling stress responses in virtual environments(13:30) Play style personas from empirical data(17:15) Automating video game testing(21:00) Video game development(28:15) modl.ai(33:45) Automated playtesting with procedural personas through MCTS with evolved heuristics(35:40) Thoughts on recent AI progress(40:50) RL for game testing(44:40) AI in Minecraft(47:50) Impact of AI on video game development(01:00:00) Ethics of Gen AI (01:06:20) Hobbies / Interests (01:08:30) Outro Get full access to The Gradient at thegradientpub.substack.com/subscribe

Riley Goodside: The Art and Craft of Prompt Engineering
In episode 75 of The Gradient Podcast, Daniel Bashir speaks to Riley Goodside. Riley is a Staff Prompt Engineer at Scale AI. Riley began posting GPT-3 prompt examples and screenshot demonstrations in 2022. He previously worked as a data scientist at OkCupid, Grindr, and CopyAI.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:37) Riley’s journey to becoming the first Staff Prompt Enginer* (02:00) data science background in online dating industry* (02:15) Sabbatical + catching up on LLM progress* (04:00) AI Dungeon and first taste of GPT-3* (05:10) Developing on codex, ideas about integrating codex with Jupyter Notebooks, start of posting on Twitter* (08:30) “LLM ethnography”* (09:12) The history of prompt engineering: in-context learning, Reinforcement Learning from Human Feedback (RLHF)* (10:20) Models used to be harder to talk to* (10:45) The three eras* (10:45) 1 - Pre-trained LM era—simple next-word predictors* (12:54) 2 - Instruction tuning* (16:13) 3 - RLHF and overcoming instruction tuning’s limitations* (19:24) Prompting as subtractive sculpting, prompting and AI safety* (21:17) Riley on RLHF and safety* (24:55) Riley’s most interesting experiments and observations* (25:50) Mode collapse in RLHF models* (29:24) Prompting models with very long instructions* (33:13) Explorations with regular expressions, chain-of-thought prompting styles* (36:32) Theories of in-context learning and prompting, why certain prompts work well* (42:20) Riley’s advice for writing better prompts* (49:02) Debates over prompt engineering as a career, relevance of prompt engineers* (58:55) OutroLinks:* Riley’s Twitter and LinkedIn* Talk: LLM Prompt Engineering and RLHF: History and Techniques Get full access to The Gradient at thegradientpub.substack.com/subscribe

Talia Ringer: Formal Verification and Deep Learning
In episode 74 of The Gradient Podcast, Daniel Bashir speaks to Professor Talia Ringer.Professor Ringer is an Assistant Professor with the Programming Languages, Formal Methods, and Software Engineering group at the University of Illinois at Urbana Champaign. Their research leverages proof engineering to allow programmers to more easily build formally verified software systems.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Daniel’s long annoying intro* (02:15) Origin Story* (04:30) Why / when formal verification is important* (06:40) Concerns about ChatGPT/AutoGPT et al failures, systems for accountability* (08:20) Difficulties in making formal verification accessible* (11:45) Tactics and interactive theorem provers, interface issues* (13:25) How Prof Ringer’s research first crossed paths with ML* (16:00) Concrete problems in proof automation* (16:15) How ML can help people verifying software systems* (20:05) Using LLMs for understanding / reasoning about code* (23:05) Going from tests / formal properties to code* (31:30) Is deep learning the right paradigm for dealing with relations for theorem proving? * (36:50) Architectural innovations, neuro-symbolic systems* (40:00) Hazy definitions in ML* (41:50) Baldur: Proof Generation & Repair with LLMs* (45:55) In-context learning’s effectiveness for LLM-based theorem proving* (47:12) LLMs without fine-tuning for proofs* (48:45) Something ~ surprising ~ about Baldur results (maybe clickbait or maybe not)* (49:32) Asking models to construct proofs with restrictions, translating proofs to formal proofs* (52:07) Methods of proofs and relative difficulties* (57:45) Verifying / providing formal guarantees on ML systems* (1:01:15) Verifying input-output behavior and basic considerations, nature of guarantees* (1:05:20) Certified/verifies systems vs certifying/verifying systems—getting LLMs to spit out proofs along with code* (1:07:15) Interpretability and how much model internals matter, RLHF, mechanistic interpretability* (1:13:50) Levels of verification for deploying ML systems, HCI problems* (1:17:30) People (Talia) actually use Bard* (1:20:00) Dual-use and “correct behavior”* (1:24:30) Good uses of jailbreaking* (1:26:30) Talia’s views on evil AI / AI safety concerns* (1:32:00) Issues with talking about “intelligence,” assumptions about what “general intelligence” means* (1:34:20) Difficulty in having grounded conversations about capabilities, transparency* (1:39:20) Great quotation to steal for your next thinkpiece + intelligence as socially defined* (1:42:45) Exciting research directions* (1:44:48) OutroLinks:* Talia’s Twitter and homepage* Research* Concrete Problems in Proof Automation* Baldur: Whole-Proof Generation and Repair with LLMs* Research ideas Get full access to The Gradient at thegradientpub.substack.com/subscribe

Brigham Hyde: AI for Clinical Decision-Making
In episode 72 of The Gradient Podcast, Daniel Bashir speaks to Brigham Hyde.Brigham is Co-Founder and CEO of Atropos Health. Prior to Atropos, he served as President of Data and Analytics at Eversana, a life sciences commercialization service provider. He led the investment in Concert AI in the oncology real-world data space at Symphony AI. Brigham has also held research faculty positions at Tufts University and the MIT Media Lab.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:55) Brigham’s background* (06:00) Current challenges in healthcare* (12:33) Interpretablity and delivering positive patient outcomes* (17:10) How Atropos surfaces relevant data for patient interventions, on personalized observational research studies* (22:10) Quality and quantity of data for patient interventions* (27:25) Challenges and opportunities for generative AI in healthcare* (35:17) Database augmentation for generative models* (36:25) Future work for Atropos* (39:15) Future directions for AI + healthcare* (40:56) OutroLinks:* Atropos Health homepage* Brigham’s Twitter and LinkedIn Get full access to The Gradient at thegradientpub.substack.com/subscribe

Scott Aaronson: Against AI Doomerism
In episode 72 of The Gradient Podcast, Daniel Bashir speaks to Professor Scott Aaronson. Scott is the Schlumberger Centennial Chair of Computer Science at the University of Texas at Austin and director of its Quantum Information Center. His research interests focus on the capabilities and limits of quantum computers and computational complexity theory more broadly. He has recently been on leave to work at OpenAI, where he is researching theoretical foundations of AI safety. Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:45) Scott’s background* (02:50) Starting grad school in AI, transitioning to quantum computing and the AI / quantum computing intersection* (05:30) Where quantum computers can give us exponential speedups, simulation overhead, Grover’s algorithm* (10:50) Overselling of quantum computing applied to AI, Scott’s analysis on quantum machine learning* (18:45) ML problems that involve quantum mechanics and Scott’s work* (21:50) Scott’s recent work at OpenAI* (22:30) Why Scott was skeptical of AI alignment work early on* (26:30) Unexpected improvements in modern AI and Scott’s belief update* (32:30) Preliminary Analysis of DALL-E 2 (Marcus & Davis)* (34:15) Watermarking GPT outputs* (41:00) Motivations for watermarking and language model detection* (45:00) Ways around watermarking* (46:40) Other aspects of Scott’s experience with OpenAI, theoretical problems* (49:10) Thoughts on definitions for humanistic concepts in AI* (58:45) Scott’s “reform AI alignment stance” and Eliezer Yudkowsky’s recent comments (+ Daniel pronounces Eliezer wrong), orthogonality thesis, cases for stopping scaling* (1:08:45) OutroLinks:* Scott’s blog* AI-related work* Quantum Machine Learning Algorithms: Read the Fine Print* A very preliminary analysis of DALL-E 2 w/ Marcus and Davis* New AI classifier for indicating AI-written text and Watermarking GPT Outputs* Writing* Should GPT exist?* AI Safety Lecture* Why I’m not terrified of AI Get full access to The Gradient at thegradientpub.substack.com/subscribe

Ted Underwood: Machine Learning and the Literary Imagination
In episode 71 of The Gradient Podcast, Daniel Bashir speaks to Ted Underwood.Ted is a professor in the School of Information Sciences with an appointment in the Department of English at the University of Illinois at Urbana Champaign. Trained in English literary history, he turned his research focus to applying machine learning to large digital collections. His work explores literary patterns that become visible across long timelines when we consider many works at once—often, his work involves correcting and enriching digital collections to make them more amenable to interesting literary research.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:42) Ted’s background / origin story* (04:35) Context in interpreting statistics, “you need a model,” the need for data about human responses to literature and how that manifested in Ted’s work* (07:25) The recognition that we can model literary prestige/genre because of ML* (08:30) Distant reading and the import of statistics over large digital libraries* (12:00) Literary prestige* (12:45) How predictable is fiction? Scales of predictability in texts* (13:55) Degrees of autocorrelation in biography and fiction and the structure of narrative, how LMs might offer more sophisticated analysis* (15:15) Braided suspense / suspense at different scales of a story* (17:05) The Literary Uses of High-Dimensional Space: how “big data” came to impact the humanities, skepticism from humanists and responses, what you can do with word count* (20:50) Why we could use more time to digest statistical ML—how acceleration in AI advances might impact pedagogy* (22:30) The value in explicit models* (23:30) Poetic “revolutions” and literary prestige* (25:53) Distant vs. close reading in poetry—follow-up work for “The Longue Durée”* (28:20) Sophistication of NLP and approaching the human experience* (29:20) What about poetry renders it prestigious?* (32:20) Individualism/liberalism and evolution of poetic taste* (33:20) Why there is resistance to quantitative approaches to literature* (34:00) Fiction in other languages* (37:33) The Life Cycles of Genres* (38:00) The concept of “genre”* (41:00) Inflationary/deflationary views on natural kinds and genre* (44:20) Genre as a social and not a linguistic phenomenon* (46:10) Will causal models impact the humanities? * (48:30) (Ir)reducibility of cultural influences on authors* (50:00) Machine Learning and Human Perspective* (50:20) Fluent and perspectival categories—Miriam Posner on “the radical, unrealized potential of digital humanities.”* (52:52) How ML’s vices can become virtues for humanists* (56:05) Can We Map Culture? and The Historical Significance of Textual Distances* (56:50) Are cultures and other social phenomena related to one another in a way we can “map”? * (59:00) Is cultural distance Euclidean? * (59:45) The KL Divergence’s use for humanists* (1:03:32) We don’t already understand the broad outlines of literary history* (1:06:55) Science Fiction Hasn’t Prepared us to Imagine Machine Learning* (1:08:45) The latent space of language and what intelligence could mean* (1:09:30) LLMs as models of culture* (1:10:00) What it is to be a human in “the age of AI” and Ezra Klein’s framing* (1:12:45) Mapping the Latent Spaces of Culture* (1:13:10) Ted on Stochastic Parrots* (1:15:55) The risk of AI enabling hermetically sealed cultures* (1:17:55) “Postcards from an unmapped latent space,” more on AI systems’ limitations as virtues* (1:20:40) Obligatory GPT-4 section* (1:21:00) Using GPT-4 to estimate passage of time in fiction* (1:23:39) Is deep learning more interpretable than statistical NLP?* (1:25:17) The “self-reports” of language models: should we trust them?* (1:26:50) University dependence on tech giants, open-source models* (1:31:55) Reclaiming Ground for the Humanities* (1:32:25) What scientists, alone, can contribute to the humanities* (1:34:45) On the future of the humanities* (1:35:55) How computing can enable humanists as humanists* (1:37:05) Human self-understanding as a collaborative project* (1:39:30) Is anything ineffable? On what AI systems can “grasp”* (1:43:12) OutroLinks:* Ted’s blog and Twitter* Research* The literary uses of high-dimensional space* The Longue Durée of literary prestige* The Historical Significance of Textual Distances* Machine Learning and Human Perspective* The life cycles of genres* Can We Map Culture?* Cohort Succession Explains Most Change in Literary Culture* Other Writing* Reclaiming Ground for the Humanities* We don’t already understand the broad outlines of literary history* Science fiction hasn’t prepared us to imagine machine learning.* How predictable is fiction?* Mapping the latent spaces of culture* Using GPT-4 to measure the passage of time in fiction Get full access to The Gradient at t

Irene Solaiman: AI Policy and Social Impact
In episode 70 of The Gradient Podcast, Daniel Bashir speaks to Irene Solaiman.Irene is an expert in AI safety and policy and the Policy Director at HuggingFace, where she conducts social impact research and develops public policy. In her former role at OpenAI, she initiated and led bias and social impact research at OpenAI in addition to leading public policy. She built AI policy at Zillow group and advised poilcymakers on responsible autonomous decision-making and privacy as a fellow at Harvard’s Berkman Klein Center.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:00) Intro to Irene and her work* (03:45) What tech people need to learn about policy, and vice versa* (06:35) Societal impact—words and reality, Irene’s experience* (08:30) OpenAI work on GPT-2 and release strategies (yes, this was recorded on Pi Day)* (11:00) Open-source proponents and release* (14:00) What does a multidisciplinary approach to working on AI look like? * (16:30) Thinking about end users and enabling contributors with different sets of expertise* (18:00) “Preparing for AGI” and current approaches to release* (21:00) Who constitutes a researcher? What constitutes safety and who gets resourced? Limitations in red-teaming potentially dangerous systems. * (22:35) PALMS and Values-Targeted Datasets* (25:52) PALMS and RLHF* (27:00) Homogenization in foundation models, cultural contexts* (29:45) Anthropic’s moral self-correction paper and Irene’s concerns about marketing “de-biasing” and oversimplification* (31:50) Data work, human systemic problems → AI bias* (33:55) Why do language models get more toxic as they get larger? (if you have ideas, let us know!)* (35:45) The gradient of generative AI release, Irene’s experience with the open-source world, tradeoffs along the release gradient* (38:40) More on Irene’s orientation towards release* (39:40) Pragmatics of keeping models closed, dealing with open-source by force* (42:22) Norm setting for release and use, normalization of documentation on social impacts* (46:30) Race dynamics :(* (49:45) Resource allocation and advances in ethics/policy, conversations on integrity and disinformation* (53:10) Organizational goals, balancing technical research with policy work* (58:10) Thoughts on governments’ AI policies, impact of structural assumptions* (1:04:00) Approaches to AI-generated sexual content, need for more voices represented in conversations about AI* (1:08:25) Irene’s suggestions for AI practitioners / technologists* (1:11:24) OutroLinks:* Irene’s homepage and Twitter* Papers* Release Strategies and the Social Impacts of Language Models* Hugh Zhang’s open letter in The Gradient from 2019* Process for Adapting Large Models to Society (PALMS) with Values-Targeted Datasets* The Gradient of Generative AI Release: Methods and Considerations Get full access to The Gradient at thegradientpub.substack.com/subscribe

Drago Anguelov: Waymo and Autonomous Vehicles
In episode 69 of The Gradient Podcast, Daniel Bashir speaks to Drago Anguelov.Drago is currently a Distinguished Scientist and Head of Research at Waymo, where he joined in 2018. Earlier, he spent eight years at Google working on 3D vision and pose estimation for StreetView, then leading a research team that developed computer vision systems for annotating Google Photos. He has been involved in developing popular neural network methods such as the Inception architecture and the SSD detector. Before joining Waymo, he also led the 3D perception team at Zoox.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:04) Drago’s background in AI and self-driving, work with Daphne Koller + Sebastian Thrun, computer vision / pose estimation* (14:20) One- and two-stage object detectors* (15:15) Early experiences and thoughts on self-driving and its prospects* (21:00) An introduction to the “self-driving stack”: mapping & localization, perception, behavior modeling & planning, simulation* (29:25) From Stuart Russell’s comments on early Waymo’s “old-fashioned” approach* (37:34) Scaling 3D Detection: challenges and architectural innovations* (43:20) Behavior modeling: making decisions and modeling interactions in multi-agent environments* (52:42) Distributional RL (+ imitation learning) in self-driving?* (54:10) The Waymo Open Dataset* (1:01:48) Looking forward in self-driving* (1:04:36) OutroLinks:* Drago’s LinkedIn and Twitter* Research* SSD: Single-Shot Multibox Detector* SCAPE: Shape completion and animation of people* Behavior Models for Autonomous Driving* Wayformer* Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation* Imitation Is Not Enough* Scaling 3D Detection to the Long Tail Get full access to The Gradient at thegradientpub.substack.com/subscribe

Joanna Bryson: The Problems of Cognition
In episode 68 of The Gradient Podcast, Daniel Bashir speaks to Professor Joanna Bryson.Professor Bryson is Professor of Ethics and Technology at the Hertie School, where her research focuses on the impact of technology on human cooperation and AI/ICT governance. Professor Bryson has advised companies, governments, transnational agencies, and NGOs, particularly in AI policy. She is one of the few people doing this sort of work who actually has a PhD and work experience in AI, but also advanced degrees in the social sciences. She started her academic career though in the liberal arts, and publishes regularly in the natural sciences.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:35) Intro to Professor Bryson’s work* (06:37) Shifts in backgrounds expected of AI PhDs/researchers* (09:40) Masters’ degree in Edinburgh, Behavior-Based AI* (11:00) PhD, differences between MIT’s engineering focus and Edinburgh, systems engineering + AI* (16:15) Comments on ways you can make contributions in AI* (18:45) When definitions of “intelligence” are important* (24:23) Non- and proto-linguistic aspects of intelligence, arguments about text as a description of human experience* (31:45) Cognitive leaps in interacting with language models* (37:00) Feelings of affiliation for robots, phenomenological experience in humans and (not) in AI systems* (42:00) Language models and technological systems as cultural artifacts, expressing agency through machines* (44:15) Capabilities development and moral patient status in AI systems* (51:20) Prof. Bryson’s perspectives on recent AI regulation* (1:00:55) Responsibility and recourse, Uber self-driving crash* (1:07:30) “Preparing for AGI,” “Living with AGI,” how to respond to recent AI developments* (1:12:18) OutroLinks:* Professor Bryson’s homepage and Twitter* Papers* Systems AI* Behavior Oriented Design, action selection, key differences in methodology/views between systems AI researchers and e.g. connectionists* Agent architecture as object oriented design (1998)* Intelligence by design: Principles of modularity and coordination for engineering complex adaptive agents (2001)* Cognition* Age-Related Inhibition and Learning Effects: Evidence from Transitive Performance (2013)* Primate Errors in Transitive ‘Inference’: A Two-Tier Learning Model (2007)* Skill Acquisition Through Program-Level Imitation in a Real-Time Domain* Agent-Based Models as Scientific Methodology: A Case Study Analysing Primate Social Behaviour (2008, 2011)* Social learning in a non-social reptile (Geochelone carbonaria) (2010)* Understanding and Addressing Cultural Variation in Costly Antisocial Punishment (2014)* Polarization Under Rising Inequality and Economic Decline (2020)* Semantics derived automatically from language corpora contain human-like biases (2017)* Evolutionary Psychology and Artificial Intelligence: The Impact of Artificial Intelligence on Human Behaviour (2020)* Ethics/Policy* Robots should be slaves (2010)* Standardizing Ethical Design for Artificial Intelligence and Autonomous Systems (2017)* Of, For, and By the People: The Legal Lacuna of Synthetic Persons (2017)* Patiency is not a virtue: the design of intelligent systems and systems of ethics (2018)* Other writing* Reflections on the EU’s AI Act* Is There an AI Cold War?* Living with AGI* One Day, AI Will Seem as Human as Anyone. What Then? Get full access to The Gradient at thegradientpub.substack.com/subscribe

Daniel Situnayake: AI on the Edge
In episode 67 of The Gradient Podcast, Daniel Bashir speaks to Daniel Situnayake. Daniel is head of Machine Learning at Edge Impulse. He is co-author of the O’Reilly books "AI at the Edge" and "TinyML". Previously, he’s worked on the Tensorflow Lite team at Google AI and co-founded Tiny Farms, an insect farming company. Daniel has also lectured in AIDC technologies at Birmingham City University.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (1:40) Daniel S Origin Story: computer networking, RFID/barcoding, earlier jobs, Tiny Farms, Tensorflow Lite, writing on TinyML, and Edge Impulse* (15:30) Edge AI and questions of embodiment/intelligence in AI* (21:00) The role of hardware, other constraints in edge AI* (25:00) Definitions of intelligence* (29:45) What is edge AI?* (37:30) The spectrum of edge devices* (43:45) Innovations in edge AI (architecture, frameworks/toolchains, quantization)* (53:45) Model compression tradeoffs in edge* (1:00:30) Federated learning and challenges* (1:09:00) Intro to Edge Impulse* (1:20:30) Feature engineering for edge systems, fairness considerations* (1:25:50) Edge AI and axes in AI (large/small, ethereal/embodied)* (1:37:00) Daniel and Daniel go off the rails on panpsychism* (1:54:20) Daniel’s advice for aspiring AI practitioners* (1:57:20) OutroLinks:* Daniel’s Twitter and blog* Edge Impulse Get full access to The Gradient at thegradientpub.substack.com/subscribe

Soumith Chintala: PyTorch
In episode 66 of The Gradient Podcast, Daniel Bashir speaks to Soumith Chintala.Soumith is a Research Engineer at Meta AI Research in NYC. He is the co-creator and lead of Pytorch, and maintains a number of other open-source ML projects including Torch-7 and EBLearn. Soumith has previously worked on robotics, object and human detection, generative modeling, AI for video games, and ML systems research.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:30) Soumith’s intro to AI journey to Pytorch* (05:00) State of computer vision early in Soumith’s career* (09:15) Institutional inertia and sunk costs in academia, identifying fads* (12:45) How Soumith started working on GANs, frustrations* (17:45) State of ML frameworks early in the deep learning era, differentiators* (23:50) Frameworks and leveling the playing field, exceptions* (25:00) Contributing to Torch and evolution into Pytorch* (29:15) Soumith’s product vision for ML frameworks* (32:30) From product vision to concrete features in Pytorch* (39:15) Progressive disclosure of complexity (Chollet) in Pytorch* (41:35) Building an open source community* (43:25) The different players in today’s ML framework ecosystem* (49:35) ML frameworks pioneered by Yann LeCun and Léon Bottou, their influences on Pytorch* (54:37) Pytorch 2.0 and looking to the future* (58:00) Soumith’s adventures in household robotics* (1:03:25) Advice for aspiring ML practitioners* (1:07:10) Be cool like Soumith and subscribe :)* (1:07:33) OutroLinks:* Soumith’s Twitter and homepage* Papers* Convolutional Neural Networks Applied to House Numbers Digit Classification* GANs: LAPGAN, DCGAN, Wasserstein GAN* Automatic differentiation in PyTorch* PyTorch: An Imperative Style, High-Performance Deep Learning Library Get full access to The Gradient at thegradientpub.substack.com/subscribe

Sewon Min: The Science of Natural Language
In episode 65 of The Gradient Podcast, Daniel Bashir speaks to Sewon Min.Sewon is a fifth-year PhD student in the NLP group at the University of Washington, advised by Hannaneh Hajishirzi and Luke Zettlemoyer. She is a part-time visiting researcher at Meta AI and a recipient of the JP Morgan PhD Fellowship. She has previously spent time at Google Research and Salesforce research.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (03:00) Origin Story* (04:20) Evolution of Sewon’s interests, question-answering and practical NLP* (07:00) Methodology concerns about benchmarks* (07:30) Multi-hop reading comprehension* (09:30) Do multi-hop QA benchmarks actually measure multi-hop reasoning?* (12:00) How models can “cheat” multi-hop benchmarks* (13:15) Explicit compositionality* (16:05) Commonsense reasoning and background information* (17:30) On constructing good benchmarks* (18:40) AmbigQA and ambiguity* (22:20) Types of ambiguity* (24:20) Practical possibilities for models that can handle ambiguity* (25:45) FaVIQ and fact-checking benchmarks* (28:45) External knowledge* (29:45) Fact verification and “complete understanding of evidence”* (31:30) Do models do what we expect/intuit in reading comprehension?* (34:40) Applications for fact-checking systems* (36:40) Intro to in-context learning (ICL)* (38:55) Example of an ICL demonstration* (40:45) Rethinking the Role of Demonstrations and what matters for successful ICL* (43:00) Evidence for a Bayesian inference perspective on ICL* (45:00) ICL + gradient descent and what it means to “learn”* (47:00) MetaICL and efficient ICL* (49:30) Distance between tasks and MetaICL task transfer* (53:00) Compositional tasks for language models, compositional generalization* (55:00) The number and diversity of meta-training tasks* (58:30) MetaICL and Bayesian inference* (1:00:30) Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations* (1:02:00) The copying effect* (1:03:30) Copying effect for non-identical examples* (1:06:00) More thoughts on ICL* (1:08:00) Understanding Chain-of-Thought Prompting* (1:11:30) Bayes strikes again* (1:12:30) Intro to Sewon’s text retrieval research* (1:15:30) Dense Passage Retrieval (DPR)* (1:18:40) Similarity in QA and retrieval* (1:20:00) Improvements for DPR* (1:21:50) Nonparametric Masked Language Modeling (NPM)* (1:24:30) Difficulties in training NPM and solutions* (1:26:45) Follow-on work* (1:29:00) Important fundamental limitations of language models* (1:31:30) Sewon’s experience doing a PhD* (1:34:00) Research challenges suited for academics* (1:35:00) Joys and difficulties of the PhD* (1:36:30) Sewon’s advice for aspiring PhDs* (1:38:30) Incentives in academia, production of knowledge* (1:41:50) OutroLinks:* Sewon’s homepage and Twitter* Papers* Solving and re-thinking benchmarks* Multi-hop Reading Comprehension through Question Decomposition and Rescoring / Compositional Questions Do Not Necessitate Multi-hop Reasoning* AmbigQA: Answering Ambiguous Open-domain Questions* FaVIQ: FAct Verification from Information-seeking Questions* Language Modeling* Rethinking the Role of Demonstrations* MetaICL: Learning to Learn In Context* Towards Understanding CoT Prompting* Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations* Text representation/retrieval* Dense Passage Retrieval* Nonparametric Masked Language Modeling Get full access to The Gradient at thegradientpub.substack.com/subscribe

Richard Socher: Re-Imagining Search
In episode 64 of The Gradient Podcast, Daniel Bashir speaks to Richard Socher.Richard is founder and CEO of you.com, a new search engine that lets you personalize your search workflow and eschews tracking and invasive ads. Richard was previously Chief Scientist at Salesforce where he led work on fundamental and applied research, product incubation, CRM search, customer service automation and a cross-product AI platform. He was an adjunct professor at Stanford’s CS department as well as founder and CEO/CTO of MetaMind, which was acquired by Salesforce in 2016. He received his PhD from Stanford’s CS Department in 2014.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:20) Richard Socher origin story + time at Metamind, Salesforce (AI Economist, CTRL, ProGen)* (22:00) Why Richard advocated for deep learning in NLP* (27:00) Richard’s perspective on language* (32:20) Is physical grounding and language necessary for intelligence?* (40:10) Frankfurtian b******t and language model utterances as truth* (47:00) Lessons from Salesforce Research* (53:00) Balancing fundamental research with product focus* (57:30) The AI Economist + how should policymakers account for limitations?* (1:04:50) you.com, the chatbot wars, and taking on search giants* (1:13:50) Re-imagining the vision for and components of a search engine* (1:18:00) The future of generative models in search and the internet* (1:28:30) Richard’s advice for early-career technologists* (1:37:00) OutroLinks:* Richard’s Twitter * YouChat by you.com* Careers at you.com* Papers mentioned* Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions* Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank* Grounded Compositional Semantics for Finding and Describing Images with Sentences* The AI Economist* ProGen* CTRL Get full access to The Gradient at thegradientpub.substack.com/subscribe

Joe Edelman: Meaning-Aligned AI
In episode 63 of The Gradient Podcast, Daniel Bashir speaks to Joe Edelman.Joe developed the meaning-based organizational metrics at Couchsurfing.com, then co-founded the Center for Humane Technology with Tristan Harris, and coined the term “Time Well Spent” for a family of metrics adopted by teams at Facebook, Google, and Apple. Since then, he's worked on the philosophical underpinnings for new business metrics, design methods, and political movements. The central idea is to make people's sources of meaning explicit, so that how meaningful or meaningless things are can be rigorously accounted for. His previous career was in HCI and programming language design.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro (yes Daniel is trying a new intro format)* (01:30) Joe’s origin story* (07:15) Revealed preferences and personal meaning, recommender systems* (12:30) Is using revealed preferences necessary? * (17:00) What are values and how do you detect them? * (24:00) Figuring out what’s meaningful to us* (28:45) The decline of spaces and togetherness* (35:00) Individualism and economic/political theory, tensions between collectivism/individualism* (41:00) What it looks like to build spaces, Habitat* (47:15) Cognitive effects of social platforms* (51:45) Atomized communication, re-imagining chat apps* (55:50) Systems for social groups and medium independence* (1:02:45) Spaces being built today* (1:05:15) Joe is building research groups! Get in touch :)* (1:05:40) OutroLinks:* Joe's 80m lecture on techniques for rebuilding society on meaning (youtube, transcript)* The discord for Rebuilding Meaning—join if you'd like to help build ML models or metrics using the methods discussed* Writing/papers mentioned:* Tech products (that don’t cause depression and war)* Values, Preferences, Meaningful Choice* Social Programming Considered as a Habitat for Groups* Is Anything Worth Maximizing* Joe’s homepage, Twitter, and YouTube page Get full access to The Gradient at thegradientpub.substack.com/subscribe

Ed Grefenstette: Language, Semantics, Cohere
In episode 62 of The Gradient Podcast, Daniel Bashir speaks to Ed Grefenstette.Ed is Head of Machine Learning at Cohere and an Honorary Professor at University College London. He previously held research scientist positions at Facebook AI Research and DeepMind, following a stint as co-founder and CTO of Dark Blue Labs. Before his time in industry, Ed worked at Oxford’s Department of Computer Science as a lecturer and Fulford Junior Research Fellow at Somerville College. Ed also received his MSc and DPhil from Oxford’s Computer Science Department.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:18) The Ed Grefenstette Origin Story* (08:15) Distributional semantics and Ed’s PhD research* (14:30) Extending the distributional hypothesis, later Wittgenstein* (18:00) Recovering parse trees in LMs, can LLMs understand communication and not just bare language?* (23:15) LMs capture something about pragmatics, proxies for grounding and pragmatics* (25:00) Human-in-the-loop training and RLHF—what is the essential differentiator? * (28:15) A convolutional neural network for modeling sentences, relationship to attention* (34:20) Difficulty of constructing supervised learning datasets, benchmark-driven development* (40:00) Learning to Transduce with Unbounded Memory, Neural Turing Machines* (47:40) If RNNs are like finite state machines, where are transformers? * (51:40) Cohere and why Ed joined* (56:30) Commercial applications of LLMs and Cohere’s product* (59:00) Ed’s reply to stochastic parrots and thoughts on consciousness* (1:03:30) Lessons learned about doing effective science* (1:05:00) Where does scaling end? * (1:07:00) Why Cohere is an exciting place to do science* (1:08:00) Ed’s advice for aspiring ML {researchers, engineers, etc} and the role of communities in science* (1:11:45) Cohere for AI plug!* (1:13:30) OutroLinks:* Ed’s homepage and Twitter* (some of) Ed’s Papers* Experimental support for a categorical compositional distributional model of meaning* Multi-step regression learning* “Not not bad” is not “bad”* Towards a formal distributional semantics* A CNN for modeling sentences* Teaching machines to read and comprehend* Reasoning about entailment with neural attention* Learning to Transduce with Unbounded Memory* Teaching Artificial Agents to Understand Language by Modelling Reward* Other things mentioned* Large language models are not zero-shot communicators (Laura Ruis + others and Ed)* Looped Transformers as Programmable Computers and our Update 43 covering this paper* Cohere and Cohere for AI (+ earlier episode w/ Sara Hooker on C4AI)* David Chalmers interview on AI + consciousness Get full access to The Gradient at thegradientpub.substack.com/subscribe

Ken Liu: What Science Fiction Can Teach Us
In episode 61 of The Gradient Podcast, Daniel Bashir speaks to Ken Liu.Ken is an author of speculative fiction. A winner of the Nebula, Hugo, and World Fantasy awards, he is the author of silkpunk epic fantasy series Dandelion Dynasty and short story collections The Paper Menagerie and Other Stories and The Hidden Girl and Other Stories. Prior to writing full-time, Ken worked as a software engineer, corporate lawyer, and litigation consultant.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:00) How Ken Liu became Ken Liu: A Saga* (03:10) Time in the tech industry, interest in symbolic machines* (04:40) Determining what stories to write, (07:00) art as failed communication* (07:55) Law as creating abstract machines, importance of successful communication, stories in law* (13:45) Misconceptions about science fiction* (18:30) How we’ve been misinformed about literature and stories in school, stories as expressing multivalent truths, Dickens on narration (29:00)* (31:20) Stories as imposing structure on the world* (35:25) Silkpunk as aesthetic and writing approach* (39:30) If modernity is a translated experience, what is it translated from? Alternative sources for the American pageant* (47:30) The value of silkpunk for technologists and building the future* (52:40) The engineer as poet* (59:00) Technology language as constructing societies, what it is to be a technologist* (1:04:00) The technology of language* (1:06:10) The Google Wordcraft Workshop and co-writing with LaMDA* (1:14:10) Possibilities and limitations of LMs in creative writing* (1:18:45) Ken’s short fiction* (1:19:30) Short fiction as a medium* (1:24:45) “The Perfect Match” (from The Paper Menagerie and other stories)* (1:34:00) Possibilities for better recommender systems* (1:39:35) “Real Artists” (from The Hidden Girl and other stories)* (1:47:00) The scaling hypothesis and creativity* (1:50:25) “The Gods have not died in vain” & Moore’s Proof epigraph (The Hidden Girl)* (1:53:10) More of The Singularity Trilogy (The Hidden Girl)* (1:58:00) The role of science fiction today and how technologists should engage with stories* (2:01:53) OutroLinks:* Ken’s homepage* The Dandelion Dynasty Series: Speaking Bones is out in paperback* Books/Stories/Projects Mentioned* “Evaluative Soliloquies” in Google Wordcraft* The Paper Menagerie and Other Stories* The Hidden Girl and Other Stories Get full access to The Gradient at thegradientpub.substack.com/subscribe

Hattie Zhou: Lottery Tickets and Algorithmic Reasoning in LLMs
In episode 60 of The Gradient Podcast, Daniel Bashir speaks to Hattie Zhou.Hattie is a PhD student at the Université de Montréal and Mila. Her research focuses on understanding how and why neural networks work, based on the belief that the performance of modern neural networks exceeds our understanding and that building more capable and trustworthy models requires bridging this gap. Prior to Mila, she spent time as a data scientist at Uber and did research with Uber AI Labs.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:55) Hattie’s Origin Story, Uber AI Labs, empirical theory and other sorts of research* (10:00) Intro to the Lottery Ticket Hypothesis & Deconstructing Lottery Tickets* (14:30) Lottery tickets as lucky initialization* (17:00) Types of masking and the “masking is training” claim* (24:00) Type-0 masks and weight evolution over long training trajectories* (27:00) Can you identify good masks or training trajectories a priori?* (29:00) The role of signs in neural net initialization* (35:27) The Supermask* (41:00) Masks to probe pretrained models and model steerability* (47:40) Fortuitous Forgetting in Connectionist Networks* (54:00) Relationships to other work (double descent, grokking, etc.)* (1:01:00) The iterative training process in fortuitous forgetting, scale and value of exploring alternatives* (1:03:35) In-Context Learning and Teaching Algorithmic Reasoning* (1:09:00) Learning + algorithmic reasoning, prompting strategy* (1:13:50) What’s happening with in-context learning?* (1:14:00) Induction heads* (1:17:00) ICL and gradient descent* (1:22:00) Algorithmic prompting vs discovery* (1:24:45) Future directions for algorithmic prompting* (1:26:30) Interesting work from NeurIPS 2022* (1:28:20) Hattie’s perspective on scientific questions people pay attention to, underrated problems* (1:34:30) Hattie’s perspective on ML publishing culture* (1:42:12) OutroLinks:* Hattie’s homepage and Twitter* Papers* Deconstructing Lottery Tickets: Zeros, signs, and the Supermask* Fortuitous Forgetting in Connectionist Networks* Teaching Algorithmic Reasoning via In-context Learning Get full access to The Gradient at thegradientpub.substack.com/subscribe

Kyunghyun Cho: Neural Machine Translation, Language, and Doing Good Science
In episode 59 of The Gradient Podcast, Daniel Bashir speaks to Professor Kyunghyun Cho.Professor Cho is an associate professor of computer science and data science at New York University and CIFAR Fellow of Learning in Machines & Brains. He is also a senior director of frontier research at the Prescient Design team within Genentech Research & Early Development. He was a research scientist at Facebook AI Research from 2017-2020 and a postdoctoral fellow at University of Montreal under the supervision of Prof. Yoshua Bengio after receiving his MSc and PhD degrees from Aalto University. He received the Samsung Ho-Am Prize in Engineering in 2021.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:15) How Professor Cho got into AI, going to Finland for a PhD* (06:30) Accidental and non-accidental parts of Prof Cho’s journey, the role of timing in career trajectories* (09:30) Prof Cho’s M.Sc. thesis on Restricted Boltzmann Machines* (17:00) The state of autodiff at the time* (20:00) Finding non-mainstream problems and examining limitations of mainstream approaches, anti-dogmatism, Yoshua Bengio appreciation* (24:30) Detaching identity from work, scientific training* (26:30) The rest of Prof Cho’s PhD, the first ICLR conference, working in Yoshua Bengio’s lab* (34:00) Prof Cho’s isolation during his PhD and its impact on his work—transcending insecurity and working on unsexy problems* (41:30) The importance of identifying important problems and developing an independent research program, ceiling on the number of important research problems* (46:00) Working on Neural Machine Translation, Jointly Learning to Align and Translate* (1:01:45) What RNNs and earlier NN architectures can still teach us, why transformers were successful* (1:08:00) Science progresses gradually* (1:09:00) Learning distributed representations of sentences, extending the distributional hypothesis* (1:21:00) Difficulty and limitations in evaluation—directions of dynamic benchmarks, trainable evaluation metrics* (1:29:30) Mixout and AdapterFusion: fine-tuning and intervening on pre-trained models, pre-training as initialization, destructive interference* (1:39:00) Analyzing neural networks as reading tea leaves* (1:44:45) Importance of healthy skepticism for scientists* (1:45:30) Language-guided policies and grounding, vision-language navigation* (1:55:30) Prof Cho’s reflections on 2022* (2:00:00) Obligatory ChatGPT content* (2:04:50) Finding balance* (2:07:15) OutroLinks:* Professor Cho’s homepage and Twitter* Papers* M.Sc. thesis and PhD thesis* NMT and attention* Properties of NMT, * Learning Phrase Representations* Neural machine translation by jointly learning to align and translate * More recent work* Learning Distributed Representations of Sentences from Unlabelled Data* Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models* Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes’ Rule* AdapterFusion: Non-Destructive Task Composition for Transfer Learning Get full access to The Gradient at thegradientpub.substack.com/subscribe

Steve Miller: Will AI Take Your Job? It's Not So Simple.
In episode 58 of The Gradient Podcast, Daniel Bashir speaks to Professor Steve Miller.Steve is a Professor Emeritus of Information Systems at Singapore Management University. Steve served as Founding Dean for the SMU School of Information Systems, and established and developed the technology core of SIS research and project capabilities in Cybersecurity, Data Management & Analytics, Intelligent Systems & Decision Analytics, and Software & Cyber-Physical Systems, as well as the management science oriented capability in Information Systems & Management. Steve works closely with a number of Singapore government ministries and agencies via steering committees, advisory boards, and advisory appointments. Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:40) Steve’s evolution of interests in AI, time in academia and industry* (05:15) How different is this “industrial revolution”?* (10:00) What new technologies enable, the human role in technology’s impact on jobs* (11:35) Automation and augmentation and the realities of integrating new technologies in the workplace* (21:50) Difficulties of applying AI systems in real-world contexts* (32:45) Re-calibrating human work with intelligent machines* (39:00) Steve’s thinking on the nature of human/machine intelligence, implications for human/machine hybrid work* (47:00) Tradeoffs in using ML systems for automation/augmentation* (52:40) Organizational adoption of AI and speed* (1:01:55) Technology adoption is more than just a technology problem* (1:04:50) Progress narratives, “safe to speed”* (1:10:27) OutroLinks:* Steve’s SMU Faculty Profile and Google Scholar* Working with AI by Steve Miller and Tom Davenport Get full access to The Gradient at thegradientpub.substack.com/subscribe

Blair Attard-Frost: Canada’s AI strategy and the ethics of AI business practices
In episode 57 of The Gradient Podcast, Andrey Kurenkov speaks to Blair Attard-Frost.Note: this interview was recorded 8 months ago, and some aspects of Canada’s AI strategy have changed since then. It is still a good overview of AI governance and other topics, however.Blair is a PhD Candidate at the University of Toronto’s Faculty of Information who researches the governance and management of artificial intelligence. More specifically, they are interested in the social construction of intelligence, unintelligence, and artificial intelligence, the relationship between organizational values and AI use, and the political economy, governance, and ethics of AI value chains. They integrate perspectives from service sciences, cognitive sciences, public policy, information management, and queer studies for their research.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter or MastodonOutline:* Intro* Getting into AI research* What is AI governance* Canada’s AI strategy* Other interestsLinks:* Once a promising leader, Canada’s artificial-intelligence strategy is now a fragmented laggard* The Ethics of AI Business Practices: A Review of 47 Guidelines Get full access to The Gradient at thegradientpub.substack.com/subscribe

Linus Lee: At the Boundary of Machine and Mind
In episode 56 of The Gradient Podcast, Daniel Bashir speaks to Linus Lee. Linus is an independent researcher interested in the future of knowledge representation and creative work aided by machine understanding of language. He builds interfaces and knowledge tools that expand the domain of thoughts we can think and qualia we can feel. Linus has been writing online since 2014–his blog boasts half a million words–and has built well over 100 side projects. He has also spent time as a software engineer at Replit, Hack Club, and Spensa, and was most recently a Researcher in Residence at Betaworks in New York. Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:00) Linus’s background and interests, vision-language models* (07:45) Embodiment and limits for text-image* (11:35) Ways of experiencing the world* (16:55) Origins of the handle “thesephist”, languages* (25:00) Math notation, reading papers* (29:20) Operations on ideas* (32:45) Overview of Linus’s research and current work* (41:30) The Oak and Ink languages, programming languages* (49:30) Personal search engines: Monocle and Reverie, what you can learn from personal data* (55:55) Web browsers as mediums for thought* (1:01:30) This AI Does Not Exist* (1:03:05) Knowledge representation and notational intelligence* Notation vs language* (1:07:00) What notation can/should be* (1:16:00) Inventing better notations and expanding human intelligence* (1:23:30) Better interfaces between humans and LMs to provide precise control, inefficiency prompt engineering* (1:33:00) Inexpressible experiences* (1:35:42) Linus’s current work using latent space models* (1:40:00) Ideas as things you can hold* (1:44:55) Neural nets and cognitive computing* (1:49:30) Relation to Hardware Lottery and AI accelerators* (1:53:00) Taylor Swift Appreciation Session, mastery and virtuosity* (1:59:30) Mastery/virtuosity and interfaces / learning curves* (2:03:30) Linus’s stories, the work of fiction* (2:09:00) Linus’s thoughts on writing* (2:14:20) A piece of writing should be focused* (2:16:15) On proving yourself* (2:28:00) OutroLinks:* Linus’s Twitter and website Get full access to The Gradient at thegradientpub.substack.com/subscribe

Suresh Venkatasubramanian: An AI Bill of Rights
In episode 55 of The Gradient Podcast, Daniel Bashir speaks to Professor Suresh Venkatasubramanian. Professor Venkatasubramanian is a Professor of Computer Science and Data Science at Brown University, where his research focuses on algorithmic fairness and the impact of automated decision-making systems in society. He recently served as Assistant Director for Science and Justice in the White House Office of Science and Technology Policy, where he co-authored the Blueprint for an AI Bill of Rights.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:25) Suresh’s journey into AI and policymaking* (08:00) The complex graph of designing and deploying “fair” AI systems* (09:50) The Algorithmic Lens* (14:55) “Getting people into a room” isn’t enough* (16:30) Failures of incorporation* (21:10) Trans-disciplinary vs interdisciplinary, the limiting nature of “my lane” / “your lane” thinking, going beyond existing scientific and philosophical ideas* (24:50) The trolley problem is annoying, its usefulness and limitations* (25:30) Breaking the frame of a discussion, self-driving doesn’t fit into the parameters of the trolley problem* (28:00) Acknowledging frames and their limitations* (29:30) Social science’s inclination to critique, flaws and benefits of solutionism* (30:30) Computer security as a model for thinking about algorithmic protections, the risk of failure in policy* (33:20) Suresh’s work on recourse* (38:00) Kantian autonomy and the value of recourse, non-Western takes and issues with individual benefit/harm as the most morally salient question* (41:00) Community as a valuable entity and its implications for algorithmic governance, surveillance systems* (43:50) How Suresh got involved in policymaking / the OSTP* (46:50) Gathering insights for the AI Bill of Rights Blueprint* (51:00) One thing the Bill did miss… Struggles with balancing specificity and vagueness in the Bill* (54:20) Should “automated system” be defined in legislation? Suresh’s approach and issues with the EU AI Act* (57:45) The danger of definitions, overlap with chess world controversies* (59:10) Constructive vagueness in law, partially theorized agreements* (1:02:15) Digital privacy and privacy fundamentalism, focus on breach of individual autonomy as the only harm vector* (1:07:40) GDPR traps, the “legacy problem” with large companies and post-hoc regulation* (1:09:30) Considerations for legislating explainability* (1:12:10) Criticisms of the Blueprint and Suresh’s responses* (1:25:55) The global picture, AI legislation outside the US, legislation as experiment* (1:32:00) Tensions in entering policy as an academic and technologist* (1:35:00) Technologists need to learn additional skills to impact policy* (1:38:15) Suresh’s advice for technologists interested in public policy* (1:41:20) OutroLinks:* Suresh is on Mastodon @[email protected] (and also Twitter)* Suresh’s blog* Blueprint for an AI Bill of Rights* Papers* Fairness and abstraction in sociotechnical systems* A comparative study of fairness-enhancing interventions in machine learning* The Philosophical Basis of Algorithmic Recourse* Runaway Feedback Loops in Predictive Policing Get full access to The Gradient at thegradientpub.substack.com/subscribe

Pete Florence: Dense Visual Representations, NeRFs, and LLMs for Robotics
In episode 54 of The Gradient Podcast, Andrey Kurenkov speaks with Pete Florence.Note: this was recorded 2 months ago. Andrey should be getting back to putting out some episodes next year. Pete Florence is a Research Scientist at Google Research on the Robotics at Google team inside Brain Team in Google Research. His research focuses on topics in robotics, computer vision, and natural language -- including 3D learning, self-supervised learning, and policy learning in robotics. Before Google, he finished his PhD in Computer Science at MIT with Russ Tedrake.Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00:00) Intro* (00:01:16) Start in AI* (00:04:15) PhD Work with Quadcopters* (00:08:40) Dense Visual Representations * (00:22:00) NeRFs for Robotics* (00:39:00) Language Models for Robotics* (00:57:00) Talking to Robots in Real Time* (01:07:00) Limitations* (01:14:00) OutroPapers discussed:* Aggressive quadrotor flight through cluttered environments using mixed integer programming * Integrated perception and control at high speed: Evaluating collision avoidance maneuvers without maps* High-speed autonomous obstacle avoidance with pushbroom stereo* Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation. (Best Paper Award, CoRL 2018)* Self-Supervised Correspondence in Visuomotor Policy Learning (Best Paper Award, RA-L 2020 )* iNeRF: Inverting Neural Radiance Fields for Pose Estimation.* NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields.* Reinforcement Learning with Neural Radiance Fields* Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language.* Inner Monologue: Embodied Reasoning through Planning with Language Models* Code as Policies: Language Model Programs for Embodied Control Get full access to The Gradient at thegradientpub.substack.com/subscribe

Melanie Mitchell: Abstraction and Analogy in AI
Have suggestions for future podcast guests (or other feedback)? Let us know here!In episode 53 of The Gradient Podcast, Daniel Bashir speaks to Professor Melanie Mitchell. Professor Mitchell is the Davis Professor at the Santa Fe Institute. Her research focuses on conceptual abstraction, analogy-making, and visual recognition in AI systems. She is the author or editor of six books and her work spans the fields of AI, cognitive science, and complex systems. Her latest book is Artificial Intelligence: A Guide for Thinking Humans. Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:20) Melanie’s intro to AI* (04:35) Melanie’s intellectual influences, AI debates over time* (10:50) We don’t have the right metrics for empirical study in AI* (15:00) Why AI is Harder than we Think: the four fallacies* (20:50) Difficulties in understanding what’s difficult for machines vs humans* (23:30) Roles for humanlike and non-humanlike intelligence* (27:25) Whether “intelligence” is a useful word* (31:55) Melanie’s thoughts on modern deep learning advances, brittleness* (35:35) Abstraction, Analogies, and their role in AI* (38:40) Concepts as analogical and what that means for cognition* (41:25) Where does analogy bottom out* (44:50) Cognitive science approaches to concepts* (45:20) Understanding how to form and use concepts is one of the key problems in AI* (46:10) Approaching abstraction and analogy, Melanie’s work / the Copycat architecture* (49:50) Probabilistic program induction as a promising approach to intelligence* (52:25) Melanie’s advice for aspiring AI researchers* (54:40) OutroLinks:* Melanie’s homepage and Twitter* Papers* Difficulties in AI, hype cycles* Why AI is Harder than we think* The Debate Over Understanding in AI’s Large Language Models* What Does It Mean for AI to Understand?* Abstraction, analogies, and reasoning* Abstraction and Analogy-Making in Artificial Intelligence* Evaluating understanding on conceptual abstraction benchmarks Get full access to The Gradient at thegradientpub.substack.com/subscribe

Marc Bellemare: Distributional Reinforcement Learning
Have suggestions for future podcast guests (or other feedback)? Let us know here!In episode 52 of The Gradient Podcast, Daniel Bashir speaks to Professor Marc Bellemare. Professor Bellemare leads the reinforcement learning efforts at Google Brain Montréal and is a core industry member at Mila, where he also holds the Canada CIFAR AI Chair. His PhD work, completed at the University of Alberta, proposed the use of Atari 2600 video games to benchmark progress in reinforcement learning (RL). He was a research scientist at DeepMind from 2013-2017, and his Arcade Learning Environment was very influential in DeepMind’s early RL research and remains one of the most widely-used RL benchmarks today. More recently he collaborated with Loon to deploy deep reinforcement learning to navigate stratospheric balloons. His book on distributional reinforcement learning, published by MIT Press, will be available in Spring 2023.Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (03:10) Marc’s intro to AI and RL* (07:00) Cross-pollination of deep learning research and RL in McGill and UDM* (09:50) PhD work at U Alberta, continual learning, origins of the Arcade Learning Environment (ALE)* (14:40) Challenges in the ALE, how the ALE drove RL research* (23:10) Marc’s thoughts on the Avalon benchmark and what makes a good RL benchmark* (28:00) Opinions on “Reward is Enough” and whether RL gets us to AGI* (32:10) How Marc thinks about priors in learning, “reincarnating RL”* (36:00) Distributional Reinforcement Learning and the problem of distribution estimation* (43:00) GFlowNets and distributional RL* (45:05) Contraction in RL and distributional RL, theory-practice gaps* (52:45) Representation learning for RL* (55:50) Structure of the value function space* (1:00:00) Connections to open-endedness / evolutionary algorithms / curiosity* (1:03:30) RL for stratospheric balloon navigation with Loon* (1:07:30) New ideas for applying RL in the real world* (1:10:15) Marc’s advice for young researchers* (1:12:37) OutroLinks:* Professor Bellemare’s Homepage* Distributional Reinforcement Learning book* Papers* The Arcade Learning Environment: An Evaluation Platform for General Agents* A Distributional Perspective on Reinforcement Learning* Distributional Reinforcement Learning with Quantile Regression* Distributional Reinforcement Learning with Linear Function Approximation* Autonomous navigation of stratospheric balloons using reinforcement learning* A Geometric Perspective on Optimal Representations for Reinforcement Learning* The Value Function Polytope in Reinforcement Learning Get full access to The Gradient at thegradientpub.substack.com/subscribe

François Chollet: Keras and Measures of Intelligence
In episode 51 of The Gradient Podcast, Daniel Bashir speaks to François Chollet.François is a Senior Staff Software Engineer at Google and creator of the Keras deep learning library, which has enabled many people (including me) to get their hands dirty with the world of deep learning. Francois is also the author of the book “Deep Learning with Python.” Francois is interested in understanding the nature of abstraction and developing algorithms capable of autonomous abstraction and democratizing the development and deployment of AI technology, among other topics. Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro + Daniel has far too much fun pronouncing “François Chollet”* (02:00) How François got into AI* (08:00) Keras and user experience, library as product, progressive disclosure of complexity* (18:20) François’ comments on the state of ML frameworks and what different frameworks are useful for* (23:00) On the Measure of Intelligence: historical perspectives* (28:00) Intelligence vs cognition, overlaps* (32:30) How core is Core Knowledge?* (39:15) Cognition priors, metalearning priors* (43:10) Defining intelligence* (49:30) François’ comments on modern deep learning systems* (55:50) Program synthesis as a path to intelligence* (1:02:30) Difficulties on program synthesis* (1:09:25) François’ concerns about current AI* (1:14:30) The need for regulation* (1:16:40) Thoughts on longtermism* (1:23:30) Where we can expect exponential progress in AI* (1:26:35) François’ advice on becoming a good engineer* (1:29:03) OutroLinks:* François’ personal page* On the Measure of Intelligence* Keras Get full access to The Gradient at thegradientpub.substack.com/subscribe

Yoshua Bengio: The Past, Present, and Future of Deep Learning
Happy episode 50! This week’s episode is being released on Monday to avoid Thanksgiving. Have suggestions for future podcast guests (or other feedback)? Let us know here!In episode 50 of The Gradient Podcast, Daniel Bashir speaks to Professor Yoshua Bengio. Professor Bengio is a Full Professor at the Université de Montréal as well as Founder and Scientific Director of the MILA-Quebec AI Institute and the IVADO institute. Best known for his work in pioneering deep learning, Bengio was one of three awardees of the 2018 A.M. Turing Award along with Geoffrey Hinton and Yann LeCun. He is also the awardee of the prestigious Killam prize and, as of this year, the computer scientist with the highest h-index in the world.Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:20) Journey into Deep Learning, PDP and Hinton* (06:45) “Inspired by biology”* (08:30) “Gradient Based Learning Applied to Document Recognition” and working with Yann LeCun* (10:00) What Bengio learned from LeCun (and Larry Jackel) about being a research advisor* (13:00) “Learning Long-Term Dependencies with Gradient Descent is Difficult,” why people don’t understand this paper well enough* (18:15) Bengio’s work on word embeddings and the curse of dimensionality, “A Neural Probabilistic Language Model”* (23:00) Adding more structure / inductive biases to LMs* (24:00) The rise of deep learning and Bengio’s experience, “you have to be careful with inductive biases”* (31:30) Bengio’s “Bayesian posture” in response to recent developments* (40:00) Higher level cognition, Global Workspace Theory* (45:00) Causality, actions as mediating distribution change* (49:30) GFlowNets and RL* (53:30) GFlowNets and actions that are not well-defined, combining with System II and modular, abstract ideas* (56:50) GFlowNets and evolutionary methods* (1:00:45) Bengio on Cartesian dualism* (1:09:30) “When you are famous, it is hard to work on hard problems” (Richard Hamming) and Bengio’s response* (1:11:10) Family background, art and its role in Bengio’s life* (1:14:20) OutroLinks:* Professor Bengio’s Homepage* Papers* Gradient-based learning applied to document recognition* Learning Long-Term Dependencies with Gradient Descent is Difficult* The Consciousness Prior* Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation Get full access to The Gradient at thegradientpub.substack.com/subscribe

Kanjun Qiu and Josh Albrecht: Generally Intelligent
In episode 49 of The Gradient Podcast, Daniel Bashir speaks to Kanjun Qiu and Josh Albrecht. Kanjun and Josh are CEO and CTO of Generally Intelligent, an AI startup aiming to develop general-purpose agents with human-like intelligence that can be safely deployed in the real world. Kanjun and Josh have played these roles together in the past as CEO and CTO of AI recruiting startup Sourceress. Kanjun is also involved with building the SF Neighborhood, and together with Josh invests in early-stage founders at Outset Capital.Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:00) Kanjun’s and Josh’s intros to AI* (06:45) How Kanjun and Josh met and started working together* (08:40) Sourceress and AI in hiring, looking for unusual candidates* (11:30) Generally Intelligent: origins and motivations* (14:55) How Kanjun and Josh think about understanding the fundamentals of intelligence* (17:20) AGI companies and long-term goals* (19:20) How Kanjun and Josh think about intelligence + Generally Intelligent’s approach-agnosticism* (22:30) Skill-acquisition efficiency* (25:18) The Avalon Environment/Benchmark* (27:40) Tasks with shared substrate* (29:00) Blending of different approaches, baseline tuning* (31:15) Approach to safety* (33:33) Issues with interpretability + ML academic practices, ablations* (36:30) Lessons about working with people, company culture* (40:00) Human focus and diversity in companies, tech environment* (44:10) Advice for potential (AI) founders* (47:05) OutroLinks:* Generally Intelligent* Avalon: A Benchmark for RL Generalization* Kanjun’s homepage* Josh’s homepage Get full access to The Gradient at thegradientpub.substack.com/subscribe