
TalkRL: The Reinforcement Learning Podcast
74 episodes — Page 2 of 2

Ep 23Thomas Krendl Gilbert
Thomas Krendl Gilbert is a PhD student at UC Berkeley’s Center for Human-Compatible AI, specializing in Machine Ethics and Epistemology. Featured References Hard Choices in Artificial Intelligence: Addressing Normative Uncertainty through Sociotechnical Commitments Roel Dobbe, Thomas Krendl Gilbert, Yonatan Mintz Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles Thomas Krendl Gilbert AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks McKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert and Tom Zick Additional References Political Economy of Reinforcement Learning Systems (PERLS) The Law and Political Economy (LPE) Project The Societal Implications of Deep Reinforcement Learning, Jess Whittlestone, Kai Arulkumaran, Matthew Crosby Robot Brains Podcast: Yann LeCun explains why Facebook would crumble without AI

Ep 22Marc G. Bellemare
Professor Marc G. Bellemare is a Research Scientist at Google Research (Brain team), An Adjunct Professor at McGill University, and a Canada CIFAR AI Chair. Featured References The Arcade Learning Environment: An Evaluation Platform for General Agents Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling Human-level control through deep reinforcement learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg & Demis Hassabis Autonomous navigation of stratospheric balloons using reinforcement learning Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang Additional References CAIDA Talk: A tour of distributional reinforcement learning November 18, 2020 - Marc G. Bellemare Amii AI Seminar Series: Autonomous nav of stratospheric balloons using RL, Marlos C. Machado UMD RLSS | Marc Bellemare | A History of Reinforcement Learning: Atari to Stratospheric Balloons TalkRL: Marlos C. Machado, Dr. Machado also spoke to us about various aspects of ALE and Project Loon in depth Hyperbolic discounting and learning over multiple horizons, Fedus et al 2019 Marc G. Bellemare on Twitter

Ep 21Robert Osazuwa Ness
Robert Osazuwa Ness is an adjunct professor of computer science at Northeastern University, an ML Research Engineer at Gamalon, and the founder of AltDeep School of AI. He holds a PhD in statistics. He studied at Johns Hopkins SAIS and then Purdue University. References Altdeep School of AI, Altdeep on Twitch, Substack, Robert Ness Altdeep Causal Generative Machine Learning Minicourse, Free course Robert Osazuwa Ness on Google Scholar Gamalon Inc Causal Reinforcement Learning talks, Elias Bareinboim The Bitter Lesson, Rich Sutton 2019 The Need for Biases in Learning Generalizations, Tom Mitchell 1980 Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics, Kansky et al 2017

Ep 20Marlos C. Machado
Dr. Marlos C. Machado is a research scientist at DeepMind and an adjunct professor at the University of Alberta. He holds a PhD from the University of Alberta and a MSc and BSc from UFMG, in Brazil. Featured References Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew J. Hausknecht, Michael Bowling Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning [ video ] Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare Efficient Exploration in Reinforcement Learning through Time-Based Representations Marlos C. Machado A Laplacian Framework for Option Discovery in Reinforcement Learning [ video ] Marlos C. Machado, Marc G. Bellemare, Michael H. Bowling Eigenoption Discovery through the Deep Successor Representation Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell Exploration in Reinforcement Learning with Deep Covering Options Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Dimitri Konidaris Autonomous navigation of stratospheric balloons using reinforcement learning Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang Generalization and Regularization in DQN Jesse Farebrother, Marlos C. Machado, Michael Bowling Additional References Amii AI Seminar Series: Marlos C. Machado - Autonomous navigation of stratospheric balloons using RL State of the Art Control of Atari Games Using Shallow Reinforcement Learning, Liang et al Introspective Agents: Confidence Measures for General Value Functions, Sherstan et al

Ep 19Nathan Lambert
Nathan Lambert is a PhD Candidate at UC Berkeley. Featured References Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning Nathan O. Lambert, Albert Wilcox, Howard Zhang, Kristofer S. J. Pister, Roberto Calandra Objective Mismatch in Model-based Reinforcement Learning Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning Nathan O. Lambert, Daniel S. Drew, Joseph Yaconelli, Roberto Calandra, Sergey Levine, Kristofer S.J. Pister On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra Additional References Nathan Lambert's blog Nathan Lambert on Google scholar

Ep 18Kai Arulkumaran
Kai Arulkumaran is a researcher at Araya in Tokyo. Featured References AlphaStar: An Evolutionary Computation Perspective Kai Arulkumaran, Antoine Cully, Julian Togelius Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation Tianhong Dai, Kai Arulkumaran, Tamara Gerbert, Samyakh Tukra, Feryal Behbahani, Anil Anthony Bharath Training Agents using Upside-Down Reinforcement Learning Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber Additional References Araya NNAISENSE Kai Arulkumaran on Google Scholar https://github.com/Kaixhin/rlenvs https://github.com/Kaixhin/Atari https://github.com/Kaixhin/Rainbow Tschiatschek, S., Arulkumaran, K., Stühmer, J. & Hofmann, K. (2018). Variational Inference for Data-Efficient Model Learning in POMDPs. arXiv:1805.09281. Arulkumaran, K., Dilokthanakul, N., Shanahan, M. & Bharath, A. A. (2016). Classifying Options for Deep Reinforcement Learning. International Joint Conference on Artificial Intelligence, Deep Reinforcement Learning Workshop. Garnelo, M., Arulkumaran, K. & Shanahan, M. (2016). Towards Deep Symbolic Reinforcement Learning. Annual Conference on Neural Information Processing Systems, Deep Reinforcement Learning Workshop. Arulkumaran, K., Deisenroth, M. P., Brundage, M. & Bharath, A. A. (2017). Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine. Agostinelli, A., Arulkumaran, K., Sarrico, M., Richemond, P. & Bharath, A. A. (2019). Memory-Efficient Episodic Control Reinforcement Learning with Dynamic Online k-means. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning. Sarrico, M., Arulkumaran, K., Agostinelli, A., Richemond, P. & Bharath, A. A. (2019). Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning.

Ep 17Michael Dennis
Michael Dennis is a PhD student at the Center for Human-Compatible AI at UC Berkeley, supervised by Professor Stuart Russell. I'm interested in robustness in RL and multi-agent RL, specifically as it applies to making the interaction between AI systems and society at large to be more beneficial. --Michael Dennis Featured References Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design [PAIRED] Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine Videos Adversarial Policies: Attacking Deep Reinforcement Learning Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell Homepage and Videos Accumulating Risk Capital Through Investing in Cooperation Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell Quantifying Differences in Reward Functions [EPIC] Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike Additional References Safe Opponent Exploitation, Sam Ganzfried And Tuomas Sandholm 2015 Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning, Natasha Jaques et al 2019 Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research, Leibo et al 2019 Leveraging Procedural Generation to Benchmark Reinforcement Learning, Karl Cobbe et al 2019 Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions, Wang et al 2019 Consequences of Misaligned AI, Zhuang et al 2020 Conservative Agency via Attainable Utility Preservation, Turner et al 2019

Ep 16Roman Ring
Roman Ring is a Research Engineer at DeepMind. Featured References Grandmaster level in StarCraft II using multi-agent reinforcement learning Vinyals et al, 2019 Replicating DeepMind StarCraft II Reinforcement Learning Benchmark with Actor-Critic Methods Roman Ring, 2018 Additional References Relational Deep Reinforcement Learning, Zambaldi et al 2018 StarCraft II: A New Challenge for Reinforcement Learning, Vinyals et al 2017 Safe and Efficient Off-Policy Reinforcement Learning [Retrace(λ)], Munos et al 2016 Sample Efficient Actor-Critic with Experience Replay [ACER], Wang et al 2016 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures [IMPALA/V-trace], Espeholt et al 2018
Ep 15Shimon Whiteson
Shimon Whiteson is a Professor of Computer Science at Oxford University, the head of WhiRL, the Whiteson Research Lab at Oxford, and Head of Research at Waymo UK. Featured References VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson Additional References Shimon Whiteson - Multi-agent RL, MIT Embodied Intelligence Seminar The StarCraft Multi-Agent Challenge, Samvelyan et al 2019 Direct Policy Transfer with Hidden Parameter Markov Decision Processes, Yao et al 2018 Value-Decomposition Networks For Cooperative Multi-Agent Learning, Sunehag et al 2017 Whiteson Research Lab Waymo acquires Latent Logic to accelerate progress towards safe, driverless vehicles, Oxford News Waymo

Ep 14Aravind Srinivas
Aravind Srinivas is a 3rd year PhD student at UC Berkeley advised by Prof. Abbeel. He co-created and co-taught a grad course on Deep Unsupervised Learning at Berkeley. Featured References Data-Efficient Image Recognition with Contrastive Predictive Coding Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord Contrastive Unsupervised Representations for Reinforcement Learning Aravind Srinivas, Michael Laskin, Pieter Abbeel Reinforcement Learning with Augmented Data Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel Additional References CS294-158-SP20 Deep Unsupervised Learning, Berkeley Phasic Policy Gradient, Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman Bootstrap your own latent: A new approach to self-supervised Learning , Grill et al 2020

Ep 13Taylor Killian
Taylor Killian is a Ph.D. student at the University of Toronto and the Vector Institute, and an Intern at Google Brain. Featured References Direct Policy Transfer with Hidden Parameter Markov Decision Processes Yao, Killian, Konidaris, Doshi-Velez Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes Killian, Daulton, Konidaris, Doshi-Velez Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes Killian, Konidaris, Doshi-Velez Counterfactually Guided Policy Transfer in Clinical Settings Killian, Ghassemi, Joshi Additional References Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations, Doshi-Velez, Konidaris Mimic III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care, Komorowski et al

Ep 12Nan Jiang
Nan Jiang is an Assistant Professor of Computer Science at University of Illinois. He was a Postdoc Microsoft Research, and did his PhD at University of Michigan under Professor Satinder Singh. Featured References Reinforcement Learning: Theory and Algorithms Alekh Agarwal Nan Jiang Sham M. Kakade Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford Information-Theoretic Considerations in Batch Reinforcement Learning Jinglin Chen, Nan Jiang Additional References Towards a Unified Theory of State Abstraction for MDPs, Lihong Li, Thomas J. Walsh, Michael L. Littman Doubly Robust Off-policy Value Evaluation for Reinforcement Learning, Nan Jiang, Lihong Li Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization, Nan Jiang, Jiawei Huang Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning, Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue Errata [Robin] I misspoke when I said in domain randomization we want the agent to "ignore" domain parameters. What I should have said is, we want the agent to perform well within some range of domain parameters, it should be robust with respect to domain parameters.

Ep 11Danijar Hafner
Danijar Hafner is a PhD student at the University of Toronto, and a student researcher at Google Research, Brain Team and the Vector Institute. He holds a Masters of Research from University College London. Featured References A deep learning framework for neuroscience Blake A. Richards, Timothy P. Lillicrap , Philippe Beaudoin, Yoshua Bengio, Rafal Bogacz, Amelia Christensen, Claudia Clopath, Rui Ponte Costa, Archy de Berker, Surya Ganguli, Colleen J. Gillon , Danijar Hafner, Adam Kepecs, Nikolaus Kriegeskorte, Peter Latham , Grace W. Lindsay, Kenneth D. Miller , Richard Naud , Christopher C. Pack, Panayiota Poirazi , Pieter Roelfsema , João Sacramento, Andrew Saxe, Benjamin Scellier, Anna C. Schapiro , Walter Senn, Greg Wayne, Daniel Yamins, Friedemann Zenke, Joel Zylberberg, Denis Therien, Konrad P. Kording Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson Dream to Control: Learning Behaviors by Latent Imagination Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Planning to Explore via Self-Supervised World Models Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak Additional ReferencesMastering Atari, Go, Chess and Shogi by Planning with a Learned Model Schrittwieser et al Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Silver et al Shaping Belief States with Generative Environment Models for RL Gregor et al Model-Based Active Exploration Shyam et al Errata [Robin] Around 1:37 I say "some ... world models get confused by random noise". I meant "some curiosity formulations", not "world models"

Ep 10Csaba Szepesvari
Csaba Szepesvari is: Head of the Foundations Team at DeepMind Professor of Computer Science at the University of Alberta Canada CIFAR AI Chair Fellow at the Alberta Machine Intelligence Institute Co-Author of the book Bandit Algorithms along with Tor Lattimore, and author of the book Algorithms for Reinforcement Learning References Bandit based monte-carlo planning, Levente Kocsis, Csaba Szepesvári Bandit Algorithms, Tor Lattimore, Csaba Szepesvári Algorithms for Reinforcement Learning, Csaba Szepesvári The Predictron: End-To-End Learning and Planning, David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris A Bayesian framework for reinforcement learning, Strens Solving Rubik’s Cube with a Robot Hand ; Paper, OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang The Nonstochastic Multiarmed Bandit Problem, Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire Deep Learning with Bayesian Principles, Mohammad Emtiyaz Khan Tackling climate change with Machine Learning David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio

Ep 9Ben Eysenbach
Ben Eysenbach is a PhD student in the Machine Learning Department at Carnegie Mellon University. He was a Resident at Google Brain, and studied math and computer science at MIT. He co-founded the ICML Exploration in Reinforcement Learning workshop. Featured References Diversity is All You Need: Learning Skills without a Reward Function Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine Search on the Replay Buffer: Bridging Planning and Reinforcement Learning Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine Additional References Behaviour Suite for Reinforcement Learning, Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt Learning Latent Plans from Play, Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, Pierre Sermanet Finale Doshi-Velez Emma Brunskill Closed-loop optimization of fast-charging protocols for batteries with machine learning, Peter Attia, Aditya Grover, Norman Jin, Kristen Severson, Todor Markov, Yang-Hung Liao, Michael Chen, Bryan Cheong, Nicholas Perkins, Zi Yang, Patrick Herring, Muratahan Aykol, Stephen Harris, Richard Braatz, Stefano Ermon, William Chueh CMU 10-703 Deep Reinforcement Learning, Fall 2019, Carnegie Mellon University ICML Exploration in Reinforcement Learning workshop
Ep 8NeurIPS 2019 Deep RL Workshop
Thank you to all the presenters that participated. I covered as many as I could given the time and crowds, if you were not included and wish to be, please email [email protected] More details on the official NeurIPS Deep RL Workshop site. 0:23 Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms; Matthia Sabatelli (University of Liege); Gilles Louppe (University of Liège); Pierre Geurts (University of Liège); Marco Wiering (University of Groningen) [external pdf link] 4:16 Single Deep Counterfactual Regret Minimization; Eric Steinberger (University of Cambridge). 5:38 On the Convergence of Episodic Reinforcement Learning Algorithms at the Example of RUDDER; Markus Holzleitner (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria); José Arjona-Medina (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria); Marius-Constantin Dinu (LIT AI Lab / University Linz ); Sepp Hochreiter (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria). 9:33 Objective Mismatch in Model-based Reinforcement Learning; Nathan Lambert (UC Berkeley); Brandon Amos (Facebook); Omry Yadan (Facebook); Roberto Calandra (Facebook). 10:51 Option Discovery using Deep Skill Chaining; Akhil Bagaria (Brown University); George Konidaris (Brown University). 13:44 Blue River Controls: A toolkit for Reinforcement Learning Control Systems on Hardware; Kirill Polzounov (University of Calgary); Ramitha Sundar (Blue River Technology); Lee Reden (Blue River Technology). 14:52 LeDeepChef: Deep Reinforcement Learning Agent for Families of Text-Based Games; Leonard Adolphs (ETHZ); Thomas Hofmann (ETH Zurich). 16:30 Accelerating Training in Pommerman with Imitation and Reinforcement Learning; Hardik Meisheri (TCS Research); Omkar Shelke (TCS Research); Richa Verma (TCS Research); Harshad Khadilkar (TCS Research). 17:27 Dream to Control: Learning Behaviors by Latent Imagination; Danijar Hafner (Google); Timothy Lillicrap (DeepMind); Jimmy Ba (University of Toronto); Mohammad Norouzi (Google Brain) [external pdf link]. 20:48 Adaptive Temperature Tuning for Mellowmax in Deep Reinforcement Learning; Seungchan Kim (Brown University); George Konidaris (Brown). 22:05 Meta-learning curiosity algorithms; Ferran Alet (MIT); Martin Schneider (MIT); Tomas Lozano-Perez (MIT); Leslie Kaelbling (MIT). 24:09 Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards; Xingyu Lu (Berkeley); Stas Tiomkin (BAIR, UC Berkeley); Pieter Abbeel (UC Berkeley). 25:44 Swarm-inspired Reinforcement Learning via Collaborative Inter-agent Knowledge Distillation; Zhang-Wei Hong (Preferred Networks); Prabhat Nagarajan (Preferred Networks); Guilherme Maeda (Preferred Networks). 26:35 Multiplayer AlphaZero; Nicholas Petosa (Georgia Institute of Technology); Tucker Balch (Ga Tech) [external pdf link]. 27:43 Prioritized Sequence Experience Replay; Marc Brittain (Iowa State University); Joshua Bertram (Iowa State University); Xuxi Yang (Iowa State University); Peng Wei (Iowa State University) [external pdf link]. 29:14 Recurrent neural-linear posterior sampling for non-stationary bandits; Paulo Rauber (IDSIA); Aditya Ramesh (USI); Jürgen Schmidhuber (IDSIA - Lugano). 29:36 Improving Evolutionary Strategies With Past Descent Directions; Asier Mujika (ETH Zurich); Florian Meier (ETH Zurich); Marcelo Matheus Gauy (ETH Zurich); Angelika Steger (ETH Zurich) [external pdf link]. 31:40 ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations; Daniel Seita (University of California, Berkeley); David Chan (University of California, Berkeley); Roshan Rao (UC Berkeley); Chen Tang (UC Berkeley); Mandi Zhao (UC Berkeley); John Canny (UC Berkeley) [external pdf link]. 33:05 Bottom-Up Meta-Policy Search; Luckeciano Melo (Aeronautics Institute of Technology); Marcos Máximo (Aeronautics Institute of Technology); Adilson Cunha (Aeronautics Institute of Technology) [external pdf link]. 33:37 MERL: Multi-Head Reinforcement Learning; Yannis Flet-Berliac (University of Lille / Inria); Philippe Preux (INRIA) [external pdf link]. 35:30 Emergen...

Ep 7Scott Fujimoto
Scott Fujimoto is a PhD student at McGill University and Mila. He is the author of TD3 as well as some of the recent developments in batch deep reinforcement learning. Featured References Addressing Function Approximation Error in Actor-Critic Methods Scott Fujimoto, Herke van Hoof, David Meger Off-Policy Deep Reinforcement Learning without Exploration Scott Fujimoto, David Meger, Doina Precup Benchmarking Batch Deep Reinforcement Learning Algorithms Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau Additional References Striving for Simplicity in Off-Policy Deep Reinforcement Learning Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard Continuous control with deep reinforcement learning Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra Distributed Distributional Deterministic Policy Gradients Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap

Ep 6Jessica Hamrick
Dr. Jessica Hamrick is a Research Scientist at DeepMind. She holds a PhD in Psychology from UC Berkeley. Featured References Structured agents for physical construction Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly L. Stachenfeld, Pushmeet Kohli, Peter W. Battaglia, Jessica B. Hamrick Analogues of mental simulation and imagination in deep learning Jessica Hamrick Additional References Metacontrol for Adaptive Imagination-Based Optimization Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia Surprising Negative Results for Generative Adversarial Tree Search Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Zachary C Lipton, Animashree Anandkumar Metareasoning and Mental Simulation Jessica B. Hamrick Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis Object-oriented state editing for HRL Victor Bapst, Alvaro Sanchez-Gonzalez, Omar Shams, Kimberly Stachenfeld, Peter W. Battaglia, Satinder Singh, Jessica B. Hamrick FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu PILCO: A Model-Based and Data-Efficient Approach to Policy Search Marc Peter Deisenroth, Carl Edward Rasmussen Blueberry Earth Anders Sandberg

Ep 5Pablo Samuel Castro
Dr Pablo Samuel Castro is a Staff Research Software Engineer at Google Brain. He is the main author of the Dopamine RL framework. Featured References A Comparative Analysis of Expected and Distributional Reinforcement Learning Clare Lyle, Pablo Samuel Castro, Marc G. Bellemare A Geometric Perspective on Optimal Representations for Reinforcement Learning Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle Dopamine: A Research Framework for Deep Reinforcement Learning Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, Marc G. Bellemare Dopamine RL framework on github Tensorflow Agents on github Additional References Using Linear Programming for Bayesian Exploration in Markov Decision Processes Pablo Samuel Castro, Doina Precup Using bisimulation for policy transfer in MDPs Pablo Samuel Castro, Doina Precup Rainbow: Combining Improvements in Deep Reinforcement Learning Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver Implicit Quantile Networks for Distributional Reinforcement Learning Will Dabney, Georg Ostrovski, David Silver, Rémi Munos A Distributional Perspective on Reinforcement Learning Marc G. Bellemare, Will Dabney, Rémi Munos
Ep 4Kamyar Azizzadenesheli
Dr. Kamyar Azizzadenesheli is a post-doctorate scholar at Caltech. His research interest is mainly in the area of Machine Learning, from theory to practice, with the main focus in Reinforcement Learning. He will be joining Purdue University as an Assistant CS Professor in Fall 2020. Featured References Efficient Exploration through Bayesian Deep Q-Networks Kamyar Azizzadenesheli, Animashree Anandkumar Surprising Negative Results for Generative Adversarial Tree Search Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Zachary C Lipton, Animashree Anandkumar Maybe a few considerations in Reinforcement Learning Research? Kamyar Azizzadenesheli Additional References Model-Based Reinforcement Learning for Atari Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski Near-optimal Regret Bounds for Reinforcement Learning Thomas Jaksch, Ronald Ortner, Peter Auer Curious Model-Building Control Systems Jürgen Schmidhuber Rainbow: Combining Improvements in Deep Reinforcement Learning Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics Ken Kansky, Tom Silver, David A. Mély, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou, Nimrod Dorfman, Szymon Sidor, Scott Phoenix, Dileep George Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis

Ep 3Antonin Raffin and Ashley Hill
Antonin Raffin is a researcher at the German Aerospace Center (DLR) in Munich, working in the Institute of Robotics and Mechatronics. His research is on using machine learning for controlling real robots (because simulation is not enough), with a particular interest for reinforcement learning. Ashley Hill is doing his thesis on improving control algorithms using machine learning for real time gain tuning. He works mainly with neuroevolution, genetic algorithms, and of course reinforcement learning, applied to mobile robots. He holds a masters degree in Machine learning, and a bachelors in Computer science from the Université Paris-Saclay. Featured References stable-baselines on github Ashley Hill, Antonin Raffin primary authors. S-RL Toolbox Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat Additional References Learning to Drive Smoothly in Minutes, Antonin Raffin Multimodal SRL (best paper at ICRA): Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks, Michelle A. Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg Benchmarking Model-Based Reinforcement Learning, Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba TossingBot: Learning to Throw Arbitrary Objects with Residual Physics Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser Stable Baselines roadmap OpenAI baselines stable-baselines github pull request

Ep 2Michael Littman
Michael L Littman is a professor of Computer Science at Brown University. He was elected ACM Fellow in 2018 "For contributions to the design and analysis of sequential decision making algorithms in artificial intelligence". Featured References Convergent Actor Critic by Humans James MacGlashan, Michael L. Littman, David L. Roberts, Robert Tyler Loftin, Bei Peng, Matthew E. Taylor People teach with rewards and punishments as communication, not reinforcements Mark Ho, Fiery Cushman, Michael L. Littman, Joseph Austerweil Theory of Minds: Understanding Behavior in Groups Through Inverse Planning Michael Shum, Max Kleiman-Weiner, Michael L. Littman, Joshua B. Tenenbaum Personalized education at scale Saarinen, Cater, Littman Additional References Michael Littman papers on Google Scholar, Semantic Scholar Reinforcement Learning on Udacity, Charles Isbell, Michael Littman, Chris Pryby Machine Learning on Udacity, Michael Littman, Charles Isbell, Pushkar Kolhe Temporal Difference Learning and TD-Gammon, Gerald Tesauro Playing Atari with Deep Reinforcement Learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller Ask Me Anything about MOOCs, D Fisher, C Isbell, ML Littman, M Wollowski, et al Reinforcement Learning and Decision Making (RLDM) Conference Algorithms for Sequential Decision Making, Michael Littman's Thesis Machine Learning A Cappella - Overfitting Thriller!, Michael Littman and Charles Isbell feat Infinite Harmony Turbotax Ad 2016: Genius Anna/Michael Littman

Ep 1Natasha Jaques
Natasha Jaques is a PhD candidate at MIT working on affective and social intelligence. She has interned with DeepMind and Google Brain, and was an OpenAI Scholars mentor. Her paper “Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning” received an honourable mention for best paper at ICML 2019. Featured References Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas Tackling climate change with Machine Learning David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio Additional References MIT Media Lab Flight Offsets, Caroline Jaffe, Juliana Cherston, Natasha Jaques Modeling Others using Oneself in Multi-Agent Reinforcement Learning, Roberta Raileanu, Emily Denton, Arthur Szlam, Rob Fergus Inequity aversion improves cooperation in intertemporal social dilemmas, Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, Thore Graepel Sequential Social Dilemma Games on github, Eugene Vinitsky, Natasha Jaques AI Alignment newsletter, Rohin Shah Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions, Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley The social function of intellect, Nicholas Humphrey Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research, Joel Z. Leibo, Edward Hughes, Marc Lanctot, Thore Graepel A Recipe for Training Neural Networks, Andrej Karpathy Emotionally Adaptive Intelligent Tutoring Systems using POMDPs, Natasha Jaques Sapiens, Yuval Noah Harari
About TalkRL Podcast: All Reinforcement Learning, All the Time
trailerAugust 2, 2019 Transcript The idea with TalkRL Podcast is to hear from brilliant folks from across the world of Reinforcement Learning, both research and applications. As much as possible, I want to hear from them in their own language. I try to get to know as much as I can about their work before hand. And Im not here to convert anyone, I want to reach people who are already into RL. So we wont stop to explain what a value function is, for example. Though we also wont assume everyone has read the very latest papers. Why am I doing this? Because it’s a great way to learn from the most inspiring people in the field! There’s so much happening in the universe of RL, and there’s tons of interesting angles and so many fascinating minds to learn from. Now I know there is no shortage of books, papers, and lectures, but so much goes unsaid. I mean I guess if you work at MILA or AMII or Vector Institute, you might be having these conversations over coffee all the time, but I live in a little village in the woods in BC, so for me, these remote interviews are like a great way to have these conversations, and I hope sharing with the community makes it more worthwhile for everyone. In terms of format, the first 2 episodes were interviews in longer form, around an hour long. Going forward, some may be a lot shorter, it depends on the guest. If you want want to be a guest or suggest a guest, goto talkrl.com/about, you will find a link to a suggestion form. Thanks for listening!