PLAY PODCASTS
172: Transformers and Large Language Models
Episode 172

172: Transformers and Large Language Models

Programming Throwdown

March 11, 20241h 26m

Audio is streamed directly from the publisher (s3.amazonaws.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

172: Transformers and Large Language Models


Intro topic: Is WFH actually WFC?

News/Links:


Book of the Show


Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h


Tool of the Show


Topic: Transformers and Large Language Models

  • How neural networks store information
    • Latent variables
  • Transformers
    • Encoders & Decoders
  • Attention Layers
    • History
      • RNN
        • Vanishing Gradient Problem
      • LSTM
        • Short term (gradient explodes), Long term (gradient vanishes)
    • Differentiable algebra
    • Key-Query-Value
    • Self Attention
  • Self-Supervised Learning & Forward Models
  • Human Feedback
    • Reinforcement Learning from Human Feedback
    • Direct Policy Optimization (Pairwise Ranking)



★ Support this podcast on Patreon ★

Topics

Programming ThrowdownProgramming LanguagesCCJavaPythonObjective C