Episode 80

More powerful deep learning with transformers (Ep. 84)

October 27, 201937m 44sExplicit

Audio is streamed directly from the publisher (mcdn.podbean.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture.
Such architecture is built on top of another important concept already known to the community: self-attention.
In this episode I explain what these mechanisms are, how they work and why they are so powerful.

Don't forget to subscribe to our Newsletter or join the discussion on our Discord server

References

Attention is all you need
https://arxiv.org/abs/1706.03762
The illustrated transformer
https://jalammar.github.io/illustrated-transformer
Self-attention for generative models
http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf

← All episodes of Data Science at Home