98 - Analyzing Information Flow In Transformers, With Elena Voita

What function do the different attention heads se…

NLP Highlights · Allen Institute for Artificial Intelligence

December 9, 201937m 5s

Audio is streamed directly from the publisher (podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

What function do the different attention heads serve in multi-headed attention models? In this episode, Lena describes how to use attribution methods to assess the importance and contribution of different heads in several tasks, and describes a gating mechanism to prune the number of effective heads used when combined with an auxiliary loss. Then, we discuss Lena’s work on studying the evolution of representations of individual tokens in transformers model. Lena’s homepage: https://lena-voita.github.io/ Blog posts: https://lena-voita.github.io/posts/acl19_heads.html https://lena-voita.github.io/posts/emnlp19_evolution.html Papers: https://arxiv.org/abs/1905.09418 https://arxiv.org/abs/1909.01380

← All episodes of NLP Highlights