NLP Highlights
145 episodes — Page 3 of 3

44 - Truly Low Resource NLP, with Anders Søgaard
Anders talks with us about his line of work on doing NLP in languages where you have no linguistic resources other than a Bible translation or other religious works. He and his students have developed methods for annotation projection for both part of speech tagging and dependency parsing, aggregating information from many languages to predict annotations for languages where you have no training data. We talk about low-resource NLP generally, then dive into the specifics of the annotation projection method that Anders used, also touching on a related paper on learning cross-lingual word embeddings. https://www.semanticscholar.org/paper/If-all-you-have-is-a-bit-of-the-Bible-Learning-POS-Agic-Hovy/812965ddce635174b33621aaaa551e5f6199b6c0 https://www.semanticscholar.org/paper/Multilingual-Projection-for-Parsing-Truly-Low-Reso-Agic-Johannsen/1414e3041f4cc3366b6ab49d1dbe9216632b9c78 https://www.semanticscholar.org/paper/Cross-Lingual-Dependency-Parsing-with-Late-Decodin-Schlichtkrull-S%C3%B8gaard/eda636e3abae829cf7ad8e0519fbaec3f29d1e82 https://www.semanticscholar.org/paper/A-Strong-Baseline-for-Learning-Cross-Lingual-Word-S%C3%B8gaard-Goldberg/55ca53050fcd29e43d6dcfb7dfc6a602ec5e6878

43 - Reinforced Video Captioning with Entailment Rewards, with Ramakanth and Mohit
EMNLP 2017 paper by Ramakanth Pasunuru and Mohit Bansal Ram and Mohit join us to talk about their work, which uses reinforcement learning to improve performance on a video captioning task. They directly optimize CIDEr, a popular image/video captioning metric, using policy gradient methods, then use a modified version of CIDEr that penalizes the model when it fails to produce a caption that is _entailed_ by the correct caption. In our discussion, we hit on what video captioning is, what typical models look like for this task, and how the entailment-based reward function is similar to other attempts to be smart about handling paraphrases when evaluating or training language generation models. Unfortunately, due to some technical issues, the audio recording is a little worse than usual for this episode. Our apologies. https://www.semanticscholar.org/paper/Reinforced-Video-Captioning-with-Entailment-Reward-Pasunuru-Bansal/0d11977afa1a6ce90dc3b1f26694492c2ab04773

42 - Generating Sentences by Editing Prototypes, with Kelvin Guu
Paper is by Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, and Percy Liang In this episode, Kelvin tells us how to build a language model that starts from a prototype sentence instead of starting from scratch, enabling much more grammatical and diverse language modeling results. In the process, Kelvin gives us a really good intuitive explanation for how variational autoencoders work, we talk about some of the details of the model they used, and some of the implications of the work - can you use this for better summarization, or machine translation, or dialogue responses? https://www.semanticscholar.org/paper/Generating-Sentences-by-Editing-Prototypes-Guu-Hashimoto/d94d2a9c615b5359ec7d63b1379f9896c48a713f

41 - Cross-Sentence N-ary Relation Extraction with Graph LSTMs, with Nanyun (Violet) Peng
TACL 2017 paper, by Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. Most relation extraction work focuses on binary relations, like (Seattle, located in, Washington), because extracting n-ary relations is difficult. Nanyun (Violet) and her colleagues came up with a model to extract n-ary relations, focusing on drug-mutation-gene interactions, using graph LSTMs (a construct pretty similar to graph CNNs, which was developed around the same time). Nanyun comes on the podcast to tell us about her work. https://www.semanticscholar.org/paper/Cross-Sentence-N-ary-Relation-Extraction-with-Grap-Peng-Poon/03a2f871cc841e8047ab3291806dc301c5144bec

40 - On the State of the Art of Evaluation in Neural Language Models, with Gábor Melis
Recent arxiv paper by Gábor Melis, Chris Dyer, and Phil Blunsom. Gábor comes on the podcast to tell us about his work. He performs a thorough comparison between vanilla LSTMs and recurrent highway networks on the language modeling task, showing that when both methods are given equal amounts of hyperparameter tuning, LSTMs perform better, in contrast to prior work claiming that recurrent highway networks perform better. We talk about parameter tuning, training variance, language model evaluation, and other related issues. https://www.semanticscholar.org/paper/On-the-State-of-the-Art-of-Evaluation-in-Neural-La-Melis-Dyer/2397ce306e5d7f3d0492276e357fb1833536b5d8

39 - Organizing the SemEval task on scientific information extraction, with Isabelle Augenstein
Isabelle Augenstein was the lead organizer of SemEval 2017 task 10, on extracting keyphrases and relations from scientific publications. In this episode we talk about her experience organizing the task, how the task was set up, and what the result of the task was. We also talk about some related work Isabelle did on multi-task learning for keyphrase boundary detection. https://www.semanticscholar.org/paper/SemEval-2017-Task-10-ScienceIE-Extracting-Keyphras-Augenstein-Das/71007219617d0f5e2419c5c1ab1a0d6d0bc40b7e https://www.semanticscholar.org/paper/Multi-Task-Learning-of-Keyphrase-Boundary-Classifi-Augenstein-S%C3%B8gaard/4a0db09d0c19dfeb78900164d46d4b06cd3fc9f3

38 - A Corpus of Natural Language for Visual Reasoning, with Alane Suhr
ACL 2017 best resource paper, by Alane Suhr, Mike Lewis, James Yeh, and Yoav Artzi Alane joins us on the podcast to tell us about the dataset, which contains images paired with natural language descriptions of the images, where the task is to decide whether the description is true or false. Alane tells us about the motivation for creating the new dataset, how it was constructed, the way they elicited complex language from crowd workers, and why the dataset is an interesting target for future research. https://www.semanticscholar.org/paper/A-Corpus-of-Natural-Language-for-Visual-Reasoning-Suhr-Lewis/633453fb633c3c8695f3cd0e6b5350e971058bed

37 - On Statistical Significance, Training Variance, and Why Reporting Score Distributions Matters
In this episode we talk about a couple of recent papers that get at the issue of training variance, and why we should not just take the max from a training distribution when reporting results. Sadly, our current focus on performance in leaderboards only exacerbates these issues, and (in my opinion) encourages bad science. Papers: https://www.semanticscholar.org/paper/Reporting-Score-Distributions-Makes-a-Difference-P-Reimers-Gurevych/0eae432f7edacb262f3434ecdb2af707b5b06481 https://www.semanticscholar.org/paper/Deep-Reinforcement-Learning-that-Matters-Henderson-Islam/90dad036ab47d683080c6be63b00415492b48506

36 - Attention Is All You Need, with Ashish Vaswani and Jakob Uszkoreit
NIPS 2017 paper. We dig into the details of the Transformer, from the "attention is all you need" paper. Ashish and Jakob give us some motivation for replacing RNNs and CNNs with a more parallelizable self-attention mechanism, they describe how this mechanism works, and then we spend the bulk of the episode trying to get their intuitions for _why_ it works. We discuss the positional encoding mechanism, multi-headed attention, trying to use these ideas to replace encoders in other models, and what the self-attention actually learns. Turns out that the lower layers learn something like n-grams (similar to CNNs), and the higher layers learn more semantic-y things, like coreference. https://www.semanticscholar.org/paper/Attention-Is-All-You-Need-Vaswani-Shazeer/0737da0767d77606169cbf4187b83e1ab62f6077 Minor correction: Talking about complexity equations without the paper in front of you can be tricky, and Ashish and Jakob may have gotten some of the details slightly wrong when we were discussing computational complexity. The high-level point is that self-attention is cheaper than RNNs when the hidden dimension is higher than the sequence length. See the paper for more details.

35 - Replicability Analysis for Natural Language Processing, with Roi Reichart
TACL 2017 paper by Rotem Dror, Gili Baumer, Marina Bogomolov, and Roi Reichart. Roi comes on to talk to us about how to make better statistical comparisons between two methods when there are multiple datasets in the comparison. This paper shows that there are more powerful methods available than the occasionally-used Bonferroni correction, and using the better methods can let you make stronger, statistically-valid conclusions. We talk a bit also about how the assumptions you make about your data can affect the statistical tests that you perform, and briefly mention other issues in replicability / reproducibility, like training variance. https://www.semanticscholar.org/paper/Replicability-Analysis-for-Natural-Language-Proces-Dror-Baumer/fa5129ab6fd85f8ff590f9cc8a39139e9dfa8aa2

34 - Translating Neuralese, with Jacob Andreas
ACL 2017 paper by Jacob Andreas, Anca D. Dragan, and Dan Klein. Jacob comes on to tell us about the paper. The paper focuses on multi-agent dialogue tasks, where two learning systems need to figure out a way to communicate with each other to solve some problem. These agents might be figuring out communication protocols that are very different from what humans would come up with in the same situation, and Jacob introduces some clever ways to figure out what the learned communication protocol looks like - you find human messages that induce the same beliefs in the listener as the robot messages. Jacob tells us about this work, and we conclude with a brief discussion of the more general issue of interpreting neural models. https://www.semanticscholar.org/paper/Translating-Neuralese-Andreas-Dragan/49612dc348ce953027bb4aba95adad0c703d76d1

33 - Entity Linking via Joint Encoding of Types, Descriptions, and Context, with Nitish Gupta
EMNLP 2017 paper by Nitish Gupta, Sameer Singh, and Dan Roth. Nitish comes on to talk to us about his paper, which presents a new entity linking model that both unifies prior sources of information into a single neural model, and trains that model in a domain-agnostic way, so it can be transferred to new domains without much performance degradation. https://www.semanticscholar.org/paper/Entity-Linking-via-Joint-Encoding-of-Types-Descrip-Gupta-Singh/a66b6a3ac0aa9af6c178c1d1a4a97fd14a882353

32 - The Effect of Different Writing Tasks on Linguistic Style, with Roy Schwartz
CoNLL 2017 paper, by Roy Schwartz, Maarten Sap, Ioannis Konstas, Leila Zilles, Yejin Choi, and Noah A. Smith. Roy comes on to talk to us about the paper. They analyzed the ROCStories corpus, which was created with three separate tasks on Mechanical Turk. They found that there were enough stylistic differences between the text generated from each task that they could get very good performance on the ROCStories cloze task just by looking at the style, ignoring the information you're supposed to use to solve the task. Roy talks to us about this finding, and about how hard it is to generate datasets that don't have some kind of flaw (hint: they all have problems). https://www.semanticscholar.org/paper/The-Effect-of-Different-Writing-Tasks-on-Linguisti-Schwartz-Sap/1a697d7cf187e51d5ccc23eb3ee5d2950ece5522

31 - Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
ICLR 2017 paper by Hakan Inan, Khashayar Khosravi, Richard Socher, presented by Waleed. The paper presents some tricks for training better language models. It introduces a modified loss function for language modeling, where producing a word that is similar to the target word is not penalized as much as producing a word that is very different to the target (I've seen this in other places, e.g., image classification, but not in language modeling). They also give theoretical and empirical justification for tying input and output embeddings. https://www.semanticscholar.org/paper/Tying-Word-Vectors-and-Word-Classifiers-A-Loss-Fra-Inan-Khosravi/424aef7340ee618132cc3314669400e23ad910ba

30 - Probabilistic Typology: Deep Generative Models of Vowel Inventories
Paper by Ryan Cotterell and Jason Eisner, presented by Matt. This paper won the best paper award at ACL 2017. It's also quite outside the typical focus areas that you see at NLP conferences, trying to build generative models of vowel vocabularies in languages. That means we give quite a bit of set up, to try to help someone not familiar with this area understand what's going on. That makes this episode quite a bit longer than a typical non-interview episode. https://www.semanticscholar.org/paper/Probabilistic-Typology-Deep-Generative-Models-of-V-Cotterell-Eisner/6fad97c4fe0cfb92478d8a17a4e6aaa8637d8222

29 - Neural machine translation via binary code prediction, with Graham Neubig
ACL 2017 paper, by Yusuke Oda and others (including Graham Neubig) at Nara Institute of Science and Technology (Graham is now at Carnegie Mellon University). Graham comes on to talk to us about neural machine translation generally, and about this ACL paper in particular. We spend the first half of the episode talking about major milestones in neural machine translation and why it is so much more effective than previous methods (spoiler: stronger language models help a lot). We then talk about the specifics of binary code prediction, how it's related to a hierarchical or class-factored softmax, and how to make it robust to off-by-one-bit errors. Paper link: https://www.semanticscholar.org/paper/Neural-Machine-Translation-via-Binary-Code-Predict-Oda-Arthur/bbedfd0380eb2e62f1c3b61aaf484d5867e6358d An example of the Language log posts that we discussed: http://languagelog.ldc.upenn.edu/nll/?p=33613 (there are many more).

28 - Data Programming: Creating Large Training Sets, Quickly
NIPS 2016 paper by Alexander Ratner and coauthors in Chris Ré's group at Stanford, presented by Waleed. The paper presents a method for generating labels for an unlabeled dataset by combining a number of weak labelers. This changes the annotation effort from looking at individual examples to constructing a large number of noisy labeling heuristics, a task the authors call "data programming". Then you learn a model that intelligently aggregates information from the weak labelers to create a weighted "supervised" training set. We talk about this method, how it works, how it's related to ideas like co-training, and when you might want to use it. https://www.semanticscholar.org/paper/Data-Programming-Creating-Large-Training-Sets-Quic-Ratner-Sa/37acbbbcfe9d8eb89e5b01da28dac6d44c3903ee

27 - What do Neural Machine Translation Models Learn about Morphology?, with Yonatan Belinkov
ACL 2017 paper by Yonatan Belinkov and others at MIT and QCRI. Yonatan comes on to tell us about their work. They trained a neural MT system, then learned models on top of the NMT representation layers to do morphology tasks, trying to probe how much morphological information is encoded by the MT system. We talk about the specifics of their model and experiments, insights they got from doing these experiments, and how this work relates to other work on representation learning in NLP. https://www.semanticscholar.org/paper/What-do-Neural-Machine-Translation-Models-Learn-ab-Belinkov-Durrani/37ac87ccea1cc9c78a0921693dd3321246e5ef07

26 - Structured Attention Networks, with Yoon Kim
ICLR 2017 paper, by Yoon Kim, Carl Denton, Luong Hoang, and Sasha Rush. Yoon comes on to talk with us about his paper. The paper shows how standard attentions can be seen as an expected feature count computation, and can be generalized to other kinds of expected feature counts, as long as we have efficient, differentiable algorithms for computing those marginals, like the forward-backward and inside-outside algorithms. We talk with Yoon about how this works, the experiments they ran to test this idea, and interesting implications of their work. https://www.semanticscholar.org/paper/Structured-Attention-Networks-Kim-Denton/0aec1745d0e054e8d86d21b20d0ee5fc0d932a49 Yoon also brought up a more recent paper by Yang Liu and Mirella Lapata that computes a very similar kind of structured attention, but does so much more efficiently. That paper is here: https://www.semanticscholar.org/paper/Learning-Structured-Text-Representations-Liu-Lapata/4435c3586364e8f8a2c8c9ee671c39d7df7e196c.

25 - Neural Semantic Parsing over Multiple Knowledge-bases
ACL 2017 short paper, by Jonathan Herzig and Jonathan Berant. This is a nice, obvious-in-hindsight paper that applies a frustratingly-easy-domain-adaptation-like approach to semantic parsing, similar to the multi-task semantic dependency parsing approach we talked to Noah Smith about recently. Because there is limited training data available for complex logical constructs (like argmax, or comparatives), but the mapping from language onto these constructions is typically constant across domains, domain adaptation can give a nice, though somewhat small, boost in performance. NB: I felt like I struggled a bit with describing this clearly. Not my best episode. Hopefully it's still useful. https://www.semanticscholar.org/paper/Neural-Semantic-Parsing-over-Multiple-Knowledge-ba-Herzig-Berant/6611cf821f589111adfc0a6fbb426fa726f4a9af

24 - Improving Hypernymy Detection with an Integrated Path-based and Distributional Method
ACL 2016 outstanding paper, by Vered Shwartz, Yoav Goldberg and Ido Dagan. Waleed presents this paper, discussing hypernymy detection and the methods used in the paper. It's pretty similar to work in relation extraction and knowledge base completion, so we also talk a bit about connections to other methods we're familiar with. Encoding paths using an RNN like they do (and like Arvind Neelakantan did for KBC) improves recall substantially, at the cost of some precision, which makes intuitive sense. https://www.semanticscholar.org/paper/Improving-Hypernymy-Detection-with-an-Integrated-P-Shwartz-Goldberg/05d28e891fd70d123c46ceeb0cdfc0a2cb0d88db

23 - Get To The Point: Summarization with Pointer-Generator Networks
ACL 2017 paper by Abigail See, Peter Liu, and Chris Manning. Matt presents the paper, describing the task (summarization on CNN/Daily Mail), the model (the standard copy + generate model that people are using these days, plus a nice coverage loss term), and the results (can't beat the extractive baseline, but coming close). It's a nice paper - very well written, interesting discussion section. https://www.semanticscholar.org/paper/Get-To-The-Point-Summarization-with-Pointer-Genera-See-Liu/13db673d09f546698e0bfb6687beeb5345f81ad9 Abigail also has a very nice blog post where she describes her work in a less formal tone than the paper: http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html

22 - Deep Multitask Learning for Semantic Dependency Parsing, with Noah Smith
An interview with Noah Smith. Noah tells us about his work with his students Hao Peng and Sam Thomson. We talk about what semantic dependency parsing is, the model that they used to approach the problem, how multi-task learning fits into this with a graph-based parser, and end with a little discussion about representation learning. https://www.semanticscholar.org/paper/Deep-Multitask-Learning-for-Semantic-Dependency-Pa-Peng-Thomson/406fd41b360bb02c0aaabff54055193fb5d9d7f1

21 - Contextual Explanation Networks, with Maruan Al-Shedivat
https://arxiv.org/abs/1705.10301 Maruan, Avinava Dubey and Eric Xing essentially put the post-hoc decision boundary explanations from the "Why Should I Trust You?" paper* as a core component of a predictive model. Maruan comes on to tell us about it. * https://www.semanticscholar.org/paper/Why-Should-I-Trust-You-Explaining-the-Predictions-Ribeiro-Singh/5636dca44384240ce9aff2b10b78458cd3c2f450

20 - A simple neural network module for relational reasoning
The recently-hyped paper that got "superhuman" performance on FAIR's CLEVR dataset. https://arxiv.org/abs/1706.01427

19 - End-to-end Differentiable Proving, with Tim Rocktäschel
An interview with Tim Rocktäschel. https://arxiv.org/abs/1705.11040

18 - Generalizing to Unseen Entities and Entity Pairs with Row-less Universal Schema
https://www.semanticscholar.org/paper/Generalizing-to-Unseen-Entities-and-Entity-Pairs-w-Verga-Neelakantan/7dd8b958632b07e41979337c71d847a3f39df456

17 - pix2code: Generating Code from a Graphical User Interface Screenshot
https://arxiv.org/abs/1705.07962

16 - Arc-swift: A Novel Transition System for Dependency Parsing
https://www.semanticscholar.org/paper/Arc-swift-A-Novel-Transition-System-for-Dependency-Qi-Manning/56fc1372a41a46f777ac77859219bb4b76bfd098

15 - Attention and Augmented Recurrent Neural Networks
http://distill.pub/2016/augmented-rnns/

14 - Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning
https://arxiv.org/abs/1705.00557

13 - Question Answering from Unstructured Text by Retrieval and Comprehension
https://www.semanticscholar.org/paper/Question-Answering-from-Unstructured-Text-by-Retri-Watanabe-Dhingra/89d06c2996f379c602e64b4243f026cd164400d7

12 - Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
https://arxiv.org/abs/1705.02364

11 - Relation Extraction with Matrix Factorization and Universal Schemas
https://www.semanticscholar.org/paper/Relation-Extraction-with-Matrix-Factorization-and-Riedel-Yao/52b5eab895a2d9ae24ea72ca72782974a52f90c4

10 - A Syntactic Neural Model for General-Purpose Code Generation
https://www.semanticscholar.org/paper/A-Syntactic-Neural-Model-for-General-Purpose-Code-Yin-Neubig/c8d0e13de2eaa09a928eff36b99d63f494c2f5ec

09 - Learning to Generate Reviews and Discovering Sentiment
https://www.semanticscholar.org/paper/Learning-to-Generate-Reviews-and-Discovering-Senti-Radford-Jozefowicz/664ec878de4b7170712baae4a7821fc2602bba25 https://blog.openai.com/unsupervised-sentiment-neuron/

08 - Finding News Citations for Wikipedia
https://www.semanticscholar.org/paper/Finding-News-Citations-for-Wikipedia-Fetahu-Markert/526acf565190d843758b89d37acf281639cb90e2

07 - Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks
https://www.semanticscholar.org/paper/Capturing-Semantic-Similarity-for-Entity-Linking-w-Francis-Landau-Durrett/1c9aca60f7ac5edcceb73d612806704a7d662643

06 - Design Challenges for Entity Linking
https://www.semanticscholar.org/paper/Design-Challenges-for-Entity-Linking-Ling-Singh/aa2a7ac7bfa9a0201d4faddd4e7bb26299a5e0be

05 - Transition-Based Dependency Parsing with Stack Long Short-Term Memory
https://www.semanticscholar.org/paper/Transition-Based-Dependency-Parsing-with-Stack-Lon-Dyer-Ballesteros/396b7932beac62a72288eaea047981cc9a21379a

04 - Recurrent Neural Network Grammars, with Chris Dyer
An interview with Chris Dyer. https://www.semanticscholar.org/paper/Recurrent-Neural-Network-Grammars-Dyer-Kuncoro/1594d954abc650bce2db445c52a76e49655efb0c

03 - FastQA: A Simple and Efficient Neural Architecture for Question Answering
https://www.semanticscholar.org/paper/FastQA-A-Simple-and-Efficient-Neural-Architecture-Weissenborn-Wiese/7c1576b96a1e246d77b30f7b80cec63be96fa698

02 - Bidirectional Attention Flow for Machine Comprehension
https://www.semanticscholar.org/paper/Bidirectional-Attention-Flow-for-Machine-Seo-Kembhavi/007ab5528b3bd310a80d553cccad4b78dc496b02

01 - A Comparative Study of Word Embeddings for Reading Comprehension
https://www.semanticscholar.org/paper/A-Comparative-Study-of-Word-Embeddings-for-Reading-Dhingra-Liu/3ec37205c9201fc891ab51da200e361fdc34bfb3

00 - Intro to the podcast
In this episode we briefly say what we're up to with the podcast. No technical content, just a description of what each episode will look like, and why we're doing this.