PLAY PODCASTS
NLP Highlights

NLP Highlights

145 episodes — Page 2 of 3

94 - Decompositional Semantics, with Aaron White

In this episode, Aaron White tells us about the decompositional semantics initiative (Decomp), an attempt to re-think the prototypical approach to semantic representation and annotation. The basic idea is to decompose complex semantic classes such as ‘agent’ and ‘patient’ into simpler semantic properties such as ‘causation’ and ‘volition’, while embracing the uncertainty inherent in language by allowing annotators to choose answers such as ‘probably’ or ‘probably not’. In order to scale the collection of labeled data, each property is annotated by asking crowd workers intuitive questions about phrases in a given sentence. Aaron White's homepage: http://aaronstevenwhite.io/ Decomp initiative page: http://decomp.io/

Sep 30, 201927 min

93 - NLP/ML for clinical data, with Alistair Johnson

In this episode, we invite Alistair Johnson to discuss the main challenge in applying NLP/ML to clinical domains: the lack of data. We discuss privacy concerns, de-identification, synthesizing records, legal liabilities and data heterogeneity. We also discuss how the MIMIC dataset evolved over the years, how it is being used, and some of the under-explored ways in which it can be used. Alistair’s homepage: http://alistairewj.github.io/ MIMIC dataset: https://mimic.physionet.org/

Jul 22, 201937 min

92 - Computational Humanities, with David Bamman

In this episode, we invite David Bamman to give an overview of computational humanities. We discuss examples of questions studied in computational humanities (e.g., characterizing fictionality, assessing novelty, measuring the attention given to male vs. female characters in the literature). We talk about the role NLP plays in addressing these questions and how the accuracy and biases of NLP models can influence the results. We also discuss understudied NLP tasks which can help us answer more questions in this domain such as literary scene coreference resolution and constructing a map of literature geography. David Bamman's homepage: http://people.ischool.berkeley.edu/~dbamman/ LitBank dataset: https://github.com/dbamman/litbank

Jul 5, 201933 min

91 - (Executable) Semantic Parsing, with Jonathan Berant

In this episode, we invite Jonathan Berant to talk about executable semantic parsing. We discuss what executable semantic parsing is and how it differs from related tasks such as semantic dependency parsing and abstract meaning representation (AMR) parsing. We talk about the main components of a semantic parser, how the formal language affects design choices in the parser, and end with a discussion of some exciting open problems in this space. Jonathan Berant's homepage: http://www.cs.tau.ac.il/~joberant/

Jun 26, 201942 min

90 - Research in Academia versus Industry, with Philip Resnik and Jason Baldridge

How is it like to do research in academia vs. industry? In this episode, we invite Jason Baldridge (UT Austin => Google) and Philip Resnik (Sun Microsystems => UMD) to discuss some of the aspects one may want to consider when planning their research careers, including flexibility, security and intellectual freedom. Perhaps most importantly, we discuss how the career choices we make influence and are influenced by the relationships we forge. Check out the Careers in NLP Panel at NAACL'19 on Monday, June 3, 2019 for further discussion. Careers in NLP panel @ NAACL'19: https://naacl2019.org/blog/careers-panel-survey/ Jason Baldridge's homepage: http://www.jasonbaldridge.com/ Philip Resnik's homepage: http://users.umiacs.umd.edu/~resnik/

May 31, 201954 min

89 - Dialog Systems, with Zhou Yu

In this episode, we invite Zhou Yu to give an overview of dialogue systems. We discuss different types of dialogue systems (task-oriented vs. non-task-oriented), the main building blocks and how they relate to other research areas in NLP, how to transfer models across domains, and the different ways used to evaluate these systems. Zhou also shares her thoughts on exciting future directions such as developing dialogue methods for non-cooperative environments (e.g., to negotiate prices) and multimodal dialogue systems (e.g., using video as well as audio/text). Zhou Yu's homepage: http://zhouyu.cs.ucdavis.edu/

May 31, 201937 min

88 - A Structural Probe for Finding Syntax in Word Representations, with John Hewitt

In this episode, we invite John Hewitt to discuss his take on how to probe word embeddings for syntactic information. The basic idea is to project word embeddings to a vector space where the L2 distance between a pair of words in a sentence approximates the number of hops between them in the dependency tree. The proposed method shows that ELMo and BERT representations, trained with no syntactic supervision, embed many of the unlabeled, undirected dependency attachments between words in the same sentence. Paper: https://nlp.stanford.edu/pubs/hewitt2019structural.pdf GitHub repository: https://github.com/john-hewitt/structural-probes Blog post: https://nlp.stanford.edu/~johnhew/structural-probe.html Twitter thread: https://twitter.com/johnhewtt/status/1114252302141886464 John's homepage: https://nlp.stanford.edu/~johnhew/

May 7, 201940 min

87 - Pathologies of Neural Models Make Interpretation Difficult, with Shi Feng

In this episode, Shi Feng joins us to discuss his recent work on identifying pathological behaviors of neural models for NLP tasks. Shi uses input word gradients to identify the least important word for a model's prediction, and iteratively removes that word until the model prediction changes. The reduced inputs tend to be significantly smaller than the original inputs, e.g., 2.3 words instead of 11.5 in the original in SQuAD, on average. We discuss possible interpretations of these results, and a proposed method for mitigating these pathologies. Shi Feng's homepage: http://users.umiacs.umd.edu/~shifeng/ Paper: https://www.semanticscholar.org/paper/Pathologies-of-Neural-Models-Make-Interpretation-Feng-Wallace/8e141b5cb01c88b315c9a94dc97e50738cc7370d Joint work with Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez and Jordan Boyd-Graber

Apr 25, 201933 min

86 - NLP for Evidence-based Medicine, with Byron Wallace

In this episode, Byron Wallace tells us about interdisciplinary work between evidence based medicine and natural language processing. We discuss extracting PICO frames from articles describing clinical trials and data available for direct and weak supervision. We also discuss automating the assessment of risks of bias in, e.g., random sequence generation, allocation containment and outcome assessment, which have been used to help domain experts who need to review hundreds of articles. Byron Wallace's homepage: http://www.byronwallace.com/ EBM-NLP dataset: https://ebm-nlp.herokuapp.com/ MIMIC dataset: https://mimic.physionet.org/ Cochrane database of systematic reviews: https://www.cochranelibrary.com/cdsr/about-cdsr The bioNLP workshop at ACL'19 (submission due date was extended to May 10): https://aclweb.org/aclwiki/BioNLP_Workshop The workshop on health text mining and information analysis at EMNLP'19: https://louhi2019.fbk.eu/ Machine learning for healthcare conference: https://www.mlforhc.org/

Apr 15, 201932 min

85 - Stress in Research, with Charles Sutton

In this episode, Charles Sutton walks us through common sources of stress for researchers and suggests coping strategies to maintain your sanity. We talk about how pursuing a research career is similar to participating in a life-long international tournament, conflating research worth and self-worth, and how freedom can be both a blessing and a curse, among other stressors one may encounter in a research career. Charles Sutton's homepage: https://homepages.inf.ed.ac.uk/csutton/ A series of blog posts Charles wrote on this topic: http://www.theexclusive.org/tag/stress%20in%20research/

Mar 29, 201936 min

84 - Large Teams Develop, Small Groups Disrupt, with Lingfei Wu

In a recent Nature paper, Lingfei Wu (Ling) suggests that smaller teams of scientists tend to do more disruptive work. In this episode, we invite Ling to discuss their results, how they define disruption and possible reasons why smaller teams may be better positioned to do disruptive work. We also touch on robustness of the disruption metric, differences between research disciplines, and sleeping beauties in science. Lingfei Wu’s homepage: https://www.knowledgelab.org/people/detail/lingfei_wu/ Paper: https://www.nature.com/articles/s41586-019-0941-9 Note: Lingfei is on the job market for faculty positions at the intersection of social science, computer science and communication.

Mar 26, 201938 min

83 - Knowledge Base Construction, with Sebastian Riedel

In this episode, we invite Sebastian Riedel to talk about knowledge base construction (KBC). Why is it an important research area? What are the tradeoffs between using an open vs. closed schema? What are popular methods currently used, and what challenges prevent the adoption of KBC methods? We also briefly discuss the AKBC workshop and its graduation into a conference in 2019. Sebastian Riedel's homepage: http://www.riedelcastro.org/ AKBC conference: http://www.akbc.ws/2019/

Mar 13, 201938 min

82 - Visual Reasoning, with Yoav Artzi

In this episode, Yoav Artzi joins us to talk about visual reasoning. We start by defining what visual reasoning is, then discuss the pros and cons of different tasks and datasets. We discuss some of the models used for visual reasoning and how they perform, before ending with open questions in this young, exciting research area. Yoav Artzi: https://yoavartzi.com/ NLVR: https://github.com/clic-lab/nlvr/tree/master/nlvr NLVR2: https://github.com/clic-lab/nlvr/tree/master/nlvr2 CLEVR dataset: https://cs.stanford.edu/people/jcjohns/clevr/ VQA: https://visualqa.org/ GQA: https://cs.stanford.edu/people/dorarad/gqa/index.html Neural module networks: https://arxiv.org/abs/1511.02799

Mar 6, 201942 min

81 - BlackboxNLP, with Afra Alishahi and Tal Linzen

Neural models recently resulted in large performance improvements in various NLP problems, but our understanding of what and how the models learn remains fairly limited. In this episode, Tal Linzen and Afra Alishahi talk to us about BlackboxNLP, an EMNLP’18 workshop dedicated to the analysis and interpretation of neural networks for NLP. In the workshop, computer scientists and cognitive scientists joined forces to probe and analyze neural NLP models. BlackboxNLP 2018 website: https://blackboxnlp.github.io/2018/ BlackboxNLP 2018 proceedings: https://aclanthology.info/events/ws-2018#W18-54 BlackboxNLP 2019 website: https://blackboxnlp.github.io/

Feb 6, 201931 min

80 - Leaderboards and Science, with Siva Reddy

Originally used to entice fierce competitions in arcade games, leaderboards recently made their way into NLP research circles. Leaderboards could help mitigate some of the problems in how researchers run experiments and share results (e.g., accidentally overfitting models on a test set), but they also introduce new problems (e.g., breaking author anonymity in peer reviewing). In this episode, Siva Reddy joins us to talk about the good, the bad, and the ugly of using leaderboards in science. We also discuss potential solutions to address some of the outstanding problems with existing leaderboard frameworks. Software platforms for leaderboards: http://codalab.org/ https://leaderboard.allenai.org/

Jan 29, 201929 min

79 - The glass ceiling in NLP, with Natalie Schluter

In this episode, Natalie Schluter talks to us about a data-driven analysis of career progression of male vs. female researchers in NLP through the lens of mentor-mentee networks based on ~20K papers in the ACL anthology. Directed edges in the network describe a mentorship relation from the last author on a paper to the last author, and author names were annotated for gender when possible. Interesting observations include the increase of percentage of mentors (regardless of gender), and an increasing gap between the fraction of mentors who are males and females since the early 2000s. By analyzing the number of years between a researcher’s first publication and the year at which they achieve mentorship status at threshold T, defined by publishing T or more papers as a last author, Natalie also found that female researchers tend to take much longer to be mentors. Another interesting finding is that in-gender mentorship is a strong predictor of the mentee’s success in becoming mentors themselves. Finally, Natalie describes the bias preferential attachment model of Avin et al. (2015) and applies it to the gender-annotated mentor-mentee network in NLP, formally describing a glass ceiling in NLP for female researchers. https://www.semanticscholar.org/paper/The-glass-ceiling-in-NLP-Schluter/abfb1eb2d27194269503afce8be45909c8f86f4b See also: Homophily and the glass ceiling effect in social networks, at ITCS 2015, by Chen Avin, Barbara Keller, Zvi Lotker, Claire Mathieu, David Peleg, and Yvonne-Anne Pignolet. https://www.semanticscholar.org/paper/Homophily-and-the-Glass-Ceiling-Effect-in-Social-Avin-Keller/23dcb12dd918fcf29f7abb287dd466478031b8ff Apologies for the relatively poor audio quality on this one; we did our best.

Jan 21, 201926 min

78. Where do corpora come from?, with Matt Honnibal and Ines Montani

Most NLP projects rely crucially on the quality of annotations used for training and evaluating models. In this episode, Matt and Ines of Explosion AI tell us how Prodigy can improve data annotation and model development workflows. Prodigy is an annotation tool implemented as a python library, and it comes with a web application and a command line interface. A developer can define input data streams and design simple annotation interfaces. Prodigy can help break down complex annotation decisions into a series of binary decisions, and it provides easy integration with spaCy models. Developers can specify how models should be modified as new annotations come in in an active learning framework. Prodigy: https://prodi.gy Prodigy recipe scripts: https://github.com/explosion/prodigy-recipes Twitter: https://twitter.com/_inesmontani https://twitter.com/honnibal

Jan 15, 201930 min

77. On Writing Quality Peer Reviews, with Noah A. Smith

It's not uncommon for authors to be frustrated with the quality of peer reviews they receive in (NLP) conferences. In this episode, Noah A. Smith shares his advice on how to write good peer reviews. The structure Noah recommends for writing a peer review starts with a dispassionate summary of what a paper has to offer, followed by the strongest reasons the paper may be accepted, followed by the strongest reasons it may be rejected, and concludes with a list of minor, easy-to-fix problems (e.g., typos) which can be easily addressed in the camera ready. Noah stresses on the importance of thinking about how the reviews we write could demoralize (junior) researchers, and how to be precise and detailed when discussing the weaknesses of a paper to help the authors see the path forward. Other questions we discuss in this episode include: How to read a paper for reviewing purposes? How long it takes to review a paper and how many papers to review? What types of mistakes to be on the lookout for while reviewing? How to review pre-published work?

Jan 7, 201938 min

76 - Increasing In-Class Similarity by Retrofitting Embeddings with Demographics, with Dirk Hovy

EMNLP 2018 paper by Dirk Hovy and Tommaso Fornaciari. https://www.semanticscholar.org/paper/Improving-Author-Attribute-Prediction-by-Linguistic-Hovy-Fornaciari/71aad8919c864f73108aafd8e926d44e9df51615 In this episode, Dirk Hovy talks about natural language as social phenomenon which can provide insights about those who generate it. For example, this paper uses retrofitted embeddings to improve on two tasks: predicting the gender and age group of a person based on their online reviews. In this approach, authors embeddings are first generated using Doc2Vec, then retrofitted such that authors with similar attributes are closer in the vector space. In order to estimate the retrofitted vectors for authors with unknown attributes, a linear transformation is learned which maps Doc2Vec vectors to the retrofitted vectors. Dirk also used a similar approach to encode geographic information to model regional linguistic variations, in another EMNLP 2018 paper with Christoph Purschke titled “Capturing Regional Variation with Distributed Place Representations and Geographic Retrofitting” [link: https://www.semanticscholar.org/paper/Capturing-Regional-Variation-with-Distributed-Place-Hovy-Purschke/6d9babd835d0cdaaf175f098bb4fd61fd75b1be0].

Nov 27, 201829 min

75 - Reinforcement / Imitation Learning in NLP, with Hal Daumé III

In this episode, we invite Hal Daumé to continue the discussion on reinforcement learning, focusing on how it has been used in NLP. We discuss how to reduce NLP problems into the reinforcement learning framework, and circumstances where it may or may not be useful. We discuss imitation learning, roll-in and roll-out, and how to approximate an expert with a reference policy. DAgger: https://www.semanticscholar.org/paper/A-Reduction-of-Imitation-Learning-and-Structured-to-Ross-Gordon/17eddf33b513ae1134abadab728bdbf6abab2a05?navId=citing-papers RESLOPE: http://legacydirs.umiacs.umd.edu/~hal/docs/daume18reslope.pdf

Nov 21, 201843 min

74 - Deep Reinforcement Learning Doesn't Work Yet, with Alex Irpan

Blog post by Alex Irpan titled "Deep Reinforcement Learning Doesn't Work Yet" https://www.alexirpan.com/2018/02/14/rl-hard.html In this episode, Alex Irpan talks about limitations of current deep reinforcement learning methods and why we have a long way to go before they go mainstream. We discuss sample inefficiency, instability, the difficulty to design reward functions and overfitting to the environment. Alex concludes with a list of recommendations he found useful when training models with deep reinforcement learning.

Nov 16, 201840 min

73 - Supersense Disambiguation of English Prepositions and Possessives, with Nathan Schneider

ACL 2018 paper by Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Jakob Prange, Austin Blodgett, Sarah R. Moeller, Aviram Stern, Adi Bitan, Omri Abend. In this episode, Nathan discusses how the meaning of prepositions varies, proposes a hierarchy for classifying the semantics of function words (e.g., comparison, temporal, purpose), and describes empirical results using the provided dataset for disambiguating preposition semantics. Along the way, we talk about lexicon-based semantics, multilinguality and pragmatics. https://www.semanticscholar.org/paper/Comprehensive-Supersense-Disambiguation-of-English-Schneider-Hwang/8310213af102913b9e74e7dfe6864f3aa62a5a5e

Nov 13, 201852 min

72 - The Anatomy Question Answering Task, with Jordan Boyd-Graber

Our first episode in a new format: broader surveys of areas, instead of specific discussions on individual papers. In this episode, we talk with Jordan Boyd-Graber about question answering. Matt starts the discussion by giving five different axes on which question answering tasks vary: (1)how complex is the language in the question, (2)what is the genre of the question / nature of the question semantics, (3)what is the context or knowledge source used to answer the question, (4)how much "reasoning" is required to answer the question, and (5) what's the format of the answer? We talk about each of these in detail, giving examples from Jordan's and others' work. In the end, we conclude that "question answering" is a format to study a particular phenomenon, it is not a "phenomenon" in itself. Sometimes it's useful to pose a phenomenon you want to study as a question answering task, and sometimes it's not. During the conversation, Jordan mentioned the QANTA competition; you can find that here: http://qanta.org. We also talked about an adversarial question creation task for Quiz Bowl questions; the paper on that can be found here: https://www.semanticscholar.org/paper/Trick-Me-If-You-Can%3A-Adversarial-Writing-of-Trivia-Wallace-Boyd-Graber/11caf090fef96605d6d67c7505572b1a26796971.

Oct 16, 201843 min

71 - DuoRC: Complex Language Understanding with Paraphrased Reading Comprehension, with Amrita Saha

ACL 2018 paper by Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, Karthik Sankaranarayanan Amrita and colleagues at IBM Research introduced a harder dataset for "reading comprehension", where you have to answer questions about a given passage of text. Amrita joins us on the podcast to talk about why a new dataset is necessary, what makes this one unique and interesting, and how well initial baseline systems perform on it. Along the way, we talk about the problems with using BLEU or ROUGE as evaluation metrics for question answering systems. https://www.semanticscholar.org/paper/DuoRC%3A-Towards-Complex-Language-Understanding-with-Saha-Aralikatte/1e70a4830840d48486ecfbc6c89b774cdd0b6399

Oct 12, 201833 min

70 - Measuring the Evolution of a Scientific Field through Citation Frames, with David Jurgens

TACL 2018 paper (presented at ACL 2018) by David Jurgens, Srijan Kumar, Raine Hoover, Daniel A. McFarland, and Daniel Jurafsky David comes on the podcast to talk to us about citation frames. We discuss the dataset they created by painstakingly annotating the "citation type" for all of the citations in a large collection of papers (around 2000 citations in total), then training a classifier on that data to annotate the rest of the ACL anthology. This process itself is interesting, including how exactly the citations are classified, and we talk about this for a bit. The second half of the podcast talks about the analysis that David and colleagues did using the (automatically) annotated ACL anthology, trying to gauge how the field has changed over time. https://www.semanticscholar.org/paper/Measuring-the-Evolution-of-a-Scientific-Field-Jurgens-Kumar/65118f3a7463f54bdf9b9e5cdd655953a2488c2f

Sep 18, 201840 min

69 - Second language acquisition modeling, with Burr Settles

A shared task held in conjunction with a NAACL 2018 workshop, organized by Burr Settles and collaborators at Duolingo. Burr tells us about the shared task. The goal of the task was to predict errors that a language learner would make when doing exercises on Duolingo. We talk about the details of the data, why this particular data is interesting to study for second language acquisition, what could be better about it, and what systems people used to approach this task. We also talk a bit about what you could do with a system that can predict these kinds of errors to build better language learning systems. https://www.semanticscholar.org/paper/Second-Language-Acquisition-Modeling-Settles-Brust/10685728fab1dfe9d1cf0cd4240ed687dd601ac6

Sep 10, 201834 min

68 - Neural models of factuality, with Rachel Rudinger

NAACL 2018 paper, by Rachel Rudinger, Aaron Steven White, and Benjamin Van Durme Rachel comes on to the podcast, telling us about what factuality is (did an event happen?), what datasets exist for doing this task (a few; they made a new, bigger one), and how to build models to predict factuality (turns out a vanilla biLSTM does quite well). Along the way, we have interesting discussions about how you decide what an "event" is, how you label factuality (whether something happened) on inherently uncertain text (like "I probably failed the test"), and how you might use a system that predicts factuality in some end task. https://www.semanticscholar.org/paper/Neural-models-of-factuality-Rudinger-White/4d62a1e7819f9e3f8c837832c66659db5a6d9b37

Sep 4, 201836 min

67 - GLUE: A Multi-Task Benchmark and Analysis Platform, with Sam Bowman

Paper by Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. Sam comes on to tell us about GLUE. We talk about the motivation behind setting up a benchmark framework for natural language understanding, how the authors defined "NLU" and chose the tasks for this benchmark, a very nice diagnostic dataset that was constructed for GLUE, and what insight they gained from the experiments they've run so far. We also have some musings about the utility of general-purpose sentence vectors, and about leaderboards. https://www.semanticscholar.org/paper/GLUE%3A-A-Multi-Task-Benchmark-and-Analysis-Platform-Wang-Singh/a2054eff8b4efe0f1f53d88c08446f9492ae07c1

Aug 27, 201839 min

66 - Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods, with Jieyu Zhao

NACL 2018 paper, by Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Jieyu comes on the podcast to talk about bias in coreference resolution models. This bias makes models rely disproportionately on gender when making decisions for whether "she" refers to a noun like "secretary" or "physician". Jieyu and her co-authors show that coreference systems do not actually exhibit much bias in standard evaluation settings (OntoNotes), perhaps because there is a broad document context to aid in making coreference decisions. But they then construct a really nice diagnostic dataset that isolates simple coreference decisions, and evaluates whether the model is using common sense, grammar, or gender bias to make those decisions. This dataset shows that current models are quite biased, particularly when it comes to common sense, using gender to make incorrect coreference decisions. Jieyu then tells us about some simple methods to correct the bias without much of a drop in overall accuracy. https://www.semanticscholar.org/paper/Gender-Bias-in-Coreference-Resolution%3A-Evaluation-Zhao-Wang/e4a31322ed60479a6ae05d1f2580dd0fa2d77e50 Also, there was a very similar paper also published at NAACL 2018 that used similar methodology and constructed a similar dataset: https://www.semanticscholar.org/paper/Gender-Bias-in-Coreference-Resolution-Rudinger-Naradowsky/be2c8b5ec0eee2f32da950db1b6cf8cc4a621f8f.

Aug 20, 201826 min

65 - Event Representations with Tensor-based Compositions, with Niranjan Balasubramanian

AAAI 2018 paper by Noah Weber, Niranjan Balasubramanian, and Nathanael Chambers Niranjan joins us on the podcast to tell us about his latest contribution in a line of work going back to Shank's scripts. This work tries to model sequences of events to get coherent narrative schemas, mined from large collections of text. For example, given an event like "She threw a football", you might expect future events involving catching, running, scoring, and so on. But if the event is instead "She threw a bomb", you would expect future events to involve things like explosions, damage, arrests, or other related things. We spend much of our conversation talking about why these scripts are interesting to study, and the general outline for how one might learn these scripts from text, and spend a little bit of time talking about the particular contribution of this paper, which is a better model that captures interactions among all of the arguments to an event. https://www.semanticscholar.org/paper/Event-Representations-With-Tensor-Based-Weber-Balasubramanian/418f405a60b8d9009099777f7ae37f4496542f90

Aug 13, 201838 min

64 - Neural Network Models for Sentence Pair Tasks, with Wuwei Lan and Wei Xu

Best reproduction paper at COLING 2018, by Wuwei Lan and Wei Xu. This paper takes a bunch of models for sentence pair classification (including paraphrase identification, semantic textual similarity, natural language inference / entailment, and answer sentence selection for QA) and compares all of them on all tasks. There's a very nice table in the paper showing the cross product of models and datasets, and how by looking at the original papers this table is almost empty; Wuwei and Wei fill in all of the missing values in that table with their own experiments. This is a very nice piece of work that helps us gain a broader understanding of how these models perform in diverse settings, and it's awesome that COLING explicitly asked for and rewarded this kind of paper, as it's not your typical "come look at my shiny new model!" paper. Our discussion with Wuwei and Wei covers what models and datasets the paper looked at, why the datasets can be treated similarly (and some reasons for why maybe they should be treated differently), the differences between the models that were tested, and the difficulties of reproducing someone else's model. https://www.semanticscholar.org/paper/Neural-Network-Models-for-Paraphrase-Semantic-and-Lan-Xu/6c990c162816bff2133a8e0ed9719bd0f87ae9d9

Aug 8, 201836 min

63 - Neural Lattice Language Models, with Jacob Buckman

TACL 2018 paper by Jacob Buckman and Graham Neubig. Jacob tells us about marginalizing over latent structure in a sentence by doing a clever parameterization of a lattice with a model kind of like a tree LSTM. This lets you treat collocations as multi-word units, or allow words to have multiple senses, without having to commit to a particular segmentation or word sense disambiguation up front. We talk about how this works and what comes out. One interesting result that comes out of the sense lattice: learning word senses from a language modeling objective tends to give you senses that capture the mode of the "next word" distribution, like uses of "bank" that are always followed by "of". Helpful for local perplexity, but not really what you want if you're looking for semantic senses, probably. https://www.semanticscholar.org/paper/Neural-Lattice-Language-Models-Buckman-Neubig/f36b961ea5106c19c341763bd9942c1f09038e5d

Aug 2, 201830 min

62 - Sounding Board: A User-Centric and Content-Driven Social Chatbot, with Hao Fang

NAACL 2018 demo paper, by Hao Fang, Hao Cheng, Maarten Sap, Elizabeth Clark, Ari Holtzman, Yejin Choi, Noah A. Smith, and Mari Ostendorf Sounding Board was the system that won the 2017 Amazon Alexa Prize, a competition to build a social chatbot that interacts with users as an Alexa skill. Hao comes on the podcast to tell us about the project. We talk for a little bit about how Sounding Board works, but spend most of the conversation talking about what these chatbots can do - the competition setup, some example interactions, the limits of current systems, and how chatbots might be more useful in the future. Even the best current systems seem pretty limited, but the potential future uses are compelling enough to warrant continued research. https://www.semanticscholar.org/paper/Sounding-Board%3A-A-User-Centric-and-Content-Driven-Fang-Cheng/b540fd427a02b19c6ea55dd7d9758ebf15ec3965

Jul 30, 201831 min

61 - Neural Text Generation in Stories, with Elizabeth Clark and Yangfeng Ji

NAACL 2018 Outstanding Paper by Elizabeth Clark, Yangfeng Ji, and Noah A. Smith Both Elizabeth and Yangfeng come on the podcast to tell us about their work. This paper is an extension of an EMNLP 2017 paper by Yangfeng and co-authors that introduced a language model that included explicit entity representations. Elizabeth and Yangfeng take that model, improve it a bit, and use it for creative narrative generation, with some interesting applications. We talk a little bit about the model, but mostly about how the model was used to generate narrative text, how it was evaluated, and what other interesting applications there are of this idea. The punchline is that this model does a better job at generating coherent stories than other generation techniques, because it can track the entities in the story better. We've been experimenting with how we record the audio, trying to figure out how to get better audio quality. Sadly, this episode was a failed experiment, and there is a background hiss that we couldn't get rid of. Bear with us as we work on this... https://www.semanticscholar.org/paper/Generation-in-Stories-Using-Entity-Representations-Clark-Ji/56df50601975f6065b8acc0a08c169aaecad97bc

Jul 23, 201830 min

60 - FEVER: a large-scale dataset for Fact Extraction and VERification, with James Thorne

NAACL 2018 paper by James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal James tells us about his paper, where they created a dataset for fact checking. We talk about how this dataset relates to other datasets, why a new one was needed, how it was built, and how well the initial baseline does on this task. There are some interesting side notes on bias in dataset construction, and on how "fact checking" relates to "fake news" ("fake news" could mean that an article is actively trying to deceive or mislead you; "fact checking" here is just determining if a single claim is true or false given a corpus of assumed-correct reference material). The baseline system does quite poorly, and the lowest-hanging fruit seems to be in improving the retrieval component that finds relevant supporting evidence for claims. There's a workshop and shared task coming up on this dataset: http://fever.ai/. The shared task test period starts on July 24th - get your systems ready! https://www.semanticscholar.org/paper/FEVER%3A-a-Large-scale-Dataset-for-Fact-Extraction-Thorne-Vlachos/7b1f840ecfafb94d2d9e6e926696dba7fad0bb88

Jun 28, 201828 min

59 - Weakly Supervised Semantic Parsing With Abstract Examples, with Omer Goldman

ACL 2018 paper by Omer Goldman, Veronica Latcinnik, Udi Naveh, Amir Globerson, and Jonathan Berant Omer comes on to tell us about a class project (done mostly by undergraduates!) that made it into ACL. Omer and colleagues built a semantic parser that gets state-of-the-art results on the Cornell Natural Language Visual Reasoning dataset. They did this by using "abstract examples" - they replaced the entities in the questions and corresponding logical forms with their types, labeled about a hundred examples in this abstracted formalism, and used those labels to do data augmentation and train their parser. They also used some interesting caching tricks, and a discriminative reranker. https://www.semanticscholar.org/paper/Weakly-supervised-Semantic-Parsing-with-Abstract-Goldman-Latcinnik/5aec2ab5bf2979da067e2aa34762b589a0680030

Jun 12, 201835 min

58 - Learning What’s Easy: Fully Differentiable Neural Easy-First Taggers, with André Martins

EMNLP 2017 paper by André F. T. Martins and Julia Kreutzer André comes on the podcast to talk to us the paper. We spend the bulk of the time talking about the two main contributions of the paper: how they applied the notion of "easy first" decoding to neural taggers, and the details of the constrained softmax that they introduced to accomplish this. We conclude that "easy first" might not be the right name for this - it's doing something that in the end is very similar to stacked self-attention, with standard independent decoding at the end. The particulars of the self-attention are inspired by "easy first", however, using a constrained softmax to enforce some novel constraints on the self-attention. https://www.semanticscholar.org/paper/Learning-What's-Easy%3A-Fully-Differentiable-Neural-Martins-Kreutzer/252571243aa4c0b533aa7fc63f88d07fd844e7bb

Jun 8, 201847 min

57 - A Survey Of Cross-lingual Word Embedding Models, with Sebastian Ruder

Upcoming JAIR paper by Sebastian Ruder, Ivan Vulić, and Anders Søgaard. Sebastian comes on to tell us about his survey. He creates a typology of cross-lingual word embedding methods, and we discuss why you might use cross-lingual embeddings (low-resource languages in particular), what information they capture (semantics? syntax? both?), how the methods work (lots of different ways), and how to evaluate the embeddings (best when you have an extrinsic task to evaluate on). https://www.semanticscholar.org/paper/A-survey-of-cross-lingual-embedding-models-Ruder/3dbd28c63a7807280c9531735c715d4598024166

Jun 5, 201831 min

56 - Deep contextualized word representations, with Matthew Peters

NAACL 2018 paper, by Matt Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Chris Clark, Kenton Lee, and Luke Zettlemoyer. In this episode, AI2's own Matt Peters comes on the show to talk about his recent work on ELMo embeddings, what some have called "the next word2vec". Matt has shown very convincingly that using a pre-trained bidirectional language model to get contextualized word representations performs substantially better than using static word vectors. He comes on the show to give us some more intuition about how and why this works, and to talk about some of the other things he tried and what's coming next. https://www.semanticscholar.org/paper/Deep-contextualized-word-representations-Peters-Neumann/4b17597b856c087f109381ce77d60d9017cb6f9a

Apr 4, 201830 min

55 - Matchbox: Dispatch-driven autobatching for imperative deep learning, with James Bradbury

In this episode, we take a more systems-oriented approach to NLP, looking at issues with writing deep learning code for NLP models. As a lot of people have discovered over the last few years, efficiently batching multiple examples together for fast training on a GPU can be very challenging with complex NLP models. James Bradbury comes on to tell us about Matchbox, his recent effort to provide a framework for automatic batching with pytorch. In the discussion, we talk about why batching is hard, why it's important, how other people have tried to solve this problem in the past, and what James' solution to the problem is. Code is available here: https://github.com/salesforce/matchbox

Mar 28, 201831 min

54 - Simulating Action Dynamics with Neural Process Networks, with Antoine Bosselut

ICLR 2018 paper, by Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, and Yejin Choi. This is not your standard NLP task. This work tries to predict which entities change state over the course of a recipe (e.g., ingredients get combined into a batter, so entities merge, and then the batter gets baked, changing location, temperature, and "cookedness"). We talk to Antoine about the work, getting into details about how the data was collected, how the model works, and what some possible future directions are. https://www.semanticscholar.org/paper/Simulating-Action-Dynamics-with-Neural-Process-Bosselut-Levy/dc01c9401d1caab7f5e6d2f1280f5815f6919977

Mar 26, 201836 min

53 - Classical Structured Prediction Losses for Sequence to Sequence Learning, with Sergey and Myle

NAACL 2018 paper, by Sergey Edunov, Myle Ott, Michael Auli, David Grangier, and Marc'Aurelio Ranzato, from Facebook AI Research In this episode we continue our theme from last episode on structured prediction, talking with Sergey and Myle about their paper. They did a comprehensive set of experiments comparing many prior structured learning losses, applied to neural seq2seq models. We talk about the motivation for their work, what turned out to work well, and some details about some of their loss functions. They introduced a notion of a "pseudo reference", replacing the target output sequence with the highest scoring output on the beam during decoding, and we talk about some of the implications there. It also turns out the minimizing expected risk was the best overall training procedure that they found for these structured models. https://www.semanticscholar.org/paper/Classical-Structured-Prediction-Losses-for-Sequence-Edunov-Ott/20ae11c08c6b0cd567c486ba20f44bc677f2ed23

Mar 21, 201826 min

52 - Sequence-to-Sequence Learning as Beam-Search Optimization, with Sam Wiseman

EMNLP 2016 paper by Sam Wiseman and Sasha Rush. In this episode we talk with Sam about a paper from a couple of years ago on bringing back some ideas from structured prediction into neural seq2seq models. We talk about the classic problems in structured prediction of exposure bias, label bias, and locally normalized models, how people used to solve these problems, and how we can apply those solutions to modern neural seq2seq architectures using a technique that Sam and Sasha call Beam Search Optimization. (Note: while we said in the episode that BSO with beam size of 2 is equivalent to a token-level hinge loss, that's not quite accurate; it's close, but there are some subtle differences.) https://www.semanticscholar.org/paper/Sequence-to-Sequence-Learning-as-Beam-Search-Optim-Wiseman-Rush/28703eef8fe505e8bd592ced3ce52a597097b031

Mar 15, 201823 min

51 - A Regularized Framework for Sparse and Structured Neural Attention, with Vlad Niculae

NIPS 2017 paper by Vlad Niculae and Mathieu Blondel. Vlad comes on to tell us about his paper. Attentions are often computed in neural networks using a softmax operator, which maps scalar outputs from a model into a probability space over latent variables. There are lots of cases where this is not optimal, however, such as when you really want to encourage a sparse attention over your inputs, or when you have additional structural biases that could inform the model. Vlad and Mathieu have developed a theoretical framework for analyzing the options in this space, and in this episode we talk about that framework, some concrete instantiations of attention mechanisms from the framework, and how well these work.

Mar 12, 201816 min

50 - Cardinal Virtues: Extracting Relation Cardinalities from Text, with Paramita Mirza

ACL 2017 paper, by Paramita Mirza, Simon Razniewski, Fariz Darari, and Gerhard Weikum. There's not a whole lot of work on numbers in NLP, and getting good information out of numbers expressed in text can be challenging. In this episode, Paramita comes on to tell us about her efforts to use distant supervision to learn models that extract relation cardinalities from text. That is, given an entity and a relation in a knowledge base, like "Barack Obama" and "has child", the goal is to extract _how many_ related entities there are (in this case, two). There are a lot of challenges in getting this to work well, and Paramita describes some of those, and how she solved them. https://www.semanticscholar.org/paper/Cardinal-Virtues-Extracting-Relation-Cardinalities-Mirza-Razniewski/01afba9f40e0df06446b9cd3d5ea8725c4ba1342

Feb 14, 201827 min

49 - A Joint Sequential and Relational Model for Frame-Semantic Parsing, with Bishan Yang

EMNLP 2017 paper by Bishan Yang and Tom Mitchell. Bishan tells us about her experiments on frame-semantic parsing / semantic role labeling, which is trying to recover the predicate-argument structure from natural language sentences, as well as categorize those structures into a pre-defined event schema (in the case of frame-semantic parsing). Bishan had two interesting ideas here: (1) use a technique similar to model distillation to combine two different model structures (her "sequential" and "relational" models), and (2) use constraints on arguments across frames in the same sentence to get a more coherent global labeling of the sentence. We talk about these contributions, and also touch on "open" versus "closed" semantics, in both predicate-argument structure and information extraction. https://www.semanticscholar.org/paper/A-Joint-Sequential-and-Relational-Model-for-Frame-Yang-Mitchell/a1deb609e3758519cbe3f1a542bdf1ea52b6f224

Feb 5, 201826 min

48 - Incidental Supervision: Moving Beyond Supervised Learning, with Dan Roth

AAAI 2017 paper, by Dan Roth. In this episode we have a conversation with Dan about what he means by "incidental supervision", and how it's related to ideas in reinforcement learning and representation learning. For many tasks, there are signals you can get from seemingly unrelated data that will help you in making predictions. Leveraging the international news cycle to learn transliteration models for named entities is one example of this, as is the current trend in NLP of using language models or other multi-task signals to do better representation learning for your end task. Dan argues that we need to be thinking about this more explicitly in our research, instead of learning everything "end-to-end", as we will never have enough data to learn complex tasks directly from annotations alone. https://www.semanticscholar.org/paper/Incidental-Supervision-Moving-beyond-Supervised-Le-Roth/2997dcfc6d5ffc262d57d0a26f74d091de096573

Jan 29, 201827 min

47 - Dynamic integration of background knowledge in neural NLU systems, with Dirk Weißenborn

How should you incorporate background knowledge into a neural net? A lot of people have been thinking about this problem, and Dirk Weissenborn comes on to tell us about his work in this area. Paper is with Tomáš Kočiský and Chris Dyer. https://arxiv.org/abs/1706.02596

Jan 24, 201835 min

46 - Parsing with Traces, with Jonathan Kummerfeld

TACL 2017 paper by Jonathan K. Kummerfeld and Dan Klein. Jonathan tells us about his work on parsing algorithms that capture traces and null elements in sentence structure. We spend the first third of the conversation talking about what these are and why they are interesting - if you want to correctly handle wh-movement, or coordinating structures, or control structures, or many other phenomena that we commonly see in language, you really want to handle traces and null elements, but most current parsers totally ignore these phenomena. The second third of the conversation is about how the parser works, and we conclude by talking about some of the implications of the work, and where to go next - should we really be pushing harder on capturing linguistic structure when everyone seems to be going towards end-to-end learning on some higher-level task? https://www.semanticscholar.org/paper/Parsing-with-Traces-An-O-n-4-Algorithm-and-a-Struc-Kummerfeld-Klein/af89e56b3d9b720d43cae9f4971928c5cb95cbe3 Jonathan also blogs about papers that he's reading; check out his paper summaries at http://jkk.name/

Jan 8, 201839 min

45 - Build It, Break It workshop, with Allyson Ettinger and Sudha Rao

How robust is your NLP system? High numbers on common datasets can be misleading, as most systems are easily fooled by small modifications that would not be hard for humans to understand. Allyson Ettinger, Sudha Rao, Hal Daumé III, and Emily Bender organized a workshop trying to characterize this issue, inviting participants to either build robust systems, or try to break them with targeted examples. Allyson and Sudha come on the podcast to talk about the workshop. We cover the motivation of the workshop, what a "minimal pair" is, what tasks the workshop focused on and why, and what the main takeaways of the workshop were. https://www.semanticscholar.org/paper/Towards-Linguistically-Generalizable-NLP-Systems-A-Ettinger-Rao/8472e999f723a9ccaffc6089b7be1865d8a1b863

Jan 2, 201837 min