CLunch is the weekly computational linguistics lunch run by the NLP group. We invite external and internal speakers to come and present their research on natural language processing, computational linguistics, and machine learning.

Interested in attending CLunch? Sign up for our mailing list here.


Matt Gardner

Allen Institute for Artificial Intelligence

March 5, 2020

NLP Evaluations That We Believe In

With all of the modeling advancements in recent years, NLP benchmarks have been falling over left and right: "human performance" has been reached on SQuAD 1 and 2, GLUE and SuperGLUE, and many commonsense datasets. Yet no serious researcher actually believes that these systems understand language, or even really solve the underlying tasks behind these datasets. To get benchmarks that we actually believe in, we need to both think more deeply about the language phenomena that our benchmarks are targeting, and make our evaluation sets more rigorous. I will first present ORB, an Open Reading Benchmark that collects many reading comprehension datasets that we (and others) have recently built, targeting various aspects of what it means to read. I will then present contrast sets, a way of creating non-iid test sets that more thoroughly evaluate a model's abilities on some task, decoupling training data artifacts from test labels.

Mohammad Sadegh Rasooli

University of Pennsylvania

February 27, 2020

Cross-Lingual Transfer of Natural Language Processing Systems

Accurate natural language processing systems rely heavily on annotated datasets. In the absence of such datasets, transfer methods can help to develop a model by transferring annotations from one or more rich-resource languages to the target language of interest. These methods are generally divided into two approaches: 1) annotation projection from translation data, aka parallel data, using supervised models in rich-resource languages, and 2) direct model transfer from annotated datasets in rich-resource languages. In this talk, we present different methods for transfer of syntactic and semantic dependency parsers. We propose an annotation projection method that performs well in scenarios for which a large amount of in-domain parallel data is available. We also propose a method which is a combination of annotation projection and direct model transfer that can leverage a minimal amount of information from a small out-of-domain parallel dataset to develop highly accurate transfer models. Furthermore, we present an unsupervised syntactic reordering model to improve the accuracy of dependency parser transfer for non-European languages. We also propose a method for cross-lingual transfer of dependency parsing based on multi-task learning by leveraging supervised syntactic information in the target language of interest. Finally, we introduce our current efforts for learning cross-lingual representations using information from different modalities especially from images in the massively multilingual image dataset (MMID).

Zhiting Hu

Carnegie Mellon University

February 20, 2020

Connecting the Dots between Learning Paradigms

Continued research has created a diverse set of learning algorithms for ingesting distinct forms of experience (e.g. data, cost, knowledge constraints). However, it is often challenging for practitioners to choose or adapt solutions from such a bewildering marketplace of algorithms, as it could demand deep ML expertise and bespoke innovations. This talk will present an attempt to systematize several paradigms of algorithms for both a unifying understanding and new systematic methodologies of creating ML solutions. I will show that some of the popular algorithms in supervised learning, constraint-driven learning, reinforcement learning, etc, indeed share a common succinct formulation, showing that different forms of experience can be used for learning in the same way. The unifying representation of algorithms allows us to methodically exchange solutions between paradigms, and learn from combinations of experience jointly, for complex problems such as text and image generation.

Nitish Gupta

University of Pennsylvania

February 13, 2020

Neural Module Networks for Reasoning over Text

Answering compositional questions that require multiple steps of reasoning against text is challenging, especially when they involve discrete, symbolic operations. Neural module networks (NMNs) learn to parse such questions as executable programs composed of learnable modules, performing well on synthetic visual QA domains. In this talk, I will outline the challenges in learning these models for non-synthetic questions on open-domain text, where a model needs to deal with the diversity of natural language and perform a broader range of reasoning. Then, I will present how we extend NMNs by (a) introducing modules that reason over a paragraph of text, performing symbolic reasoning (such as arithmetic, sorting, counting) over numbers and dates in a probabilistic and differentiable manner; and (b) proposing an unsupervised auxiliary loss to help extract arguments associated with the events in text. Additionally, we show that a limited amount of heuristically-obtained question program and intermediate module output supervision provides sufficient inductive bias for accurate learning. In conclusion, I will present methods for achieving interpretability in such compositional neural models and challenges for future research.

Noam Slonim


February 6, 2020

Project Debater – How Persuasive can a Computer be?

Project Debater is the first AI system that can meaningfully debate a human opponent. The system, an IBM Grand Challenge, is designed to build coherent, convincing speeches on its own, as well as provide rebuttals to the opponent’s main arguments. In 2019, Project Debater competed against Harish Natarajan, who holds the world record for most debate victories, in an event held in San Francisco that was broadcasted live world-wide. In this talk I will tell the story of Project Debater, from conception to a climatic final event, describe its underlying technology, and discuss how it can be leveraged for advancing decision making and critical thinking.

Jay-Yoon Lee

Carnegie Mellon University

January 30, 2020

Injecting output constraints into neural NLP models in a model agnostic way

The talk discusses a particular method of injecting constraints into neural models, primarily for natural language processing (NLP) tasks. While neural models have set the new state of the art performance in many tasks from vision to NLP, they often fail to learn simple rules necessary for well-formed structures unless there is an immense amount of training data. The talk claims that not all the aspects of the model have to be learned from the data itself and injecting simple knowledge/constraints into the neural models can help low-resource tasks as well as improving state-of-the-art models. The talk focuses on the structural knowledge of the output space and injects knowledge of correct or preferred structures as an objective to the model in a model-agnostic way, i.e. without modification to the model structure. The first benefit of focusing on the knowledge of output space is that it is intuitive as we can directly enforce outputs to satisfy logical/linguistic constraints. Another advantage of structural knowledge is that it often does not require a labeled dataset. Focusing on the example of Semantic Role Labeling and its constraints related to the syntactic parse tree, the talk showcases the efficacy of the proposed inference algorithm and the proposed semi-supervised learning.

Nick Montfort

Massachusetts Institute of Technology

January 23, 2020

Lean Computer-Generated Poetry as Exploration of Language, Culture, and Computation

Computational poetics is a compelling area of NLP. Poetry has helped to constitute cultures for millennia and its composition is considered one of the most human activities. On the generation side, computational poetics involves the production of poetic language, potentially with meter, rhyme and other forms of musicality, metaphors and their cousins, narrative aspects, and intertextual references. Essentially, the main objective of computationally generated poetry is being culturally and individually resonant for at least some readers or listeners in some cultures. There are a wide variety of approaches, some of which seek to model human creativity, as in the computational creativity community. Work in the area is undertaken by academic researchers, poets and artists, and programmers seeking amusement and diversion during events such as NaNoGenMo (National Novel Generation Month), which accommodates the generation of all sorts of large-scale literature, including poetry. In my talk, I will introduce my own practice as a computational poet, which does not involve developing general models of human creativity. My practice is often considered experimental and sometimes conceptual; it is not, in any case, expressive, that is, mainly concerned with my experiences or with conveying my emotions. Rather, I consider myself a situated and embodied explorer of language, culture, and computation. My means of exploration is the development of computational poetry. My practice involves writing programs that are usually small and simple, based on specific unusual lexicons and combinatorial techniques. As part of inquiring about computation, my work connects with platform studies and deals with specifics of particular computers and programming languages. As I share and discuss some of my specific computational poems, I will describe how this type of NLG work touches on questions of language and thought as studied in, for instance, linguistics, cognitive science, and conventional poetics.

Adam Poliak

Johns Hopkins University

December 10, 2019

Sentence-level Semantic Inference: From Diverse Phenomena to Applications

Many NLP tasks involve understanding meaning at the sentence-level. In order to analyze such models, we should decompose sentence-level semantic understanding into a diverse array of smaller, more-focused, fine-grained types of reasoning. This will help improve our understanding of the sentence-level reasoning capabilities of our NLP systems. In this talk, we will focus on Natural Language Inference (NLI), the task of determining if one sentence (hypothesis) can likely be inferred from another (context/premise). NLI has traditionally be used to evaluate how well different models understand language and the relationship between texts. We investigate whether 10 recent NLI datasets require models to reason about both texts, or if the datasets contain biases or statistical irregularities that allow a model to correctly label a context-hypothesis pair by only looking at a hypothesis. In the most popular dataset that we consider, a hypothesis-only model outperforms the majority baseline by over 2x. We will also discuss our recently released dataset, the Diverse NLI Collection (DNC), that can be used to shed light on a model’s ability to capture or understand a diverse array of semantic phenomena that are important to Natural Language Understanding. We will demonstrate how a variant of the DNC has been used to evaluate whether a Neural Machine Translation encoder captures semantic phenomena related to translation. With the remaining time, we will discuss how lessons from these studies can be applied real-world uses cases of sentence-level semantic inference. This talk is based on work that has appeared at NAACL, ACL, StarSem, and EMNLP.

Yoav Artzi

Cornell University

December 3, 2019

Robot Control and Collaboration in Situated Instruction Following

I will present two projects studying the problem of learning to follow natural language instructions. I will present new datasets, a class of interpretable models for instruction following, learning methods that combine the benefits of supervised and reinforcement learning, and new evaluation protocols. In the first part, I will discuss the task of executing natural language instructions with a robotic agent. In contrast to existing work, we do not engineer formal representations of language meaning or the robot environment. Instead, we learn to directly map raw observations and language to low-level continuous control of a quadcopter drone. In the second part, I will propose the task of learning to follow sequences of instructions in a collaborative scenario, where both the user and the system execute actions in the environment and the user controls the system using natural language. To study this problem, we build CerealBar, a multi-player 3D game where a leader instructs a follower, and both act in the environment together to accomplish complex goals. The two projects were led by Valts Blukis, Alane Suhr, and collaborators.

Hangfeng He

University of Pennsylvania

November 19, 2019

Distributed Semantic Representations from Question-Answering Signals

Human annotations, especially those from experts, are costly for many natural language processing (NLP) tasks. One emerging approach is to use natural language to annotate natural language, but it is challenging to get supervision effectively from annotations that are very different from the target task. This paper studies the case where the annotations are in the format of question answering (QA). We propose a novel approach to retrieve two types of semantic representations from QA, using which we can consistently improve on a suite of tasks. This work may have pointed out an alternative way to supervise NLP tasks.

Shuai Tang

University of California, San Diego

November 12, 2019

Revisiting post-processing for word embeddings

Word embeddings learnt from large corpora have been adopted in various applications in natural language processing and served as the general input representations to learning systems. Recently, a series of post-processing methods have been proposed to boost the performance of word embeddings on similarity comparison and analogy retrieval tasks, and some have been adapted to compose sentence representations. The general hypothesis behind these methods is that by enforcing the embedding space to be more isotropic, the similarity between words can be better expressed. We view these methods as an approach to shrink the covariance/gram matrix, which is estimated by learning word vectors, towards a scaled identity matrix. By optimising an objective in the semi-Riemannian manifold with Centralised Kernel Alignment (CKA), we are able to search for the optimal shrinkage parameter, and provide a post-processing method to smooth the spectrum of learnt word vectors which yields improved performance on downstream tasks.

Daniel Deutsch

University of Pennsylvania

October 29, 2019

A General-Purpose Algorithm for Constrained Sequential Inference

Inference in structured prediction involves finding the best output structure for an input, subject to certain constraints. Many current approaches use sequential inference, which constructs the output in a left-to-right manner. However, there is no general framework to specify constraints in these approaches. We present a principled approach for incorporating constraints into sequential inference algorithms. Our approach expresses constraints using an automaton, which is traversed in lock-step during inference, guiding the search to valid outputs. We show that automata can express commonly used constraints and are easily incorporated into sequential inference. When it is more natural to represent constraints as a set of automata, our algorithm uses an active set method for demonstrably fast and efficient inference. We experimentally show the benefits of our algorithm on constituency parsing and semantic role labeling. For parsing, unlike unconstrained approaches, our algorithm always generates valid output, incurring only a small drop in performance. For semantic role labeling, imposing constraints using our algorithm corrects common errors, improving F1 by 1.5 points. These benefits increase in low-resource settings. Our active set method achieves a 5.2x relative speed-up over a naive approach.

Daniel Deutsch

University of Pennsylvania

October 29, 2019

Summary Cloze: A New Task for Content Selection in Topic-Focused Summarization

A key challenge in topic-focused summarization is determining what information should be included in the summary, a problem known as content selection. In this work, we propose a new method for studying content selection in topic-focused summarization called the summary cloze task. The goal of the summary cloze task is to generate the next sentence of a summary conditioned on the beginning of the summary, a topic, and a reference document(s). The main challenge is deciding what information in the references is relevant to the topic and partial summary and should be included in the summary. Although the cloze task does not address all aspects of the traditional summarization problem, the more narrow scope of the task allows us to collect a large-scale datset of nearly 500k summary cloze instances from Wikipedia. We report experimental results on this new dataset using various extractive models and a two-step abstractive model that first extractively selects a small number of sentences and then abstractively summarizes them. Our results show that the topic and partial summary help the models identify relevant content, but the task remains a significant challenge.

Ben Zhou

University of Pennsylvania

October 29, 2019

"Going on a vacation" takes longer than "Going for a walk": A Study of Temporal Commonsense Understanding

Understanding time is crucial for understanding events expressed in natural language. Because people rarely say the obvious, it is often necessary to have commonsense knowledge about various temporal aspects of events, such as duration, frequency, and temporal order. However, this important problem has so far received limited attention. This paper systematically studies this temporal commonsense problem. Specifically, we define five classes of temporal commonsense, and use crowdsourcing to develop a new dataset, MCTACO, that serves as a test set for this task. We find that the best current methods used on MCTACO are still far behind human performance, by about 20%, and discuss several directions for improvement. We hope that the new dataset and our study here can foster more future research on this topic.

Katharina Kann

New York University

October 22, 2019

Neural Networks for Morphological Generation in the Minimal-Resource Setting

As languages other than English are moving more and more into the focus of natural language processing, accurate handling of morphology is increasing in importance. This talk presents neural network-based approaches to morphological generation, casting the problem as a character-based sequence-to-sequence task. First, we will generally discuss how to successfully train neural sequence-to-sequence models for this. Then, since many morphologically rich languages only have limited resources, the main part of the talk will focus on how to overcome the challenges that limited amounts of annotated training data pose to neural models. The approaches covered in this talk include multi-task learning, cross-lingual transfer learning, and meta-learning.

Jithin Pradeep

The Vanguard Group

October 15, 2019

ArSI - Artificial Speech Intelligence - An end to end automatic speech recognition using Attention plus CTC

Shi Yu

The Vanguard Group

October 15, 2019

A Financial Service Chatbot based on Deep Bidirectional Transformers

Christopher Lynn

University of Pennsylvania

October 8, 2019

Human information processing in complex networks

Humans communicate using systems of interconnected stimuli or concepts -- from language and music to literature and science -- yet it remains unclear how, if at all, the structure of these networks supports the communication of information. Although information theory provides tools to quantify the information produced by a system, traditional metrics do not account for the inefficient and biased ways that humans process this information. Here we develop an analytical framework to study the information generated by a system as perceived by a human observer. We demonstrate experimentally that this perceived information depends critically on a system's network topology. Applying our framework to several real networks, we find that they communicate a large amount of information (having high entropy) and do so efficiently (maintaining low divergence from human expectations). Moreover, we show that such efficient communication arises in networks that are simultaneously heterogeneous, with high-degree hubs, and clustered, with tightly-connected modules -- the two defining features of hierarchical organization. Together, these results suggest that many real networks are constrained by the pressures of information transmission, and that these pressures select for specific structural features.

Dan Goldwasser

Purdue University

October 1, 2019

Joint Models for Social, Behavioral and Textual Information

Understanding natural language communication often requires context, such as the speakers' backgrounds and social conventions, however, when it comes to computationally modeling these interactions, we typically ignore their broader context and analyze the text in isolation. In this talk, I will review on-going work demonstrating the importance of holistically modeling behavioral, social and textual information. I will focus on several NLP problems, including political discourse analysis on Twitter, partisan news detection and open-domain debate stance prediction, and discuss how jointly modeling text and social behavior can help reduce the supervision effort and provide a better representation for language understanding tasks.

Robert Shaffer

University of Pennsylvania

September 24, 2019

Similarity Inference for Legal Texts

Quantifying similarity between pairs of documents is a ubiquitous task. Both researchers and members of the public frequently use document-level pairwise similarity measures to describe or explore unfamiliar corpora, or to test hypotheses regarding diffusion of ideas between authors. High-level similarity measures are particularly useful when dealing with legal or political corpora, which often contain long, thematically diverse, and specialized language that is difficult for non-experts to interpret. Unfortunately, though similarity estimation is a well-studied problem in the context of short documents and document excerpts, less attention has been paid to the problem of similarity inference for long documents.

Reno Kriz

University of Pennsylvania

September 17, 2019

Comparison of Diverse Decoding Methods from Conditional Language Models

While conditional language models have greatly improved in their ability to output high-quality natural language, many NLP applications benefit from being able to generate a diverse set of candidate sequences. Diverse decoding strategies aim to, within a given-sized candidate list, cover as much of the space of high-quality outputs as possible, leading to improvements for tasks that re-rank and combine candidate outputs. Standard decoding methods, such as beam search, optimize for generating high likelihood sequences rather than diverse ones, though recent work has focused on increasing diversity in these methods. We conduct an extensive survey of decoding-time strategies for generating diverse outputs from conditional language models. We also show how diversity can be improved without sacrificing quality by over-sampling additional candidates, then filtering to the desired number.

Daphne Ippolito

University of Pennsylvania

September 17, 2019

Detecting whether Text is Human- or Machine-Generated

With the advent of generative models with a billion parameters or more, it is now possible to automatically generate vast amounts of human-sounding text. But just how human-like is this machine-generated text? Intuitively, shorter amounts of machine-generated text are harder to detect, but exactly how many words can a machine generate and still fool both humans and trained discriminators? We investigate how the choices of sampling strategy and text sequence length impact discriminability from human-written text, using both automatic detection methods and human judgement.