Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models’ problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%.
NLP
SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification
Pranjal Aggarwal,
Ameet Deshpande,
and Karthik Narasimhan
Extreme classification (XC) involves predicting over large numbers of classes (thousands to millions), with real-world applications like news article classification and e-commerce product tagging. The zero-shot version of this task requires generalization to novel classes without additional supervision. In this paper, we develop SemSup-XC, a model that achieves state-of-the-art zero-shot and few-shot performance on three XC datasets derived from legal, e-commerce, and Wikipedia data. To develop SemSup-XC, we use automatically collected semantic class descriptions to represent classes and facilitate generalization through a novel hybrid matching module that matches input instances to class descriptions using a combination of semantic and lexical similarity. Trained with contrastive learning, SemSup-XC significantly outperforms baselines and establishes state-of-the-art performance on all three datasets considered, gaining up to 12 precision points on zero-shot and more than 10 precision points on one-shot tests, with similar gains for recall@10. Our ablation studies highlight the relative importance of our hybrid matching module and automatically collected class descriptions.
NLP
RL
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao,
Jeffrey Zhao,
Dian Yu,
Nan Du,
Izhak Shafran,
Karthik Narasimhan,
and Yuan Cao
International Conference on Learning Representations (ICLR),
2023
While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples.
NLP
C-STS: Conditional Semantic Textual Similarity
Ameet Deshpande,
Carlos E Jimenez,
Howard Chen,
Vishvak Murahari,
Victoria Graf,
Tanmay Rajpurohit,
Ashwin Kalyan,
Danqi Chen,
and Karthik Narasimhan
Semantic textual similarity (STS) has been a cornerstone task in NLP that measures the degree of similarity between a pair of sentences, with applications in information retrieval, question answering, and embedding methods. However, it is an inherently ambiguous task, with the sentence similarity depending on the specific aspect of interest. We resolve this ambiguity by proposing a novel task called conditional STS (C-STS) which measures similarity conditioned on an aspect elucidated in natural language (hereon, condition). As an example, the similarity between the sentences "The NBA player shoots a three-pointer." and "A man throws a tennis ball into the air to serve." is higher for the condition "The motion of the ball." (both upward) and lower for "The size of the ball." (one large and one small). C-STS’s advantages are two-fold: (1) it reduces the subjectivity and ambiguity of STS, and (2) enables fine-grained similarity evaluation using diverse conditions. C-STS contains almost 20,000 instances from diverse domains and we evaluate several state-of-the-art models to demonstrate that even the most performant fine-tuning and in-context learning models (GPT-4, Flan, SimCSE) find it challenging, with Spearman correlation scores of <50. We encourage the community to evaluate their models on C-STS to provide a more holistic view of semantic similarity and natural language understanding.
NLP
MUX-PLMs: Pre-training Language Models with Data Multiplexing
Vishvak Murahari,
Ameet Deshpande,
Carlos E Jimenez,
Izhak Shafran,
Mingqiu Wang,
Yuan Cao,
and Karthik Narasimhan
The widespread adoption of large language models such as ChatGPT and Bard has led to unprecedented demand for these technologies. The burgeoning cost of inference for ever-increasing model sizes coupled with hardware shortages has limited affordable access and poses a pressing need for efficiency approaches geared towards high throughput and performance. Multi-input multi-output (MIMO) algorithms such as data multiplexing, offer a promising solution with a many-fold increase in throughput by performing inference for multiple inputs at the cost of a single input. Yet these approaches are not currently performant enough to be deployed in modern systems. We change that by developing MUX-PLMs, a class of high throughput pre-trained language models (PLMs) trained with data multiplexing, that can be fine-tuned for any downstream task to yield high-throughput high-performance. Our novel multiplexing and demultiplexing modules proficiently entangle and disentangle inputs, and enable high-performance high throughput MUX-PLMs that are competitive with vanilla PLMs while achieving 2x/5x inference speedup with only a 1−4% drop on a broad suite of tasks.
NLP
Toxicity in chatgpt: Analyzing persona-assigned language models
Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community, with adoption throughout many services like healthcare, therapy, education, and customer service. Since users include people with critical information needs like students or patients engaging with chatbots, the safety of these systems is of prime importance. Therefore, a clear understanding of the capabilities and limitations of LLMs is necessary. To this end, we systematically evaluate toxicity in over half a million generations of ChatGPT, a popular dialogue-based LLM. We find that setting the system parameter of ChatGPT by assigning it a persona, say that of the boxer Muhammad Ali, significantly increases the toxicity of generations. Depending on the persona assigned to ChatGPT, its toxicity can increase up to 6x, with outputs engaging in incorrect stereotypes, harmful dialogue, and hurtful opinions. This may be potentially defamatory to the persona and harmful to an unsuspecting user. Furthermore, we find concerning patterns where specific entities (e.g., certain races) are targeted more than others (3x more) irrespective of the assigned persona, that reflect inherent discriminatory biases in the model. We hope that our findings inspire the broader AI community to rethink the efficacy of current safety guardrails and develop better techniques that lead to robust, safe, and trustworthy AI systems.
NLP
Referral Augmentation for Zero-Shot Information Retrieval
Michael Tang,
Shunyu Yao,
John Yang,
and Karthik Narasimhan
We propose Referral-Augmented Retrieval (RAR), a simple technique that concatenates document indices with referrals, i.e. text from other documents that cite or link to the given document, to provide significant performance gains for zero-shot information retrieval. The key insight behind our method is that referrals provide a more complete, multi-view representation of a document, much like incoming page links in algorithms like PageRank provide a comprehensive idea of a webpage’s importance. RAR works with both sparse and dense retrievers, and outperforms generative text expansion techniques such as DocT5Query and Query2Doc a 37% and 21% absolute improvement on ACL paper retrieval Recall@10 – while also eliminating expensive model training and inference. We also analyze different methods for multi-referral aggregation and show that RAR enables up-to-date information retrieval without re-training.
Anthropomorphization of AI: Opportunities and Risks
Ameet Deshpande,
Tanmay Rajpurohit,
Karthik Narasimhan,
and Ashwin Kalyan
Anthropomorphization is the tendency to attribute human-like traits to non-human entities. It is prevalent in many social contexts – children anthropomorphize toys, adults do so with brands, and it is a literary device. It is also a versatile tool in science, with behavioral psychology and evolutionary biology meticulously documenting its consequences. With widespread adoption of AI systems, and the push from stakeholders to make it human-like through alignment techniques, human voice, and pictorial avatars, the tendency for users to anthropomorphize it increases significantly. We take a dyadic approach to understanding this phenomenon with large language models (LLMs) by studying (1) the objective legal implications, as analyzed through the lens of the recent blueprint of AI bill of rights and the (2) subtle psychological aspects customization and anthropomorphization. We find that anthropomorphized LLMs customized for different user bases violate multiple provisions in the legislative blueprint. In addition, we point out that anthropomorphization of LLMs affects the influence they can have on their users, thus having the potential to fundamentally change the nature of human-AI interaction, with potential for manipulation and negative influence. With LLMs being hyper-personalized for vulnerable groups like children and patients among others, our work is a timely and important contribution. We propose a conservative strategy for the cautious use of anthropomorphization to improve trustworthiness of AI systems.
NLP
PruMUX: Augmenting Data Multiplexing with Model Compression
Yushan Su,
Vishvak Murahari,
Karthik Narasimhan,
and Kai Li
As language models increase in size by the day, methods for efficient inference are critical to leveraging their capabilities for various applications. Prior work has investigated techniques like model pruning, knowledge distillation, and data multiplexing to increase model throughput without sacrificing accuracy. In this paper, we combine two such methods – structured pruning and data multiplexing – to compound the speedup gains obtained by either method. Our approach, PruMUX, obtains up to 7.5-29.5X throughput improvement over BERT-base model with accuracy threshold from 80% to 74%. We further study various combinations of parameters (such as sparsity and multiplexing factor) in the two techniques to provide a comprehensive analysis of the tradeoff between accuracy and throughput in the resulting models. We then propose Auto-PruMUX, a meta-level model that can predict the high-performance parameters for pruning and multiplexing given a desired accuracy loss budget, providing a practical method to leverage the combination effectively.
2022
NLP
ALIGN-MLM: Word Embedding Alignment is Crucial for Multilingual Pre-training
Henry Tang,
Ameet Deshpande,
and Karthik Narasimhan
Multilingual pre-trained models exhibit zero-shot cross-lingual transfer, where a model fine-tuned on a source language achieves surprisingly good performance on a target language. While studies have attempted to understand transfer, they focus only on MLM, and the large number of differences between natural languages makes it hard to disentangle the importance of different properties. In this work, we specifically highlight the importance of word embedding alignment by proposing a pre-training objective (ALIGN-MLM) whose auxiliary loss guides similar words in different languages to have similar word embeddings. ALIGN-MLM either outperforms or matches three widely adopted objectives (MLM, XLM, DICT-MLM) when we evaluate transfer between pairs of natural languages and their counterparts created by systematically modifying specific properties like the script. In particular, ALIGN-MLM outperforms XLM and MLM by 35 and 30 F1 points on POS-tagging for transfer between languages that differ both in their script and word order (left-to-right v.s. right-to-left). We also show a strong correlation between alignment and transfer for all objectives (e.g., rho=0.727 for XNLI), which together with ALIGN-MLM’s strong performance calls for explicitly aligning word embeddings for multilingual models.
NLP
SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers
Ameet Deshpande,
Md Arafat Sultan,
Anthony Ferritto,
Ashwin Kalyan,
Karthik Narasimhan,
and Avirup Sil
Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this paradigm is infeasible for storage-constrained edge devices like mobile phones. In this paper, we propose SPARTAN, a parameter efficient (PE) and computationally fast architecture for edge devices that adds hierarchically organized sparse memory after each Transformer layer. SPARTAN freezes the PLM parameters and fine-tunes only its memory, thus significantly reducing storage costs by re-using the PLM backbone for different tasks. SPARTAN contains two levels of memory, with only a sparse subset of parents being chosen in the first level for each input, and children cells corresponding to those parents being used to compute an output representation. This sparsity combined with other architecture optimizations improves SPARTAN’s throughput by over 90% during inference on a Raspberry Pi 4 when compared to PE baselines (adapters) while also outperforming the latter by 0.1 points on the GLUE benchmark. Further, it can be trained 34% faster in a few-shot setting, while performing within 0.9 points of adapters. Qualitative analysis shows that different parent cells in SPARTAN specialize in different topics, thus dividing responsibility efficiently.
NLP
Controllable Text Generation with Language Constraints
Howard Chen,
Huihan Li,
Danqi Chen,
and Karthik Narasimhan
We consider the task of text generation in language models with constraints specified in natural language. To this end, we first create a challenging benchmark Cognac that provides as input to the model a topic with example text, along with a constraint on text to be avoided. Unlike prior work, our benchmark contains knowledge-intensive constraints sourced from databases like Wordnet and Wikidata, which allows for straightforward evaluation while striking a balance between broad attribute-level and narrow lexical-level controls. We find that even state-of-the-art language models like GPT-3 fail often on this task, and propose a solution to leverage a language model’s own internal knowledge to guide generation. Our method, called CognacGen, first queries the language model to generate guidance terms for a specified topic or constraint, and uses the guidance to modify the model’s token generation probabilities. We propose three forms of guidance (binary verifier, top-k tokens, textual example), and employ prefix-tuning approaches to distill the guidance to tackle diverse natural language constraints. Through extensive empirical evaluations, we demonstrate that CognacGen can successfully generalize to unseen instructions and outperform competitive baselines in generating constraint conforming text.
NLP
RL
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Shunyu Yao,
Howard Chen,
John Yang,
and Karthik Narasimhan
Neural Information Processing Systems (NeurIPS),
2022
Existing benchmarks for grounding language in interactive environments either lack real-world linguistic elements, or prove difficult to scale up due to substantial human involvement in the collection of data or feedback signals. To bridge this gap, we develop WebShop – a simulated e-commerce website environment with 1.18 million real-world products and 12,087 crowd-sourced text instructions. Given a text instruction specifying a product requirement, an agent needs to navigate multiple types of webpages and issue diverse actions to find, customize, and purchase an item. WebShop provides several challenges for language grounding including understanding compositional instructions, query (re-)formulation, comprehending and acting on noisy text in webpages, and performing strategic exploration. We collect over 1,600 human demonstrations for the task, and train and evaluate a diverse range of agents using reinforcement learning, imitation learning, and pre-trained image and language models. Our best model achieves a task success rate of 29%, which outperforms rule-based heuristics (9.6%) but is far lower than human expert performance (59%). We also analyze agent and human trajectories and ablate various model components to provide insights for developing future agents with stronger language understanding and decision making abilities. Finally, we show that agents trained on WebShop exhibit non-trivial sim-to-real transfer when evaluated on this http URL, indicating the potential value of WebShop in developing practical web-based agents that can operate in the wild.
NLP
CV
Semantic Supervision: Enabling Generalization over Output Spaces
Austin W. Hanjie,
Ameet Deshpande,
and Karthik Narasimhan
In this paper, we propose Semantic Supervision (SemSup) - a unified paradigm for training classifiers that generalize over output spaces. In contrast to standard classification, which treats classes as discrete symbols, SemSup represents them as dense vector features obtained from descriptions of classes (e.g., "The cat is a small carnivorous mammal"). This allows the output space to be unbounded (in the space of descriptions) and enables models to generalize both over unseen inputs and unseen outputs (e.g. "The aardvark is a nocturnal burrowing mammal with long ears"). Specifically, SemSup enables four types of generalization, to – (1) unseen class descriptions, (2) unseen classes, (3) unseen super-classes, and (4) unseen tasks. Through experiments on four classification datasets across two variants (multi-class and multi-label), two input modalities (text and images), and two output description modalities (text and JSON), we show that our SemSup models significantly outperform standard supervised models and existing models that leverage word embeddings over class names. For instance, our model outperforms baselines by 40% and 20% precision points on unseen descriptions and classes, respectively, on a news categorization dataset (RCV1). SemSup can serve as a pathway for scaling neural models to large unbounded output spaces and enabling better generalization and model reuse for unseen tasks and domains.
DataMUX: Data Multiplexing for Neural Networks
Vishvak Murahari,
Carlos E. Jimenez,
Runzhe Yang,
and Karthik Narasimhan
Neural Information Processing Systems (NeurIPS),
2022
In this paper, we introduce data multiplexing (DataMUX), a technique that enables deep neural networks to process multiple inputs simultaneously using a single compact representation. DataMUX demonstrates that neural networks are capable of generating accurate predictions over mixtures of inputs, resulting in increased throughput with minimal extra memory requirements. Our approach uses two key components – 1) a multiplexing layer that performs a fixed linear transformation to each input before combining them to create a mixed representation of the same size as a single input, which is then processed by the base network, and 2) a demultiplexing layer that converts the base network’s output back into independent representations before producing predictions for each input. We show the viability of DataMUX for different architectures (Transformers, and to a lesser extent MLPs and CNNs) across six different tasks spanning sentence classification, named entity recognition and image classification. For instance, DataMUX for Transformers can multiplex up to 20x/40x inputs, achieving 11x/18x increase in throughput with minimal absolute performance drops of <2% and <4% respectively on MNLI, a natural language inference task. We also provide a theoretical construction for multiplexing in self-attention networks and analyze the effect of various design elements in DataMUX.
Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines
Sreejan Kumar,
Carlos G. Correa,
Ishita Dasgupta,
Raja Marjieh,
Michael Y. Hu,
Robert D. Hawkins,
Nathaniel D. Daw,
Jonathan D. Cohen,
Karthik Narasimhan,
and Thomas L. Griffiths
Neural Information Processing Systems (NeurIPS),
2022
Strong inductive biases are a key component of human intelligence, allowing people to quickly learn a variety of tasks. Although meta-learning has emerged as an approach for endowing neural networks with useful inductive biases, agents trained by meta-learning may acquire very different strategies from humans. We show that co-training these agents on predicting representations from natural language task descriptions and from programs induced to generate such tasks guides them toward human-like inductive biases. Human-generated language descriptions and program induction with library learning both result in more human-like behavior in downstream meta-reinforcement learning agents than less abstract controls (synthetic language descriptions, program induction without library learning), suggesting that the abstraction supported by these representations is key.
Learning Physics Constrained Dynamics Using Autoencoders
Tsung-Yen Yang,
Justinian P. Rosca,
Karthik Narasimhan,
and Peter Ramadge
Neural Information Processing Systems (NeurIPS),
2022
We consider the problem of estimating states (e.g., position and velocity) and physical parameters (e.g., friction, elasticity) from a sequence of observations when provided a dynamic equation that describes the behavior of the system. The dynamic equation can arise from first principles (e.g., Newton’s laws) and provide useful cues for learning, but its physical parameters are unknown. To address this problem, we propose a model that estimates states and physical parameters of the system using two main components. First, an autoencoder compresses a sequence of observations (e.g., sensor measurements, pixel images) into a sequence for the state representation that is consistent with physics by including a simulation of the dynamic equation. Second, an estimator is coupled with the autoencoder to predict the values of the physical parameters. We also theoretically and empirically show that using Fourier feature mappings improves generalization of the estimator in predicting physical parameters compared to raw state sequences. In our experiments on three visual and one sensor measurement tasks, our model imposes interpretability on latent states and achieves improved generalization performance for long-term prediction of system dynamics over state-of-the-art baselines.
RL
NLP
Leveraging Language for Accelerated Learning of Tool Manipulation
Allen Z. Ren,
Bharat Govil,
Tsung-Yen Yang,
Karthik Narasimhan,
and Anirudha Majumdar
Robust and generalized tool manipulation requires an understanding of the properties and affordances of different tools. We investigate whether linguistic information about a tool (e.g., its geometry, common uses) can help control policies adapt faster to new tools for a given task. We obtain diverse descriptions of various tools in natural language and use pre-trained language models to generate their feature representations. We then perform language-conditioned meta-learning to learn policies that can efficiently adapt to new tools given their corresponding text descriptions. Our results demonstrate that combining linguistic information and meta-learning significantly accelerates tool learning in several manipulation tasks including pushing, lifting, sweeping, and hammering.
CV
Multi-query Video Retrieval
Zeyu Wang,
Yu Wu,
Karthik Narasimhan,
and Olga Russakovsky
Retrieving target videos based on text descriptions is a task of great practical value and has received increasing attention over the past few years. In this paper, we focus on the less-studied setting of multi-query video retrieval, where multiple queries are provided to the model for searching over the video archive. We first show that the multi-query retrieval task is more pragmatic and representative of real-world use cases and better evaluates retrieval capabilities of current models, thereby deserving of further investigation alongside the more prevalent single-query retrieval setup. We then propose several new methods for leveraging multiple queries at training time to improve over simply combining similarity outputs of multiple queries from regular single-query trained models. Our models consistently outperform several competitive baselines over three different datasets. For instance, Recall@1 can be improved by 4.7 points on MSR-VTT, 4.1 points on MSVD and 11.7 points on VATEX over a strong baseline built on the state-of-the-art CLIP4Clip model. We believe further modeling efforts will bring new insights to this direction and spark new systems that perform better in real-world video retrieval applications.
NLP
Can Rationalization Improve Robustness?
Howard Chen,
Jacqueline He,
Karthik Narasimhan,
and Danqi Chen
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL),
2022
A growing line of work has investigated the development of neural NLP models that can produce rationales–subsets of input that can explain their model predictions. In this paper, we ask whether such rationale models can also provide robustness to adversarial attacks in addition to their interpretable nature. Since these models need to first generate rationales ("rationalizer") before making predictions ("predictor"), they have the potential to ignore noise or adversarially added text by simply masking it out of the generated rationale. To this end, we systematically generate various types of ’AddText’ attacks for both token and sentence-level rationalization tasks, and perform an extensive empirical evaluation of state-of-the-art rationale models across five different tasks. Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios–when the rationalizer is sensitive to positional bias or lexical choices of attack text. Further, leveraging human rationale as supervision does not always translate to better performance. Our study is a first step towards exploring the interplay between interpretability and robustness in the rationalize-then-predict framework.
NLP
When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer
Ameet Deshpande,
Partha Talukdar,
and Karthik Narasimhan
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL),
2022
While recent work on multilingual language models has demonstrated their capacity for cross-lingual zero-shot transfer on downstream tasks, there is a lack of consensus in the community as to what shared properties between languages enable such transfer. Analyses involving pairs of natural languages are often inconclusive and contradictory since languages simultaneously differ in many linguistic aspects. In this paper, we perform a large-scale empirical study to isolate the effects of various linguistic properties by measuring zero-shot transfer between four diverse natural languages and their counterparts constructed by modifying aspects such as the script, word order, and syntax. Among other things, our experiments show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order, and there is a strong correlation between transfer performance and word embedding alignment between languages (e.g., R=0.94 on the task of NLI). Our results call for focus in multilingual models on explicitly improving word embedding alignment between languages rather than relying on its implicit emergence.
NLP
CV
CARETS: A Consistency And Robustness Evaluative Test Suite for VQA
Carlos Jimenez,
Olga Russakovsky,
and Karthik Narasimhan
Association for Computational Linguistics (ACL),
2022
We introduce CARETS, a systematic test suite to measure consistency and robustness of modern VQA models through a series of six fine-grained capability tests. In contrast to existing VQA test sets, CARETS features balanced question generation to create pairs of instances to test models, with each pair focusing on a specific capability such as rephrasing, logical symmetry or image obfuscation. We evaluate six modern VQA systems on CARETS and identify several actionable weaknesses in model comprehension, especially with concepts such as negation, disjunction, or hypernym invariance. Interestingly, even the most sophisticated models are sensitive to aspects such as swapping the order of terms in a conjunction or changing the number of answer choices mentioned in the question. We release CARETS to be used as an extensible tool for evaluating multi-modal model robustness.
RL
Multi-Stage Episodic Control for Strategic Exploration in Text Games
Jens Tuyls,
Shunyu Yao,
Sham Kakade,
and Karthik Narasimhan
International Conference on Learning Representations (ICLR) ,
2022
Text adventure games present unique challenges to reinforcement learning methods due to their combinatorially large action spaces and sparse rewards. The interplay of these two factors is particularly demanding because large action spaces require extensive exploration, while sparse rewards provide limited feedback. This work proposes to tackle the explore-vs-exploit dilemma using a multi-stage approach that explicitly disentangles these two strategies within each episode. Our algorithm, called eXploit-Then-eXplore (XTX), begins each episode using an exploitation policy that imitates a set of promising trajectories from the past, and then switches over to an exploration policy aimed at discovering novel actions that lead to unseen state spaces. This policy decomposition allows us to combine global decisions about which parts of the game space to return to with curiosity-based local exploration in that space, motivated by how a human may approach these games. Our method significantly outperforms prior approaches by 27% and 11% average normalized score over 12 games from the Jericho benchmark (Hausknecht et al., 2020) in both deterministic and stochastic settings, respectively. On the game of Zork1, in particular, XTX obtains a score of 103, more than a 2x improvement over prior methods, and pushes past several known bottlenecks in the game that have plagued previous state-of-the-art methods.
NLP
Linking Emergent and Natural Languages via Corpus Transfer
Shunyu Yao,
Mo Yu,
Yang Zhang,
Karthik Narasimhan,
Joshua Tenenbaum,
and Chuang Gan
International Conference on Learning Representations (ICLR) ,
2022
The study of language emergence aims to understand how human languages are shaped by perceptual grounding and communicative intent. Computational approaches to emergent communication (EC) predominantly consider referential games in limited domains and analyze the learned protocol within the game framework. As a result, it remains unclear how the emergent languages from these settings connect to natural languages or provide benefits in real-world language processing tasks, where statistical models trained on large text corpora dominate. In this work, we propose a novel way to establish such a link by corpus transfer, i.e. pretraining on a corpus of emergent language for downstream natural language tasks, which is in contrast to prior work that directly transfers speaker and listener parameters. Our approach showcases non-trivial transfer benefits for two different tasks – language modeling and image captioning. For example, in a low-resource setup (modeling 2 million natural language tokens), pre-training on an emergent language corpus with just 2 million tokens reduces model perplexity by 24.6% on average across ten natural languages. We also introduce a novel metric to predict the transferability of an emergent language by translating emergent messages to natural language captions grounded on the same images. We find that our translation-based metric highly correlates with the downstream performance on modeling natural languages (for instance ρ= 0.83 on Hebrew), while topographic similarity, a popular metric in previous works, shows surprisingly low correlation (0.003), hinting that simple properties like attribute disentanglement from synthetic domains might not capture the full complexities of natural language. Our findings also indicate potential benefits of moving language emergence forward with natural language resources and models.
Revelio: ML-Generated Debugging Queries for Finding Root Causes in Distributed Systems
Pradeep Dogga,
Karthik Narasimhan,
Anirudh Sivaraman,
Shiv Saini,
George Varghese,
and Ravi Netravali
Proceedings of Machine Learning and Systems (MLSys),
2022
A major difficulty in debugging distributed systems lies in manually determining which of the many available debugging tools to use and how to query that tool’s logs. Our own study of a production debugging workflow confirms the magnitude of this burden. This paper explores whether a deep neural network trained on past bug reports and debugging logs can assist developers in distributed systems debugging. We present Revelio, a debugging assistant which takes user reports and system logs as input, and outputs debugging queries that developers can use to find a bug’s root cause. The key challenges lie in (1) combining inputs of different types (e.g., natural language reports and quantitative logs) and (2) generalizing to unseen faults. Revelio addresses these by employ-ing deep neural networks to uniformly embed diverse input sources and potential queries into a high-dimensional vector space. In addition, it exploits observations from production systems to factorize query generation into two computationally and statistically simpler learning tasks. To evaluate Revelio, we built a testbed with multiple distributed applications and debugging tools. By injecting faults and training on logs and reports from 800 Mechanical Turkers, we show that Revelio includes the most helpful query in its predicted list of top-3 relevant queries 96% of the time. Our developer study confirms the utility of Revelio.
2021
NLP
RL
SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark
Victor Zhong,
Austin W. Hanjie,
Sida I. Wang,
Karthik Narasimhan,
and Luke Zettlemoyer
Neural Information Processing Systems (NeurIPS),
2021
Existing work in language grounding typically study single environments. How do we build unified models that apply across multiple environments? We propose the multi-environment Symbolic Interactive Language Grounding benchmark (SILG), which unifies a collection of diverse grounded language learning environments under a common interface. SILG consists of grid-world environments that require generalization to new dynamics, entities, and partially observed worlds (RTFM, Messenger, NetHack), as well as symbolic counterparts of visual worlds that require interpreting rich natural language with respect to complex scenes (ALFWorld, Touchdown). Together, these environments provide diverse grounding challenges in richness of observation space, action space, language specification, and plan complexity. In addition, we propose the first shared model architecture for RL on these environments, and evaluate recent advances such as egocentric local convolution, recurrent state-tracking, entity-centric attention, and pretrained LM using SILG. Our shared architecture achieves comparable performance to environment-specific architectures. Moreover, we find that many recent modelling advances do not result in significant gains on environments other than the one they were designed for. This highlights the need for a multi-environment benchmark. Finally, the best models significantly underperform humans on SILG, which suggests ample room for future work. We hope SILG enables the community to quickly identify new methodologies for language grounding that generalize to a diverse set of environments and their associated challenges.
NLP
RL
Safe Reinforcement Learning with Natural Language Constraints
Tsung-Yen Yang,
Michael Hu,
Yinlam Chow,
Peter J. Ramadge,
and Karthik Narasimhan
Neural Information Processing Systems (NeurIPS),
2021
In this paper, we tackle the problem of learning control policies for tasks when provided with constraints in natural language. In contrast to instruction following, language here is used not to specify goals, but rather to describe situations that an agent must avoid during its exploration of the environment. Specifying constraints in natural language also differs from the predominant paradigm in safe reinforcement learning, where safety criteria are enforced by hand-defined cost functions. While natural language allows for easy and flexible specification of safety constraints and budget limitations, its ambiguous nature presents a challenge when mapping these specifications into representations that can be used by techniques for safe reinforcement learning. To address this, we develop a model that contains two components: (1) a constraint interpreter to encode natural language constraints into vector representations capturing spatial and temporal information on forbidden states, and (2) a policy network that uses these representations to output a policy with minimal constraint violations. Our model is end-to-end differentiable and we train it using a recently proposed algorithm for constrained policy optimization. To empirically demonstrate the effectiveness of our approach, we create a new benchmark task for autonomous navigation with crowd-sourced free-form text specifying three different types of constraints. Our method outperforms several baselines by achieving 6-7 times higher returns and 76% fewer constraint violations on average.
NLP
RL
Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning
Austin W. Hanjie,
Victor Zhong,
and Karthik Narasimhan
International Conference on Machine Learning (ICML),
2021
We consider the problem of leveraging textual descriptions to improve generalization of control policies. We introduce a new multi-task environment Messenger with free-form natural language manuals describing the environment dynamics. In contrast to previous work, Messenger does not assume prior knowledge connecting text and state observations – the control policy must simultaneously learn to ground a natural language manual to entity symbols and dynamics in the environment. In order to learn this challenging grounding, we develop a new model, EMMA (Entity Mapper with Multi-modal Attention) which uses a multi-modal entity-conditioned attention module that allows for selective focus over relevant sentences in the manual for each entity in the environment. EMMA is end-to-end differentiable and can learn a latent grounding of entities and dynamics from text to observations using environment rewards as the only source of supervision. We demonstrate that EMMA achieves successful zero-shot generalization to unseen games with new dynamics, obtaining significantly higher rewards compared to multiple baselines. However, performance on the hardest stage of Messenger remains low, demonstrating the significant challenge in accurately grounding dynamics and the need for additional work in this direction.
RL
Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies
Tsung-Yen Yang,
Justinian Rosca,
Karthik Narasimhan,
and Peter J. Ramadge
International Conference on Machine Learning (ICML),
2021
We consider the problem of reinforcement learning when provided with (1) a baseline control policy and (2) a set of constraints that the controlled system must satisfy. The baseline policy can arise from a teacher agent, demonstration data or even a heuristic while the constraints might encode safety, fairness or other application-specific requirements. Importantly, the baseline policy may be sub-optimal for the task at hand, and is not guaranteed to satisfy the specified constraints. The key challenge therefore lies in effectively leveraging the baseline policy for faster learning, while still ensuring that the constraints are minimally violated. To reconcile these potentially competing aspects, we propose an iterative policy optimization algorithm that alternates between maximizing expected return on the task, minimizing distance to the baseline policy, and projecting the policy onto the constraint-satisfying set. We analyze the convergence of our algorithm theoretically and provide a finite-time guarantee. In our empirical experiments on five different control tasks, our algorithm consistently outperforms several state-of-the-art methods, achieving 10 times fewer constraint violations and 40% higher reward on average.
NLP
RL
Improving Dialog Systems for Negotiation with Personality Modeling
Runzhe Yang*,
Jingxiao Chen*,
and Karthik Narasimhan
Association for Computational Linguistics (ACL),
2021
In this paper, we explore the ability to model and infer personality types of opponents, predict their responses, and use this information to adapt a dialog agent’s high-level strategy in negotiation tasks. Inspired by the idea of incorporating a theory of mind (ToM) into machines, we introduce a probabilistic formulation to encapsulate the opponent’s personality type during both learning and inference. We test our approach on the CraigslistBargain dataset and show that our method using ToM inference achieves a 20% higher dialog agreement rate compared to baselines on a mixed population of opponents. We also demonstrate that our model displays diverse negotiation behavior with different types of opponents.
NLP
Self-Attention Networks Can Process Bounded Hierarchical Languages
Shunyu Yao,
Binghui Peng,
Christos Papadimitriou,
and Karthik Narasimhan
Association for Computational Linguistics (ACL),
2021
Despite their impressive performance in NLP, self-attention networks were recently proved to be limited for processing formal languages with hierarchical structure, such as Dyck_k, the language consisting of well-nested parentheses of k types. This suggested that natural language can be approximated well with models that are too weak for formal languages, or that the role of hierarchy and recursion in natural language might be limited. We qualify this implication by proving that self-attention networks can process Dyck_k, D, the subset of Dyck_k with depth bounded by D, which arguably better captures the bounded hierarchical structure of natural language. Specifically, we construct a hard-attention network with D+1 layers and O(log k) memory size (per token per layer) that recognizes Dyck_k, D, and a soft-attention network with two layers and O(log k) memory size that generates Dyck_k, D. Experiments show that self-attention networks trained on Dyck_k, D generalize to longer inputs with near-perfect accuracy, and also verify the theoretical memory advantage of self-attention networks over recurrent networks.
Connecting Context-specific Adaptation in Humans to Meta-learning
Rachit Dubey,
Erin Grant,
Michael Luo,
Karthik Narasimhan,
and Thomas Griffiths
Cognitive control, the ability of a system to adapt to the demands of a task, is an integral part of cognition. A widely accepted fact about cognitive control is that it is context-sensitive: Adults and children alike infer information about a task’s demands from contextual cues and use these inferences to learn from ambiguous cues. However, the precise way in which people use contextual cues to guide adaptation to a new task remains poorly understood. This work connects the context-sensitive nature of cognitive control to a method for meta-learning with context-conditioned adaptation. We begin by identifying an essential difference between human learning and current approaches to meta-learning: In contrast to humans, existing meta-learning algorithms do not make use of task-specific contextual cues but instead rely exclusively on online feedback in the form of task-specific labels or rewards. To remedy this, we introduce a framework for using contextual information about a task to guide the initialization of task-specific models before adaptation to online feedback. We show how context-conditioned meta-learning can capture human behavior in a cognitive task and how it can be scaled to improve the speed of learning in various settings, including few-shot classification and low-sample reinforcement learning. Our work demonstrates that guiding meta-learning with task information can capture complex, human-like behavior, thereby deepening our understanding of cognitive control.
NLP
RL
Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents
Shunyu Yao,
Karthik Narasimhan,
and Matthew Hausknecht
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL),
2021
Text-based games simulate worlds and interact with players using natural language. Recent work has used them as a testbed for autonomous language-understanding agents, with the motivation being that understanding the meanings of words or semantics is a key component of how humans understand, reason, and act in these worlds.
However, it remains unclear to what extent artificial agents utilize semantic understanding of the text. To this end, we perform experiments to systematically reduce the amount of semantic information available to a learning agent.
Surprisingly, we find that an agent is capable of achieving high scores even in the complete absence of language semantics, indicating that the currently popular experimental setup and models may be poorly designed to understand and leverage game texts. To remedy this deficiency, we propose an inverse dynamics decoder to regularize the representation space and encourage exploration, which shows improved performance on several games including Zork I. We discuss the implications of our findings for designing future agents with stronger semantic understanding.
NLP
Universal Adversarial Attacks with Natural Triggers for Text Classification
Liwei Song*,
Xinwei Yu*,
Hsuan-Tung Peng*,
and Karthik Narasimhan
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL),
2021
Recent work has demonstrated the vulnerability of modern text classifiers to universal adversarial attacks, which are input-agnostic sequence of words added to any input instance. Despite being highly successful, the word sequences produced in these attacks are often unnatural, do not carry much semantic meaning, and can be easily distinguished from natural text. In this paper, we develop adversarial attacks that appear closer to natural English phrases and yet confuse classification systems when added to benign inputs. To achieve this, we leverage an adversarially regularized autoencoder (ARAE) to generate triggers and propose a gradient-based search method to output natural text that fools a target classifier. Experiments on two different classification tasks demonstrate the effectiveness of our attacks while also being less identifiable than previous approaches on three simple detection metrics.
NLP
RL
Learning Rewards from Linguistic Feedback
Theodore R. Sumers,
Mark K. Ho,
Robert D. Hawkins,
Karthik Narasimhan,
and Thomas L. Griffiths
Thirty-Fifth AAAI Conference on Artificial Intelligence,
2021
We explore unconstrained natural language feedback as a learning signal for artificial agents. Humans use rich and varied language to teach, yet most prior work on interactive learning from language assumes a particular form of input (e.g. commands). We propose a general framework which does not make this assumption. We decompose linguistic feedback into two components: a grounding to features of a Markov decision process and sentiment about those features. We then perform an analogue of inverse reinforcement learning, regressing the teacher’s sentiment on the features to infer their latent reward function. To evaluate our approach, we first collect a corpus of teaching behavior in a cooperative task where both teacher and learner are human. We use our framework to implement two artificial learners: a simple "literal" model and a "pragmatic" model with additional inductive biases. We baseline these with a neural network trained end-to-end to predict latent rewards. We then repeat our initial experiment pairing human teachers with our models. We find our "literal" and "pragmatic" models successfully learn from live human feedback and offer statistically-significant performance gains over the end-to-end baseline, with the "pragmatic" model approaching human performance on the task. Inspection reveals the end-to-end network learns representations similar to our models, suggesting they reflect emergent properties of the data. Our work thus provides insight into the information structure of naturalistic linguistic feedback as well as methods to leverage it for reinforcement learning.
RL
m-Stage Epsilon-Greedy Exploration for Reinforcement Learning
Rohan Rao,
and Karthik Narasimhan
AAAI-21 Workshop on Reinforcement Learning in Games,
2021
Efficient exploration of the environment is a major challenge for reinforcement learning agents, especially in sparse reward
settings. This is evident from the fact that simple schemes such as eps-greedy remain competitive with more complicated
algorithms for exploration. In this paper, we propose a generalization of eps-greedy, called m-stage eps-greedy in which eps increases within each episode but decreases between episodes. This ensures that by the time an agent gets to explore the later states within an episode, eps has not decayed too much to do any meaningful exploration. We provide theoretical results motivating the use of our algorithm in task based environments, and provide experimental evidence in two types of environments demonstrating the effectiveness of our method.
2020
NLP
RL
Keep CALM and Explore: Language Models for Action Generation in Text-based Games
Shunyu Yao,
Rohan Rao,
Matthew Hausknecht,
and Karthik Narasimhan
Empirical Methods in Natural Language Processing (EMNLP),
2020
Text-based games present a unique challenge for autonomous agents to operate in natural language and handle enormous action spaces. In this paper, we propose the Contextual Action Language Model (CALM) to generate a compact set of action candidates at each game state. Our key insight is to train language models on human gameplay, where people demonstrate linguistic priors and a general game sense for promising actions conditioned on game history. We combine CALM with a reinforcement learning agent which re-ranks the generated action candidates to maximize ingame rewards. We evaluate our approach using the Jericho benchmark (Hausknecht et al.,
2019a), on games unseen by CALM during training. Our method obtains a 69% relative improvement in average game score over the previous state-of-the-art model. Surprisingly, on half of these games, CALM is competitive with or better than other models that have access to ground truth admissible actions.
NLP
Guiding Attention for Self-Supervised Learning with Transformers
Ameet Deshpande,
and Karthik Narasimhan
Findings of Empirical Methods in Natural Language Processing (EMNLP),
2020
In this paper, we propose a simple and effective technique to allow for efficient selfsupervised learning with bi-directional Transformers. Our approach is motivated by recent studies demonstrating that self-attention patterns in trained models contain a majority of non-linguistic regularities. We propose a computationally efficient auxiliary loss function to guide attention heads to conform to such patterns. Our method is agnostic to the actual pretraining objective and results in faster convergence of models as well as better performance on downstream tasks compared to the baselines, even achieving state of the art results on a low-resource language. We conclude with a surprising finding that linguistic properties of attention heads are not necessarily correlated with language modeling performance.
NLP
Robust and Interpretable Grounding of Spatial References with Relation Networks
Tsung-Yen Yang,
Andrew S. Lan,
and Karthik Narasimhan
Findings of Empirical Methods in Natural Language Processing (EMNLP),
2020
Handling spatial references in natural language is a key challenge in tasks like autonomous navigation and robotic manipulation. Recent work has investigated various neural architectures for learning multi-modal representations of spatial concepts that generalize well across a variety of observations and text instructions. In this work, we develop accurate models for understanding spatial references in text that are also robust and interpretable. We design a text-conditioned relation network whose parameters are dynamically computed with a cross-modal attention module to capture fine-grained spatial relations between entities. Our experiments across
three different prediction tasks demonstrate the effectiveness of our model compared to existing state-of-the-art systems. Our model is robust to both observational and instructional noise, and lends itself to easy interpretation through visualization of intermediate outputs.
NLP
CV
Multimodal Graph Networks for Compositional Generalization in Visual Question Answering
Raeid Saqur,
and Karthik Narasimhan
Neural Information Processing Systems (NeurIPS),
2020
Compositional generalization is a key challenge in grounding natural language to visual perception. While deep learning models have achieved great success in multimodal tasks like visual question answering, recent studies have shown that they fail to generalize to new inputs that are simply an unseen combination of those seen in the training distribution. In this paper, we propose to tackle this challenge by employing neural factor graphs to induce a tighter coupling between concepts in different modalities (e.g. images and text). Graph representations are inherently compositional in nature and allow us to capture entities, attributes and relations in a scalable manner. Our model first creates a multimodal graph, processes it with a graph neural network to induce a factor correspondence matrix, and then outputs a symbolic program to predict answers to questions. Empirically, our model achieves close to perfect scores on a caption truth prediction problem and state-of-the-art results on the recently introduced CLOSURE dataset, improving on the mean overall accuracy across seven compositional templates by 4.77% over previous approaches.
NLP
CV
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Zhiwei Deng,
Karthik Narasimhan,
and Olga Russakovsky
Neural Information Processing Systems (NeurIPS),
2020
The ability to perform effective planning is crucial for building an instruction-following agent. When navigating through a new environment, an agent is challenged with (1) connecting the natural language instructions with its progressively growing knowledge of the world; and (2) performing long-range planning and decision making in the form of effective exploration and error correction. Current methods are still limited on both fronts despite extensive efforts. In this paper, we introduce the Evolving Graphical Planner (EGP), a model that performs global planning for navigation based on raw sensory input. The model dynamically constructs a graphical representation, generalizes the action space to allow for more flexible decision making, and performs efficient planning on a proxy graph representation. We evaluate our model on a challenging Vision-and-Language Navigation (VLN) task with photorealistic images and achieve superior performance compared to previous navigation architectures. For instance, we achieve a 53% success rate on the test split of the Room-to-Room navigation task through pure imitation learning, outperforming previous navigation architectures by up to 5%.
NLP
CV
Towards Unique and Informative Captioning of Images
Zeyu Wang,
Berthy Feng,
Karthik Narasimhan,
and Olga Russakovsky
European Conference on Computer Vision (ECCV),
2020
Despite considerable progress, state of the art image captioning models produce generic captions, leaving out important image details. Furthermore, these systems may even misrepresent the image in order to produce a simpler caption consisting of common concepts. In this paper, we first analyze both modern captioning systems and evaluation metrics through empirical experiments to quantify these phenomena. We find that modern captioning systems return higher likelihoods for incorrect distractor sentences compared to ground truth captions, and that evaluation metrics like SPICE can be ’topped’ using simple captioning systems relying on object detectors. Inspired by these observations, we design a new metric (SPICE-U) by introducing a notion of uniqueness over the concepts generated in a caption. We show that SPICE-U is better correlated with human judgements compared to SPICE, and effectively captures notions of diversity and descriptiveness. Finally, we also demonstrate a general technique to improve any existing captioning model – by using mutual information as a re-ranking objective during decoding. Empirically, this results in more unique and informative captions, and improves two different state-of-the-art models on SPICE-U as well as average score over existing metrics.
NLP
Calibration, Entropy Rates, and Memory in Language Models
Mark Braverman,
Xinyi Chen,
Sham Kakade,
Karthik Narasimhan,
Cyril Zhang,
and Yi Zhang
International Conference on Machine Learning (ICML),
2020
Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that state-of-the-art language models, including LSTMs and Transformers, are miscalibrated: the entropy rates of their generations drift dramatically upward over time. We then provide provable methods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction.
RL
Projection Based Constrained Policy Optimization
Tsung-Yen Yang,
Justinian Rosca,
Karthik Narasimhan,
and Peter J. Ramadge
International Conference on Learning Representations (ICLR) ,
2020
In this paper, we consider the problem of learning control policies that optimize a reward function while satisfying constraints due to considerations of safety, fairness, or other costs. We propose a new algorithm - Projection Based Constrained Policy Optimization (PCPO), an iterative method for optimizing policies in a two-step process - the first step performs an unconstrained update while the second step reconciles the constraint violation by projection the policy back onto the constraint set. We theoretically analyze PCPO and provide a lower bound on reward improvement, as well as an upper bound on constraint violation for each policy update. We further characterize the convergence of PCPO with projection based on two different metrics - L2 norm and Kullback-Leibler divergence. Our empirical results over several control tasks demonstrate that our algorithm achieves superior performance, averaging more than 3.5 times less constraint violation and around 15% higher reward compared to state-of-the-art methods.
NLP
CV
Take the scenic route: improving generalization in vision-and-language navigation
Felix Yu,
Zhiwei Deng,
Karthik Narasimhan,
and Olga Russakovsky
CVPR Visual Learning with Limited Labels Workshop,
2020
In the Vision-and-Language Navigation (VLN) task, an agent with egocentric vision navigates to a destination given natural language instructions. The act of manually annotating these instructions is timely and expensive, such that many existing approaches automatically generate additional samples to improve agent performance. However, these approaches still have difficulty generalizing their performance to new environments. In this work, we investigate the popular Room-to-Room (R2R) VLN benchmark and discover that what is important is not only the amount of data you synthesize, but also how you do it. We find that shortest path sampling, which is used by both the R2R benchmark and existing augmentation methods, encode biases in the action space of the agent which we dub as action priors. We then show that these action priors offer one explanation toward the poor generalization of existing works. To mitigate such priors, we propose a path sampling method based on random walks to augment the data. By training with this augmentation strategy, our agent is able to generalize better to unknown environments compared to the baseline, significantly improving model performance in the process.
2019
RL
A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation
Runzhe Yang,
Xingyuan Sun,
and Karthik Narasimhan
Neural Information Processing Systems (NeurIPS),
2019
We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions. We propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences. After this initial learning phase, our agent can quickly adapt to any given preference, or automatically infer an underlying preference with very few samples. Experiments across four different domains demonstrate the effectiveness of our approach.
RL
Task-Agnostic Dynamics Priors for Deep Reinforcement Learning
Yilun Du,
and Karthik Narasimhan
International Conference on Machine Learning (ICML),
2019
While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is often challenging and requires substantial interaction with the environment. A wide variety of domains have dynamics that share common foundations like the laws of classical mechanics, which are rarely exploited by existing algorithms. In fact, humans continuously acquire and use such dynamics priors to easily adapt to operating in new environments. In this work, we propose an approach to learn task-agnostic dynamics priors from videos and incorporate them into an RL agent. Our method involves pre-training a frame predictor on task-agnostic physics videos to initialize dynamics models (and fine-tune them) for unseen target environments. Our frame prediction architecture, SpatialNet, is designed specifically to capture localized physical phenomena and interactions. Our approach allows for both faster policy learning and convergence to better policies, outperforming competitive approaches on several different environments. We also demonstrate that incorporating this prior allows for more effective transfer between environments.
2018
NLP
RL
Deep Transfer in Reinforcement Learning by Language Grounding
Karthik Narasimhan,
Regina Barzilay,
and Tommi Jaakkola
Journal of Artificial Intelligence Research (JAIR),
2018
In this paper, we explore the utilization of natural language to drive transfer for reinforcement learning (RL). Despite the wide-spread application of deep RL techniques, learning generalized policy representations that work across domains remains a challenging problem. We demonstrate that textual descriptions of environments provide a compact intermediate channel to facilitate effective policy transfer. We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized representation to effectively utilize entity descriptions. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments.
NLP
Improving language understanding by generative pre-training
Alec Radford,
Karthik Narasimhan,
Tim Salimans,
and Ilya Sutskever
Natural language understanding comprises a wide range of diverse tasks such
as textual entailment, question answering, semantic similarity assessment, and
document classification. Although large unlabeled text corpora are abundant,
labeled data for learning these specific tasks is scarce, making it challenging for
discriminatively trained models to perform adequately. We demonstrate that large
gains on these tasks can be realized by generative pre-training of a language model
on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each
specific task. In contrast to previous approaches, we make use of task-aware input
transformations during fine-tuning to achieve effective transfer while requiring
minimal changes to the model architecture. We demonstrate the effectiveness of
our approach on a wide range of benchmarks for natural language understanding.
Our general task-agnostic model outperforms discriminatively trained models that
use architectures specifically crafted for each task, significantly improving upon the
state of the art in 9 out of the 12 tasks studied. For instance, we achieve absolute
improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on
question answering (RACE), and 1.5% on textual entailment (MultiNLI).
2017
NLP
RL
Grounding Natural Language with Autonomous Interaction
The resurgence of deep neural networks has resulted in impressive advances in natural language processing (NLP). This success, however, is contingent on access to large amounts of structured supervision, often manually constructed and unavailable for many applications and domains. In this thesis, I present novel computational models that integrate reinforcement learning with language understanding to induce grounded representations of semantics. Using unstructured feedback, these techniques not only enable task-optimized representations which reduce dependence on high quality annotations, but also exploit language in adapting control policies across different environments.
First, I describe an approach for learning to play text-based games, where all interaction is through natural language and the only source of feedback is in-game rewards. Employing a deep reinforcement learning framework to jointly learn state representations and action policies, our model outperforms several baselines on different domains, demonstrating the importance of learning expressive representations.
Second, I exhibit a framework for utilizing textual descriptions to tackle the challenging problem of cross-domain policy transfer for reinforcement learning (RL). We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized state representation to effectively make use of text. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments.
Finally, I demonstrate how reinforcement learning can enhance traditional NLP systems in low resource scenarios. In particular, I describe an autonomous agent that can learn to acquire and integrate external information to enhance information extraction. Our experiments on two databases – shooting incidents and food adulteration cases – demonstrate that our system significantly improves over traditional extractors and a competitive meta-classifier baseline.
NLP
RL
Representation Learning for Grounded Spatial Reasoning
Michael Janner,
Karthik Narasimhan,
and Regina Barzilay
Transactions of the Association for Computational Linguistics (TACL),
2017
The interpretation of spatial references is highly contextual, requiring joint inference over both language and the environment. We consider the task of spatial reasoning in a simulated environment, where an agent can act and receive rewards. The proposed model learns a representation of the world steered by instruction text. This design allows for precise alignment of local neighborhoods with corresponding verbalizations, while also handling global references in the instructions. We train our model with reinforcement learning using a variant of generalized value iteration. The model outperforms state-of-the-art approaches on several metrics, yielding a 45% reduction in goal localization error.
NLP
Unsupervised Learning of Morphological Forests
Jiaming Luo,
Karthik Narasimhan,
and Regina Barzilay
Transactions of the Association for Computational Linguistics (TACL),
2017
This paper focuses on unsupervised modeling of morphological families, collectively comprising a forest over the language vocabulary. This formulation enables us to capture edge-wise properties reflecting single-step morphological derivations, along with global distributional properties of the entire forest. These global properties constrain the size of the affix set and encourage formation of tight morphological families. The resulting objective is solved using Integer Linear Programming (ILP) paired with contrastive estimation. We train the model by alternating between optimizing the local log-linear model and the global ILP objective. We evaluate our system on three tasks: root detection, clustering of morphological families and segmentation. Our experiments demonstrate that our model yields consistent gains in all three tasks compared with the best published results.
NLP
Constructing sub-word units for Spoken Term Detection
Charl Heerden,
Damianos Karakos,
Karthik Narasimhan,
Marelie Davel,
and Richard Schwartz
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
2017
Spoken term detection, especially of out-of-vocabulary (OOV) keywords, benefits from the use of sub-word systems. We experiment with different language-independent approaches to sub-word unit generation, generating both syllable-like and morpheme-like units, and demonstrate how the performance of syllable-like units can be improved by artificially increasing the number of unique units. The effect of unit choice is empirically evaluated using the eight languages from the 2016 IARPA BABEL evaluation.
2016
NLP
RL
Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
Karthik Narasimhan,
Adam Yala,
and Regina Barzilay
Empirical Methods in Natural Language Processing (EMNLP),
2016
Most successful information extraction systems
operate with access to a large collection
of documents. In this work, we explore
the task of acquiring and incorporating external
evidence to improve extraction accuracy
in domains where the amount of training data
is scarce. This process entails issuing search
queries, extraction from new sources and reconciliation
of extracted values, which are repeated
until sufficient evidence is collected.
We approach the problem using a reinforcement
learning framework where our model
learns to select optimal actions based on contextual
information. We employ a deep Qnetwork,
trained to optimize a reward function
that reflects extraction accuracy while penalizing
extra effort. Our experiments on
two databases – of shooting incidents, and
food adulteration cases – demonstrate that our
system significantly outperforms traditional
extractors and a competitive meta-classifier
baseline.
RL
Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation
Tejas D Kulkarni*,
Karthik R Narasimhan*,
Ardavan Saeedi,
and Joshua B Tenenbaum
Neural Information Processing Systems (NIPS),
2016
Learning goal-directed behavior in environments with sparse feedback is a major
challenge for reinforcement learning algorithms. One of the key difficulties is insufficient
exploration, resulting in an agent being unable to learn robust policies.
Intrinsically motivated agents can explore new behavior for their own sake rather
than to directly solve external goals. Such intrinsic behaviors could eventually
help the agent solve tasks posed by the environment. We present hierarchical-
DQN (h-DQN), a framework to integrate hierarchical action-value functions, operating
at different temporal scales, with goal-driven intrinsically motivated deep
reinforcement learning. A top-level q-value function learns a policy over intrinsic
goals, while a lower-level function learns a policy over atomic actions to satisfy
the given goals. h-DQN allows for flexible goal specifications, such as functions
over entities and relations. This provides an efficient space for exploration in
complicated environments. We demonstrate the strength of our approach on two
problems with very sparse and delayed feedback: (1) a complex discrete stochastic
decision process with stochastic transitions, and (2) the classic ATARI game –
‘Montezuma’s Revenge’.
NLP
Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge
Nicholas Locascio,
Karthik Narasimhan,
Eduardo DeLeon,
Nate Kushman,
and Regina Barzilay
Empirical Methods in Natural Language Processing (EMNLP),
2016
This paper explores the task of translating natural
language queries into regular expressions
which embody their meaning. In contrast to
prior work, the proposed neural model does
not utilize domain-specific crafting, learning
to translate directly from a parallel corpus.
To fully explore the potential of neural models,
we propose a methodology for collecting
a large corpus1 of regular expression, natural
language pairs. Our resulting model achieves
a performance gain of 19.6% over previous
state-of-the-art models.
NLP
Nonparametric Spherical Topic Modeling with Word Embeddings
Kayhan Batmanghelich,
Ardavan Saeedi,
Karthik Narasimhan,
and Sam Gershman
Association for Computational Linguistics (ACL),
2016
Traditional topic models do not account
for semantic regularities in language.
Recent distributional representations of
words exhibit semantic consistency over
directional metrics such as cosine similarity.
However, neither categorical nor
Gaussian observational distributions used
in existing topic models are appropriate to
leverage such correlations. In this paper,
we propose to use the von Mises-Fisher
distribution to model the density of words
over a unit sphere. Such a representation is
well-suited for directional data. We use a
Hierarchical Dirichlet Process for our base
topic model and propose an efficient inference
algorithm based on Stochastic Variational
Inference. This model enables us
to naturally exploit the semantic structures
of word embeddings while flexibly discovering
the number of topics. Experiments
demonstrate that our method outperforms
competitive approaches in terms of topic
coherence on two different text corpora
while offering efficient inference.
2015
NLP
RL
Language understanding for text-based games using deep reinforcement learning
Karthik Narasimhan*,
Tejas Kulkarni*,
and Regina Barzilay
Empirical Methods in Natural Language Processing (EMNLP),
2015
In this paper, we consider the task of learning
control policies for text-based games.
In these games, all interactions in the virtual
world are through text and the underlying
state is not observed. The resulting
language barrier makes such environments
challenging for automatic game
players. We employ a deep reinforcement
learning framework to jointly learn state
representations and action policies using
game rewards as feedback. This framework
enables us to map text descriptions
into vector representations that capture the
semantics of the game states. We evaluate
our approach on two game worlds,
comparing against baselines using bag-ofwords
and bag-of-bigrams for state representations.
Our algorithm outperforms
the baselines on both worlds demonstrating
the importance of learning expressive
representations.
NLP
An Unsupervised Method for Uncovering Morphological Chains
Karthik Narasimhan,
Regina Barzilay,
and Tommi Jaakkola
Transactions of the Association for Computational Linguistics (TACL)
2015
Most state-of-the-art systems today produce
morphological analysis based only on orthographic
patterns. In contrast, we propose a
model for unsupervised morphological analysis
that integrates orthographic and semantic
views of words. We model word formation
in terms of morphological chains, from
base words to the observed words, breaking
the chains into parent-child relations. We use
log-linear models with morpheme and wordlevel
features to predict possible parents, including
their modifications, for each word.
The limited set of candidate parents for each
word render contrastive estimation feasible.
Our model consistently matches or outperforms
five state-of-the-art systems on Arabic,
English and Turkish.
NLP
Machine Comprehension with Discourse Relations
Karthik Narasimhan,
and Regina Barzilay
Association for Computational Linguistics (ACL),
2015
This paper proposes a novel approach
for incorporating discourse information
into machine comprehension applications.
Traditionally, such information is computed
using off-the-shelf discourse analyzers.
This design provides limited opportunities
for guiding the discourse parser
based on the requirements of the target
task. In contrast, our model induces relations
between sentences while optimizing
a task-specific objective. This approach
enables the model to benefit from
discourse information without relying on
explicit annotations of discourse structure
during training. The model jointly identifies
relevant sentences, establishes relations
between them and predicts an answer.
We implement this idea in a discriminative
framework with hidden variables
that capture relevant sentences and relations
unobserved during training. Our experiments
demonstrate that the discourse
aware model outperforms state-of-the-art
machine comprehension systems.
JUMP-Means: Small-Variance Asymptotics for Markov Jump Processes
Jonathan H Huggins,
Karthik Narasimhan,
Ardavan Saeedi,
and Vikash K Mansinghka
International Conference on Machine Learning (ICML),
2015
Markov jump processes (MJPs) are used to
model a wide range of phenomena from disease
progression to RNA path folding. However,
maximum likelihood estimation of parametric
models leads to degenerate trajectories
and inferential performance is poor in nonparametric
models. We take a small-variance asymptotics
(SVA) approach to overcome these limitations.
We derive the small-variance asymptotics
for parametric and nonparametric MJPs for both
directly observed and hidden state models. In
the parametric case we obtain a novel objective
function which leads to non-degenerate trajectories.
To derive the nonparametric version we introduce
the gamma-gamma process, a novel extension
to the gamma-exponential process. We
propose algorithms for each of these formulations,
which we call JUMP-means. Our experiments
demonstrate that JUMP-means is competitive
with or outperforms widely used MJP inference
approaches in terms of both speed and
reconstruction accuracy
2014
NLP
Morphological Segmentation for Keyword Spotting
Karthik Narasimhan,
Damianos Karakos,
Richard Schwartz,
Stavros Tsakalidis,
and Regina Barzilay
Empirical Methods in Natural Language Processing (EMNLP),
2014
We explore the impact of morphological
segmentation on keyword spotting
(KWS). Despite potential benefits, stateof-the-art
KWS systems do not use morphological
information. In this paper,
we augment a state-of-the-art KWS system
with sub-word units derived from supervised
and unsupervised morphological
segmentations, and compare with phonetic
and syllabic segmentations. Our experiments
demonstrate that morphemes improve
overall performance of KWS systems.
Syllabic units, however, rival the
performance of morphological units when
used in KWS. By combining morphological,
phonetic and syllabic segmentations,
we demonstrate substantial performance
gains.
2012
Modeling human bounded rationality to improve defender strategies in network security games
Rong Yang,
Fei Fang,
Albert Xin Jiang,
Karthik Rajagopal,
Milind Tambe,
and Rajiv Maheswaran
In a Network Security Game (NSG), security agencies must allocate
limited resources to protect targets embedded in a network,
such as important buildings in a city road network. A recent line
of work relaxed the perfect-rationality assumption of human adversary
and showed significant advantages of incorporating the bounded
rationality adversary models in non-networked security domains.
Given that real-world NSG are often extremely complex and
hence very difficult for humans to solve, it is critical that we address
human bounded rationality when designing defender strategies. To
that end, the key contributions of this paper include: (i) comprehensive
experiments with human subjects using a web-based game
that we designed to simulate NSGs; (ii) new behavioral models of
human adversary in NSGs, which we train with the data collected
from human experiments; (iii) new algorithms for computing the
defender optimal strategy against the new models.