Karthik Narasimhan | publications

2023

NLP
Tree of thoughts: Deliberate problem solving with large language models

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L Griffiths, Yuan Cao, and Karthik Narasimhan

ArXiv, 2023

Abs Bib Paper

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models’ problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%.
@inproceedings{yao2023tree, bibtex_show = {true}, title = {Tree of thoughts: Deliberate problem solving with large language models}, author = {Yao, Shunyu and Yu, Dian and Zhao, Jeffrey and Shafran, Izhak and Griffiths, Thomas L and Cao, Yuan and Narasimhan, Karthik}, booktitle = {ArXiv}, html = {https://arxiv.org/abs/2305.10601}, tag = {NLP}, year = {2023} }
NLP
SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification

Pranjal Aggarwal, Ameet Deshpande, and Karthik Narasimhan

ICML, 2023

Abs Bib Paper

Extreme classification (XC) involves predicting over large numbers of classes (thousands to millions), with real-world applications like news article classification and e-commerce product tagging. The zero-shot version of this task requires generalization to novel classes without additional supervision. In this paper, we develop SemSup-XC, a model that achieves state-of-the-art zero-shot and few-shot performance on three XC datasets derived from legal, e-commerce, and Wikipedia data. To develop SemSup-XC, we use automatically collected semantic class descriptions to represent classes and facilitate generalization through a novel hybrid matching module that matches input instances to class descriptions using a combination of semantic and lexical similarity. Trained with contrastive learning, SemSup-XC significantly outperforms baselines and establishes state-of-the-art performance on all three datasets considered, gaining up to 12 precision points on zero-shot and more than 10 precision points on one-shot tests, with similar gains for recall@10. Our ablation studies highlight the relative importance of our hybrid matching module and automatically collected class descriptions.
@inproceedings{aggarwal2023semsup, bibtex_show = {true}, title = {SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification}, author = {Aggarwal, Pranjal and Deshpande, Ameet and Narasimhan, Karthik}, booktitle = {ICML}, html = {https://arxiv.org/abs/2301.11309}, year = {2023}, tag = {NLP} }
NLP RL
ReAct: Synergizing Reasoning and Acting in Language Models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao

International Conference on Learning Representations (ICLR), 2023
Oral (Top 5%)

Abs Bib Paper Website

While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples.
@inproceedings{yao2023react, bibtex_show = {true}, title = {ReAct: Synergizing Reasoning and Acting in Language Models}, author = {Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2023}, html = {https://arxiv.org/abs/2210.03629}, website = {https://react-lm.github.io/}, tag = {NLP}, tagg = {RL}, misc = {Oral (Top 5%)} }
NLP
C-STS: Conditional Semantic Textual Similarity

Ameet Deshpande, Carlos E Jimenez, Howard Chen, Vishvak Murahari, Victoria Graf, Tanmay Rajpurohit, Ashwin Kalyan, Danqi Chen, and Karthik Narasimhan

ArXiv, 2023

Abs Bib Paper

Semantic textual similarity (STS) has been a cornerstone task in NLP that measures the degree of similarity between a pair of sentences, with applications in information retrieval, question answering, and embedding methods. However, it is an inherently ambiguous task, with the sentence similarity depending on the specific aspect of interest. We resolve this ambiguity by proposing a novel task called conditional STS (C-STS) which measures similarity conditioned on an aspect elucidated in natural language (hereon, condition). As an example, the similarity between the sentences "The NBA player shoots a three-pointer." and "A man throws a tennis ball into the air to serve." is higher for the condition "The motion of the ball." (both upward) and lower for "The size of the ball." (one large and one small). C-STS’s advantages are two-fold: (1) it reduces the subjectivity and ambiguity of STS, and (2) enables fine-grained similarity evaluation using diverse conditions. C-STS contains almost 20,000 instances from diverse domains and we evaluate several state-of-the-art models to demonstrate that even the most performant fine-tuning and in-context learning models (GPT-4, Flan, SimCSE) find it challenging, with Spearman correlation scores of <50. We encourage the community to evaluate their models on C-STS to provide a more holistic view of semantic similarity and natural language understanding.
@inproceedings{deshpande2023csts, bibtex_show = {true}, title = {C-STS: Conditional Semantic Textual Similarity}, author = {Deshpande, Ameet and Jimenez, Carlos E and Chen, Howard and Murahari, Vishvak and Graf, Victoria and Rajpurohit, Tanmay and Kalyan, Ashwin and Chen, Danqi and Narasimhan, Karthik}, booktitle = {ArXiv}, html = {https://arxiv.org/abs/2305.15093}, year = {2023}, tag = {NLP} }
NLP
MUX-PLMs: Pre-training Language Models with Data Multiplexing

Vishvak Murahari, Ameet Deshpande, Carlos E Jimenez, Izhak Shafran, Mingqiu Wang, Yuan Cao, and Karthik Narasimhan

ArXiv, 2023

Abs Bib Paper

The widespread adoption of large language models such as ChatGPT and Bard has led to unprecedented demand for these technologies. The burgeoning cost of inference for ever-increasing model sizes coupled with hardware shortages has limited affordable access and poses a pressing need for efficiency approaches geared towards high throughput and performance. Multi-input multi-output (MIMO) algorithms such as data multiplexing, offer a promising solution with a many-fold increase in throughput by performing inference for multiple inputs at the cost of a single input. Yet these approaches are not currently performant enough to be deployed in modern systems. We change that by developing MUX-PLMs, a class of high throughput pre-trained language models (PLMs) trained with data multiplexing, that can be fine-tuned for any downstream task to yield high-throughput high-performance. Our novel multiplexing and demultiplexing modules proficiently entangle and disentangle inputs, and enable high-performance high throughput MUX-PLMs that are competitive with vanilla PLMs while achieving 2x/5x inference speedup with only a 1−4% drop on a broad suite of tasks.
@inproceedings{murahari2023mux, bibtex_show = {true}, title = {MUX-PLMs: Pre-training Language Models with Data Multiplexing}, author = {Murahari, Vishvak and Deshpande, Ameet and Jimenez, Carlos E and Shafran, Izhak and Wang, Mingqiu and Cao, Yuan and Narasimhan, Karthik}, booktitle = {ArXiv}, html = {https://arxiv.org/abs/2302.12441}, year = {2023}, tag = {NLP} }
NLP
Toxicity in chatgpt: Analyzing persona-assigned language models

Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, and Karthik Narasimhan

ArXiv, 2023

Abs Bib Paper

Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community, with adoption throughout many services like healthcare, therapy, education, and customer service. Since users include people with critical information needs like students or patients engaging with chatbots, the safety of these systems is of prime importance. Therefore, a clear understanding of the capabilities and limitations of LLMs is necessary. To this end, we systematically evaluate toxicity in over half a million generations of ChatGPT, a popular dialogue-based LLM. We find that setting the system parameter of ChatGPT by assigning it a persona, say that of the boxer Muhammad Ali, significantly increases the toxicity of generations. Depending on the persona assigned to ChatGPT, its toxicity can increase up to 6x, with outputs engaging in incorrect stereotypes, harmful dialogue, and hurtful opinions. This may be potentially defamatory to the persona and harmful to an unsuspecting user. Furthermore, we find concerning patterns where specific entities (e.g., certain races) are targeted more than others (3x more) irrespective of the assigned persona, that reflect inherent discriminatory biases in the model. We hope that our findings inspire the broader AI community to rethink the efficacy of current safety guardrails and develop better techniques that lead to robust, safe, and trustworthy AI systems.
@inproceedings{deshpande2023toxicity, bibtex_show = {true}, title = {Toxicity in chatgpt: Analyzing persona-assigned language models}, author = {Deshpande, Ameet and Murahari, Vishvak and Rajpurohit, Tanmay and Kalyan, Ashwin and Narasimhan, Karthik}, booktitle = {ArXiv}, html = {https://arxiv.org/abs/2304.05335}, year = {2023}, tag = {NLP} }
NLP
Referral Augmentation for Zero-Shot Information Retrieval

Michael Tang, Shunyu Yao, John Yang, and Karthik Narasimhan

ArXiv, 2023

Abs Bib Paper

We propose Referral-Augmented Retrieval (RAR), a simple technique that concatenates document indices with referrals, i.e. text from other documents that cite or link to the given document, to provide significant performance gains for zero-shot information retrieval. The key insight behind our method is that referrals provide a more complete, multi-view representation of a document, much like incoming page links in algorithms like PageRank provide a comprehensive idea of a webpage’s importance. RAR works with both sparse and dense retrievers, and outperforms generative text expansion techniques such as DocT5Query and Query2Doc a 37% and 21% absolute improvement on ACL paper retrieval Recall@10 – while also eliminating expensive model training and inference. We also analyze different methods for multi-referral aggregation and show that RAR enables up-to-date information retrieval without re-training.
@inproceedings{tang2023referral, bibtex_show = {true}, title = {Referral Augmentation for Zero-Shot Information Retrieval}, author = {Tang, Michael and Yao, Shunyu and Yang, John and Narasimhan, Karthik}, booktitle = {ArXiv}, html = {https://arxiv.org/abs/2305.15098}, year = {2023}, tag = {NLP} }
Anthropomorphization of AI: Opportunities and Risks

Ameet Deshpande, Tanmay Rajpurohit, Karthik Narasimhan, and Ashwin Kalyan

ArXiv, 2023

Abs Bib Paper

Anthropomorphization is the tendency to attribute human-like traits to non-human entities. It is prevalent in many social contexts – children anthropomorphize toys, adults do so with brands, and it is a literary device. It is also a versatile tool in science, with behavioral psychology and evolutionary biology meticulously documenting its consequences. With widespread adoption of AI systems, and the push from stakeholders to make it human-like through alignment techniques, human voice, and pictorial avatars, the tendency for users to anthropomorphize it increases significantly. We take a dyadic approach to understanding this phenomenon with large language models (LLMs) by studying (1) the objective legal implications, as analyzed through the lens of the recent blueprint of AI bill of rights and the (2) subtle psychological aspects customization and anthropomorphization. We find that anthropomorphized LLMs customized for different user bases violate multiple provisions in the legislative blueprint. In addition, we point out that anthropomorphization of LLMs affects the influence they can have on their users, thus having the potential to fundamentally change the nature of human-AI interaction, with potential for manipulation and negative influence. With LLMs being hyper-personalized for vulnerable groups like children and patients among others, our work is a timely and important contribution. We propose a conservative strategy for the cautious use of anthropomorphization to improve trustworthiness of AI systems.
@inproceedings{deshpande2023anthropomorphization, bibtex_show = {true}, title = {Anthropomorphization of AI: Opportunities and Risks}, author = {Deshpande, Ameet and Rajpurohit, Tanmay and Narasimhan, Karthik and Kalyan, Ashwin}, booktitle = {ArXiv}, html = {https://arxiv.org/abs/2305.14784}, year = {2023} }
NLP
PruMUX: Augmenting Data Multiplexing with Model Compression

Yushan Su, Vishvak Murahari, Karthik Narasimhan, and Kai Li

Findings of ACL, 2023

Abs Bib Paper

As language models increase in size by the day, methods for efficient inference are critical to leveraging their capabilities for various applications. Prior work has investigated techniques like model pruning, knowledge distillation, and data multiplexing to increase model throughput without sacrificing accuracy. In this paper, we combine two such methods – structured pruning and data multiplexing – to compound the speedup gains obtained by either method. Our approach, PruMUX, obtains up to 7.5-29.5X throughput improvement over BERT-base model with accuracy threshold from 80% to 74%. We further study various combinations of parameters (such as sparsity and multiplexing factor) in the two techniques to provide a comprehensive analysis of the tradeoff between accuracy and throughput in the resulting models. We then propose Auto-PruMUX, a meta-level model that can predict the high-performance parameters for pruning and multiplexing given a desired accuracy loss budget, providing a practical method to leverage the combination effectively.
@inproceedings{su2023prumux, bibtex_show = {true}, title = {PruMUX: Augmenting Data Multiplexing with Model Compression}, author = {Su, Yushan and Murahari, Vishvak and Narasimhan, Karthik and Li, Kai}, booktitle = {Findings of ACL}, html = {https://arxiv.org/abs/2305.14706}, year = {2023}, tag = {NLP} }

2022

NLP
ALIGN-MLM: Word Embedding Alignment is Crucial for Multilingual Pre-training

Henry Tang, Ameet Deshpande, and Karthik Narasimhan

ArXiv, 2022

Abs Bib Paper

Multilingual pre-trained models exhibit zero-shot cross-lingual transfer, where a model fine-tuned on a source language achieves surprisingly good performance on a target language. While studies have attempted to understand transfer, they focus only on MLM, and the large number of differences between natural languages makes it hard to disentangle the importance of different properties. In this work, we specifically highlight the importance of word embedding alignment by proposing a pre-training objective (ALIGN-MLM) whose auxiliary loss guides similar words in different languages to have similar word embeddings. ALIGN-MLM either outperforms or matches three widely adopted objectives (MLM, XLM, DICT-MLM) when we evaluate transfer between pairs of natural languages and their counterparts created by systematically modifying specific properties like the script. In particular, ALIGN-MLM outperforms XLM and MLM by 35 and 30 F1 points on POS-tagging for transfer between languages that differ both in their script and word order (left-to-right v.s. right-to-left). We also show a strong correlation between alignment and transfer for all objectives (e.g., rho=0.727 for XNLI), which together with ALIGN-MLM’s strong performance calls for explicitly aligning word embeddings for multilingual models.
@inproceedings{tang2022align, bibtex_show = {true}, title = {ALIGN-MLM: Word Embedding Alignment is Crucial for Multilingual Pre-training}, author = {Tang, Henry and Deshpande, Ameet and Narasimhan, Karthik}, booktitle = {ArXiv}, html = {https://arxiv.org/abs/2212.10466}, year = {2022}, tag = {NLP} }
NLP
SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers

Ameet Deshpande, Md Arafat Sultan, Anthony Ferritto, Ashwin Kalyan, Karthik Narasimhan, and Avirup Sil

ArXiv, 2022

Abs Bib Paper

Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this paradigm is infeasible for storage-constrained edge devices like mobile phones. In this paper, we propose SPARTAN, a parameter efficient (PE) and computationally fast architecture for edge devices that adds hierarchically organized sparse memory after each Transformer layer. SPARTAN freezes the PLM parameters and fine-tunes only its memory, thus significantly reducing storage costs by re-using the PLM backbone for different tasks. SPARTAN contains two levels of memory, with only a sparse subset of parents being chosen in the first level for each input, and children cells corresponding to those parents being used to compute an output representation. This sparsity combined with other architecture optimizations improves SPARTAN’s throughput by over 90% during inference on a Raspberry Pi 4 when compared to PE baselines (adapters) while also outperforming the latter by 0.1 points on the GLUE benchmark. Further, it can be trained 34% faster in a few-shot setting, while performing within 0.9 points of adapters. Qualitative analysis shows that different parent cells in SPARTAN specialize in different topics, thus dividing responsibility efficiently.
@inproceedings{deshpande2022spartan, bibtex_show = {true}, title = {SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers}, author = {Deshpande, Ameet and Sultan, Md Arafat and Ferritto, Anthony and Kalyan, Ashwin and Narasimhan, Karthik and Sil, Avirup}, booktitle = {ArXiv}, html = {https://arxiv.org/abs/2212.10466}, year = {2022}, tag = {NLP} }
NLP
Controllable Text Generation with Language Constraints

Howard Chen, Huihan Li, Danqi Chen, and Karthik Narasimhan

ArXiv, 2022

Abs Bib Paper

We consider the task of text generation in language models with constraints specified in natural language. To this end, we first create a challenging benchmark Cognac that provides as input to the model a topic with example text, along with a constraint on text to be avoided. Unlike prior work, our benchmark contains knowledge-intensive constraints sourced from databases like Wordnet and Wikidata, which allows for straightforward evaluation while striking a balance between broad attribute-level and narrow lexical-level controls. We find that even state-of-the-art language models like GPT-3 fail often on this task, and propose a solution to leverage a language model’s own internal knowledge to guide generation. Our method, called CognacGen, first queries the language model to generate guidance terms for a specified topic or constraint, and uses the guidance to modify the model’s token generation probabilities. We propose three forms of guidance (binary verifier, top-k tokens, textual example), and employ prefix-tuning approaches to distill the guidance to tackle diverse natural language constraints. Through extensive empirical evaluations, we demonstrate that CognacGen can successfully generalize to unseen instructions and outperform competitive baselines in generating constraint conforming text.
@inproceedings{chen2022cognac, bibtex_show = {true}, title = {Controllable Text Generation with Language Constraints}, author = {Chen, Howard and Li, Huihan and Chen, Danqi and Narasimhan, Karthik}, booktitle = {ArXiv}, year = {2022}, html = {https://arxiv.org/abs/2212.10466}, tag = {NLP} }
NLP RL
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan

Neural Information Processing Systems (NeurIPS), 2022

Abs Bib Paper Website

Existing benchmarks for grounding language in interactive environments either lack real-world linguistic elements, or prove difficult to scale up due to substantial human involvement in the collection of data or feedback signals. To bridge this gap, we develop WebShop – a simulated e-commerce website environment with 1.18 million real-world products and 12,087 crowd-sourced text instructions. Given a text instruction specifying a product requirement, an agent needs to navigate multiple types of webpages and issue diverse actions to find, customize, and purchase an item. WebShop provides several challenges for language grounding including understanding compositional instructions, query (re-)formulation, comprehending and acting on noisy text in webpages, and performing strategic exploration. We collect over 1,600 human demonstrations for the task, and train and evaluate a diverse range of agents using reinforcement learning, imitation learning, and pre-trained image and language models. Our best model achieves a task success rate of 29%, which outperforms rule-based heuristics (9.6%) but is far lower than human expert performance (59%). We also analyze agent and human trajectories and ablate various model components to provide insights for developing future agents with stronger language understanding and decision making abilities. Finally, we show that agents trained on WebShop exhibit non-trivial sim-to-real transfer when evaluated on this http URL, indicating the potential value of WebShop in developing practical web-based agents that can operate in the wild.
@inproceedings{yao2022webshop, bibtex_show = {true}, title = {WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents}, author = {Yao, Shunyu and Chen, Howard and Yang, John and Narasimhan, Karthik}, booktitle = {Neural Information Processing Systems (NeurIPS)}, year = {2022}, html = {https://arxiv.org/abs/2207.01206}, website = {https://webshop-pnlp.github.io/}, tag = {NLP}, tagg = {RL} }
NLP CV
Semantic Supervision: Enabling Generalization over Output Spaces

Austin W. Hanjie, Ameet Deshpande, and Karthik Narasimhan

ArXiv, 2022

Abs Bib Paper Website

In this paper, we propose Semantic Supervision (SemSup) - a unified paradigm for training classifiers that generalize over output spaces. In contrast to standard classification, which treats classes as discrete symbols, SemSup represents them as dense vector features obtained from descriptions of classes (e.g., "The cat is a small carnivorous mammal"). This allows the output space to be unbounded (in the space of descriptions) and enables models to generalize both over unseen inputs and unseen outputs (e.g. "The aardvark is a nocturnal burrowing mammal with long ears"). Specifically, SemSup enables four types of generalization, to – (1) unseen class descriptions, (2) unseen classes, (3) unseen super-classes, and (4) unseen tasks. Through experiments on four classification datasets across two variants (multi-class and multi-label), two input modalities (text and images), and two output description modalities (text and JSON), we show that our SemSup models significantly outperform standard supervised models and existing models that leverage word embeddings over class names. For instance, our model outperforms baselines by 40% and 20% precision points on unseen descriptions and classes, respectively, on a news categorization dataset (RCV1). SemSup can serve as a pathway for scaling neural models to large unbounded output spaces and enabling better generalization and model reuse for unseen tasks and domains.
@inproceedings{hanjie2022semsup, bibtex_show = {true}, title = {Semantic Supervision: Enabling Generalization over Output Spaces}, author = {Hanjie, Austin W. and Deshpande, Ameet and Narasimhan, Karthik}, booktitle = {ArXiv}, year = {2022}, html = {https://arxiv.org/abs/2202.13100}, website = {https://sites.google.com/princeton.edu/semantic-supervision/}, tag = {NLP}, tagg = {CV} }
DataMUX: Data Multiplexing for Neural Networks

Vishvak Murahari, Carlos E. Jimenez, Runzhe Yang, and Karthik Narasimhan

Neural Information Processing Systems (NeurIPS), 2022
Bell Labs 2nd prize, Qualcomm Innovation finalist

Abs Bib Paper Website

In this paper, we introduce data multiplexing (DataMUX), a technique that enables deep neural networks to process multiple inputs simultaneously using a single compact representation. DataMUX demonstrates that neural networks are capable of generating accurate predictions over mixtures of inputs, resulting in increased throughput with minimal extra memory requirements. Our approach uses two key components – 1) a multiplexing layer that performs a fixed linear transformation to each input before combining them to create a mixed representation of the same size as a single input, which is then processed by the base network, and 2) a demultiplexing layer that converts the base network’s output back into independent representations before producing predictions for each input. We show the viability of DataMUX for different architectures (Transformers, and to a lesser extent MLPs and CNNs) across six different tasks spanning sentence classification, named entity recognition and image classification. For instance, DataMUX for Transformers can multiplex up to 20x/40x inputs, achieving 11x/18x increase in throughput with minimal absolute performance drops of <2% and <4% respectively on MNLI, a natural language inference task. We also provide a theoretical construction for multiplexing in self-attention networks and analyze the effect of various design elements in DataMUX.
@inproceedings{murahari2022datamux, bibtex_show = {true}, title = {DataMUX: Data Multiplexing for Neural Networks}, author = {Murahari, Vishvak and Jimenez, Carlos E. and Yang, Runzhe and Narasimhan, Karthik}, booktitle = {Neural Information Processing Systems (NeurIPS)}, year = {2022}, html = {https://arxiv.org/abs/2202.09318}, misc = {Bell Labs 2nd prize, Qualcomm Innovation finalist}, website = {https://princeton-nlp.github.io/DataMUX/}, tag = {ML} }
Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines

Sreejan Kumar, Carlos G. Correa, Ishita Dasgupta, Raja Marjieh, Michael Y. Hu, Robert D. Hawkins, Nathaniel D. Daw, Jonathan D. Cohen, Karthik Narasimhan, and Thomas L. Griffiths

Neural Information Processing Systems (NeurIPS), 2022
Outstanding paper award

Abs Bib Paper

Strong inductive biases are a key component of human intelligence, allowing people to quickly learn a variety of tasks. Although meta-learning has emerged as an approach for endowing neural networks with useful inductive biases, agents trained by meta-learning may acquire very different strategies from humans. We show that co-training these agents on predicting representations from natural language task descriptions and from programs induced to generate such tasks guides them toward human-like inductive biases. Human-generated language descriptions and program induction with library learning both result in more human-like behavior in downstream meta-reinforcement learning agents than less abstract controls (synthetic language descriptions, program induction without library learning), suggesting that the abstraction supported by these representations is key.
@inproceedings{kumar2022using, bibtex_show = {true}, title = {Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines}, author = {Kumar, Sreejan and Correa, Carlos G. and Dasgupta, Ishita and Marjieh, Raja and Hu, Michael Y. and Hawkins, Robert D. and Daw, Nathaniel D. and Cohen, Jonathan D. and Narasimhan, Karthik and Griffiths, Thomas L.}, booktitle = {Neural Information Processing Systems (NeurIPS)}, year = {2022}, html = {https://arxiv.org/abs/2205.11558}, misc = {Outstanding paper award} }
Learning Physics Constrained Dynamics Using Autoencoders

Tsung-Yen Yang, Justinian P. Rosca, Karthik Narasimhan, and Peter Ramadge

Neural Information Processing Systems (NeurIPS), 2022

Abs Bib Paper

We consider the problem of estimating states (e.g., position and velocity) and physical parameters (e.g., friction, elasticity) from a sequence of observations when provided a dynamic equation that describes the behavior of the system. The dynamic equation can arise from first principles (e.g., Newton’s laws) and provide useful cues for learning, but its physical parameters are unknown. To address this problem, we propose a model that estimates states and physical parameters of the system using two main components. First, an autoencoder compresses a sequence of observations (e.g., sensor measurements, pixel images) into a sequence for the state representation that is consistent with physics by including a simulation of the dynamic equation. Second, an estimator is coupled with the autoencoder to predict the values of the physical parameters. We also theoretically and empirically show that using Fourier feature mappings improves generalization of the estimator in predicting physical parameters compared to raw state sequences. In our experiments on three visual and one sensor measurement tasks, our model imposes interpretability on latent states and achieves improved generalization performance for long-term prediction of system dynamics over state-of-the-art baselines.
@inproceedings{yang2022learning, bibtex_show = {true}, title = {Learning Physics Constrained Dynamics Using Autoencoders}, author = {Yang, Tsung-Yen and Rosca, Justinian P. and Narasimhan, Karthik and Ramadge, Peter}, booktitle = {Neural Information Processing Systems (NeurIPS)}, year = {2022}, html = {}, tag = {ML} }
RL NLP
Leveraging Language for Accelerated Learning of Tool Manipulation

Allen Z. Ren, Bharat Govil, Tsung-Yen Yang, Karthik Narasimhan, and Anirudha Majumdar

Conference on Robot Learning (CoRL), 2022

Abs Bib Paper

Robust and generalized tool manipulation requires an understanding of the properties and affordances of different tools. We investigate whether linguistic information about a tool (e.g., its geometry, common uses) can help control policies adapt faster to new tools for a given task. We obtain diverse descriptions of various tools in natural language and use pre-trained language models to generate their feature representations. We then perform language-conditioned meta-learning to learn policies that can efficiently adapt to new tools given their corresponding text descriptions. Our results demonstrate that combining linguistic information and meta-learning significantly accelerates tool learning in several manipulation tasks including pushing, lifting, sweeping, and hammering.
@inproceedings{ren2022leveraging, bibtex_show = {true}, title = {Leveraging Language for Accelerated Learning of Tool Manipulation}, author = {Ren, Allen Z. and Govil, Bharat and Yang, Tsung-Yen and Narasimhan, Karthik and Majumdar, Anirudha}, booktitle = {Conference on Robot Learning (CoRL)}, year = {2022}, html = {https://arxiv.org/abs/2206.13074}, tag = {RL}, tagg = {NLP} }
CV
Multi-query Video Retrieval

Zeyu Wang, Yu Wu, Karthik Narasimhan, and Olga Russakovsky

ECCV, 2022

Abs Bib Paper

Retrieving target videos based on text descriptions is a task of great practical value and has received increasing attention over the past few years. In this paper, we focus on the less-studied setting of multi-query video retrieval, where multiple queries are provided to the model for searching over the video archive. We first show that the multi-query retrieval task is more pragmatic and representative of real-world use cases and better evaluates retrieval capabilities of current models, thereby deserving of further investigation alongside the more prevalent single-query retrieval setup. We then propose several new methods for leveraging multiple queries at training time to improve over simply combining similarity outputs of multiple queries from regular single-query trained models. Our models consistently outperform several competitive baselines over three different datasets. For instance, Recall@1 can be improved by 4.7 points on MSR-VTT, 4.1 points on MSVD and 11.7 points on VATEX over a strong baseline built on the state-of-the-art CLIP4Clip model. We believe further modeling efforts will bring new insights to this direction and spark new systems that perform better in real-world video retrieval applications.
@inproceedings{wang2022multi, bibtex_show = {true}, title = {Multi-query Video Retrieval}, author = {Wang, Zeyu and Wu, Yu and Narasimhan, Karthik and Russakovsky, Olga}, booktitle = {ECCV}, year = {2022}, html = {https://arxiv.org/abs/2201.03639}, tag = {CV} }
NLP
Can Rationalization Improve Robustness?

Howard Chen, Jacqueline He, Karthik Narasimhan, and Danqi Chen

Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022

Abs Bib Paper

A growing line of work has investigated the development of neural NLP models that can produce rationales–subsets of input that can explain their model predictions. In this paper, we ask whether such rationale models can also provide robustness to adversarial attacks in addition to their interpretable nature. Since these models need to first generate rationales ("rationalizer") before making predictions ("predictor"), they have the potential to ignore noise or adversarially added text by simply masking it out of the generated rationale. To this end, we systematically generate various types of ’AddText’ attacks for both token and sentence-level rationalization tasks, and perform an extensive empirical evaluation of state-of-the-art rationale models across five different tasks. Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios–when the rationalizer is sensitive to positional bias or lexical choices of attack text. Further, leveraging human rationale as supervision does not always translate to better performance. Our study is a first step towards exploring the interplay between interpretability and robustness in the rationalize-then-predict framework.
@inproceedings{chen2022rationalization, bibtex_show = {true}, title = {Can Rationalization Improve Robustness?}, author = {Chen, Howard and He, Jacqueline and Narasimhan, Karthik and Chen, Danqi}, booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2022}, html = {https://arxiv.org/abs/2204.11790}, tag = {NLP} }
NLP
When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer

Ameet Deshpande, Partha Talukdar, and Karthik Narasimhan

Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022

Abs Bib Paper

While recent work on multilingual language models has demonstrated their capacity for cross-lingual zero-shot transfer on downstream tasks, there is a lack of consensus in the community as to what shared properties between languages enable such transfer. Analyses involving pairs of natural languages are often inconclusive and contradictory since languages simultaneously differ in many linguistic aspects. In this paper, we perform a large-scale empirical study to isolate the effects of various linguistic properties by measuring zero-shot transfer between four diverse natural languages and their counterparts constructed by modifying aspects such as the script, word order, and syntax. Among other things, our experiments show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order, and there is a strong correlation between transfer performance and word embedding alignment between languages (e.g., R=0.94 on the task of NLI). Our results call for focus in multilingual models on explicitly improving word embedding alignment between languages rather than relying on its implicit emergence.
@inproceedings{deshpande2022when, bibtex_show = {true}, title = {When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer}, author = {Deshpande, Ameet and Talukdar, Partha and Narasimhan, Karthik}, booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2022}, html = {https://arxiv.org/abs/2110.14782}, tag = {NLP} }
NLP CV
CARETS: A Consistency And Robustness Evaluative Test Suite for VQA

Carlos Jimenez, Olga Russakovsky, and Karthik Narasimhan

Association for Computational Linguistics (ACL), 2022

Abs Bib Paper

We introduce CARETS, a systematic test suite to measure consistency and robustness of modern VQA models through a series of six fine-grained capability tests. In contrast to existing VQA test sets, CARETS features balanced question generation to create pairs of instances to test models, with each pair focusing on a specific capability such as rephrasing, logical symmetry or image obfuscation. We evaluate six modern VQA systems on CARETS and identify several actionable weaknesses in model comprehension, especially with concepts such as negation, disjunction, or hypernym invariance. Interestingly, even the most sophisticated models are sensitive to aspects such as swapping the order of terms in a conjunction or changing the number of answer choices mentioned in the question. We release CARETS to be used as an extensible tool for evaluating multi-modal model robustness.
@inproceedings{jimenez2022carets, bibtex_show = {true}, title = {CARETS: A Consistency And Robustness Evaluative Test Suite for VQA}, author = {Jimenez, Carlos and Russakovsky, Olga and Narasimhan, Karthik}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2022}, html = {https://arxiv.org/abs/2203.07613}, tag = {NLP}, tagg = {CV} }
RL
Multi-Stage Episodic Control for Strategic Exploration in Text Games

Jens Tuyls, Shunyu Yao, Sham Kakade, and Karthik Narasimhan

International Conference on Learning Representations (ICLR) , 2022
Spotlight

Abs Bib Paper

Text adventure games present unique challenges to reinforcement learning methods due to their combinatorially large action spaces and sparse rewards. The interplay of these two factors is particularly demanding because large action spaces require extensive exploration, while sparse rewards provide limited feedback. This work proposes to tackle the explore-vs-exploit dilemma using a multi-stage approach that explicitly disentangles these two strategies within each episode. Our algorithm, called eXploit-Then-eXplore (XTX), begins each episode using an exploitation policy that imitates a set of promising trajectories from the past, and then switches over to an exploration policy aimed at discovering novel actions that lead to unseen state spaces. This policy decomposition allows us to combine global decisions about which parts of the game space to return to with curiosity-based local exploration in that space, motivated by how a human may approach these games. Our method significantly outperforms prior approaches by 27% and 11% average normalized score over 12 games from the Jericho benchmark (Hausknecht et al., 2020) in both deterministic and stochastic settings, respectively. On the game of Zork1, in particular, XTX obtains a score of 103, more than a 2x improvement over prior methods, and pushes past several known bottlenecks in the game that have plagued previous state-of-the-art methods.
@inproceedings{tuyls2022multi, bibtex_show = {true}, title = {Multi-Stage Episodic Control for Strategic Exploration in Text Games}, author = {Tuyls, Jens and Yao, Shunyu and Kakade, Sham and Narasimhan, Karthik}, booktitle = {International Conference on Learning Representations (ICLR) }, year = {2022}, html = {https://arxiv.org/abs/2201.01251}, tag = {RL}, misc = {Spotlight} }
NLP
Linking Emergent and Natural Languages via Corpus Transfer

Shunyu Yao, Mo Yu, Yang Zhang, Karthik Narasimhan, Joshua Tenenbaum, and Chuang Gan

International Conference on Learning Representations (ICLR) , 2022
Spotlight

Abs Bib Paper

The study of language emergence aims to understand how human languages are shaped by perceptual grounding and communicative intent. Computational approaches to emergent communication (EC) predominantly consider referential games in limited domains and analyze the learned protocol within the game framework. As a result, it remains unclear how the emergent languages from these settings connect to natural languages or provide benefits in real-world language processing tasks, where statistical models trained on large text corpora dominate. In this work, we propose a novel way to establish such a link by corpus transfer, i.e. pretraining on a corpus of emergent language for downstream natural language tasks, which is in contrast to prior work that directly transfers speaker and listener parameters. Our approach showcases non-trivial transfer benefits for two different tasks – language modeling and image captioning. For example, in a low-resource setup (modeling 2 million natural language tokens), pre-training on an emergent language corpus with just 2 million tokens reduces model perplexity by 24.6% on average across ten natural languages. We also introduce a novel metric to predict the transferability of an emergent language by translating emergent messages to natural language captions grounded on the same images. We find that our translation-based metric highly correlates with the downstream performance on modeling natural languages (for instance ρ= 0.83 on Hebrew), while topographic similarity, a popular metric in previous works, shows surprisingly low correlation (0.003), hinting that simple properties like attribute disentanglement from synthetic domains might not capture the full complexities of natural language. Our findings also indicate potential benefits of moving language emergence forward with natural language resources and models.
@inproceedings{yao2022linking, bibtex_show = {true}, title = {Linking Emergent and Natural Languages via Corpus Transfer}, author = {Yao, Shunyu and Yu, Mo and Zhang, Yang and Narasimhan, Karthik and Tenenbaum, Joshua and Gan, Chuang}, booktitle = {International Conference on Learning Representations (ICLR) }, year = {2022}, html = {https://openreview.net/pdf?id=49A1Y6tRhaq}, tag = {NLP}, misc = {Spotlight} }
Revelio: ML-Generated Debugging Queries for Finding Root Causes in Distributed Systems

Pradeep Dogga, Karthik Narasimhan, Anirudh Sivaraman, Shiv Saini, George Varghese, and Ravi Netravali

Proceedings of Machine Learning and Systems (MLSys), 2022

Abs Bib Paper

A major difficulty in debugging distributed systems lies in manually determining which of the many available debugging tools to use and how to query that tool’s logs. Our own study of a production debugging workflow confirms the magnitude of this burden. This paper explores whether a deep neural network trained on past bug reports and debugging logs can assist developers in distributed systems debugging. We present Revelio, a debugging assistant which takes user reports and system logs as input, and outputs debugging queries that developers can use to find a bug’s root cause. The key challenges lie in (1) combining inputs of different types (e.g., natural language reports and quantitative logs) and (2) generalizing to unseen faults. Revelio addresses these by employ-ing deep neural networks to uniformly embed diverse input sources and potential queries into a high-dimensional vector space. In addition, it exploits observations from production systems to factorize query generation into two computationally and statistically simpler learning tasks. To evaluate Revelio, we built a testbed with multiple distributed applications and debugging tools. By injecting faults and training on logs and reports from 800 Mechanical Turkers, we show that Revelio includes the most helpful query in its predicted list of top-3 relevant queries 96% of the time. Our developer study confirms the utility of Revelio.
@inproceedings{dogga2022system, bibtex_show = {true}, title = {Revelio: ML-Generated Debugging Queries for Finding Root Causes in Distributed Systems}, author = {Dogga, Pradeep and Narasimhan, Karthik and Sivaraman, Anirudh and Saini, Shiv and Varghese, George and Netravali, Ravi}, booktitle = {Proceedings of Machine Learning and Systems (MLSys)}, year = {2022}, html = {https://proceedings.mlsys.org/paper_files/paper/2022/hash/650e2245aa3513ed517f4cf1b3d58e06-Abstract.html} }

2021

NLP RL
SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark

Victor Zhong, Austin W. Hanjie, Sida I. Wang, Karthik Narasimhan, and Luke Zettlemoyer

Neural Information Processing Systems (NeurIPS), 2021

Abs Bib Paper

Existing work in language grounding typically study single environments. How do we build unified models that apply across multiple environments? We propose the multi-environment Symbolic Interactive Language Grounding benchmark (SILG), which unifies a collection of diverse grounded language learning environments under a common interface. SILG consists of grid-world environments that require generalization to new dynamics, entities, and partially observed worlds (RTFM, Messenger, NetHack), as well as symbolic counterparts of visual worlds that require interpreting rich natural language with respect to complex scenes (ALFWorld, Touchdown). Together, these environments provide diverse grounding challenges in richness of observation space, action space, language specification, and plan complexity. In addition, we propose the first shared model architecture for RL on these environments, and evaluate recent advances such as egocentric local convolution, recurrent state-tracking, entity-centric attention, and pretrained LM using SILG. Our shared architecture achieves comparable performance to environment-specific architectures. Moreover, we find that many recent modelling advances do not result in significant gains on environments other than the one they were designed for. This highlights the need for a multi-environment benchmark. Finally, the best models significantly underperform humans on SILG, which suggests ample room for future work. We hope SILG enables the community to quickly identify new methodologies for language grounding that generalize to a diverse set of environments and their associated challenges.
@inproceedings{zhong2021silg, bibtex_show = {true}, title = {SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark}, author = {Zhong, Victor and Hanjie, Austin W. and Wang, Sida I. and Narasimhan, Karthik and Zettlemoyer, Luke}, booktitle = {Neural Information Processing Systems (NeurIPS)}, year = {2021}, html = {https://arxiv.org/abs/2110.10661}, tag = {NLP}, tagg = {RL} }
NLP RL
Safe Reinforcement Learning with Natural Language Constraints

Tsung-Yen Yang, Michael Hu, Yinlam Chow, Peter J. Ramadge, and Karthik Narasimhan

Neural Information Processing Systems (NeurIPS), 2021
Spotlight

Abs Bib Paper

In this paper, we tackle the problem of learning control policies for tasks when provided with constraints in natural language. In contrast to instruction following, language here is used not to specify goals, but rather to describe situations that an agent must avoid during its exploration of the environment. Specifying constraints in natural language also differs from the predominant paradigm in safe reinforcement learning, where safety criteria are enforced by hand-defined cost functions. While natural language allows for easy and flexible specification of safety constraints and budget limitations, its ambiguous nature presents a challenge when mapping these specifications into representations that can be used by techniques for safe reinforcement learning. To address this, we develop a model that contains two components: (1) a constraint interpreter to encode natural language constraints into vector representations capturing spatial and temporal information on forbidden states, and (2) a policy network that uses these representations to output a policy with minimal constraint violations. Our model is end-to-end differentiable and we train it using a recently proposed algorithm for constrained policy optimization. To empirically demonstrate the effectiveness of our approach, we create a new benchmark task for autonomous navigation with crowd-sourced free-form text specifying three different types of constraints. Our method outperforms several baselines by achieving 6-7 times higher returns and 76% fewer constraint violations on average.
@inproceedings{yang2020safe, bibtex_show = {true}, title = {Safe Reinforcement Learning with Natural Language Constraints}, author = {Yang, Tsung-Yen and Hu, Michael and Chow, Yinlam and Ramadge, Peter J. and Narasimhan, Karthik}, booktitle = {Neural Information Processing Systems (NeurIPS)}, year = {2021}, html = {https://arxiv.org/abs/2010.05150}, tag = {NLP}, tagg = {RL}, misc = {Spotlight} }
NLP RL
Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning

Austin W. Hanjie, Victor Zhong, and Karthik Narasimhan

International Conference on Machine Learning (ICML), 2021

Abs Bib Paper

We consider the problem of leveraging textual descriptions to improve generalization of control policies. We introduce a new multi-task environment Messenger with free-form natural language manuals describing the environment dynamics. In contrast to previous work, Messenger does not assume prior knowledge connecting text and state observations – the control policy must simultaneously learn to ground a natural language manual to entity symbols and dynamics in the environment. In order to learn this challenging grounding, we develop a new model, EMMA (Entity Mapper with Multi-modal Attention) which uses a multi-modal entity-conditioned attention module that allows for selective focus over relevant sentences in the manual for each entity in the environment. EMMA is end-to-end differentiable and can learn a latent grounding of entities and dynamics from text to observations using environment rewards as the only source of supervision. We demonstrate that EMMA achieves successful zero-shot generalization to unseen games with new dynamics, obtaining significantly higher rewards compared to multiple baselines. However, performance on the hardest stage of Messenger remains low, demonstrating the significant challenge in accurately grounding dynamics and the need for additional work in this direction.
@inproceedings{wang2021grounding, bibtex_show = {true}, title = {Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning}, author = {Hanjie, Austin W. and Zhong, Victor and Narasimhan, Karthik}, booktitle = {International Conference on Machine Learning (ICML)}, year = {2021}, html = {https://arxiv.org/abs/2101.07393}, tag = {NLP}, tagg = {RL} }
RL
Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies

Tsung-Yen Yang, Justinian Rosca, Karthik Narasimhan, and Peter J. Ramadge

International Conference on Machine Learning (ICML), 2021

Abs Bib Paper

We consider the problem of reinforcement learning when provided with (1) a baseline control policy and (2) a set of constraints that the controlled system must satisfy. The baseline policy can arise from a teacher agent, demonstration data or even a heuristic while the constraints might encode safety, fairness or other application-specific requirements. Importantly, the baseline policy may be sub-optimal for the task at hand, and is not guaranteed to satisfy the specified constraints. The key challenge therefore lies in effectively leveraging the baseline policy for faster learning, while still ensuring that the constraints are minimally violated. To reconcile these potentially competing aspects, we propose an iterative policy optimization algorithm that alternates between maximizing expected return on the task, minimizing distance to the baseline policy, and projecting the policy onto the constraint-satisfying set. We analyze the convergence of our algorithm theoretically and provide a finite-time guarantee. In our empirical experiments on five different control tasks, our algorithm consistently outperforms several state-of-the-art methods, achieving 10 times fewer constraint violations and 40% higher reward on average.
@inproceedings{yang2020accelerating, bibtex_show = {true}, title = {Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies}, author = {Yang, Tsung-Yen and Rosca, Justinian and Narasimhan, Karthik and Ramadge, Peter J.}, booktitle = {International Conference on Machine Learning (ICML)}, year = {2021}, html = {https://arxiv.org/abs/2006.11645}, tag = {RL} }
NLP RL
Improving Dialog Systems for Negotiation with Personality Modeling

Runzhe Yang*, Jingxiao Chen*, and Karthik Narasimhan

Association for Computational Linguistics (ACL), 2021

Abs Bib Paper

In this paper, we explore the ability to model and infer personality types of opponents, predict their responses, and use this information to adapt a dialog agent’s high-level strategy in negotiation tasks. Inspired by the idea of incorporating a theory of mind (ToM) into machines, we introduce a probabilistic formulation to encapsulate the opponent’s personality type during both learning and inference. We test our approach on the CraigslistBargain dataset and show that our method using ToM inference achieves a 20% higher dialog agreement rate compared to baselines on a mixed population of opponents. We also demonstrate that our model displays diverse negotiation behavior with different types of opponents.
@inproceedings{yang2021improving, bibtex_show = {true}, title = {Improving Dialog Systems for Negotiation with Personality Modeling}, author = {Yang*, Runzhe and Chen*, Jingxiao and Narasimhan, Karthik}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2021}, html = {https://arxiv.org/abs/2010.09954}, tag = {NLP}, tagg = {RL} }
NLP
Self-Attention Networks Can Process Bounded Hierarchical Languages

Shunyu Yao, Binghui Peng, Christos Papadimitriou, and Karthik Narasimhan

Association for Computational Linguistics (ACL), 2021

Abs Bib Paper

Despite their impressive performance in NLP, self-attention networks were recently proved to be limited for processing formal languages with hierarchical structure, such as Dyck_k, the language consisting of well-nested parentheses of k types. This suggested that natural language can be approximated well with models that are too weak for formal languages, or that the role of hierarchy and recursion in natural language might be limited. We qualify this implication by proving that self-attention networks can process Dyck_k, D, the subset of Dyck_k with depth bounded by D, which arguably better captures the bounded hierarchical structure of natural language. Specifically, we construct a hard-attention network with D+1 layers and O(log k) memory size (per token per layer) that recognizes Dyck_k, D, and a soft-attention network with two layers and O(log k) memory size that generates Dyck_k, D. Experiments show that self-attention networks trained on Dyck_k, D generalize to longer inputs with near-perfect accuracy, and also verify the theoretical memory advantage of self-attention networks over recurrent networks.
@inproceedings{yao2021self, bibtex_show = {true}, title = {Self-Attention Networks Can Process Bounded Hierarchical Languages}, author = {Yao, Shunyu and Peng, Binghui and Papadimitriou, Christos and Narasimhan, Karthik}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2021}, html = {https://arxiv.org/abs/2105.11115}, tag = {NLP} }
Connecting Context-specific Adaptation in Humans to Meta-learning

Rachit Dubey, Erin Grant, Michael Luo, Karthik Narasimhan, and Thomas Griffiths

ArXiv, 2021

Abs Bib Paper

Cognitive control, the ability of a system to adapt to the demands of a task, is an integral part of cognition. A widely accepted fact about cognitive control is that it is context-sensitive: Adults and children alike infer information about a task’s demands from contextual cues and use these inferences to learn from ambiguous cues. However, the precise way in which people use contextual cues to guide adaptation to a new task remains poorly understood. This work connects the context-sensitive nature of cognitive control to a method for meta-learning with context-conditioned adaptation. We begin by identifying an essential difference between human learning and current approaches to meta-learning: In contrast to humans, existing meta-learning algorithms do not make use of task-specific contextual cues but instead rely exclusively on online feedback in the form of task-specific labels or rewards. To remedy this, we introduce a framework for using contextual information about a task to guide the initialization of task-specific models before adaptation to online feedback. We show how context-conditioned meta-learning can capture human behavior in a cognitive task and how it can be scaled to improve the speed of learning in various settings, including few-shot classification and low-sample reinforcement learning. Our work demonstrates that guiding meta-learning with task information can capture complex, human-like behavior, thereby deepening our understanding of cognitive control.
@inproceedings{dubey2020connecting, bibtex_show = {true}, title = {Connecting Context-specific Adaptation in Humans to Meta-learning}, author = {Dubey, Rachit and Grant, Erin and Luo, Michael and Narasimhan, Karthik and Griffiths, Thomas}, booktitle = {ArXiv}, year = {2021}, html = {https://arxiv.org/abs/2011.13782} }
NLP RL
Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents

Shunyu Yao, Karthik Narasimhan, and Matthew Hausknecht

Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021

Abs Bib Paper

Text-based games simulate worlds and interact with players using natural language. Recent work has used them as a testbed for autonomous language-understanding agents, with the motivation being that understanding the meanings of words or semantics is a key component of how humans understand, reason, and act in these worlds. However, it remains unclear to what extent artificial agents utilize semantic understanding of the text. To this end, we perform experiments to systematically reduce the amount of semantic information available to a learning agent. Surprisingly, we find that an agent is capable of achieving high scores even in the complete absence of language semantics, indicating that the currently popular experimental setup and models may be poorly designed to understand and leverage game texts. To remedy this deficiency, we propose an inverse dynamics decoder to regularize the representation space and encourage exploration, which shows improved performance on several games including Zork I. We discuss the implications of our findings for designing future agents with stronger semantic understanding.
@inproceedings{yao2021reading, bibtex_show = {true}, title = {Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents}, author = {Yao, Shunyu and Narasimhan, Karthik and Hausknecht, Matthew}, booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2021}, html = {https://arxiv.org/abs/2103.13552}, tag = {NLP}, tagg = {RL} }
NLP
Universal Adversarial Attacks with Natural Triggers for Text Classification

Liwei Song*, Xinwei Yu*, Hsuan-Tung Peng*, and Karthik Narasimhan

Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021

Abs Bib Paper

Recent work has demonstrated the vulnerability of modern text classifiers to universal adversarial attacks, which are input-agnostic sequence of words added to any input instance. Despite being highly successful, the word sequences produced in these attacks are often unnatural, do not carry much semantic meaning, and can be easily distinguished from natural text. In this paper, we develop adversarial attacks that appear closer to natural English phrases and yet confuse classification systems when added to benign inputs. To achieve this, we leverage an adversarially regularized autoencoder (ARAE) to generate triggers and propose a gradient-based search method to output natural text that fools a target classifier. Experiments on two different classification tasks demonstrate the effectiveness of our attacks while also being less identifiable than previous approaches on three simple detection metrics.
@inproceedings{song2021universal, bibtex_show = {true}, title = {Universal Adversarial Attacks with Natural Triggers for Text Classification}, author = {Song*, Liwei and Yu*, Xinwei and Peng*, Hsuan-Tung and Narasimhan, Karthik}, booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2021}, html = {https://arxiv.org/abs/2005.00174}, tag = {NLP} }
NLP RL
Learning Rewards from Linguistic Feedback

Theodore R. Sumers, Mark K. Ho, Robert D. Hawkins, Karthik Narasimhan, and Thomas L. Griffiths

Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Abs Bib Paper

We explore unconstrained natural language feedback as a learning signal for artificial agents. Humans use rich and varied language to teach, yet most prior work on interactive learning from language assumes a particular form of input (e.g. commands). We propose a general framework which does not make this assumption. We decompose linguistic feedback into two components: a grounding to features of a Markov decision process and sentiment about those features. We then perform an analogue of inverse reinforcement learning, regressing the teacher’s sentiment on the features to infer their latent reward function. To evaluate our approach, we first collect a corpus of teaching behavior in a cooperative task where both teacher and learner are human. We use our framework to implement two artificial learners: a simple "literal" model and a "pragmatic" model with additional inductive biases. We baseline these with a neural network trained end-to-end to predict latent rewards. We then repeat our initial experiment pairing human teachers with our models. We find our "literal" and "pragmatic" models successfully learn from live human feedback and offer statistically-significant performance gains over the end-to-end baseline, with the "pragmatic" model approaching human performance on the task. Inspection reveals the end-to-end network learns representations similar to our models, suggesting they reflect emergent properties of the data. Our work thus provides insight into the information structure of naturalistic linguistic feedback as well as methods to leverage it for reinforcement learning.
@inproceedings{sumers2020learning, bibtex_show = {true}, title = {Learning Rewards from Linguistic Feedback}, author = {Sumers, Theodore R. and Ho, Mark K. and Hawkins, Robert D. and Narasimhan, Karthik and Griffiths, Thomas L.}, booktitle = {Thirty-Fifth AAAI Conference on Artificial Intelligence}, year = {2021}, html = {https://arxiv.org/abs/2009.14715}, tag = {NLP}, tagg = {RL} }
RL
m-Stage Epsilon-Greedy Exploration for Reinforcement Learning

Rohan Rao, and Karthik Narasimhan

AAAI-21 Workshop on Reinforcement Learning in Games, 2021

Abs Bib Paper

Efficient exploration of the environment is a major challenge for reinforcement learning agents, especially in sparse reward settings. This is evident from the fact that simple schemes such as eps-greedy remain competitive with more complicated algorithms for exploration. In this paper, we propose a generalization of eps-greedy, called m-stage eps-greedy in which eps increases within each episode but decreases between episodes. This ensures that by the time an agent gets to explore the later states within an episode, eps has not decayed too much to do any meaningful exploration. We provide theoretical results motivating the use of our algorithm in task based environments, and provide experimental evidence in two types of environments demonstrating the effectiveness of our method.
@inproceedings{rao2021mstage, bibtex_show = {true}, title = {m-Stage Epsilon-Greedy Exploration for Reinforcement Learning}, author = {Rao, Rohan and Narasimhan, Karthik}, booktitle = {AAAI-21 Workshop on Reinforcement Learning in Games}, year = {2021}, html = {http://aaai-rlg.mlanctot.info/papers/AAAI21-RLG_paper_48.pdf}, tag = {RL} }

2020

NLP RL
Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Shunyu Yao, Rohan Rao, Matthew Hausknecht, and Karthik Narasimhan

Empirical Methods in Natural Language Processing (EMNLP), 2020

Abs Bib Paper

Text-based games present a unique challenge for autonomous agents to operate in natural language and handle enormous action spaces. In this paper, we propose the Contextual Action Language Model (CALM) to generate a compact set of action candidates at each game state. Our key insight is to train language models on human gameplay, where people demonstrate linguistic priors and a general game sense for promising actions conditioned on game history. We combine CALM with a reinforcement learning agent which re-ranks the generated action candidates to maximize ingame rewards. We evaluate our approach using the Jericho benchmark (Hausknecht et al., 2019a), on games unseen by CALM during training. Our method obtains a 69% relative improvement in average game score over the previous state-of-the-art model. Surprisingly, on half of these games, CALM is competitive with or better than other models that have access to ground truth admissible actions.
@inproceedings{yao2020calm, bibtex_show = {true}, title = {Keep CALM and Explore: Language Models for Action Generation in Text-based Games}, author = {Yao, Shunyu and Rao, Rohan and Hausknecht, Matthew and Narasimhan, Karthik}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, year = {2020}, html = {https://arxiv.org/abs/2010.02903}, tag = {NLP}, tagg = {RL} }
NLP
Guiding Attention for Self-Supervised Learning with Transformers

Ameet Deshpande, and Karthik Narasimhan

Findings of Empirical Methods in Natural Language Processing (EMNLP), 2020

Abs Bib Paper

In this paper, we propose a simple and effective technique to allow for efficient selfsupervised learning with bi-directional Transformers. Our approach is motivated by recent studies demonstrating that self-attention patterns in trained models contain a majority of non-linguistic regularities. We propose a computationally efficient auxiliary loss function to guide attention heads to conform to such patterns. Our method is agnostic to the actual pretraining objective and results in faster convergence of models as well as better performance on downstream tasks compared to the baselines, even achieving state of the art results on a low-resource language. We conclude with a surprising finding that linguistic properties of attention heads are not necessarily correlated with language modeling performance.
@inproceedings{deshpande2020guiding, bibtex_show = {true}, title = {Guiding Attention for Self-Supervised Learning with Transformers}, author = {Deshpande, Ameet and Narasimhan, Karthik}, booktitle = {Findings of Empirical Methods in Natural Language Processing (EMNLP)}, year = {2020}, html = {https://arxiv.org/abs/2010.02399}, tag = {NLP} }
NLP
Robust and Interpretable Grounding of Spatial References with Relation Networks

Tsung-Yen Yang, Andrew S. Lan, and Karthik Narasimhan

Findings of Empirical Methods in Natural Language Processing (EMNLP), 2020

Abs Bib Paper

Handling spatial references in natural language is a key challenge in tasks like autonomous navigation and robotic manipulation. Recent work has investigated various neural architectures for learning multi-modal representations of spatial concepts that generalize well across a variety of observations and text instructions. In this work, we develop accurate models for understanding spatial references in text that are also robust and interpretable. We design a text-conditioned relation network whose parameters are dynamically computed with a cross-modal attention module to capture fine-grained spatial relations between entities. Our experiments across three different prediction tasks demonstrate the effectiveness of our model compared to existing state-of-the-art systems. Our model is robust to both observational and instructional noise, and lends itself to easy interpretation through visualization of intermediate outputs.
@inproceedings{yang2020robust, bibtex_show = {true}, title = {Robust and Interpretable Grounding of Spatial References with Relation Networks}, author = {Yang, Tsung-Yen and Lan, Andrew S. and Narasimhan, Karthik}, booktitle = {Findings of Empirical Methods in Natural Language Processing (EMNLP)}, year = {2020}, html = {https://arxiv.org/abs/2005.00696}, tag = {NLP}, taggg = {RL} }
NLP CV
Multimodal Graph Networks for Compositional Generalization in Visual Question Answering

Raeid Saqur, and Karthik Narasimhan

Neural Information Processing Systems (NeurIPS), 2020

Abs Bib Paper

Compositional generalization is a key challenge in grounding natural language to visual perception. While deep learning models have achieved great success in multimodal tasks like visual question answering, recent studies have shown that they fail to generalize to new inputs that are simply an unseen combination of those seen in the training distribution. In this paper, we propose to tackle this challenge by employing neural factor graphs to induce a tighter coupling between concepts in different modalities (e.g. images and text). Graph representations are inherently compositional in nature and allow us to capture entities, attributes and relations in a scalable manner. Our model first creates a multimodal graph, processes it with a graph neural network to induce a factor correspondence matrix, and then outputs a symbolic program to predict answers to questions. Empirically, our model achieves close to perfect scores on a caption truth prediction problem and state-of-the-art results on the recently introduced CLOSURE dataset, improving on the mean overall accuracy across seven compositional templates by 4.77% over previous approaches.
@inproceedings{saqur2020mgn, bibtex_show = {true}, title = {Multimodal Graph Networks for Compositional Generalization in Visual Question Answering}, author = {Saqur, Raeid and Narasimhan, Karthik}, booktitle = {Neural Information Processing Systems (NeurIPS)}, year = {2020}, html = {https://proceedings.neurips.cc/paper/2020/file/1fd6c4e41e2c6a6b092eb13ee72bce95-Paper.pdf}, tag = {NLP}, tagg = {CV} }
NLP CV
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Zhiwei Deng, Karthik Narasimhan, and Olga Russakovsky

Neural Information Processing Systems (NeurIPS), 2020

Abs Bib Paper

The ability to perform effective planning is crucial for building an instruction-following agent. When navigating through a new environment, an agent is challenged with (1) connecting the natural language instructions with its progressively growing knowledge of the world; and (2) performing long-range planning and decision making in the form of effective exploration and error correction. Current methods are still limited on both fronts despite extensive efforts. In this paper, we introduce the Evolving Graphical Planner (EGP), a model that performs global planning for navigation based on raw sensory input. The model dynamically constructs a graphical representation, generalizes the action space to allow for more flexible decision making, and performs efficient planning on a proxy graph representation. We evaluate our model on a challenging Vision-and-Language Navigation (VLN) task with photorealistic images and achieve superior performance compared to previous navigation architectures. For instance, we achieve a 53% success rate on the test split of the Room-to-Room navigation task through pure imitation learning, outperforming previous navigation architectures by up to 5%.
@inproceedings{deng2020evolving, bibtex_show = {true}, title = {Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation}, author = {Deng, Zhiwei and Narasimhan, Karthik and Russakovsky, Olga}, booktitle = {Neural Information Processing Systems (NeurIPS)}, year = {2020}, html = {https://arxiv.org/abs/2007.05655}, tag = {NLP}, tagg = {CV}, taggg = {RL} }
NLP CV
Towards Unique and Informative Captioning of Images

Zeyu Wang, Berthy Feng, Karthik Narasimhan, and Olga Russakovsky

European Conference on Computer Vision (ECCV), 2020

Abs Bib Paper

Despite considerable progress, state of the art image captioning models produce generic captions, leaving out important image details. Furthermore, these systems may even misrepresent the image in order to produce a simpler caption consisting of common concepts. In this paper, we first analyze both modern captioning systems and evaluation metrics through empirical experiments to quantify these phenomena. We find that modern captioning systems return higher likelihoods for incorrect distractor sentences compared to ground truth captions, and that evaluation metrics like SPICE can be ’topped’ using simple captioning systems relying on object detectors. Inspired by these observations, we design a new metric (SPICE-U) by introducing a notion of uniqueness over the concepts generated in a caption. We show that SPICE-U is better correlated with human judgements compared to SPICE, and effectively captures notions of diversity and descriptiveness. Finally, we also demonstrate a general technique to improve any existing captioning model – by using mutual information as a re-ranking objective during decoding. Empirically, this results in more unique and informative captions, and improves two different state-of-the-art models on SPICE-U as well as average score over existing metrics.
@inproceedings{wang2020captioning, bibtex_show = {true}, title = {Towards Unique and Informative Captioning of Images}, author = {Wang, Zeyu and Feng, Berthy and Narasimhan, Karthik and Russakovsky, Olga}, booktitle = {European Conference on Computer Vision (ECCV)}, year = {2020}, html = {http://www.ecva.net/papers/eccv_2020/papers_ECCV/html/350_ECCV_2020_paper.php}, tag = {NLP}, tagg = {CV} }
NLP
Calibration, Entropy Rates, and Memory in Language Models

Mark Braverman, Xinyi Chen, Sham Kakade, Karthik Narasimhan, Cyril Zhang, and Yi Zhang

International Conference on Machine Learning (ICML), 2020

Abs Bib Paper

Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that state-of-the-art language models, including LSTMs and Transformers, are miscalibrated: the entropy rates of their generations drift dramatically upward over time. We then provide provable methods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction.
@inproceedings{braverman2020calibration, bibtex_show = {true}, title = {Calibration, Entropy Rates, and Memory in Language Models}, author = {Braverman, Mark and Chen, Xinyi and Kakade, Sham and Narasimhan, Karthik and Zhang, Cyril and Zhang, Yi}, booktitle = {International Conference on Machine Learning (ICML)}, year = {2020}, html = {https://arxiv.org/abs/1906.05664}, tag = {NLP} }
RL
Projection Based Constrained Policy Optimization

Tsung-Yen Yang, Justinian Rosca, Karthik Narasimhan, and Peter J. Ramadge

International Conference on Learning Representations (ICLR) , 2020

Abs Bib Paper

In this paper, we consider the problem of learning control policies that optimize a reward function while satisfying constraints due to considerations of safety, fairness, or other costs. We propose a new algorithm - Projection Based Constrained Policy Optimization (PCPO), an iterative method for optimizing policies in a two-step process - the first step performs an unconstrained update while the second step reconciles the constraint violation by projection the policy back onto the constraint set. We theoretically analyze PCPO and provide a lower bound on reward improvement, as well as an upper bound on constraint violation for each policy update. We further characterize the convergence of PCPO with projection based on two different metrics - L2 norm and Kullback-Leibler divergence. Our empirical results over several control tasks demonstrate that our algorithm achieves superior performance, averaging more than 3.5 times less constraint violation and around 15% higher reward compared to state-of-the-art methods.
@inproceedings{yang2020projection, bibtex_show = {true}, title = {Projection Based Constrained Policy Optimization}, author = {Yang, Tsung-Yen and Rosca, Justinian and Narasimhan, Karthik and Ramadge, Peter J.}, booktitle = {International Conference on Learning Representations (ICLR) }, year = {2020}, html = {https://openreview.net/forum?id=rke3TJrtPS}, tag = {RL} }
NLP CV
Take the scenic route: improving generalization in vision-and-language navigation

Felix Yu, Zhiwei Deng, Karthik Narasimhan, and Olga Russakovsky

CVPR Visual Learning with Limited Labels Workshop, 2020

Abs Bib Paper

In the Vision-and-Language Navigation (VLN) task, an agent with egocentric vision navigates to a destination given natural language instructions. The act of manually annotating these instructions is timely and expensive, such that many existing approaches automatically generate additional samples to improve agent performance. However, these approaches still have difficulty generalizing their performance to new environments. In this work, we investigate the popular Room-to-Room (R2R) VLN benchmark and discover that what is important is not only the amount of data you synthesize, but also how you do it. We find that shortest path sampling, which is used by both the R2R benchmark and existing augmentation methods, encode biases in the action space of the agent which we dub as action priors. We then show that these action priors offer one explanation toward the poor generalization of existing works. To mitigate such priors, we propose a path sampling method based on random walks to augment the data. By training with this augmentation strategy, our agent is able to generalize better to unknown environments compared to the baseline, significantly improving model performance in the process.
@inproceedings{yu2020vln, bibtex_show = {true}, author = {Yu, Felix and Deng, Zhiwei and Narasimhan, Karthik and Russakovsky, Olga}, title = {Take the scenic route: improving generalization in vision-and-language navigation}, booktitle = {CVPR Visual Learning with Limited Labels Workshop}, year = {2020}, html = {https://arxiv.org/abs/2003.14269}, tag = {NLP}, tagg = {CV} }

2019

RL
A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation

Runzhe Yang, Xingyuan Sun, and Karthik Narasimhan

Neural Information Processing Systems (NeurIPS), 2019

Abs Bib Paper

We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions. We propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences. After this initial learning phase, our agent can quickly adapt to any given preference, or automatically infer an underlying preference with very few samples. Experiments across four different domains demonstrate the effectiveness of our approach.
@inproceedings{yang2019morl, bibtex_show = {true}, title = {A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation}, author = {Yang, Runzhe and Sun, Xingyuan and Narasimhan, Karthik}, booktitle = {Neural Information Processing Systems (NeurIPS)}, year = {2019}, html = {https://arxiv.org/abs/1908.08342}, tag = {RL} }
RL
Task-Agnostic Dynamics Priors for Deep Reinforcement Learning

Yilun Du, and Karthik Narasimhan

International Conference on Machine Learning (ICML), 2019

Abs Bib Paper Code

While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is often challenging and requires substantial interaction with the environment. A wide variety of domains have dynamics that share common foundations like the laws of classical mechanics, which are rarely exploited by existing algorithms. In fact, humans continuously acquire and use such dynamics priors to easily adapt to operating in new environments. In this work, we propose an approach to learn task-agnostic dynamics priors from videos and incorporate them into an RL agent. Our method involves pre-training a frame predictor on task-agnostic physics videos to initialize dynamics models (and fine-tune them) for unseen target environments. Our frame prediction architecture, SpatialNet, is designed specifically to capture localized physical phenomena and interactions. Our approach allows for both faster policy learning and convergence to better policies, outperforming competitive approaches on several different environments. We also demonstrate that incorporating this prior allows for more effective transfer between environments.
@inproceedings{du2019task, bibtex_show = {true}, title = {Task-Agnostic Dynamics Priors for Deep Reinforcement Learning}, author = {Du, Yilun and Narasimhan, Karthik}, booktitle = {International Conference on Machine Learning (ICML)}, year = {2019}, code = {https://github.com/yilundu/task_agnostic_dynamics_prior}, html = {https://arxiv.org/pdf/1905.04819.pdf}, tag = {RL} }

2018

NLP RL
Deep Transfer in Reinforcement Learning by Language Grounding

Karthik Narasimhan, Regina Barzilay, and Tommi Jaakkola

Journal of Artificial Intelligence Research (JAIR), 2018

Abs Bib Paper Code

In this paper, we explore the utilization of natural language to drive transfer for reinforcement learning (RL). Despite the wide-spread application of deep RL techniques, learning generalized policy representations that work across domains remains a challenging problem. We demonstrate that textual descriptions of environments provide a compact intermediate channel to facilitate effective policy transfer. We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized representation to effectively utilize entity descriptions. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments.
@inproceedings{narasimhan2018deep, bibtex_show = {true}, title = {Deep Transfer in Reinforcement Learning by Language Grounding}, author = {Narasimhan, Karthik and Barzilay, Regina and Jaakkola, Tommi}, booktitle = {Journal of Artificial Intelligence Research (JAIR)}, year = {2018}, code = {https://github.com/karthikncode/Grounded-RL-Transfer}, html = {https://arxiv.org/pdf/1708.00133.pdf}, tag = {NLP}, tagg = {RL} }
NLP
Improving language understanding by generative pre-training

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever

Technical Report, 2018

Abs Bib Paper

Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. For instance, we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI).
@inproceedings{radford2018improving, bibtex_show = {true}, title = {Improving language understanding by generative pre-training}, author = {Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya}, booktitle = {Technical Report}, year = {2018}, html = {https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf}, tag = {NLP} }

2017

NLP RL
Grounding Natural Language with Autonomous Interaction

Karthik Narasimhan

PhD Thesis, 2017

Abs Bib PDF

The resurgence of deep neural networks has resulted in impressive advances in natural language processing (NLP). This success, however, is contingent on access to large amounts of structured supervision, often manually constructed and unavailable for many applications and domains. In this thesis, I present novel computational models that integrate reinforcement learning with language understanding to induce grounded representations of semantics. Using unstructured feedback, these techniques not only enable task-optimized representations which reduce dependence on high quality annotations, but also exploit language in adapting control policies across different environments. First, I describe an approach for learning to play text-based games, where all interaction is through natural language and the only source of feedback is in-game rewards. Employing a deep reinforcement learning framework to jointly learn state representations and action policies, our model outperforms several baselines on different domains, demonstrating the importance of learning expressive representations. Second, I exhibit a framework for utilizing textual descriptions to tackle the challenging problem of cross-domain policy transfer for reinforcement learning (RL). We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized state representation to effectively make use of text. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. Finally, I demonstrate how reinforcement learning can enhance traditional NLP systems in low resource scenarios. In particular, I describe an autonomous agent that can learn to acquire and integrate external information to enhance information extraction. Our experiments on two databases – shooting incidents and food adulteration cases – demonstrate that our system significantly improves over traditional extractors and a competitive meta-classifier baseline.
@inproceedings{narasimhan2017grounding, bibtex_show = {true}, title = {Grounding Natural Language with Autonomous Interaction}, author = {Narasimhan, Karthik}, booktitle = {PhD Thesis}, year = {2017}, pdf = {thesis.pdf}, tag = {NLP}, tagg = {RL} }
NLP RL
Representation Learning for Grounded Spatial Reasoning

Michael Janner, Karthik Narasimhan, and Regina Barzilay

Transactions of the Association for Computational Linguistics (TACL), 2017

Abs Bib Paper Code

The interpretation of spatial references is highly contextual, requiring joint inference over both language and the environment. We consider the task of spatial reasoning in a simulated environment, where an agent can act and receive rewards. The proposed model learns a representation of the world steered by instruction text. This design allows for precise alignment of local neighborhoods with corresponding verbalizations, while also handling global references in the instructions. We train our model with reinforcement learning using a variant of generalized value iteration. The model outperforms state-of-the-art approaches on several metrics, yielding a 45% reduction in goal localization error.
@inproceedings{janner2017representation, bibtex_show = {true}, title = {Representation Learning for Grounded Spatial Reasoning}, author = {Janner, Michael and Narasimhan, Karthik and Barzilay, Regina}, booktitle = {Transactions of the Association for Computational Linguistics (TACL)}, year = {2017}, code = {https://github.com/JannerM/spatial-reasoning}, html = {https://arxiv.org/pdf/1707.03938.pdf}, tag = {NLP}, tagg = {RL} }
NLP
Unsupervised Learning of Morphological Forests

Jiaming Luo, Karthik Narasimhan, and Regina Barzilay

Transactions of the Association for Computational Linguistics (TACL), 2017

Abs Bib PDF Code

This paper focuses on unsupervised modeling of morphological families, collectively comprising a forest over the language vocabulary. This formulation enables us to capture edge-wise properties reflecting single-step morphological derivations, along with global distributional properties of the entire forest. These global properties constrain the size of the affix set and encourage formation of tight morphological families. The resulting objective is solved using Integer Linear Programming (ILP) paired with contrastive estimation. We train the model by alternating between optimizing the local log-linear model and the global ILP objective. We evaluate our system on three tasks: root detection, clustering of morphological families and segmentation. Our experiments demonstrate that our model yields consistent gains in all three tasks compared with the best published results.
@inproceedings{luo2017unsupervised, bibtex_show = {true}, title = {Unsupervised Learning of Morphological Forests}, author = {Luo, Jiaming and Narasimhan, Karthik and Barzilay, Regina}, booktitle = {Transactions of the Association for Computational Linguistics (TACL)}, year = {2017}, code = {https://github.com/j-luo93/MorphForest}, pdf = {luo2017unsupervised.pdf}, tag = {NLP} }
NLP
Constructing sub-word units for Spoken Term Detection

Charl Heerden, Damianos Karakos, Karthik Narasimhan, Marelie Davel, and Richard Schwartz

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017

Abs Bib PDF

Spoken term detection, especially of out-of-vocabulary (OOV) keywords, benefits from the use of sub-word systems. We experiment with different language-independent approaches to sub-word unit generation, generating both syllable-like and morpheme-like units, and demonstrate how the performance of syllable-like units can be improved by artificially increasing the number of unique units. The effect of unit choice is empirically evaluated using the eight languages from the 2016 IARPA BABEL evaluation.
@inproceedings{heerden2017constructing, bibtex_show = {true}, title = {Constructing sub-word units for Spoken Term Detection}, author = {van Heerden, Charl and Karakos, Damianos and Narasimhan, Karthik and Davel, Marelie and Schwartz, Richard}, booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, year = {2017}, pdf = {heerden2017constructing.pdf}, tag = {NLP}, tagg = {Speech} }

2016

NLP RL
Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning

Karthik Narasimhan, Adam Yala, and Regina Barzilay

Empirical Methods in Natural Language Processing (EMNLP), 2016
Best paper award

Abs Bib PDF Code Slides

Most successful information extraction systems operate with access to a large collection of documents. In this work, we explore the task of acquiring and incorporating external evidence to improve extraction accuracy in domains where the amount of training data is scarce. This process entails issuing search queries, extraction from new sources and reconciliation of extracted values, which are repeated until sufficient evidence is collected. We approach the problem using a reinforcement learning framework where our model learns to select optimal actions based on contextual information. We employ a deep Qnetwork, trained to optimize a reward function that reflects extraction accuracy while penalizing extra effort. Our experiments on two databases – of shooting incidents, and food adulteration cases – demonstrate that our system significantly outperforms traditional extractors and a competitive meta-classifier baseline.
@inproceedings{narasimhan2016improving, bibtex_show = {true}, title = {Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning}, author = {Narasimhan, Karthik and Yala, Adam and Barzilay, Regina}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, year = {2016}, pdf = {rlie16.pdf}, slides = {rlie16-slides.pdf}, media = {<a href="https://news.mit.edu/2016/artificial-intelligence-system-surfs-web-improve-performance-1110" target="_blank">MIT news</a>, <a href="https://www.digitaltrends.com/cool-tech/ai-surfs-the-web-learning/" target="_blank">Digital Trends</a>, <a href="https://www.techradar.com/news/mit-let-an-ai-loose-on-the-web-what-could-possibly-go-wrong" target="_blank">TechRadar</a>, <a href="https://economictimes.indiatimes.com/magazines/panache/new-ai-system-to-extract-data-from-the-internet-more-effectively/articleshow/55400944.cms?from=mdr" target="_blank">Economic Times</a>}, code = {https://github.com/karthikncode/DeepRL-InformationExtraction}, misc = {Best paper award}, tag = {NLP}, tagg = {RL} }
RL
Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation

Tejas D Kulkarni*, Karthik R Narasimhan*, Ardavan Saeedi, and Joshua B Tenenbaum

Neural Information Processing Systems (NIPS), 2016

Abs Bib PDF Code

Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms. One of the key difficulties is insufficient exploration, resulting in an agent being unable to learn robust policies. Intrinsically motivated agents can explore new behavior for their own sake rather than to directly solve external goals. Such intrinsic behaviors could eventually help the agent solve tasks posed by the environment. We present hierarchical- DQN (h-DQN), a framework to integrate hierarchical action-value functions, operating at different temporal scales, with goal-driven intrinsically motivated deep reinforcement learning. A top-level q-value function learns a policy over intrinsic goals, while a lower-level function learns a policy over atomic actions to satisfy the given goals. h-DQN allows for flexible goal specifications, such as functions over entities and relations. This provides an efficient space for exploration in complicated environments. We demonstrate the strength of our approach on two problems with very sparse and delayed feedback: (1) a complex discrete stochastic decision process with stochastic transitions, and (2) the classic ATARI game – ‘Montezuma’s Revenge’.
@inproceedings{kulkarni2016hierarchical, bibtex_show = {true}, title = {Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation}, author = {Kulkarni*, Tejas D and Narasimhan*, Karthik R and Saeedi, Ardavan and Tenenbaum, Joshua B}, booktitle = {Neural Information Processing Systems (NIPS)}, year = {2016}, pdf = {hdqn16.pdf}, code = {https://github.com/mrkulk/hierarchical-deep-RL}, tag = {RL} }
NLP
Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge

Nicholas Locascio, Karthik Narasimhan, Eduardo DeLeon, Nate Kushman, and Regina Barzilay

Empirical Methods in Natural Language Processing (EMNLP), 2016

Abs Bib Paper Code

This paper explores the task of translating natural language queries into regular expressions which embody their meaning. In contrast to prior work, the proposed neural model does not utilize domain-specific crafting, learning to translate directly from a parallel corpus. To fully explore the potential of neural models, we propose a methodology for collecting a large corpus1 of regular expression, natural language pairs. Our resulting model achieves a performance gain of 19.6% over previous state-of-the-art models.
@inproceedings{locascio2016neural, bibtex_show = {true}, title = {Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge}, author = {Locascio, Nicholas and Narasimhan, Karthik and DeLeon, Eduardo and Kushman, Nate and Barzilay, Regina}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, year = {2016}, code = {https://github.com/nicholaslocascio/deep-regex}, html = {https://arxiv.org/pdf/1608.03000v1}, media = {<a href="https://news.ycombinator.com/item?id=12269468" target="_blank">Hacker news</a>}, tag = {NLP} }
NLP
Nonparametric Spherical Topic Modeling with Word Embeddings

Kayhan Batmanghelich, Ardavan Saeedi, Karthik Narasimhan, and Sam Gershman

Association for Computational Linguistics (ACL), 2016

Abs Bib Paper Code

Traditional topic models do not account for semantic regularities in language. Recent distributional representations of words exhibit semantic consistency over directional metrics such as cosine similarity. However, neither categorical nor Gaussian observational distributions used in existing topic models are appropriate to leverage such correlations. In this paper, we propose to use the von Mises-Fisher distribution to model the density of words over a unit sphere. Such a representation is well-suited for directional data. We use a Hierarchical Dirichlet Process for our base topic model and propose an efficient inference algorithm based on Stochastic Variational Inference. This model enables us to naturally exploit the semantic structures of word embeddings while flexibly discovering the number of topics. Experiments demonstrate that our method outperforms competitive approaches in terms of topic coherence on two different text corpora while offering efficient inference.
@inproceedings{batmanghelich2016nonparametric, bibtex_show = {true}, title = {Nonparametric Spherical Topic Modeling with Word Embeddings}, author = {Batmanghelich, Kayhan and Saeedi, Ardavan and Narasimhan, Karthik and Gershman, Sam}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2016}, code = {https://github.com/Ardavans/sHDP}, html = {https://arxiv.org/pdf/1604.00126v1.pdf}, tag = {NLP} }

2015

NLP RL
Language understanding for text-based games using deep reinforcement learning

Karthik Narasimhan*, Tejas Kulkarni*, and Regina Barzilay

Empirical Methods in Natural Language Processing (EMNLP), 2015
Best paper honorable mention

Abs Bib PDF Supp Code Slides

In this paper, we consider the task of learning control policies for text-based games. In these games, all interactions in the virtual world are through text and the underlying state is not observed. The resulting language barrier makes such environments challenging for automatic game players. We employ a deep reinforcement learning framework to jointly learn state representations and action policies using game rewards as feedback. This framework enables us to map text descriptions into vector representations that capture the semantics of the game states. We evaluate our approach on two game worlds, comparing against baselines using bag-ofwords and bag-of-bigrams for state representations. Our algorithm outperforms the baselines on both worlds demonstrating the importance of learning expressive representations.
@inproceedings{narasimhan2015language, bibtex_show = {true}, title = {Language understanding for text-based games using deep reinforcement learning}, author = {Narasimhan*, Karthik and Kulkarni*, Tejas and Barzilay, Regina}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, year = {2015}, pdf = {mud-play15.pdf}, slides = {mud-play15-slides.pdf}, supp = {mud-supp.pdf}, misc = {Best paper honorable mention}, code = {https://github.com/karthikncode/text-world-player}, media = {<a href="https://news.mit.edu/2015/learning-language-playing-computer-games-0924" target="_blank">MIT news</a>, <a href="https://news.ycombinator.com/item?id=9811107" target="_blank">Hacker news</a>, <a href="https://evennia.blogspot.com/2015/11/mit-uses-evennia.html" target="_blank">Evennia blog</a>}, tag = {NLP}, tagg = {RL} }
NLP

An Unsupervised Method for Uncovering Morphological Chains

Karthik Narasimhan, Regina Barzilay, and Tommi Jaakkola

Transactions of the Association for Computational Linguistics (TACL) 2015

Abs PDF Code Slides

Most state-of-the-art systems today produce morphological analysis based only on orthographic patterns. In contrast, we propose a model for unsupervised morphological analysis that integrates orthographic and semantic views of words. We model word formation in terms of morphological chains, from base words to the observed words, breaking the chains into parent-child relations. We use log-linear models with morpheme and wordlevel features to predict possible parents, including their modifications, for each word. The limited set of candidate parents for each word render contrastive estimation feasible. Our model consistently matches or outperforms five state-of-the-art systems on Arabic, English and Turkish.
NLP
Machine Comprehension with Discourse Relations

Karthik Narasimhan, and Regina Barzilay

Association for Computational Linguistics (ACL), 2015

Abs Bib PDF Slides

This paper proposes a novel approach for incorporating discourse information into machine comprehension applications. Traditionally, such information is computed using off-the-shelf discourse analyzers. This design provides limited opportunities for guiding the discourse parser based on the requirements of the target task. In contrast, our model induces relations between sentences while optimizing a task-specific objective. This approach enables the model to benefit from discourse information without relying on explicit annotations of discourse structure during training. The model jointly identifies relevant sentences, establishes relations between them and predicts an answer. We implement this idea in a discriminative framework with hidden variables that capture relevant sentences and relations unobserved during training. Our experiments demonstrate that the discourse aware model outperforms state-of-the-art machine comprehension systems.
@inproceedings{narasimhan2015machine, bibtex_show = {true}, title = {Machine Comprehension with Discourse Relations}, author = {Narasimhan, Karthik and Barzilay, Regina}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2015}, pdf = {mcdr15.pdf}, slides = {mcdr15-slides.pdf}, resources = {/karthikn/mcdr}, tag = {NLP} }
JUMP-Means: Small-Variance Asymptotics for Markov Jump Processes

Jonathan H Huggins, Karthik Narasimhan, Ardavan Saeedi, and Vikash K Mansinghka

International Conference on Machine Learning (ICML), 2015

Abs Bib PDF

Markov jump processes (MJPs) are used to model a wide range of phenomena from disease progression to RNA path folding. However, maximum likelihood estimation of parametric models leads to degenerate trajectories and inferential performance is poor in nonparametric models. We take a small-variance asymptotics (SVA) approach to overcome these limitations. We derive the small-variance asymptotics for parametric and nonparametric MJPs for both directly observed and hidden state models. In the parametric case we obtain a novel objective function which leads to non-degenerate trajectories. To derive the nonparametric version we introduce the gamma-gamma process, a novel extension to the gamma-exponential process. We propose algorithms for each of these formulations, which we call JUMP-means. Our experiments demonstrate that JUMP-means is competitive with or outperforms widely used MJP inference approaches in terms of both speed and reconstruction accuracy
@inproceedings{huggins2015jump, bibtex_show = {true}, title = {JUMP-Means: Small-Variance Asymptotics for Markov Jump Processes}, author = {Huggins, Jonathan H and Narasimhan, Karthik and Saeedi, Ardavan and Mansinghka, Vikash K}, booktitle = {International Conference on Machine Learning (ICML)}, year = {2015}, pdf = {sva-mjp.pdf} }

2014

NLP
Morphological Segmentation for Keyword Spotting

Karthik Narasimhan, Damianos Karakos, Richard Schwartz, Stavros Tsakalidis, and Regina Barzilay

Empirical Methods in Natural Language Processing (EMNLP), 2014

Abs Bib PDF Poster

We explore the impact of morphological segmentation on keyword spotting (KWS). Despite potential benefits, stateof-the-art KWS systems do not use morphological information. In this paper, we augment a state-of-the-art KWS system with sub-word units derived from supervised and unsupervised morphological segmentations, and compare with phonetic and syllabic segmentations. Our experiments demonstrate that morphemes improve overall performance of KWS systems. Syllabic units, however, rival the performance of morphological units when used in KWS. By combining morphological, phonetic and syllabic segmentations, we demonstrate substantial performance gains.
@inproceedings{narasimhan2014morphological, bibtex_show = {true}, title = {Morphological Segmentation for Keyword Spotting}, author = {Narasimhan, Karthik and Karakos, Damianos and Schwartz, Richard and Tsakalidis, Stavros and Barzilay, Regina}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, year = {2014}, pdf = {kws14.pdf}, poster = {kws14-poster.pdf}, tag = {NLP} }

2012

Modeling human bounded rationality to improve defender strategies in network security games

Rong Yang, Fei Fang, Albert Xin Jiang, Karthik Rajagopal, Milind Tambe, and Rajiv Maheswaran

HAIDM workshop at AAMAS, 2012

Abs Bib PDF

In a Network Security Game (NSG), security agencies must allocate limited resources to protect targets embedded in a network, such as important buildings in a city road network. A recent line of work relaxed the perfect-rationality assumption of human adversary and showed significant advantages of incorporating the bounded rationality adversary models in non-networked security domains. Given that real-world NSG are often extremely complex and hence very difficult for humans to solve, it is critical that we address human bounded rationality when designing defender strategies. To that end, the key contributions of this paper include: (i) comprehensive experiments with human subjects using a web-based game that we designed to simulate NSGs; (ii) new behavioral models of human adversary in NSGs, which we train with the data collected from human experiments; (iii) new algorithms for computing the defender optimal strategy against the new models.
@inproceedings{yang2012modeling, bibtex_show = {true}, title = {Modeling human bounded rationality to improve defender strategies in network security games}, author = {Yang, Rong and Fang, Fei and Jiang, Albert Xin and Rajagopal, Karthik and Tambe, Milind and Maheswaran, Rajiv}, booktitle = {HAIDM workshop at AAMAS}, year = {2012}, pdf = {aamas2012_GSG.pdf} }