Sep 7 (Wed) |
Introduction
|
1. Human Language Understanding & Reasoning
2. Attention Is All You Need (Transformers)
3. Blog Post: The Illustrated Transformer
4. HuggingFace's course on Transformers |
- |
Danqi Chen [slides] |
- |
What are LLMs? |
Sep 12 (Mon) |
BERT (encoder-only models)
1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
|
1. Deep contextualized word representations (ELMo)
2. Improving Language Understanding by Generative Pre-Training (OpenAI GPT)
3. RoBERTa: A Robustly Optimized BERT Pretraining Approach
4. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
|
lec2 questions |
Danqi Chen [slides] |
|
Sep 14 (Wed) |
T5 (encoder-decoder models)
1. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)
|
1. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
2. mT5: A massively multilingual pre-trained text-to-text transformer
3. AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
|
lec3 questions |
Abhishek Panigrahi, Victoria Graf [slides] |
Edward Tian, Zihan Ding, Jiatong Yu, Anirudh Ajith |
Sep 19 (Mon) |
GPT-3 (decoder-only models)
1. Language Models are Few-Shot Learners (GPT-3)
|
1. Language Models are Unsupervised Multitask Learners (GPT-2)
2. PaLM: Scaling Language Modeling with Pathways
3. OPT: Open Pre-trained Transformer Language Models
|
lec 4 questions |
Sabhya Chhabria, Michael Tang [slides] |
Anika Maskara, Tianle Cai, Richard Zhu, Andrea Wynn |
How to Use and Adapt LLMs? |
Sep 21 (Wed) |
Prompting for few-shot learning
1. Making Pre-trained Language Models Better Few-shot Learners (blog post)
2. How Many Data Points is a Prompt Worth? |
1. Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
2. True Few-Shot Learning with Language Models
3. Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
4. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
| lec 5 questions |
Kaixuan Huang, Edward Tian [slides] |
Sam Liang, Mengzhou Xia, Victoria Graf, Tianle Cai |
Sep 26 (Mon) |
Prompting as parameter-efficient fine-tuning
1. Prefix-Tuning: Optimizing Continuous Prompts for Generation
2. The Power of Scale for Parameter-Efficient Prompt Tuning |
1. Factual Probing Is [MASK]: Learning vs. Learning to Recall
2. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
3. LoRA: Low-Rank Adaptation of Large Language Models
4. Towards a Unified View of Parameter-Efficient Transfer Learning |
lec 6 questions |
Chris Pan, Hongjie Wang [slides] |
Sabhya Chhabria, Andrea Wynn, Sam Liang, Wenhan Xia |
Sep 28 (Wed) |
In-context learning
1. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
2. An Explanation of In-context Learning as Implicit Bayesian Inference (we don't expect you to read this paper in depth, you can check out this blog post instead)
|
1. What Makes Good In-Context Examples for GPT-3?
2. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
3. Data Distributional Properties Drive Emergent In-Context Learning in Transformers
4. What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
|
lec 7 questions |
Sam Liang, Kexin Jin [slides] |
Anika Maskara, Zixu Zhang, Tong Wu, Victoria Graf |
Oct 3 (Mon) |
Calibration of prompting LLMs
1.
Calibrate Before Use: Improving Few-Shot Performance of Language Models
2. Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right
|
1. Noisy Channel Language Model Prompting for Few-Shot Text Classification
2. How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering
3. Language Models (Mostly) Know What They Know
|
lec 8 questions |
Vishvak Murahari, Howard Yen [slides] |
Jiatong Yu, Howard Chen, Chris Pan, Andre Niyongabo Rubungo, Devon Wood-Thomas |
Oct 5 (Wed) |
Reasoning
1. Chain of Thought Prompting Elicits Reasoning in Large Language Models
2. Large Language Models are Zero-Shot Reasoners |
1. Explaining Answers with Entailment Trees
2. Self-Consistency Improves Chain of Thought Reasoning in Language Models
3. Faithful Reasoning Using Large Language Models
|
lec 9 questions |
Zihan Ding, Zixu Zhang [slides] |
Vishvak Murahari, Beiqi Zou, Chris Pan, Xiangyu Qi |
Oct 10 (Mon) |
Knowledge
1. Language Models as Knowledge Bases?
2. How Much Knowledge Can You Pack Into the Parameters of a Language Model? |
1. Knowledge Neurons in Pretrained Transformers
2. Fast Model Editing at Scale
3. Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets
|
lec 10 questions |
Jane Pan, Mengzhou Xia [slides] |
Andre Niyongabo Rubungo, Devon Wood-Thomas, Xiangyu Qi, Howard Chen |
Dissecting LLMs: Data, Model Scaling and Risks |
| Oct 12 (Wed) |
Data
1. Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
|
1. The Pile: An 800GB Dataset of Diverse Text for Language Modeling
2. Deduplicating Training Data Makes Language Models Better
|
lec 11 questions |
Andre Niyongabo Rubungo, Tanushree Banerjee [slides] |
Arseniy Andreyev, Wenhan Xia, Xindi Wu, Richard Zhu |
Oct 14 (Fri) |
Final project proposal due at 11:59pm Submit here. |
Oct 17 (Mon) |
Fall recess (no class) |
Oct 19 (Wed) |
Fall recess (no class) |
Oct 24 (Mon) |
Scaling
1. Training Compute-Optimal Large Language Models
|
1. Scaling Laws for Neural Language Models
2. Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
3. Scaling Laws for Autoregressive Generative Modeling
|
lec 12 questions |
Anika Maskara, Simon Park [slides] |
Hongjie Wang, Sabhya Chhabria, Edward Tian, Kaixuan Huang |
Oct 26 (Wed) |
Privacy
1. Extracting Training Data from Large Language Models
|
1. Quantifying Memorization Across Neural Language Models
2. Deduplicating Training Data Mitigates Privacy Risks in Language Models
3. Large Language Models Can Be Strong Differentially Private Learners
4. Recovering Private Text in Federated Learning of Language Models
|
lec 13 questions |
Xiangyu Qi, Tong Wu [slides] |
Anirudh Ajith, Austin Wang, Tanushree Banerjee, Arseniy Andreyev |
Oct 31 (Mon) |
Bias & Toxicity I: evaluation
1. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
2. OPT paper, Section 4
|
1. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
2. Red Teaming Language Models with Language Models
3. Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection
|
lec 14 questions |
Maxine Perroni-Scharf, Richard Zhu [slides] |
Tong Wu, Hongjie Wang, Howard Yen, Mengzhou Xia |
Nov 2 (Wed) |
Bias & Toxicity II: mitigation
1. Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
|
1. Challenges in Detoxifying Language Models
2. Detoxifying Language Models Risks Marginalizing Minority Voices
3. Plug and Play Language Models: A Simple Approach to Controlled Text Generation
4. GeDi: Generative discriminator guided sequence generation
|
lec 15 questions |
Anirudh Ajith, Arnab Bhattacharjee [slides] |
Maxine Perroni-Scharf, Xindi Wu, Jane Pan, Howard Chen |
Beyond Current LLMs: Models and Applications |
Nov 7 (Mon) |
Sparse models
1. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
|
1. Efficient Large Scale Language Modeling with Mixtures of Experts
2. Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
3. A Review of Sparse Expert Models in Deep Learning
|
lec 16 questions
|
Zhou Lu, Wenhan Xia [slides] |
Michael Tang, Arnab Bhattacharjee, Kexin Jin, Beiqi Zou |
Nov 9 (Wed) |
Retrieval-based LMs
1. Improving language models by retrieving from trillions of tokens
|
1. Generalization through Memorization: Nearest Neighbor Language Models
2. Training Language Models with Memory Augmentation
3. Few-shot Learning with Retrieval Augmented Language Models
|
lec 17 questions |
Tianle Cai, Beiqi Zou [slides] |
Simon Park, Jane Pan, Maxine Perroni-Scharf, Abhishek Panigrahi |
Nov 14 (Mon) |
Training LMs with human feedback
1. Training language models to follow instructions with human feedback
|
1. Learning to summarize from human feedback
2. Fine-Tuning Language Models from Human Preferences
3. MemPrompt: Memory-assisted Prompt Editing with User Feedback
4. LaMDA: Language Models for Dialog Application
|
lec 18 questions |
Howard Chen, Austin Wang [slides] |
Abhishek Panigrahi, Simon Park, Kaixuan Huang, Arseniy Andreyev |
Nov 16 (Wed) |
Code LMs
1. Evaluating Large Language Models Trained on Code
|
1. A Conversational Paradigm for Program Synthesis
2. InCoder: A Generative Model for Code Infilling and Synthesis
3. A Systematic Evaluation of Large Language Models of Code
4. Language Models of Code are Few-Shot Commonsense Learners
5. Competition-Level Code Generation with AlphaCode
|
lec 19 questions |
Arseniy Andreyev, Jiatong Yu [slides] |
Howard Yen, Michael Tang, Tanushree Banerjee, Kexin Jin |
Nov 21 (Mon) |
Multimodal LMs
1. Flamingo: a Visual Language Model for Few-Shot Learning
|
1. Blog post: Generalized Visual Language Models
2. Learning Transferable Visual Models From Natural Language Supervision (CLIP)
3. Multimodal Few-Shot Learning with Frozen Language Models
4. CM3: A Causal Masked Multimodal Model of the Internet
|
lec 20 questions |
Andrea Wynn, Xindi Wu [slides] |
Arnab Bhattacharjee, Vishvak Murahari, Austin Wang, Zihan Ding |
Nov 23 (Wed) |
Thanksgiving recess (no class) |
Nov 28 (Mon) |
Guest lecture: Alexander Rush (Cornell/Hugging Face)
Multitask Prompted Training for Zero-Shot Models
|
1. Multitask Prompted Training Enables Zero-Shot Task Generalization
2. PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
3. Scaling Instruction-Finetuned Language Models
4. Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
|
|
|
|
Nov 30 (Wed) |
AI Alignment + open discussion
|
1. A General Language Assistant as a Laboratory for Alignment
2. Alignment of Language Agents
3. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
|
|
Devon Wood-Thomas (half of the lecture) [slides] |
Richard Zhu, Sabhya Chhabria, Andrea Wynn, Anirudh Ajith |
Dec 5 (Mon) |
in-class presentation (extended class)
|
|
|
|
|
Dec 7 (Wed) |
No class |
Dec 16 (Fri) |
Final project due at 11:59pm (dean's date) |