|
Computer Science 598D
Systems and Machine Learning
|
Spring 2021
|
This course is open to graduate students. For seniors who are interested in taking this course, it requires permission from the instructors. This course has no P/D/F option. All students are required to present papers and participate discussions, and complete three programming assignments and a final project. For final projects, students have the option to work in a small team of two and produce a project report and a project presentation.
We suggest students using Microsoft Azure computational
resources to complete assignments and projects, as Microsoft provides
educational credits for this course. The course projects require training deep
neural networks. Finishing the projects may consume substantial amount GPU
hours, which may not be feasible on free cloud service.
· Microsoft Azure. We will provide each student a certain amount of Azure credits. If you need more credits, you can send a request to Dr. Xiaoxiao Li (xl32@princeton.edu). Please carefully manage the instance on Azure and stop the instance that is not running. Here is the Azure tutorial: https://azure.microsoft.com/en-us/get-started/. Additional tips of using Azure will be provided in the courses.
During each class meeting, we
will have either a lecture by the instructors or invited speakers or
presentations and discussions of two selected papers.
Each student will write a very
brief review for each paper (one or two sentences to summarize the paper, one
or two sentences to summarize the strengths, weaknesses and future directions of the paper). Please download
the template here.
Each paper will take a 40-minute
time slot. To motivate students to read and discuss papers in depth, we ask 2-3
students to be responsible for three components of presentations and
discussions, each by an individual student with a few slides:
·
Summary: discuss the problem
statement, a brief overview of the state-of-the-art, and the key idea of the
paper.
·
Strengths: a summary of the
strengths of the paper.
·
Weaknesses and future
directions: a summary of the weaknesses and future directions of the paper.
We suggest that summary
presentation takes about 20 minutes, the discussions about strengths,
weaknesses and future directions take about 20 minutes. The three
students should manage their time well and serve as a “panel” for the
discussions of the paper.
In the tentative schedule below,
we have planned the topics for each week and suggested papers. The
2-3 students who signed up for a paper time slot can discuss with the
instructors to select a different paper on the same topic and announce the
paper in advance.
To get familiar with ML projects, we require each student to
do a small warmup project to reproduce the results using LeNet-5 in a paper:
Gradient-Based Learning Applied to Document Recognition. Y. LeCun, L. Bottou, Y. Bengio and P. Haffner:, Proceedings of the IEEE, 1998.
Feel free to use any deep
learning framework you are familiar with. A warm up project about MNIST classification
is available. Our assignments examples are all in Pytorch.
MNIST dataset will be
automatic downloaded if using MNIST classification.
If you use other platforms and need to download data separately, uou can download the MNIST dataset (by Yann LeCun, Corinna Cortes and Christopher Burges).
You are welcome to use other
online help or getting help from other students.
You are encouraged to launch
the job on Microsoft Azure and get familiar with the Microsoft Azure.
Programming Assignment 1 (Systems for ML)
There are
two options for this programming assignment. To maximize learning and
minimize programming efforts for students, each option is related to a recently
published paper that will be presented and discussed in the class and each has
open-source codes. You need to select one option and
complete the assignment based on its requirements. The two options
are:
· System
for ML: Network pruning (please click to see details)
· System
for ML: Binary ConvNet (please click to
see details)
The
detailed requirements for each option will be provided before the assignment
starts.
Programming Assignment 2 (ML for Systems)
Similar
to programming assignment 1, there are two options for this programming
assignment. Each option is also related to a recently published
paper that will be presented and discussed in the class and each has
open-source codes. You need to select one option and
complete the assignment based on its requirements. The two options
are:
· ML
for System: Auto Neural Arch Search (please click to see details)
· ML
for System: Adaptive learned Bloom filter (please click to see
details)
The
detailed requirements for each option will be provided before the assignment
starts.
For the
final project, students will improve or investigate some future directions of
one of the two programming assignments above. Final project can
be done by either one or two students. Each student or each
team of students should submit a brief project proposal, a final report and
give a 10-minute final presentation. The final reports are
due at 11:59pm of the Dean's day, which is the deadline of the
university. The final presentations will be scheduled soon after
Dean’s day.
For a two-student team, we suggest
that you propose something more significant. We expect you to state
clearly who did what in the final report.
This graduate seminar will be graded roughly as follows:
|
|
|
|
|
|
Prof. Kai Li Dr. Xiaoxiao Li |
A New Golden Age in Computer Architecture:
Empowering the Machine- Learning Revolution. Systems and
Machine Learning Symbiosis (invited talk). Jeff Dean. SysML Conference. 2018 |
|
|
|
Prof. Jia Deng, Princeton |
|
|
|
|
Dr. Xiaoxiao Li |
Introduction
to deep learning framework Video
(you may need to log in) |
Introduction
to Pytorch and Tensorflow Azure
Tutorial/Demo |
|
2/11 |
Prof.
Karthik Narasimhan, Princeton (Guest
lecture) |
Human-level control through deep reinforcement learning. Mnih, V. et al, Nature, 2015.
(Earlier version). Reinforcement Learning:
An Introduction. (book) |
|
|
|
Prof. Kai Li |
Large
Scale Distributed Deep Networks. Jeffrey
Dean, et al. NIPS 2012. TensorFlow: A System
for Large-Scale Machine Learning. |
|
|
|
Dr. Xiaoxiao Li |
|
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, Jonathan Frankle, Michael Carbin. ICLR 2019. SNIP: Single-shot Network Pruning
based on Connection Sensitivity, Lee et al. ICLR 2019 |
Submit Warmup Start assignment
(Systems for ML) |
|
Felix, Alexander, Josh, Grace |
Network Pruning Recording unavailable |
Picking winning tickets before training by preserving gradient flow. Wang et al. 2020 Pruning
neural networks without any data by iteratively conserving synaptic flow.
Tanaka et al. 2020. |
Submit Review |
|
Dr. Zhao Song, Princeton (guest lecture) |
Learned Data Structures |
Learning Space Partitions for Nearest Neighbor Search Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner, ICLR 2020 A Model for Learned Bloom Filters
and Optimizing by Sandwiching. Michael Mitzenmacher,
et al. NIPS 2018. |
|
|
Kaiqi, Dongsheng, Yi, Juan |
Learning
Multi-dimensional Indexes. Vikram Nathan, Jialin Ding,
Mohammad Alizadeh, Tim Kraska SIGMOD 2020. ALEX: An Updatable Adaptive Learned
Index Jialin Ding et al. SIGMOD 2020. |
Submit
Review (click here, ddl is before the class) |
|
|
Samyak, Juan, Josh, Felix |
AutoML |
Neural Architecture Search with Reinforcement Learning Barret Zoph Quoc V. Le.
ICLR (2017) Darts: Differentiable architecture search. Liu, H., Simonyan, K. and Yang, Y., 2018. |
Submit
Review
|
|
Dr. Safeen Huda, Google Brain (Guest lecture) |
Google TPU |
In-Datacenter
Performance Analysis of a Tensor Processing Unit. N. Jouppi et al.
ISCA 2017 A
domain-specific supercomputer for training deep neural networks N. Jouppi et al. CACM 2020 The
Design Process for Google's Training Chips: TPUv2 and TPUv3. T. Morrie, et al. IEEE MICRO 2021. GShard:
Scaling Giant Models with Conditional Computation and Automatic Sharding. D. Lepikhin, at al. 2021 (under review) |
Submit assignment (Systems for ML) Start assignment (ML for systems) |
3/11 |
Samyak, Dongsheng Yue,
Yi |
Computer Architecture |
High-Performance
Deep-Learning Coprocessor Integrated into x86 SoC with Server-Class CPUs Glenn
Henry, et al 2020. The Architectural Implications of
Facebook’s DNN-based Personalized Recommendation Gupta et al.
2020. |
Submit
Review |
3/18 |
Kelvin Zou, ByteDance &Princeton Alumnus |
Systems at ByteDance |
A
Generic Communication Scheduler for Distributed DNN Training Acceleration. GPipe:
Efficient Training of Giant Neural Networks using Pipeline Parallelism. Mesh-TensorFlow: Deep Learning for Supercomputers. Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman. NeurIPS 2018. IPS: Unified
Profile Management for Ubiquitous Online Recommendations. |
|
3/23 |
Dr. Xiaoxiao Li |
Introduction
to Federated Learning |
Federated learning: Collaborative
machine learning without centralized training data. Inverting Gradients - How easy is it
to break privacy in federated learning? (Privacy) Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, Michael Moeller, NeurIPS 2020 |
|
3/25 |
Juan, Kaiqi, Samyak, Alexander, |
Privacy
Preservation |
Membership inference attacks against machine learning
models. Shokri, Reza, et al. 2017
IEEE Symposium on Security and Privacy (SP). IEEE, 2017. The secret sharer: Evaluating and testing
unintended memorization in neural networks. Carlini, Nicholas, et al. USENIX, 2019. |
Submit Review Submit assignment (ML for systems) Check suggested final project |
3/30 |
Prof. Danqi Chen, Princeton (Guest lecture) |
NLP and Transformers |
- Devlin et
al., 2018: BERT:
Pre-training of Deep Bidirectional Transformers for Language
Understanding - Liu et
al., 2019: RoBERTa: A
Robustly Optimized BERT Pretraining Approach - Joshi
& Chen et al., 2019: SpanBERT:
Improving Pre-training by Representing and Predicting Spans - Karpukhin et al., 2020: Dense Passage
Retrieval for Open-Domain Question Answering - Lee et
al., 2020: Learning
Dense Representations of Phrases at Scale |
|
4/1 |
Felix, Alexander, Josh, Kaiqi |
Transformers |
Switch Transformers: Scaling to
Trillion Parameter Models with Simple and Efficient Sparsity, William Fedus et al. 2020.
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Prashanth Guruganesh Avi Dubey Joshua Ainslie Chris Alberti Santiago Ontanon Philip Minh Pham Anirudh Ravula Qifan Wang Li Yang Amr Mahmoud El Houssieny Ahmed, NeurIPS (2020). |
Submit Review Submit final project proposal (finish discussing with instructors) |
4/6 |
Zhenyu Song, Princeton (Guest lecture) |
Caching |
Learning Cache Replacement with
CACHEUS Learning Relaxed Belady
for Content Distribution Network Caching Zhenyu Song, et al. NSDI 2020. |
|
4/8 |
Yue, Dongsheng, Yi, Grace |
Caching |
Flashield: a Hybrid
Key-value Cache that Controls Flash Write Amplification Assaf Eisenman, Asaf Cidon, Evgenya Pergament, Or Haimovich, Ryan
Stutsman, Mohammad Alizadeh, Sachin Katti. NSDI 2019. An Imitation Learning Approach for Cache Replacement Evan Z. Liu Milad Hashemi Kevin Swersky Parthasarathy Ranganathan Junwhan
Ahn. ICML 2020. |
Submit
Review |
4/13 |
Prof. Song Han, MIT (Guest lecture) |
AutoML |
MCUNet: Tiny Deep Learning on
IoT Devices Tiny
Transfer Learning: Reduce Memory, not Parameters for Efficient On-Device
Learning, Cai, H., Gan, C., Zhu, L. and
Han, S. NeurIPS’20 Differentiable Augmentation for
Data-Efficient GAN Training Zhao, S., Liu, Z., Lin, J., Zhu,
J.Y. and Han, S.,NeurIPS’20 |
|
4/15 |
Yue, Josh, Felix, Yi |
Parallel and distributed training |
Memory-Efficient Pipeline-Parallel DNN
Training Narayanan et al. 2020. PyTorch Distributed: Experiences on
Accelerating Data Parallel Training. |
Submit Review |
4/20 |
Cerebras, Inc. Dr. Mike Ringenburg |
Architecture |
Title: Accelerating Deep Learning with a Purpose-built
Solution: The Cerebras Approach Abstract:
The new era of
chip specialization for deep learning is here. Traditional approaches to
computing can no longer meet the computational and power requirements of this
important workload. What is the right processor for deep learning? To answer
this question, this talk will discuss computational requirements of deep
learning models and the limitations of existing hardware architectures and
scale-out approaches. Then we will discuss Cerebras'
approach to meet computational requirements of deep learning with the Cerebras Wafer Scale Engine (WSE) – the largest computer
chip in the world, and the Cerebras Software
Platform, co-designed with the WSE. The WSE provides cluster-scale resources
on a single chip with full utilization for tensors of any shape – fat, square
and thin, dense and sparse – enabling researchers to explore novel network
architectures and optimization techniques at any batch size. |
|
4/22 |
Yue, Kaiqi, Dongsheng, Grace |
Networking |
Neural-Enhanced Live Streaming:
Improving Live Video Ingest via Online Learning Jaehong Kim, et al, SIGCOMM 2020 Server-Driven Video Streaming for
Deep Learning Inference Kuntai Du, et al., SIGCOMM 2020 |
Submit
Review |
4/27 |
Prof. Dawn Song, UC Berkeley Prof. Ruoxi Jia, Virginia Tech (guest lecture) |
Data Valuation |
Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms. Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li, Ce Zhang, Costas J. Spanos, Dawn Song, PVLDB 2019 Towards Efficient Data Valuation Based on the Shapley Value Ruoxi Jia*, David Dao*, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos, International Conference on Artificial Intelligence and Statistics (AISTATS), 2019 |
|
4/29 |
Alexander,
Juan, Samyak, Grace |
Data Valuation |
Understanding Black-box Predictions via Influence Functions Pang Wei Koh, Percy Liang, ICML 2017. Lucas Bourtoule,
Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, Nicolas Papernot, IEEE S&P 2021 |
Submit
Review |
|
|
Submit
Report (ddl: by the end of the day) |
||
5/12 (9:30am –11am) |
All students |
Final project presentations |
Pre 1: (Alexander and Felix) Partition learned bloom
filter
Pre 2: (Yue) Sandwich bloom filter
Pre 3: (Josh) Comparing DARTS vs. Progressive DARTs
Pre 4: (Juan) Value Motivated Exploration
Pre 5: (Grace) Improved implementation of Binarized
Neural Network
Pre 6: (Kaiqi) GraSP pruning for Binarized Neural Network
Pre 7: (Dongsheng) Linear
Regression for FastLRB
Pre 8: (Samyak) NAS in Binarized
Neural Network
Pre 9: (Yi) Data Parallelism in Unpruned and
Pruned Neural Nets Training
|
The order of presentation follows the voluntary
principle or we’ll generate it using random name picker.
Please submit your comments by the end of May 16th
here.
The template can be downloaded here.
|