COS597C: Machine Learning for Health Care (Fall 2018)

Machine learning is quickly becoming a powerful tool in health care to analyze and personalize treatments, assist in diagnoses, understand the underlying biology of disease, and support decision making and medical interventions. In this seminar, we will read papers on the topic of ML for health care and discuss these papers in class\ , focusing on the models and methods, and adaptations necessary to apply these methods to health care data. There will be a final project, which is an opportunity to apply machine learning approaches to an existing health care data set.

Course Logistics

Lectures: Monday, Wednesday, 11:00-12:20
Instructor: Barbara Engelhardt (bee@princeton.edu)
- office hours Mondays 12:30-1:30 in COS 322
TA: Diana Cai (dcai@cs.princeton.edu)
- office hours Thursdays 11-12 in COS Tea Room
Piazza webpage
- Up-to-date reading for the following week
- Discussion questions for specific readings
- Course project materials
- Course announcements

Grading

Course grade will be made up of:
- 70% class participation, including leading and participating in discussions of papers
- 30% final project, which is an eight-page write up of your application of ML approaches to a health care data set

Lectures

L1 W Sept 12: Welcome and Introduction (bee leads discussion)
- [Required] Opportunities in Machine Learning for Healthcare
- [Required] Big data in health care: Using analytics to identify and manage high-risk and high-cost patients
- [Optional] A $3 Trillion Challenge to Computational Scientists: Transforming Healthcare Delivery
L2 M Sept 17: Ethics 1 GL (Arvind Narayanan)
- [Required] Inherent trade-offs in the fair determination of risk scores
- [Required] Racial disparities and mistrust in end-of-life care
L3 W Sept 19: Ethics 2 GL (?)
- [Required] Fairness in learning: Classic and contextual bandits
- [Required] Identifying and mitigating biases in EHR laboratory tests
L4 M Sept 24: Survival analysis 1 (Uthser Chitra)
- [Required] Survival analysis lecture notes
- [Required] On ranking in survival analysis: Bounds on the concordance index](https://papers.nips.cc/paper/3375-on-ranking-in-survival-analysis-bounds-on-the-concordance-index.pdf)
L5 W Sept 26: Survival analysis 2 (Archit Verma)
- [Required] The Survival Filter: Joint survival analysis with a latent time series
- [Required] Reproducible Survival Prediction with SEER Cancer Data
- [Optional] Deep survival analysis
- [Optional] Deep Survival Analysis: Nonparametrics and Missingness
L6 M Oct 1: Computer vision and healthcare 1 (Antti Valkonen)
- [Required] Bedside computer vision — Moving artificial intelligence from driver assistance to patient safety
- [Required] Deep learning assessment of tumor proliferation in breast cancer histological images
- [Required] CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning
L7 W Oct 3: Computer vision and healthcare 2 (Felix Yu)
- [Required] Systematic analysis of breast cancer morphology uncovers stromal features associated with survival
- [Required] Dermatologist-level classification of skin cancer with deep neural networks
L8 M Oct 8: Time series modeling 1 (bee)
- [Required] Disease-Atlas: Navigating Disease Trajectories using Deep Learning
- [Required] Discriminative Switching Linear Dynamical Systems applied to Physiological Condition Monitoring
L9 W Oct 10: Time series modeling 2 (Yuan Wang)
- [Required] Scalable joint models for reliable uncertainty-aware event prediction
- [Required] Cross-corpora unsupervised learning of trajectories in autism spectrum disorders
L10 M Oct 15: Natural language processing and healthcare 1 (Sonali Mahendran)
- [Required] Natural language processing: an introduction
- [Required] Challenges in clinical natural language processing for automated disorder normalization
L11 W Oct 17: Natural language processing and healthcare 2 (Daniel Suo)
- [Required] Effective Use of Bidirectional Language Modeling for Transfer Learning in Biomedical Named Entity Recognition
- [Required] Predicting early psychiatric readmission with natural language processing of narrative discharge summaries
L12 M Oct 22: Causal inference and interventions 1 (Alexander Strzalkowski)
- [Required] A targeted real-time early warning score (TREWScore) for septic shock
- [Required] Reliable Decision Support using Counterfactual Models
L13 W Oct 24: Interpretability in health care (Yomjinda)
- [Required] Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission
- [Required] The Mythos of Model Interpretability
L14 M Nov 5: Missing data in EHR (Matthew Yeh)
- [Required] Recommender Systems: Missing Data and Statistical Model Estimation
- [Required] Semi-supervised Biomedical Translation with Cycle Wasserstein Regression GANs
L15 W Nov 7: Reinforcement learning in healthcare 1 (Qasim Nadeem)
- [Required] Informing sequential clinical decision-making through reinforcement learning: an empirical study
- [Required] A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units
L16 M Nov 12: Reinforcement learning in healthcare 2 (Sayan Hassantabar)
- [Required] Scalable and accurate deep learning with electronic health records
- [Required] Optimal Medication Dosing from Suboptimal Clinical Examples: A Deep Reinforcement Learning Approach
L17 W Nov 14: Reinforcement learning in healthcare 3 (Allison Chang)
- [Required] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
- [Required] Meaningless comparisons lead to false optimism in medical machine learning
L18 M Nov 19: Reinforcement learning in healthcare 4 (Sinong Geng)
- [Required] Evaluating Reinforcement Learning Algorithms in Observational Health Settings
- [Required] Behaviour policy estimation in off-policy policy evaluation: Calibration matters
L19 M Nov 26: ML and safety (Jay Lee)
- [Required] Importance Sampling for Fair Policy Selection
- [Required] Towards A Rigorous Science of Interpretable Machine Learning
L20 W Nov 28: Generalization and transfer learning (Jonathan Lu)
- [Required] Implications of non-stationarity on predictive modeling using EHRs
- [Required] Multi-task Prediction of Disease Onsets from Longitudinal Lab Tests
L21 M Dec 3: Adaptive learning in healthcare (Greg Gundersen)
- [Required] Contextual bandits for adapting treatment in a mouse model of de novo carcinogenesis
- [Required] The stratified micro-randomized trial design: sample size considerations for testing nested causal effects of time-varying treatments
L22 W Dec 5: Policy, privacy, and access (Mohamed El-Dirany)
- [Required] Why policymakers should care about “big data” in healthcare
- [Required] Differential Privacy: A Survey of Results
L23 M Dec 10: Final project presentations
L24 W Dec 12: Final project presentations

Presentation questions

When preparing to present a paper for the class, consider the following questions:


	- what problem in healthcare is this paper addressing?
	- what is the corresponding category of problem in ML that we can map this problem onto?
	- what machine learning approaches are the authors proposing/using?
	- do standard ML methods do the trick?
	- What is special about the HC problem that the methods need to adapt to?
	- were appropriate checks in place for model testing or avoiding model misspecification?
	- what type of confounders are present in the health care data? how does the model address those confounders?
	- how does the approach quantify uncertainty? Is uncertainty important here?
	- are there limited numbers of samples? how are those addressed?
	- how can I determine whether a specific patient's sample is similar to others, or unique? (the n of 1 problem)
	- how is patient/patient group heterogeneity addressed?
	- how can you quantify the most important features of the data? Are the methods interpretable?
	- how are doctor/caregiver mistakes accounted for?
	- are there possible biases introduced by the data or the ML method?
	-what are ways to combat those biases?
	- how would you explain to a doctor how the method worked, or why it arrived at a class label/decision?
	- what is the distance that needs to be covered before this method is deployed in a healthcare setting?