COS 598B - Outline

Date	Topic/papers	Presenter (and link to slides)
Mon, Feb 5	Intro, and look at recognition datasets Unbiased Look at Dataset Bias by Torralba&Efros ICCV'13	Olga Russakovsky (logistics slides, lecture slides)
Module 1: Image segmentation, both strongly and weakly supervised
Wed, Feb 7	Large-scale object segmentation Overview of PASCAL VOC for semantic segmentation and of ImageNet Segmentation Propagation in ImageNet by Kuettel, Guillaumin, Ferrari ECCV'12 (best paper award) -- c.f., also their project page We'll use this paper to both recall some classic segmentation algorithms and also segmentation datasets	Olga Russakovsky (ImageNet slides, segmentation propagation slides, graphcut slides)
Mon, Feb 12	Semantic segmentation Fully Convolutional Networks for Semantic Segmentation by Long, Shelhamer and Darrell CVPR'15 The simple weakly supervised variant Fully Convolutional Multi-Class Multiple Instance Learning by Pathak et al. ICLR workshop'15 (time permitting) Semantic image segmentation with deep convolutional nets and fully connected CRFs by Chen et al. ICLR'15	Rohan Doshi (slides, PDF)
Wed, Feb 14	Variations on segmentation supervision: BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation by Dai, He, Sun ICCV'15 What's the Point: Semantic Segmentation with Point Supervision by Bearman et al. ECCV'16	Yannis Karakozis (slides, PDF) (some useful math notes: slides, PDF)
Mon, Feb 19	Instance segmentation: Review of Mask RCNN (background reading) Review of COCO dataset Learning to segment everything by Hu et al. Nov 2017	Berthy Feng + Riley Simmons-Edler (slides, PDF)
Wed, Feb 21	Combining semantic and instance segmentation Panotropic Segmentation by Kirillov et al. Jan 2018 Overview of Cityscapes and ADE20K	Stephanie Liu + Andrew Zhou (slides, PDF)
Mon, Feb 26	Intro to RNNs and cool annotation framework Annotating Object Instances with a Polygon-RNN by Castrejon et al. CVPR'17 Background on Convolutional LSTMs Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting by Shi et al	William Hinthorn (slides, PDF) For those wishing to brush up on LSTMs
Other cool papers we may not have a chance to cover DeepLab ICLR'15 initially but expanded to include v2.0, a weakly supervised version, etc. ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation by Lin et al. CVPR'16 Associative Embedding: End-to-End Learning for Joint Detection and Grouping NIPS'17 SGN: Sequential Grouping Networks for Instance Segmentation ICCV'17 Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network March 2017 Qure.ai blog post: a 2017 guide to semantic segmentation with learning Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection ICCV'17; Learning to Segment via Cut and Paste
Module 2: Language + vision, including captioning, VQA, ...
Wed, Feb 28	Open-world annotation and recognition Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations Image Retrieval Using Scene Graphs (Optional) Scene Graph Generation by Iterative Message Passing	Bharath Srivatsan (slides, PDF)
Mon, March 5	From recognition to captioning SPICE score for evaluating captioning Exploring nearest neighbor approaches for image captioning Background: Language Models for Image Captioning: The Quirks and What Works	Vikash + Qasim Nadeem (slides, PDF)
Wed, March 7	Captioning methods Deep Visual-Semantic Alignment for Generating Image Descriptions by Karpathy and Fei-Fei DenseCap: Fully Convolutional Localization Networks for Dense Captioning	Ryan McCaffrey + Alex Yue (slides, PDF)
Mon, March 12	No class -- midterms, No class -- midterms, ECCV deadline, CS PhD visit day
Wed, March 14	No class -- midterms, No class -- midterms, ECCV deadline, CS PhD visit day
Spring Break
Mon, March 26	Visual question answering DAQUAR: original VQA task Towards a Visual Turing Challenge VQA: Visual Question Answering c.f., the VQA challenge page Making the V in VQA matter: Elevating the Role of Image Understanding in Visual Question Answering	Prem Nair + Shayan Hassantabar (slides, PDF) (some paper notes: slides, PDF)
Wed, March 28	VQA method: simple baselines Simple Baseline for Visual Question Answering What's in a question: using visual questions as a form of supervision	Allen Wu (slides, PDF)
Tue, March 27th 12:30-1:30pm: Prof. Jia Deng (U of Michigan) colloquium on Visual Reasoning Thu, March 29th 12:30-1:30pm: Justin Johnson (Stanford) colloquium on Language + Vision
Mon, April 2	Attention-based VQA methods Where to look: focus regions for Visual Question Answering by Shih, Singh, Hoiem Ask, attend and answer: exploring question-guided spatial attention for visual question answering by Xu and Saenko	Nick Jiang (slides, PDF)
Wed, April 4	Neural module networks (presenter's choice) Neural Module Networks by Andreas et al., 2016 Learning to Reason: End-to-End Neural Module Networks by Hu et al., 2017	Berthy Feng (slides, PDF)
Other cool papers we may not have a chance to cover Segmentation from natural language expressions Visual Madlibs Nice summary paper of existing VQA techniques Hierarchical Question-Image Co-Attention for Visual Question Answering An Analysis of Visual Question Answering from ICCV'17 VQA on abstract images: Bringing Semantics into Focus Using Visual Abstractions (project page with pretty pictures, code and papers) Visual 7W: Grounded Question Answering in Images
Module 3: Video understanding
Mon, April 9	Classic video datasets and algorithms Action Recognition by Dense Trajectories by Wang, Klaser, Schmid, Cheng-Lin CVPR'11 Four key datasets: KTH, UCF-101, Hollywood2, HMDB-51 Background: Review optical flow, e.g., from the COS 429 lecture or from this nice tutorial HOG lecture: http://www.cs.princeton.edu/courses/archive/fall17/cos429/notes/cos429_fall2017_lecture4_interest_points.pdf Followup: Action Recognition with Improved Trajectories by Wang and Schmid ICCV'13 Local handcrafted features are convolutional neural networks by Lan et al. ICLR'16	Divya Thuremella + Qasim Nadeem (slides, PDF)
Wed, April 11	Two classic deep learning frameworks for action classification Two-stream convolutional networks for action recognition in videos by Simonyan and Zisserman NIPS'14 Large-scale Video Classification with Convolutional Neural Networks by Karpathy et al. CVPR'14	Haochen Li (slides, PDF)
April 11th in class: title, selection of options 1-3, (optional) partner name due April 12th 12:30-1:30pm: Saurabh Gupta (Berkeley) colloquium on Vision+Robotics April 13th: project milestone due
Mon, April 16	From classification to temporal localization with 3D convolutions Learning spatiotemporal features with 3d convolutional networks by Tran, Bourdev, Fergus, Torresani, Paluri ICCV'15 ActivityNet: A Large-scale video benchmark for human activity understanding by Heilbron et al. CVPR'15 Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks by Montes et al. ActivityNet challenge 2016	Austin Le (slides, PDF)
Wed, April 18	Two simple (relatively speaking) models for temporal action localization Every moment counts: dense detailed labeling of actions in complex videos by Yeung et al. IJCV'17 Predictive-Corrective Networks for Action Detection by Dave et al. CVPR'17	Jiaqi Su (slides, PDF)
April 20th: feedback on milestones due
Mon, April 23	Action recognition in the spirit of object detection R-C3D regional convolutional 3D network for temporal activity detection Xu, Das, Saenko Contextual Multi-scale Region Convolutional 3D network for activity detection by Bai, Xu, Saenko, and Ghanem	Nicholas Turner + Sven Dorkenwald (slides, PDF)
Wed, April 25	Favorite video understanding paper. The presenters should take the lead on finalizing the topic. They can poll/discuss with others on Piazza, or just propose a topic themselves. Please do confirm with me before finalizing. Suggestions: very recent work on a new architecture for action recognition Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset Carreira and Zisserman Non-local neural networks by Wang, Girshick, Gupta, He Or work on VQA or captioning in videos, some sample papers below	Julie LaChance + Vikash (slides, PDF)
Other cool video papers End-to-end learning of action detection from frame glimpses in videos by Yeung et al. CVPR'16 Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding by Sigurdsson et al. ECCV'16 Temporal Segment Networks: Towards Good Practices for Deep Action Recognition by Wang et al. winner of ActivityNet challenge 2016, ECCV'16 Long-term temporal convolutions for action recognition by Varol, Laptev, Schmid Asynchronous temporal fields for action recognition by Sigurdsson et al. CVPR'17 What actions are needed for understanding human actions in videos? ICCV'17 Long-term recurrent convolutional networks for visual recognition and description by Donahue et al. CVPR'15 Dense captioning events in videos by Krishna et al. ICCV'17 Movie QA: Understanding stories in movies through question-answering by Tapaswi et al. CVPR'16
Mon, April 30	Project Spotlights
Wed, May 2	Project Spotlights
Friday, May 11th: project report due Tuesday, May 15th: report feedback due