COS 429 - Computer Vision

Fall 2017

Updates:

Nov. 27 Other things to remember:

If you're doing a related independent work project or a joint project between COS 429 and another class, in the milestone you must additionally (1) describe the project you're doing outside COS 429, and (2) clearly articulate the component that's exclusive to COS 429.
Every project must contain both quantitative and qualitative evaluation. If you're unsure of how to evaluate your method, talk to the course staff.

Final Project

Milestone due Fri, Dec. 15
Poster session on Mon, Jan. 15
Written reports due Tue, Jan. 16
No late reports allowed.

The final assignment for this semester is to do an in-depth project implementing a nontrivial vision system. You will be expected to design a complete pipeline, read up on the relevant literature, implement the system, and evaluate it on real-world data. You will work individually or in small groups (2-3 people), and must deliver:

A short (1-2 pages) milestone report due Dec. 15. This should include the names of all team members, a description of the problem, an outline of the proposed approach, pointers to related course topics, plans for acquiring the necessary data/computational resources, the target outcome (what do you expect to deliver at the end of the project?) and a fallback plan (what are the potential roadblocks? what is the minimium you will be able to deliver if the exploratory parts of the project go wrong?). Please submit one report per team in plain-text, HTML, or PDF format to the Dropbox link here.
A poster to be presented at the poster session on Jan. 15 from 3-6pm in the Friend Convocation Room.
Short summaries of other teams' projects due Jan. 16. During the poster session you will be given an opportunity to both present your work and learn about the other class projects. Each member of your team must write three short (1 paragraph) summaries of three other class projects. These summaries are to be completed individually. Team members should summarize different projects -- that is, if there are 2 people on the team, the team members should summarize 6 different projects. Please submit one report per person in plain-text, HTML or PDF format to the Dropbox link here.
A report on your system due Jan. 16. This should include sections on previous work, design and implementation, results, and a discussion of the strengths and weaknesses of your system. The report should be in HTML or PDF format, and we expect lots of pretty pictures! In addition, please submit your code (or links to sites from which you downloaded pre-trained models, etc.), and links to any datasets you used. If you captured your own data, it is not necessary to submit a full dataset - just include a few samples. Please submit one report per team to the Dropbox link here.

Grading:

The project is worth 24% of your grade total with the following breakdown: 2% for the milestone, 20% for the written report and 2% for the project summaries. The project summaries are graded individually.

Project scope:

These projects are very flexible and adaptable to your interests/goals:

you are free to focus on the topic(s) that excite you the most (you are even welcome to explore a computer vision topic outside the scope of the class),
you can decide whether you want to collect your own visual data or use one of the existing benchmarks,
you can build off of an existing toolbox or develop an algorithm entirely from scratch,
you can focus your efforts more on analysis or more on building the system (although you should have some of both analysis and system building in your project)

Teams with 3 people are expected to do projects that are more somewhat more ambitious in scope than teams with 2 people. Feel free to confirm with the course staff if you're unsure.

Project example:

Suppose you select the topic of generic object detection, decide to use the standard benchmark dataset of PASCAL VOC and want to build off of an existing Deformable Parts Model toolbox. You then could:

Download the dataset and the software, and run the object detection system. You may or may not need to train the model (sometimes you can get access to pretrained models). Evaluate the results.
Use visualization or analysis techniques for understanding the errors in this system: this handy tool is great for the task of object detection in particular, but you can also use simpler techniques like confusion matrices or visualization of top-scoring images. Draw some conclusions of when the algorithm is succeeding and failing.
Identify one (or more key) parameters of the system: e.g., the number of deformable parts or the non-maximum suppression threshold. Evaluate how the results change, both quantitatively and qualitatively, as you vary these hyperparameters. Teams of 3 can challenge themselves to go deeper in this exploration: e.g., analyzing parameters that are inherent to how the model is trained, or exploring more of the parameters. How are the results changing as a function of these parameters? Is that consistent with your intuition?
Based on your exploration, formulate one concrete hypothesis for how to improve the object system. For example, perhaps adding global image context can improve object detection accuracy? Implement a way to verify your hypothesis. Evaluate how the results change quantitatively and qualitatively. Is your system better now? Teams of 3 can challenge themselves to go deeper, e.g., by exploring several avenues for improvement.
In the project report:
- Present your topic. Why is it important, e.g., what societal applications would benefit from improved object detection? What are the challenges in building a perfect object detector? Include pictures to illustrate the challenges.
- Describe the dataset: number of images, number of object classes, any interesting properties of the dataset. Show some example images. Don't forget to present the evaluation metric.
- Explain the DPM algorithm to the reader, as you would if you were teaching it in a COS 429 lecture.
- Present your analysis, including any hypotheses, intuitions or surprises, backed by both quantitative and qualitative results. This is the core of your work: make sure the reader walks away with a much more in-depth understanding of the challenge of object detection as a field and of the strengths and weaknesses of the DPM system in particular.
- Describe your modification(s) to the method, and the resulting quantitative and qualitative changes. If the modification(s) did not improve the method as expected, discuss some reasons for why this might be the case.
- Acknowledge all code, publications, and ideas you got from others outside your group.

Project ideas:

You may select any computer vision topic that is of interest to you, but some ideas to get you started:

Image mosaicing, including automatic image alignment and multiresolution blending.
Foliage/tourist removal from several photos of a building. An important question to answer is whether you want to attempt 3D reconstruction as part of the process, or whether you want to consider it as a purely 2D problem.
Video textures - see the SIGGRAPH paper linked from the video textures web page.
Foreground/background segmentation (e.g., using the Weizmann Horses dataset)
Any number of image recognition tasks:
- OCR or handwriting recongition (e.g., using the MNIST dataset)
- classifying images of skin rashes
- object classification (e.g., using the CIFAR or Caltech 101 datasets)
- object detection/semantic segmentation/human pose estimation/occlusion datetection (e.g., check out the diverse PASCAL VOC annotations)
- object attributes (e.g., using aPascal/aYahoo annotations or ImageNet attributes)
- or even explore the interplay between different recognition tasks: object classification and attribute prediction, human pose estimation and action recognition, part segmentation and object detection, face detection and whole-person detection, etc.
Explore and analyze the similarities and difference between different datasets and algorithms (e.g., check out the Dataset Bias paper or the ImageNet analysis (section 3)) -- your analysis should lead to at least one hypothesis that you verify experimentally
Develop an image captioning system combining existing recognition modules
Set up a webcam in a public space and perform tracking, counting, and/or classification of people, cars, etc.
Tracking and following a person with a drone or robot (if you have access to one)
Human action recognition in video (e.g., using the KTH dataset)
Detect pose outliers in videos of dance performances, e.g., understand where performers deviate from the choreography
Pick your favorite computer vision algorithm, implement from scratch based only on the relevant publications (without looking at the reference implementation, if one exists), and analyze its accuracy, efficiency, sensitivity to different parameters, etc

Project ideas for those with graphics experience:

Inserting computer-generated objects into a video sequence taken with a moving camera. Use a calibration or structure from motion method to recover the camera pose.
Some variant of Facade (human-assisted architectural modeling from a small number of photographs). See the the SIGGRAPH 96 paper linked from the Facade web page.
Vision-based automatic image morphing (e.g., of faces). That is, you use an optical flow or other correspondence method to generate matches between images, then use a morphing algorithm to generate intermediate frames.
Image-based visual hull (shape from silhouettes) for moving scenes. See the SIGGRAPH 2000 paper, linked from their web page.

Last update 23-Jan-2018 10:16:44