COS 429 - Computer Vision

Fall 2019

Final Project

Milestone due Fri, Dec. 13
Poster session Mon, Jan 13 3-6pm in the Friend Convocation Room
Written reports due Tue, Jan. 14
No late reports allowed.

The final assignment for this semester is to do an in-depth project implementing a nontrivial vision system. You will be expected to design a complete pipeline, read up on the relevant literature, implement the system, and evaluate it on real-world data. The project is worth 24% of your grade and will be graded out of 24 points, with the breakdown below. You will work individually or in small groups (2-3 people), and must deliver:

A short (1-2 pages) milestone report
- Submit: The milestone is due on Fri, Dec 13th at 11:59pm. No late reports will be accepted. Please submit one report per team in plain-text, html, or pdf format on Gradescope.
- Details: This should include the names of all team members, a description of the problem, an outline of the proposed approach, pointers to related course topics, plans for acquiring the necessary data/computational resources, the target outcome (what do you expect to deliver at the end of the project?) and a fallback plan (what are the potential roadblocks? what is the minimium you will be able to deliver if the exploratory parts of the project go wrong?).
- Grading: The milestone is worth 2 points (2% of the final course grade). Milestone grading will be straight-forward: either full credit (all questions above answered thoughtfully), half credit (many questions left unanswered, or answers are very short) or zero credit (milestone not turned in by the deadline, or equivalent).
A poster to be presented at the poster session
- Submit: The poster session will take place on Mon, Jan 13th from 3-6pm in the Friend Convocation Room
- Details: There's no need to include all the details of your project on the poster, but you should clearly convey the intuition/background behind your idea, the key experiments you conducted and the key findings of you work. What would you like to share with your fellow students? What are the key takeaways from your project? You're welcome to either print a full-sized poster or print out slides on individual pieces of paper and attach them to the poster board. There's unfortunately no funding available to pay for poster printing. The poster boards are 4feet x 4feet (same as COS IW poster boards).
- Grading: The poster session is worth 2 points (2% of the final course grade). The course staff will be walking around and will attend every poster to ask questions about your chosen topic and the outcomes of your project. You will be graded both on the quality of your poster as well as on the answers during the Q&A. If you are not able to attend the poster session in person but your partner(s) is there to present the work you will receive half of their grade (so up to 1 point). If noone from your team can attend the poster session you may post privately on Piazza an electronic version of the poster by Mon, Jan 13 at 3pm to receive up to a quarter of the credit (so up to 0.5 points).
Short summaries of other teams' projects
- Submit: The project summaries are due Tue, Jan 14th at 4:59pm (Dean's Date). No late reports will be accepted. Please submit one report per person in plain-text, html or pdf format on Gradescope.
- Details: During the poster session you will be given an opportunity to both present your work and learn about the other class projects. Each member of your team must write three short (1 paragraph) summaries of three other class projects. These summaries are to be completed individually. Team members should summarize different projects -- that is, if there are 2 people on the team, the team members should summarize 6 different projects.
- Grading: The project summaries are worth 2 points (2% of the final course grade). Grading will be similar to the milestone report: full credit for fullfilling the requirements, half credit if < 3 projects and/or the descriptions are short, zero credit if not turned in (or equivalent). Note that if you do not attend the poster session you will not get credit for the project summaries.
A final report on your project
- Submit: The report is due Tue, Jan 14th at 4:59pm (Dean's Date). No late reports will be accepted. Please submit one report per team in html or pdf on Gradescope. In addition, please submit your code (or links to sites from which you downloaded pre-trained models, etc.), and links to any datasets you used.
- Details: The report should include sections on previous work, design and implementation, results, and a discussion of the strengths and weaknesses of your system. Include lots of pretty pictures! If you captured your own data, it is not necessary to submit a full dataset - just include a few samples.
- Grading: The project report is worth 18 points (18% of the final course grade). You will be evaluated on the scope and success of your implementation, the rigor and depth of your scientific analysis, and the quality of your writeup.
  - Implementation and analysis will be given equal consideration and will together comprise the majority of the report grade.
  - The implementation grade will include both the scope of the system you tackled and the results you were able to get. They are graded together since frequently there’s a tradeoff: some of the deep learning systems may get better results than the classical systems but be straight-forward to implement and difficult to improve upon. Some of the more complex systems may be difficult to tune and thus while the implementation is complex the results will be poor. We will consider the quality of the results in the context of the complexity of the system.
  - The analysis portion of the grade will include the motivation of your work and the quantitative and qualitative analysis. Think of your project first and foremost as a scientific exploration: What are your goals? Why did you make the design choices that you made? What did you think would happen as a result? Is this in fact what happened? If the results did not align with your intuition, why was that? In the end, what did your system do well? Where did it fail? Why? What are the steps to improve it? What have you learned about computer vision along the way?
  - Note that getting a high grade on the implementation and analysis portions implicitly relies on clear writing — if we can’t understand what you did, we can’t give you credit for it.
  - The writeup grade will focus on the quality of the writeup beyond just explaining what you did — it will include the depth of discussion of related work, the quality of your figures and the organization of the report.

Project notes and guidelines:

If you're doing a related independent work project or a joint project between COS 429 and another class, in both the milestone and the final report you must additionally (1) describe the project you're doing outside COS 429, and (2) clearly articulate the component that's exclusive to COS 429.
Every project must include both quantitative and qualitative evaluation. The first question you want to ask yourself when thinking of a project is always “how will you know if the proposed system succeeds or fails?” Before diving into the implementation, consider exactly how you will go about evaluating your system. If you're unsure of how to evaluate your method, talk to the course staff.
The best advice is to start early, define a metric of success, and build a baseline system as quickly as possible. What is the simplest pipeline you can build that goes from an input (image, video, RGBD image, …), performs the target task, produces an output, and is evaluated? Once you have that, you can begin improving the system, documenting the scientific exploration: evidence-driven hypothesis, implementation of the experiment, evaluation of the outcome, refined evidence-driven hypothesis, repeat.
The more detailed your milestone writeup is, the more the course staff be able to give concrete and useful feedback. Feel free to include brief questions in the milestone as well if you have specific concerns about your proposal. Every team will be assigned a “project advisor” from among the TAs who will serve as a resource in shaping the final project.

Project scope:

These projects are very flexible and adaptable to your interests/goals:

you are free to focus on the topic(s) that excite you the most (you are even welcome to explore a computer vision topic outside the scope of the class),
you can decide whether you want to collect your own visual data or use one of the existing benchmarks,
you can build off of an existing toolbox or develop an algorithm entirely from scratch,
you can focus your efforts more on analysis or more on building the system (although you should have some of both analysis and system building in your project)

Teams with 3 people are expected to do projects that are more somewhat more ambitious in scope than teams with 2 people. Feel free to confirm with the course staff if you're unsure.

Project example:

Suppose you select the topic of generic object detection, decide to use the standard benchmark dataset of PASCAL VOC and want to build off of an existing Deformable Parts Model toolbox. You then could:

Download the dataset and the software, and run the object detection system. You may or may not need to train the model (sometimes you can get access to pretrained models). Evaluate the results.
Use visualization or analysis techniques for understanding the errors in this system: this handy tool is great for the task of object detection in particular, but you can also use simpler techniques like confusion matrices or visualization of top-scoring images. Draw some conclusions of when the algorithm is succeeding and failing.
Identify one (or more key) parameters of the system: e.g., the number of deformable parts or the non-maximum suppression threshold. Evaluate how the results change, both quantitatively and qualitatively, as you vary these hyperparameters. Teams of 3 can challenge themselves to go deeper in this exploration: e.g., analyzing parameters that are inherent to how the model is trained, or exploring more of the parameters. How are the results changing as a function of these parameters? Is that consistent with your intuition?
Based on your exploration, formulate one concrete hypothesis for how to improve the object system. For example, perhaps adding global image context can improve object detection accuracy? Implement a way to verify your hypothesis. Evaluate how the results change quantitatively and qualitatively. Is your system better now? Teams of 3 can challenge themselves to go deeper, e.g., by exploring several avenues for improvement.
In the project report:
- Present your topic. Why is it important, e.g., what societal applications would benefit from improved object detection? What are the challenges in building a perfect object detector? Include pictures to illustrate the challenges.
- Describe the dataset: number of images, number of object classes, any interesting properties of the dataset. Show some example images. Don't forget to present the evaluation metric.
- Explain the DPM algorithm to the reader, as you would if you were teaching it in a COS 429 lecture.
- Present your analysis, including any hypotheses, intuitions or surprises, backed by both quantitative and qualitative results. This is the core of your work: make sure the reader walks away with a much more in-depth understanding of the challenge of object detection as a field and of the strengths and weaknesses of the DPM system in particular.
- Describe your modification(s) to the method, and the resulting quantitative and qualitative changes. If the modification(s) did not improve the method as expected, discuss some reasons for why this might be the case.
- Acknowledge all code, publications, and ideas you got from others outside your group.

Project ideas:

You may select any computer vision topic that is of interest to you, but some ideas to get you started:

Image mosaicing, including automatic image alignment and multiresolution blending.
Foliage/tourist removal from several photos of a building. An important question to answer is whether you want to attempt 3D reconstruction as part of the process, or whether you want to consider it as a purely 2D problem.
Video textures - see the SIGGRAPH paper linked from the video textures web page.
Foreground/background segmentation (e.g., using the Weizmann Horses dataset)
Any number of image recognition tasks:
- OCR or handwriting recongition (e.g., using the MNIST dataset)
- classifying images of skin rashes
- object classification (e.g., using the CIFAR or Caltech 101 datasets)
- object detection/semantic segmentation/human pose estimation/occlusion datetection (e.g., check out the diverse PASCAL VOC annotations)
- object attributes (e.g., using aPascal/aYahoo annotations or ImageNet attributes)
- or even explore the interplay between different recognition tasks: object classification and attribute prediction, human pose estimation and action recognition, part segmentation and object detection, face detection and whole-person detection, etc.
Explore and analyze the similarities and difference between different datasets and algorithms (e.g., check out the Dataset Bias paper or the ImageNet analysis (section 3)) -- your analysis should lead to at least one hypothesis that you verify experimentally
Develop an image captioning system combining existing recognition modules
Set up a webcam in a public space and perform tracking, counting, and/or classification of people, cars, etc.
Tracking and following a person with a drone or robot (if you have access to one)
Human action recognition in video (e.g., using the KTH dataset)
Detect pose outliers in videos of dance performances, e.g., understand where performers deviate from the choreography
Pick your favorite computer vision algorithm, implement from scratch based only on the relevant publications (without looking at the reference implementation, if one exists), and analyze its accuracy, efficiency, sensitivity to different parameters, etc

Project ideas for those with graphics experience:

Inserting computer-generated objects into a video sequence taken with a moving camera. Use a calibration or structure from motion method to recover the camera pose.
Some variant of Facade (human-assisted architectural modeling from a small number of photographs). See the the SIGGRAPH 96 paper linked from the Facade web page.
Vision-based automatic image morphing (e.g., of faces). That is, you use an optical flow or other correspondence method to generate matches between images, then use a morphing algorithm to generate intermediate frames.
Image-based visual hull (shape from silhouettes) for moving scenes. See the SIGGRAPH 2000 paper, linked from their web page.

Past projects

For additional inspiration, consider the outstanding projects from Fall 2017.

Last update 18-Nov-2019 21:37:10