Fangyin Wei will present her FPO "Learning to Edit 3D Objects and Scenes" on Monday, April 22, 2024 at 1:30 PM in CS 302
Location: CS 302
The members of Fangyin’s committee are as follows:
Examiners: Szymon Rusinkiewicz (Adviser), Thomas Funkhouser (Adviser), Jia Deng
Readers: Felix Heide, Olga Russakovsky
A copy of her thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.
Everyone is invited to attend her talk.
Abstract follows below:
3D editing plays a key role in many fields ranging from AR/VR, industrial and art design, to robotics. However, existing 3D editing tools either (i) demand labor-intensive manual efforts and struggle to scale to many examples, or (ii) use optimization and machine learning but produce unsatisfactory results (e.g., losing details, supporting only coarse editing, etc.). These shortcomings often arise from editing in geometric space rather than structure-aware semantic space, where the latter is the key to automatic 3D editing at scale. While learning a structure-aware space will result in significantly improved efficiency and accuracy, labeled datasets to train 3D editing models don’t exist. In this dissertation, we present novel approaches for learning to edit 3D objects and scenes in structure-aware semantic space with noisy or no supervision.
We first address how to extract the underlying structure to edit 3D objects, with a focus on editing two critical properties: semantic shape parts and articulations.
Our semantic editing method enables specific edits to an object’s semantic parameters (e.g., the pose of a person’s arm or the length of an airplane’s wing), leading to better preservation of input details and improved accuracy compared to previous work.
Next, we introduce a 3D annotation-free method that learns to model geometry, articulation, and appearance of articulated objects from color images. The model works on an entire category (as opposed to typical NeRF extensions that only overfit on a single scene) and enables various applications such as few-shot reconstruction and static object animation. It also generalizes to real-world captures.
Then, we tackle how to extract structure for scene editing. We present an automatic system that removes clutter (frequently moving objects such as clothes or chairs) from 3D scenes and inpaints the resulting holes with coherent geometry and texture. We address challenges including the lack of well-defined clutter annotations, entangled semantics and geometry, and multi-view inconsistency.
In summary, this dissertation demonstrates techniques to exploit the underlying structure of 3D data for editing. Our work opens up new research directions such as leveraging structures from other modalities (e.g., text, images) to empower 3D editing models with stronger semantic understanding.