Princeton Robotics Seminar: Language as Robot Middleware
We'd like to build robots that can help us with just about anything. Central to this is getting robots to build a general-purpose representation of the world from perception, and then use it to inform actions. Should this representation be 2D? or 3D? how do we "anchor" it onto a desired latent space? should it be an implicit representation? object-centric? can it be self-supervised? While many options exist out there, I'll talk about one in particular that's becoming my favorite – natural language. Partly motivated by the advent of large language models, but also motivated by recent work in multi-task learning.
In the context of robots, I'll talk about: (i) why we're starting to think that it might actually be a good idea to revisit "language" as a symbolic representation to glue our systems together to do cool things, and (ii) in the process of building these systems, discovering various "gaps" in grounding language to control that I think we could really use your help in figuring out.
Bio: Andy Zeng is a Senior Research Scientist at Google Brain working on vision and language for robotics. He received his Bachelors in Computer Science and Mathematics at UC Berkeley '15, and his PhD in Computer Science at Princeton University '19. Andy is a recipient of several awards including the Best Paper Award at T-RO '20, Best Systems Paper Awards at RSS '19 and Amazon Robotics '18, 1st Place (Stow) at the Amazon Picking Challenge '17, and has been finalist for Best Paper Awards at conferences CoRL '21, CoRL '20, ICRA '20, RSS '19, IROS '18. His research has been recognized through the Princeton SEAS Award for Excellence '18, NVIDIA Fellowship '18, and Gordon Y.S. Wu Fellowship in Engineering and Wu Prize '16, and his work has been featured in many popular press outlets, including the New York Times, BBC, and Wired. To learn more about Andy's work please visit https://andyzeng.github.io