Learning Object-centric Representations for Robot Manipulation Tasks

Karthik Desingh

11/18/21

Location: 122 Gates Hall

Time: 2:40p.m.

Abstract: A crucial question for complex multi-step robotic tasks is how to represent relationships between entities in the world, particularly as they pertain to preconditions for various skills the robot might employ. In goal-directed sequential manipulation tasks with long-horizon planning, it is common to use a state estimator followed by a task and motion planner or other model-based system. A variety of powerful approaches exist for explicitly estimating the state of objects in the world. However, it is challenging to generalize these approaches to an arbitrary collection of objects. In addition, the objects are often in contact in manipulation scenarios, where explicit state estimation struggles from the problem of generalizing to unseen objects. In this talk, I will talk about our recent work where we take an important step towards a manipulation framework that generalizes few-shot to unseen tasks with unseen objects. Specifically, we propose a neural network that extracts implicit object embeddings directly from raw RGB images. Trained from large amounts of simulated robotic manipulation data, the object-centric embeddings produced by our network can be used to predict spatial relationships between the entities in the scene to inform a task and motion planner with relevant implicit state information toward goal-directed sequential manipulation tasks.

Bio: Karthik Desingh works as a Postdoctoral Scholar at the University of Washington (UW) with Professor Dieter Fox. Before joining UW, he received his Ph.D. in Computer Science and Engineering from the University of Michigan working with Professor Chad Jenkins. During his Ph.D. he was closely associated with the Robotics Institute and Michigan AI. He earned his B.E. in Electronics and Communication Engineering at Osmania University, India, and M.S. in Computer Science at IIIT-Hyderabad and Brown University. He researches at the intersection of robotics, computer vision, and machine learning, primarily focusing on providing perceptual capabilities to robots using deep learning and probabilistic techniques to perform goal-directed tasks in unstructured environments.