Leveraging Vision for 3D Perception in Robotics

Wei-Lun (Harry) Chao & Brian Wang, Cornell University


Abstract: Many robotics applications require accurate 3D perception, for example an autonomous car determining the positions of other cars on the road, or an industrial manipulator robot recognizing an object it is supposed to pick up. Recent advancements driven by deep neural networks have led to remarkable performance in 2D image processing tasks. However, the properties of 3D sensor data make it challenging to realize similar performance in 3D. In this talk, we present two recent works that leverage successes in 2D vision towards 3D perception tasks.

We first present Label Diffusion LiDAR Segmentation (LDLS), an algorithm for point-level object recognition in 3D LiDAR point clouds. LDLS uses information from aligned camera images to avoid any need for training on labeled 3D data. Our method applies a pre-trained 2D image segmentation model on a camera image, then diffuses information from the image into a LiDAR point cloud using a semi-supervised graph learning algorithm. Any object classes that are recognized by the 2D image segmentation model can also be detected in LiDAR, allowing LDLS to recognize a far greater variety of objects than possible in previous works.

We then present a novel vision-based 3D object detection algorithm, which can bypass the expensive LiDAR signal or serves as an auxiliary system to LiDAR-based detectors. The key insight is to apply stereo depth estimation from pairs of 2D images and back-project the depths into a 3D point cloud, which we call pseudo-LiDAR. With pseudo-LiDAR, we can essentially apply any existing LiDAR-based algorithms for 3D object detection, leading to a 300% performance improvement over the previous state-of-the-art vision-based detector.

Bio: Wei-Lun (Harry) Chao is a Postdoctoral Associate in Computer Science at Cornell University working with Kilian Q. Weinberger and Mark Campbell. His research interests are in machine learning and its applications to computer vision, natural language processing, artificial intelligence, and healthcare. His recent work has focused on robust autonomous driving. He received a Ph.D. degree in Computer Science from the University of Southern California. He will be joining the Ohio State University as an assistant professor in Computer Science and Engineering in 2019 Fall.

Brian Wang is a third-year MAE PhD student at Cornell, in Mark Campbell’s research group. His research interests include vision- and LiDAR-based perception, probabilistic tracking and estimation, autonomous driving, and human-robot interaction.