Aljosa Osep, Technical University in Munich
Abstract: Spatio-temporal interpretation of raw sensory data is vital for intelligent agents to understand how to interact with the environment and perceive how trajectories of moving agents evolve in the 4D continuum, i.e., 3D space and time. To this end, I will first talk about our recent efforts in the semantic and temporal understanding of raw sensory data. I will first present our work on multi-object and segmentation. Then, I will discuss how to generalize these ideas towards holistic temporal scene understanding, jointly tackling object instance segmentation, tracking, and semantic understanding of monocular video sequences and LiDAR streams. Finally, I will move on to the challenging problem of scaling object instance segmentation and tracking models to the open world, in which future mobile agents will need to continuously learn without explicit human supervision. In such scenarios, intelligent agents encounter and need to react to unknown dynamic objects that were not observed during the model training.