Conference on Robots and Vision

CRV 2024 speakers (in alphabetical order) are:

Keynote Speakers

Jitendra Malik
University of California, Berkeley

Talk Title: Reconstructing and Recognizing Human Actions in Video

Abstract

Humans are social animals. Perhaps this is why we so enjoy watching movies, TV shows and YouTube videos, all of which show people in action. A central problem for artificial intelligence therefore is to develop techniques for analyzing and understanding human behavior from images and video. I will present some recent results from our research group towards this grand challenge. We have developed highly accurate techniques for reconstructing 3D meshes of human bodies from single images using transformer neural networks. Given video input, we link these reconstructions over time by 3D tracking, thus producing "Humans in 4D" (3D in space + 1D in time). As a fun application, we can use this capability to transfer the 3D motion of one person to another e.g. to generate a video of you performing Michael Jackson's moonwalk or Michelle Kwan's skating routine. The ability to do 4D reconstruction of hands is a source of imitation learning for robotics and we show examples of reconstructing human-object interactions. In addition to 4D reconstruction, we are also now able to recognize actions by attaching semantic labels such as "standing", "running", or "jumping". However, long range video understanding, such as the ability to follow characters' activities and understand movie plots over periods of minutes and hours, is still quite a challenge, and even the largest vision-language models struggle on such tasks. There has been substantial progress, but much remains to be done.

Bio

Jitendra Malik is the Arthur J. Chick Professor in the Department of Electrical Engineering and Computer Sciences at UC Berkeley. He is also part-time Research Scientist Director at Meta. Malik’s research group has worked on many different topics in computer vision, human visual perception, robotics, machine learning and artificial intelligence, and he has mentored nearly 80 PhD students and postdocs. His honors include the 2013 IEEE PAMI-TC Distinguished Researcher in Computer Vision Award, the 2014 K.S. Fu Prize from the International Association of Pattern Recognition, the 2016 ACM-AAAI Allen Newell Award, the 2018 IJCAI Award for Research Excellence in AI, and the 2019 IEEE Computer Society Computer Pioneer Award. He is a member of the National Academy of Engineering and the National Academy of Sciences, and a fellow of the American Academy of Arts and Sciences.

Kirstin H. Petersen
Cornell University

Talk Title: TBD

Abstract

TBD

Bio

Kirstin Petersen is an Associate Professor and Aref and Manon Lahham Faculty Fellow in the School of Electrical and Computer Engineering at Cornell University. Her lab, the Collective Embodied Intelligence Lab, is focused on design and coordination of robot collectives able to achieve complex behaviors beyond the reach of an individual, and corresponding studies on how social insects do so in nature. Major research topics include swarm intelligence, embodied intelligence, soft robots, and bio-hybrid systems. Petersen did her postdoc at the Max Planck Institute for Intelligent Systems and her PhD at Harvard University and the Wyss Institute for Biologically Inspired Engineering. Her graduate work was featured in and on the cover of Science in 2014, she was elected among the top 25 women to know in robotics by Robohub in 2018, and received the Packard Fellowship in Science and Engineering in 2019 and the NSF CAREER award in 2021.

Symposium Speakers

Mo Chen
Simon Fraser University

Talk Title: Control and Learning in Robotic Decision Making and Human Motion Prediction

Abstract

The combination of control theory and machine learning is becoming increasingly important, and being able to get the best of both worlds would unlock many robotic applications. In this talk, we will first discuss connections between control and reinforcement learning, and how they can enable more data-efficient, generalizable, and interpretable robot learning. Afterwards, we will discuss how ideas from control can be incorporated into deep learning methods to guide long-term human motion prediction.

Bio

Mo Chen is an Assistant Professor in the School of Computing Science at Simon Fraser University, Burnaby, BC, Canada, where he directs the Multi-Agent Robotic Systems Lab. He holds a Canada CIFAR AI Chair position and is an Amii Fellow. Dr. Chen completed his PhD in the Electrical Engineering and Computer Sciences Department at the University of California, Berkeley in 2017, and received his BASc in Engineering Physics from the University of British Columbia in 2011. From 2017 to 2018, He was a postdoctoral researcher in the Aeronautics and Astronautics Department in Stanford University. Dr. Chen’s research interests include multi-agent systems, safety-critical systems, human-robot interactions, control theory, reinforcement learning, and their intersections.

Yue Hu
University of Waterloo

Talk Title: Engaging with Collaborative Robots: Insights from Human Factors

Abstract

Research in Human-Robot Interaction (HRI) has evolved into two distinct branches: physical HRI (pHRI), focusing on task efficiency and safety, and social HRI (sHRI), which examines human perceptions. To achieve collaboration and coexistence between humans and robots, a new perspective is essential. In this talk, I will explore experimental studies on active physical interactions between humans and collaborative robots. I will discuss critical human factors involved, detailing methodologies to measure and quantify these interactions form a diverse perspective.

Bio

Dr. Yue Hu has been an Assistant Professor at the Department of Mechanical and Mechatronics Engineering at the University of Waterloo since September 2021 where she is the Head of the Active and Interactive Robotics Lab. Yue obtained her doctorate in robotics from Heidelberg University, Germany in 2017. She was a postdoc first at Heidelberg University, then at the Italian Institute of Technology (IIT), in Italy. Between 2018 and 2021 she was first a JSPS (Japan Society for the Promotion of Science) fellow at the National Institute of Advanced Industrial Science and Technology (AIST) in Japan, and then an Assistant Professor at the Department of Mechanical Systems Engineering, Tokyo University of Agriculture and Technology. She is one of the co-chairs of the IEEE-RAS Technical Committee on Model-based Optimization for Robotics. Her research interests include physical human-robot interaction, collaborative robots, humanoid robots, and optimal control. Yue is also on the Advisory Board of the not-for-profit organization Women in AI & Robotics.

David Lindell
University of Toronto

Talk Title: Flying with Photons: Rendering Novel Views of Propagating Light

Abstract

In this talk I discuss an imaging and neural rendering technique that seeks to synthesize videos of light propagating through a scene from novel, moving camera viewpoints. Our approach relies on a new ultrafast imaging setup to capture a first-of-its kind, multi-viewpoint video dataset with picosecond-level temporal resolution. Combined with this dataset, we introduce an efficient neural volume rendering framework based on the transient field. This field is defined as a mapping from a 3D point and 2D direction to a high-dimensional, discrete-time signal that represents time-varying radiance at ultrafast timescales. Rendering with transient fields naturally accounts for effects due to the finite speed of light, including viewpoint-dependent appearance changes caused by light propagation delays to the camera. I will demonstrate time-resolved visualization of complex, captured light transport effects, including scattering, specular reflection, refraction, and diffraction. Finally, I will discuss future directions in propagation-aware inverse rendering.

Bio

David Lindell is an Assistant Professor in the Department of Computer Science at the University of Toronto. His research combines optics, emerging sensor platforms, machine learning, and physics-based algorithms to enable new capabilities in visual computing. Prof. Lindell’s research has a wide array of applications including autonomous navigation, virtual and augmented reality, and remote sensing. Prior to joining the University of Toronto, he received his Ph.D. from Stanford University. He is a recipient of the 2021 ACM SIGGRAPH Outstanding Dissertation Honorable Mention Award and the 2023 Marr Prize.

Jeong Joon Park
University of Michigan

Talk Title: TBD

Abstract

TBD

Bio

Jeong Joon (JJ) Park is an assistant professor at the University of Michigan, Ann Arbor, in the Computer Science and Engineering Department. His research interests lie in the intersection of computer vision and graphics, where he studies realistic reconstruction and generation of 3D scenes using neural and physical representations. Generations of large-scale, dynamic, and interactive 3D scenes are his current primary targets. His group explores 3D vision and graphics, and their applications to robotics, medical imaging, and scientific problems. He is the lead author of DeepSDF, which introduced neural implicit representation to 3D computer vision. Before coming to Michigan, he was a postdoctoral researcher at Stanford University and a Ph.D. student at the University of Washington, supported by Apple AI/ML Fellowship. He did his undergraduate studies in computer science at Caltech.

Vincent Sitzmann
MIT

Talk Title: TBD

Abstract

TBD

Bio

Vincent Sitzmann is an Assistant Professor at MIT EECS, where he is leading the Scene Representation Group. Previously, he did his Ph.D. at Stanford University as well as a Postdoc at MIT CSAIL. His research interest lies in building models that perceive and model the world the way that humans do. Specifically, Vincent works towards models that can learn to reconstruct a rich state description of their environment, such as reconstructing its 3D structure, materials, semantics, etc. from vision. More importantly, these models should then also be able to model the impact of their own actions on that environment, i.e., learn a "mental simulator" or "world model". Vincent is particularly interested in models that can learn these skills fully self-supervised only from video and by self-directed interaction with the world.

Jitendra Malik University of California, Berkeley

Talk Title: Reconstructing and Recognizing Human Actions in Video

Abstract

Bio

Kirstin H. Petersen Cornell University

Talk Title: TBD

Abstract

Bio

Mo Chen Simon Fraser University

Talk Title: Control and Learning in Robotic Decision Making and Human Motion Prediction

Abstract

Bio

Yue Hu University of Waterloo

Talk Title: Engaging with Collaborative Robots: Insights from Human Factors

Abstract

Bio

David Lindell University of Toronto

Talk Title: Flying with Photons: Rendering Novel Views of Propagating Light

Abstract

Bio

Jeong Joon Park University of Michigan

Talk Title: TBD

Abstract

Bio

Vincent Sitzmann MIT

Talk Title: TBD

Abstract

Bio

Jitendra Malik
University of California, Berkeley

Kirstin H. Petersen
Cornell University

Mo Chen
Simon Fraser University

Yue Hu
University of Waterloo

David Lindell
University of Toronto

Jeong Joon Park
University of Michigan

Vincent Sitzmann
MIT