CRV 2024 speakers (in alphabetical order) are:

Keynote Speakers
Jitendra Malik

Jitendra Malik
University of California, Berkeley

Talk Title: Reconstructing and Recognizing Human Actions in Video

Abstract
Humans are social animals. Perhaps this is why we so enjoy watching movies, TV shows and YouTube videos, all of which show people in action. A central problem for artificial intelligence therefore is to develop techniques for analyzing and understanding human behavior from images and video. I will present some recent results from our research group towards this grand challenge. We have developed highly accurate techniques for reconstructing 3D meshes of human bodies from single images using transformer neural networks. Given video input, we link these reconstructions over time by 3D tracking, thus producing "Humans in 4D" (3D in space + 1D in time). As a fun application, we can use this capability to transfer the 3D motion of one person to another e.g. to generate a video of you performing Michael Jackson's moonwalk or Michelle Kwan's skating routine. The ability to do 4D reconstruction of hands is a source of imitation learning for robotics and we show examples of reconstructing human-object interactions. In addition to 4D reconstruction, we are also now able to recognize actions by attaching semantic labels such as "standing", "running", or "jumping". However, long range video understanding, such as the ability to follow characters' activities and understand movie plots over periods of minutes and hours, is still quite a challenge, and even the largest vision-language models struggle on such tasks. There has been substantial progress, but much remains to be done.
Bio
Jitendra Malik is the Arthur J. Chick Professor in the Department of Electrical Engineering and Computer Sciences at UC Berkeley. He is also part-time Research Scientist Director at Meta. Malik’s research group has worked on many different topics in computer vision, human visual perception, robotics, machine learning and artificial intelligence, and he has mentored nearly 80 PhD students and postdocs. His honors include the 2013 IEEE PAMI-TC Distinguished Researcher in Computer Vision Award, the 2014 K.S. Fu Prize from the International Association of Pattern Recognition, the 2016 ACM-AAAI Allen Newell Award, the 2018 IJCAI Award for Research Excellence in AI, and the 2019 IEEE Computer Society Computer Pioneer Award. He is a member of the National Academy of Engineering and the National Academy of Sciences, and a fellow of the American Academy of Arts and Sciences.
Kirstin H. Petersen

Kirstin H. Petersen
Cornell University

Talk Title: Robot Superorganisms

Abstract
Natural swarms exhibit sophisticated colony-level behaviors with remarkable scalability and error tolerance. Their evolutionary success stems from more than just intelligent individuals, it hinges on their morphology, their physical interactions, and the way they shape and leverage their environment. Mound-building termites, for instance, are believed to use their own body as a template for construction; the resulting dirt mound serves, among other things, to regulate volatile pheromone cues which in turn guide further construction and colony growth. Throughout this talk I will argue how we can leverage the same principles to achieve greater performance in robot collectives, by paying attention to the interplay between control and hardware, as well as direct- and environmentally-mediated coordination between robots. I will exemplify the strength and challenges of this approach through soft robot collectives, collective robotic construction, and micro-scale robot collectives.
Bio
Kirstin Petersen is an Associate Professor and Aref and Manon Lahham Faculty Fellow in the School of Electrical and Computer Engineering at Cornell University. Her lab, the Collective Embodied Intelligence Lab, is focused on design and coordination of robot collectives able to achieve complex behaviors beyond the reach of an individual, and corresponding studies on how social insects do so in nature. Major research topics include swarm intelligence, embodied intelligence, soft robots, and bio-hybrid systems. Petersen did her postdoc at the Max Planck Institute for Intelligent Systems and her PhD at Harvard University and the Wyss Institute for Biologically Inspired Engineering. Her graduate work was featured in and on the cover of Science, she was elected among the top 25 women to know in robotics by Robohub in 2018, and received the Packard Fellowship in Science and Engineering in 2019 and the NSF CAREER award in 2021.
Featured Speakers
Qixing Huang

Qixing Huang
University of Texas at Austin

Talk Title: Geometric Regularizations for 3D Shape Generation

Abstract
Generative models, which map a latent parameter space to instances in an ambient space, enjoy various applications in 3D Vision and related domains. A standard scheme of these models is probabilistic, which aligns the induced ambient distribution of a generative model from a prior distribution of the latent space with the empirical ambient distribution of training instances. While this paradigm has proven to be quite successful on images, its current applications in 3D generation encounter fundamental challenges in the limited training data and generalization behavior. The key difference between image generation and shape generation is that 3D shapes possess various priors in geometry, topology, and physical properties. Existing probabilistic 3D generative approaches do not preserve these desired properties, resulting in synthesized shapes with various types of distortions. In this talk, I will discuss recent work that seeks to establish a novel geometric framework for learning shape generators. The key idea is to model various geometric, physical, and topological priors of 3D shapes as suitable regularization losses by developing computational tools in differential geometry and computational topology. We will discuss the applications in deformable shape generation, latent space design, joint shape matching, and 3D man-made shape generation.
Bio
Qixing Huang is an associate professor with tenure at the computer science department of the University of Texas at Austin. His research sits at the intersection of graphics, geometry, optimization, vision, and machine learning. He has published more than 100 papers at leading venues across these areas. His recent research is on 3D generation, focusing on integrating domain specific knowledge in geometry, physics, and topology, and learning 3D foundation models. He has won an NSF Career award and multiple best paper awards in graphics and vision.
Leonid Sigal

Leonid Sigal
University of British Columia

Talk Title: Opportunities and Limitations of Foundational and Vision-Language Models

Abstract
The capabilities and the use of foundational (FM) and vision-language (VLM) models (LLMs) in computer vision have exploded over the past 1-2 years. This has led to a broad paradigm shift in the field. In this talk I will focus on the recent work from my group that navigates this quickly evolving research landscape. Specifically, I will discuss three avenues of research. First, I will discuss our semi-recent work that deals with building foundational image representation models by combining two successful strategies of masking (e.g., BERT) and sequential token prediction (e.g., GPT). We find that such a combination results in a better, more efficient and transferable pre-training strategy. Second, I will discuss a series of papers focusing on text-to-image (TTI) generative models, where we introduce a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background for multi-frame story visualization. This design is able to maintain consistency and resolve references in longer story text. Third, I will discuss biases in such models and our work on bias quantification and mitigation in the TTI models.
Bio
Prof. Leonid Sigal is a Professor at the University of British Columbia (UBC). He is also currently a part-time Visiting Researcher at Google. He was appointed CIFAR AI Chair at the Vector Institute in 2019 and an NSERC Tier 2 Canada Research Chair in Computer Vision and Machine Learning in 2018. Prior to this, he was a Senior Research Scientist, and a group lead, at Disney Research. He completed his Ph.D at Brown University in 2008; received his B.Sc. degrees in Computer Science and Mathematics from Boston University in 1999, his M.A. from Boston University in 1999, and his M.S. from Brown University in 2003. Leonid's research interests lie in the areas of computer vision, machine learning, and computer graphics; with the emphasis on approaches for visual and multi-modal representation learning, recognition, understanding and generative modeling. He has won a number of research awards, including Killam Accelerator Fellowship in 2021 and has published over 100 papers in venues such as CVPR, ICCV, ECCV, NeurIPS, ICLR, and Siggraph.
Bryan Tripp

Bryan Tripp
University of Waterloo

Talk Title: The gap between deep networks and the brain

Abstract
Deep networks have roots in early efforts to model brain function, and their internal activations are among the best predictors of brain activity. Biological brains outperform deep networks in their versatility, sample efficiency, power efficiency, and real-world autonomy, suggesting that the brain may be a source of insight into how to further improve deep networks. However, the brain has many complexities, and it is unclear which of them are important. This talk will describe some first steps in developing functional, anatomically and physiologically realistic brain models based on deep networks, to better understand how deep networks should be elaborated to close the gap. This work points away from vision transformers and suggests a new parameter space for convolutional networks.
Bio
Bryan Tripp is an Associate Professor in the Department of Systems Design Engineering and the Centre for Theoretical Neuroscience at the University of Waterloo. His lab studies intelligence from the perspectives of computational neuroscience, applied deep learning, and robotics. Before joining the University of Waterloo, he was a post-doctoral fellow at McGill University, studying visual neuroscience.
Symposium Speakers
Mo Chen

Mo Chen
Simon Fraser University

Talk Title: Control and Learning in Robotic Decision Making and Human Motion Prediction

Abstract
The combination of control theory and machine learning is becoming increasingly important, and being able to get the best of both worlds would unlock many robotic applications. In this talk, we will first discuss connections between control and reinforcement learning, and how they can enable more data-efficient, generalizable, and interpretable robot learning. Afterwards, we will discuss how ideas from control can be incorporated into deep learning methods to guide long-term human motion prediction.
Bio
Mo Chen is an Assistant Professor in the School of Computing Science at Simon Fraser University, Burnaby, BC, Canada, where he directs the Multi-Agent Robotic Systems Lab. He holds a Canada CIFAR AI Chair position and is an Amii Fellow. Dr. Chen completed his PhD in the Electrical Engineering and Computer Sciences Department at the University of California, Berkeley in 2017, and received his BASc in Engineering Physics from the University of British Columbia in 2011. From 2017 to 2018, He was a postdoctoral researcher in the Aeronautics and Astronautics Department in Stanford University. Dr. Chen’s research interests include multi-agent systems, safety-critical systems, human-robot interactions, control theory, reinforcement learning, and their intersections.
Yue Hu

Yue Hu
University of Waterloo

Talk Title: Engaging with Collaborative Robots: Insights from Human Factors

Abstract
Research in Human-Robot Interaction (HRI) has evolved into two distinct branches: physical HRI (pHRI), focusing on task efficiency and safety, and social HRI (sHRI), which examines human perceptions. To achieve collaboration and coexistence between humans and robots, a new perspective is essential. In this talk, I will explore experimental studies on active physical interactions between humans and collaborative robots. I will discuss critical human factors involved, detailing methodologies to measure and quantify these interactions form a diverse perspective.
Bio
Dr. Yue Hu has been an Assistant Professor at the Department of Mechanical and Mechatronics Engineering at the University of Waterloo since September 2021 where she is the Head of the Active and Interactive Robotics Lab. Yue obtained her doctorate in robotics from Heidelberg University, Germany in 2017. She was a postdoc first at Heidelberg University, then at the Italian Institute of Technology (IIT), in Italy. Between 2018 and 2021 she was first a JSPS (Japan Society for the Promotion of Science) fellow at the National Institute of Advanced Industrial Science and Technology (AIST) in Japan, and then an Assistant Professor at the Department of Mechanical Systems Engineering, Tokyo University of Agriculture and Technology. She is one of the co-chairs of the IEEE-RAS Technical Committee on Model-based Optimization for Robotics. Her research interests include physical human-robot interaction, collaborative robots, humanoid robots, and optimal control. Yue is also on the Advisory Board of the not-for-profit organization Women in AI & Robotics.
David Lindell

David Lindell
University of Toronto

Talk Title: Flying with Photons: Rendering Novel Views of Propagating Light

Abstract
In this talk I discuss an imaging and neural rendering technique that seeks to synthesize videos of light propagating through a scene from novel, moving camera viewpoints. Our approach relies on a new ultrafast imaging setup to capture a first-of-its kind, multi-viewpoint video dataset with picosecond-level temporal resolution. Combined with this dataset, we introduce an efficient neural volume rendering framework based on the transient field. This field is defined as a mapping from a 3D point and 2D direction to a high-dimensional, discrete-time signal that represents time-varying radiance at ultrafast timescales. Rendering with transient fields naturally accounts for effects due to the finite speed of light, including viewpoint-dependent appearance changes caused by light propagation delays to the camera. I will demonstrate time-resolved visualization of complex, captured light transport effects, including scattering, specular reflection, refraction, and diffraction. Finally, I will discuss future directions in propagation-aware inverse rendering.
Bio
David Lindell is an Assistant Professor in the Department of Computer Science at the University of Toronto. His research combines optics, emerging sensor platforms, machine learning, and physics-based algorithms to enable new capabilities in visual computing. Prof. Lindell’s research has a wide array of applications including autonomous navigation, virtual and augmented reality, and remote sensing. Prior to joining the University of Toronto, he received his Ph.D. from Stanford University. He is a recipient of the 2021 ACM SIGGRAPH Outstanding Dissertation Honorable Mention Award and the 2023 Marr Prize.
Jeong Joon Park

Jeong Joon Park
University of Michigan

Talk Title: Towards compositional 3D scene generation

Abstract
Recently, numerous 3D generative model approaches have been proposed to automatically produce highly realistic objects. However, producing larger-scale scenes, rather than objects, still stands as a formidable challenge. In this talk, I’ll show my past work on scene-scale generative models. I’ll start by discussing a 3D-aware diffusion model technique that auto-regressively accumulates image-aligned 3D features for scene generations. Next, I will discuss other recent works that exploit compositional structures of large scenes to effectively produce scenes, which is difficult for non-compositional approaches.
Bio
Jeong Joon (JJ) Park is an assistant professor at the University of Michigan, Ann Arbor, in the Computer Science and Engineering Department. His research interests lie in the intersection of computer vision and graphics, where he studies realistic reconstruction and generation of 3D scenes using neural and physical representations. Generations of large-scale, dynamic, and interactive 3D scenes are his current primary targets. His group explores 3D vision and graphics, and their applications to robotics, medical imaging, and scientific problems. He is the lead author of DeepSDF, which introduced neural implicit representation to 3D computer vision. Before coming to Michigan, he was a postdoctoral researcher at Stanford University and a Ph.D. student at the University of Washington, supported by Apple AI/ML Fellowship. He did his undergraduate studies in computer science at Caltech.
Audrey A. Sedal

Audrey A. Sedal
McGill University

Talk Title: Simulation-Driven Soft Robotics

Abstract
Soft-bodied robots present a compelling solution for navigating tight spaces and interacting with unknown obstacles, with potential applications in inspection, medicine, and AR/VR. Yet, even after a decade, soft robots remain largely in the prototype phase without scaling to the tasks where they show the most promise. These systems are difficult to design and control because their morphology is coupled with both their actuation and the environment, creating a large joint space that cannot be exhaustively explored through prototype iteration. Soft roboticists need new tools to repeatably develop systems that leverage deformability and contact. Dr. Sedal will first present recent work on jointly optimizing the design and control of soft robots in simulation. Through a combination of reduced-order finite element simulation and reinforcement learning, this work trained soft-legged, crawling robots, achieving performance that surpassed expert baselines and transferred to real physical results. Second, Dr. Sedal will present work on deformable acoustic tactile sensors and design with auxetic meta-materials. The research presented here will enable engineering tools for development of intelligent structures with high compliance for high-contact settings, taking soft robots out of the laboratory and into the world.
Bio
Dr. Sedal is an Assistant Professor at McGill University in Montreal, Canada where she leads the MACRObotics (Morphology, Actuation and Computation for Robotics) research group. She is also an Associate Member (Academic) of Mila, Quebec AI Institute. Prior, she was a Research Assistant Professor (comparable to endowed postdoc) at TTI-Chicago. She holds a PhD and MSc from the University of Michigan, as well as a BSc from MIT, all in Mechanical Engineering.
Vincent Sitzmann

Vincent Sitzmann
MIT

Talk Title: Enabling New Robotic Capabilities with Spatial AI

Abstract
Recent progress in 3D computer vision has enabled a set of previously impossible capabilities. In this talk, I will present a set of results at the cutting edge of 3D computer vision and scene representation, revolving around imbuing 3D representations with a semantic understanding of the underlying 3D scene, neural networks that learn to solve the structure-from-motion problem and are capable of learning to reconstruct interpretable 3D scenes just from unprocessed video. I will relate all of these results to capabilities that they enable in robotics, and finally give an outlook on near-term results at the interface of robotics and vision.
Bio
Vincent Sitzmann is an Assistant Professor at MIT EECS, where he is leading the Scene Representation Group. Previously, he did his Ph.D. at Stanford University as well as a Postdoc at MIT CSAIL. His research interest lies in building models that perceive and model the world the way that humans do. Specifically, Vincent works towards models that can learn to reconstruct a rich state description of their environment, such as reconstructing its 3D structure, materials, semantics, etc. from vision. More importantly, these models should then also be able to model the impact of their own actions on that environment, i.e., learn a "mental simulator" or "world model". Vincent is particularly interested in models that can learn these skills fully self-supervised only from video and by self-directed interaction with the world.
Workshop Speakers
Francisco Javier Andrade Chavez

Francisco Javier Andrade Chavez
University of Waterloo

Talk Title: Human to robot skill transfer: Using Bio-inspired force distribution in double support for humanoid locomotion

Abstract
In this talk we will discuss how to transfer some aspects of human locomotion into a humanoid robot. Humanoids are made to resemble humans. One of the advantages of such a form factor is the potential to transfer human based skills into the humanoid robot platform. In locomotion, the likelihood of slipping or maintaining contact is determined by the forces applied on the environment. Therefore, it is crucial to find methods for maintaining forces within friction constraints. In single support, the relationship between center of mass acceleration and forces is unique. However, in double support, it becomes a non-deterministic problem. It is often assumed that forces are distributed to minimize a certain effort criterion. An interesting alternative is to distribute the forces in a manner similar to how a human would, which could result in a more human-like gait for humanoid robots.
Bio
Francisco Javier Andrade Chavez has been Lab Manager and Postdoc for the Human-Centred Robotics and Machine Intelligence lab at the Department of Systems and Design at the University of Waterloo since July 2020. He has also been Humanoid Specialist for the Robohub at the Facutly of Engineering of the University of Waterloo since November 2023. Francisco obtained his doctorate in Bioengineering and Robotics from the Universita degli Studi di Genova in collaboration with the Istituto Italiano di Tecnologia (IIT) in 2019. He then stayed as Postdoc and Scrum Master at the Dynamic Interactiona and Control group, now known as Artificial and Machine Intelligence, leading the telexistence research team. His research interest lies in endowing robots with the ability to exploit robot dynamics to adapt in rapidly changing scenarios while seamlessly interacting with humans. This has led to research in the areas of telexistence, balancing, loco-manipulation control, socio-physical human-robot interaction, wearable sensors, human-robot skill transfer and estimation applied to humanoid robots.
Igor Gilitschenski

Igor Gilitschenski
University of Toronto

Talk Title: Do Androids Dream of Electric Sheep? A Generative Paradigm for Dataset Design

Abstract
Traditional approaches for autonomy and AI robotics typically focus either on large scale data collection or on improving simulation. Although most practitioners rely on both approaches, they are largely still applied in separate workflows and seen as conceptually unrelated. In this talk, I will argue that this is a false dichotomy. Recent advances in generative models enable the unification of these seemingly different methodologies. Using real-world data for building data generation systems has led to numerous advances with impact in robotics and autonomy: going beyond pure distillation approaches, unifying creation and curation enables sophisticated automatic labeling pipelines and data-driven simulators. I will present some of our work following this paradigm and outline several basic research challenges and limitations associated with building systems that learn with generated data.
Bio
Igor Gilitschenski is an Assistant Professor of Computer Science at the University of Toronto where he leads the Toronto Intelligent Systems Lab. Previously, he was a (visiting) Research Scientist at the Toyota Research Institute. Dr. Gilitschenski was a Research Scientist at MIT’s Computer Science and Artificial Intelligence Lab and the Distributed Robotics Lab (DRL). There he was the technical lead of DRL’s autonomous driving research team. He joined MIT from the Autonomous Systems Lab of ETH Zurich where he worked on robotic perception, particularly localization and mapping. He obtained his doctorate in Computer Science from the Karlsruhe Institute of Technology and a Diploma in Mathematics from the University of Stuttgart. His research interests involve developing novel robotic perception and decision-making methods for challenging dynamic environments. His work has received multiple awards including best paper awards at the American Control Conference, the International Conference of Information Fusion, and the Robotics and Automation Letters.