🎉 Exciting news! 🎉 We are proud to announce that our paper has been accepted to #CoRL2024! Take a look at "Toward General Object-level Mapping from Sparse Views with 3D Diffusion Priors" by Ziwei Liao, Binbin Xu, and Steven Lake Waslander, from the University of Toronto. TL;DR: We propose a general object-level mapping system that estimates 3D object shapes (NeRFs) and poses using a 3D diffusion model as multi-categories shape priors under sparse RGB-D observations. Object-level mapping builds a 3D map of objects in a scene with detailed shapes and poses from multi-view sensor observations. Conventional methods struggle to build complete shapes and estimate accurate poses due to partial occlusions and sensor noise. They require dense observations to cover all objects, which is challenging to achieve in robotics trajectories. Recent work introduces generative shape priors for object-level mapping from sparse views, but is limited to single-category objects. We propose a General Object-level Mapping system, GOM, which leverages a 3D diffusion model as shape prior with multi-category support and outputs Neural Radiance Fields (NeRFs) for both texture and geometry for all objects in a scene. GOM includes an effective formulation to guide a pre-trained diffusion model with extra nonlinear constraints from sensor measurements without finetuning. We also develop a probabilistic optimization formulation to fuse multi-view sensor observations and diffusion priors for joint 3D object pose and shape estimation. Our GOM system demonstrates superior multi-category mapping performance from sparse views, and achieves more accurate mapping results compared to state-of-the-art methods on the real-world benchmarks. We will be attending #CoRL2024 in person in Nov, 2024. See you in Munich, Germany! Paper (with supplementary): https://lnkd.in/gyTAJw6F Code will be available soon! #CoRL24 #robotics #3dvision #mapping #diffusion #nerf #reconstruction
Toronto Robotics and Artificial Intelligence Laboratory
Research Services
Robotic perception and planning to better autonomous systems.
About us
Founded by Prof. Waslander in 2018, the TRAILab focuses on research in perception for robotics, including object detection and tracking, segmentation, localization and mapping.
- Website
-
https://www.trailab.utias.utoronto.ca/
External link for Toronto Robotics and Artificial Intelligence Laboratory
- Industry
- Research Services
- Company size
- 11-50 employees
- Headquarters
- Toronto
- Type
- Educational
Locations
-
Primary
Toronto, CA
Employees at Toronto Robotics and Artificial Intelligence Laboratory
Updates
-
Toronto Robotics and Artificial Intelligence Laboratory reposted this
Our paper "Image-to-Lidar Relational Distillation for Autonomous Driving Data" has been accepted to #ECCV2024! Pre-trained on large-scale and diverse multi-modal datasets, vision foundation models excel at solving 2D tasks in few-shot and zero-shot settings, owing to their robust multimodal representations. The emergence of 2D-to-3D distillation frameworks has extended these capabilities to 3D models. However, distilling 3D representations for autonomous driving datasets presents challenges like self-similarity, class imbalance, and point cloud sparsity, hindering the effectiveness of SOTA distillation methods. This work builds on our previous paper "Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss" where we propose estimating the similarity between negative samples in the representation space of self-supervised vision models. Using the similarity estimates, we effectively minimize the impact of false negative samples in contrastive losses and balance the contribution of well-represented and under-represented samples. However, when distilling 2D representations from vision-language models for 3D zero-shot tasks, we observe that contrastive distillation fails due to the abundance of self-similarity in autonomous driving datasets. In this work: 1. We investigate the mismatch between the 2D and 3D representation structure resulting from different distillation frameworks. We quantify the mismatch using uniformity, tolerance, and modality gap, revealing a significant gap between 2D and distilled 3D representations. 2. We address this mismatch by imposing structural constraints that foster the learning of a 3D representation aligned with the structure of 2D representations without relying on noisy negative samples. To achieve this, we employ pretraining with intra-modal and cross-modal relational losses. These losses generalize the similarity loss, providing a more effective constraint on the distillation process. Our proposed losses can be applied to pixel-based and superpixel-based distillation frameworks. 3. The resulting 3D representations significantly outperform those learned via contrastive distillation on zero-shot segmentation tasks. Furthermore, compared to the similarity loss, our relational loss results in 3D representations that consistently improve in-distribution and out-of-distribution few-shot segmentation tasks. This work has been done in collaboration with Ali Harakeh and Steven Lake Waslander Paper: https://lnkd.in/gFBfzuRp
-
🎉 Exciting news! 🎉 Congratulations to Evan Cook on graduating with a Master of Applied Science in Aerospace Engineering, specializing in Robotics, from the University of Toronto Institute for Aerospace Studies! Here at the Toronto Robotics and Artificial Intelligence Laboratory (TRAIL), Evan made significant contributions to the field of autonomous driving, focusing on out-of-distribution detection for open-world machine learning. His work has laid the groundwork for safer and more reliable self-driving vehicles. Evan, your outstanding accomplishments and enthusiasm for innovation leave a lasting mark on TRAIL. We are incredibly proud of you and wish you all the best as you embark on the next chapter of your journey at Zoox. #graduation #autonomousdriving #robotics #computervision
-
6D pose estimation of textureless shiny objects has become an essential problem in many robotic applications. Many pose estimators require high-quality depth data, often measured by structured light cameras. However, when objects have shiny surfaces (e.g., metal parts), these cameras fail to sense complete depths from a single viewpoint due to the specular reflection, resulting in a significant drop in the final pose accuracy. We are thrilled to share that our latest research has been accepted to #IROS2024! 🎉 In our paper, "Active Pose Refinement for Textureless Shiny Objects using the Structured Light Camera," co-authored by Yang J., Jian Yao, and Steven Lake Waslander, we present a complete active vision framework for 6D object pose refinement and next-best view prediction to mitigate the aforementioned issue. 🌟 Key Highlights 🌟: - Innovative 6D Pose Refinement: Our approach is tailored for SLI cameras and includes estimating pixel depth uncertainties and integrating these estimates into our SDF-based pose refinement module. - Surface Reflection Model: We predict depth uncertainties for unseen viewpoints using a reflection model that recovers object reflection parameters with a differentiable renderer. - Active Vision System: By integrating our reflection model and pose refinement approach, we can predict the next-best view (NBV) for pose estimation through online rendering. Check out the full paper linked below and join us at #IROS2024! Paper: https://lnkd.in/gMTHZVcY A big thank you to Epson Canada for supporting this work! #poseestimation #robotics #sli #structuredlightcamera
-
🚀 Exciting Breakthrough in LiDAR-Based Joint Detection and Tracking! 🚀 We're thrilled to share our latest research from the University of Toronto that has been accepted to #ECCV2024: "JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention" by Brian Cheong, Jiachen (Jason) Zhou and Steven Lake Waslander. In computer vision, approaches trained in an end-to-end manner have been shown to perform better than traditional pipeline-based methods. However, within LiDAR-based object tracking, tracking-by-detection continues to achieve state-of-the-art performance without learning both tasks jointly. In our work, we explore the potential reasons for this gap and propose techniques to leverage the advantages of joint detection and tracking. 🌟 Key Highlights: - Innovative Approach: We propose JDT3D, a novel LiDAR-based joint detection and tracking model that leverages transformer-based decoders to propagate object queries over time, implicitly performing object tracking without an association step at inference. - Enhanced Techniques: We introduce track sampling augmentation and confidence-based query propagation to bridge the performance gap between tracking-by-detection (TBD) and joint detection and tracking (JDT) methods. - Real-World Impact: Our model is trained and evaluated on the nuScenes dataset, showcasing significant improvements in tracking accuracy and robustness. Check out the full paper linked below and join us at #ECCV2024! Paper: https://lnkd.in/gbT4EStA Code: https://lnkd.in/gU385pcW #autonomousdriving #computervision #tracking #objecttracking #robotics #3DVision #transformers #deeplearning
-
🎉 Big News! 🎉 Huge congratulations to Dr. Jun Yang (Yang J.) for successfully defending his doctoral thesis at the University of Toronto Institute for Aerospace Studies! During his time at the Toronto Robotics and Artificial Intelligence Laboratory (TRAIL), Jun made some amazing contributions to the field of object pose estimation for robotics. He authored groundbreaking papers on active perception for estimating 6D object poses, which were published in top-tier international venues such as #ICRA and #IROS. Jun, your incredible achievements and passion for innovation have left a lasting impact on TRAIL. We're super proud of you and can't wait to see what you accomplish next at Epson Canada! #graduation #robotics #computervision #poseestimation
-
Want to sit back and drive home by simply talking to the car? Wondering how to leverage Large Language Models for safe and smart autonomous driving? Check our #CVPR2024 paper “Lmdrive: Closed-loop end-to-end driving with large language models” by Hao Shao, Yuxuan Hu, Letian Wang, Steven Lake Waslander, Yu Liu, Hongsheng Li. This is the first work bringing LLM into closed-loop end-to-end autonomous driving! (with code released!) Abstract: Despite significant recent progress in the field of autonomous driving, modern methods still struggle and can incur serious accidents when encountering long-tail unforeseen events and challenging urban scenarios. On the one hand, large language models (LLM) have shown impressive reasoning capabilities that approach "Artificial General Intelligence". On the other hand, previous autonomous driving methods tend to rely on limited-format inputs (e.g. sensor data and navigation waypoints), restricting the vehicle's ability to understand language information and interact with humans. To this end, this paper introduces LMDrive, a novel language-guided, end-to-end, closed-loop autonomous driving framework. LMDrive uniquely processes and integrates multi-modal sensor data with natural language instructions, enabling interaction with humans and navigation software in realistic instructional settings. To facilitate further research in language-based closed-loop autonomous driving, we also publicly release the corresponding dataset which includes approximately 64K instruction-following data clips, and the LangAuto benchmark that tests the system's ability to handle complex instructions and challenging driving scenarios. Extensive closed-loop experiments are conducted to demonstrate LMDrive's effectiveness. To the best of our knowledge, we're the very first work to leverage LLMs for closed-loop end-to-end autonomous driving. Paper: https://lnkd.in/gDgfYcaa Project Website: https://lnkd.in/gWq2SUiH Code: https://lnkd.in/gmYqmJWr #cvpr2023 #autonomousdriving #autonomousvehicles #selfdrivingcars #reinforcementlearning #deeplearning #largelanguagemodel #LLM #foundationmodel #llava
-
Remember in connect-the-dots, where the more you look, the more you score? The same principle applies to motion prediction in autonomous driving, too! Check our #CVPR2024 paper “SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction” by Yang Zhou, Hao Shao, Letian Wang, Steven Lake Waslander, Hongsheng Li, Yu Liu. By this work, we outperform all published ensemble-free works on the Argoverse 2 leaderboard (single agent track) at the submission of the paper. Our key insight is that, motion prediction models confront various driving scenarios, and each comes with different difficulties, thus the refinement potential in different scenarios is not uniform. In this work, we introduce SmartRefine, a novel approach to refining motion predictions with minimal additional computation by leveraging scenario-specific properties and adaptive refinement iterations. Abstract: Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. Context information, such as road maps and surrounding agents' states, provides crucial geometric and semantic information for motion behavior prediction. To this end, recent works explore two-stage prediction frameworks where coarse trajectories are first proposed, and then used to select critical context information for trajectory refinement. However, they either incur a large amount of computation or bring limited improvement, if not both. In this paper, we introduce a novel scenario-adaptive refinement strategy, named SmartRefine, to refine prediction with minimal additional computation. Specifically, SmartRefine can comprehensively adapt refinement configurations based on each scenario's properties, and smartly chooses the number of refinement iterations by introducing a quality score to measure the prediction quality and remaining refinement potential of each scenario. SmartRefine is designed as a generic and flexible approach that can be seamlessly integrated into most state-of-the-art motion prediction models. Experiments on Argoverse (1 & 2) show that our method consistently improves the prediction accuracy of multiple state-of-the-art prediction models. Specifically, by adding SmartRefine to QCNet, we outperform all published ensemble-free works on the Argoverse 2 leaderboard (single agent track) at submission. Comprehensive studies are also conducted to ablate design choices and explore the mechanism behind multi-iteration refinement. Paper: https://lnkd.in/g4SPxRDE Code: https://lnkd.in/g3YysfSH #CVPR2024 #autonomousdriving #autonomousvehicles #selfdrivingcars #reinforcementlearning #deeplearning #motionprediction
-
We are proud to announce that our paper "Multiple View Geometry Transformers for 3D Human Pose Estimation" by Ziwei Liao*, Jialiang Zhu* (朱嘉梁), Chunyu Wang (王春雨), Han Hu and Steven Lake Waslander, from University of Toronto and Microsoft Research Asia has been accepted to #CVPR2024! 🎉 In this paper, we aim to improve the 3D reasoning ability of Transformers in multi-view 3D human pose estimation. Recent works have focused on end-to-end learning-based transformer designs, which struggle to resolve geometric information accurately, particularly during occlusion. We propose a novel hybrid model, MVGFormer, which has a series of geometric and appearance modules organized in an iterative manner. Our method outperforms the state-of-the-art in both in-domain and out-of-domain settings. We will be attending #CVPR2024 in person. See you in Seattle, USA! More details are available here: Paper: https://lnkd.in/gnZBXGZE Code: https://lnkd.in/g4aStA5s (Available soon) #CVPR24 #3DHumanPose #Multiview #3DVision #Transformers #Deeplearning
-
Last but not least, our third #ICRA2024 paper (3/3): Modern robots navigating dynamic environments demand precise real-time detection and tracking of nearby objects. For 3D multi-object tracking, recent approaches process a single measurement frame recursively with greedy association and are prone to errors in ambiguous association decisions. In our paper, "SWTrack: Multiple Hypothesis Sliding Window 3D Multi-Object Tracking" by Sandro Papais, Robert(Junguang) Ren, and Steven Lake Waslander, we introduce Sliding Window Tracker (SWTrack) which yields more accurate association and state estimation by batch processing many frames of sensor data while being capable of running online in real-time. More details are available here: Paper: https://lnkd.in/gWNe9azv Video: https://lnkd.in/gJxcuuVt This concludes our series of #ICRA2024 papers. See you in Yokohoma, Japan! #ICRA24 #slidingwindows #tracking #objectdetection #autonomousdriving