🚀 Exciting Breakthrough in LiDAR-Based Joint Detection and Tracking! 🚀 We're thrilled to share our latest research from the University of Toronto that has been accepted to #ECCV2024: "JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention" by Brian Cheong, Jiachen (Jason) Zhou and Steven Lake Waslander. In computer vision, approaches trained in an end-to-end manner have been shown to perform better than traditional pipeline-based methods. However, within LiDAR-based object tracking, tracking-by-detection continues to achieve state-of-the-art performance without learning both tasks jointly. In our work, we explore the potential reasons for this gap and propose techniques to leverage the advantages of joint detection and tracking. 🌟 Key Highlights: - Innovative Approach: We propose JDT3D, a novel LiDAR-based joint detection and tracking model that leverages transformer-based decoders to propagate object queries over time, implicitly performing object tracking without an association step at inference. - Enhanced Techniques: We introduce track sampling augmentation and confidence-based query propagation to bridge the performance gap between tracking-by-detection (TBD) and joint detection and tracking (JDT) methods. - Real-World Impact: Our model is trained and evaluated on the nuScenes dataset, showcasing significant improvements in tracking accuracy and robustness. Check out the full paper linked below and join us at #ECCV2024! Paper: https://lnkd.in/gbT4EStA Code: https://lnkd.in/gU385pcW #autonomousdriving #computervision #tracking #objecttracking #robotics #3DVision #transformers #deeplearning
Toronto Robotics and Artificial Intelligence Laboratory’s Post
More Relevant Posts
-
Autonomous Vehicle @ Oxa | Former Team Principal @ aUToronto Self-Driving | MASc @ UofT | Vector AI Scholarship Recipient
🎉 Excited to share that the work we have been working on for the past two years got accepted to #ECCV2024, the European Conference on Computer Vision. Our work #JDT3D (joint 3D object detection and tracking) aims to address the gaps in the current #Transformer based end-to-end detection and tracking architecture for #autonomousdriving. When I started my master, I strongly believed that object detection shouldn’t be done just on single frames, but should leverage the past frames of detection as much as possible while performing object tracking jointly in an implicit way. In this work, we took inspiration from the #Transformer (the now famous architecture in natural language processing), adopted it to 3D #LiDAR-based computer vision, and proposed several key optimization and training techniques (track sampling augmentation, confidence-based query propagation) to bridge the performance gap with the traditional two-step data association approaches. This work wouldn’t have been possible without the diligence and preseverance of Brian Cheong who is still working on further improvements in this area, and the unwavering support and supervision from Professor Steven Lake Waslander. As a commitment to the research community, we are preparing to open-source our work, so stay tuned by following us at Toronto Robotics and Artificial Intelligence Laboratory. Check out the full paper in detaills linked below. Paper arxiv link: https://lnkd.in/eVEBWU7S #autonomousdriving #computervision #lidar #objectdetection #multiobjecttracking #robotics #artificialintelligence
🚀 Exciting Breakthrough in LiDAR-Based Joint Detection and Tracking! 🚀 We're thrilled to share our latest research from the University of Toronto that has been accepted to #ECCV2024: "JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention" by Brian Cheong, Jiachen (Jason) Zhou and Steven Lake Waslander. In computer vision, approaches trained in an end-to-end manner have been shown to perform better than traditional pipeline-based methods. However, within LiDAR-based object tracking, tracking-by-detection continues to achieve state-of-the-art performance without learning both tasks jointly. In our work, we explore the potential reasons for this gap and propose techniques to leverage the advantages of joint detection and tracking. 🌟 Key Highlights: - Innovative Approach: We propose JDT3D, a novel LiDAR-based joint detection and tracking model that leverages transformer-based decoders to propagate object queries over time, implicitly performing object tracking without an association step at inference. - Enhanced Techniques: We introduce track sampling augmentation and confidence-based query propagation to bridge the performance gap between tracking-by-detection (TBD) and joint detection and tracking (JDT) methods. - Real-World Impact: Our model is trained and evaluated on the nuScenes dataset, showcasing significant improvements in tracking accuracy and robustness. Check out the full paper linked below and join us at #ECCV2024! Paper: https://lnkd.in/gbT4EStA Code: https://lnkd.in/gU385pcW #autonomousdriving #computervision #tracking #objecttracking #robotics #3DVision #transformers #deeplearning
To view or add a comment, sign in
-
Data scientist with 5+ YOE seeking new opportunities | Technology Management | Applied Research and Development | AI | Computer vision
In computer vision, large complex models tend to be more accurate. Unfortunately, large complex models also tend to take up plenty of space and run more slowly than smaller but less accurate models. This makes large complex models costly to implement in production, while smaller models might not be able to achieve performance metrics despite the cheaper cost. One way to boost the accuracy of smaller models is to use knowledge distillation - where a smaller model learns directly from the outputs from a larger model in addition to directly the training data. Knowledge distillation has been applied to a wide variety of models and fields, and very recently has been applied to the widely used computer vision object detection model YOLOX. A recently released paper by Aubard et al. (https://lnkd.in/gR2d6Hzc) demonstrates how a pre-trained large version of YOLOX can be used in knowledge distillation to train a small version of YOLOX to detect custom objects in a custom sonar image set with high detection accuracy. Advancements in small but accurate models are an important step in fully unlocking the potential of computer vision, and when combined with other important steps such as quantization, edge-AI hardware, federated learning and deep reinforcement learning will bring us closer to making autonomous robotic vehicles a reality. #computervision #ai #drones Paper and Repository Links: https://lnkd.in/gYc3Xb5B https://lnkd.in/gwVxkr2t
To view or add a comment, sign in
-
#ABiD : An Industrial Dataset for Human-Robot Interaction Scenarios with Risk Indices 🔍 Data is the backbone of AI innovation. Yet, publicly available datasets are still scarce, particularly in the domain of human-robot interaction (HRI). Proximity Robotics & Automation GmbH, in partnership with the Federal Institute for Occupational Safety and Health (Bundesanstalt für Arbeitsschutz und Arbeitsmedizin (BAuA), Silvia Vock) and the University of Stuttgart (Andrey Morozov), is making strides towards safe and seamless HRI. We're recording several safety-aware HRI scenarios and labeling them with our in-house developed AI-based methods. The combination of state-of-the-art RGB-D cameras and Seyond Falcon LiDAR allows us to capture and to label not only the “standard” RGB-D stream but also image-grade point clouds. Thanks to support from BAuA, we are thrilled to offer this dataset to researchers worldwide, empowering the future of robotic workspace safety in HRI!🌐🤖🧑🔬 #HumanRobotInteration #AI #Safety #Dataset #Pointclouds #ROS2
To view or add a comment, sign in
-
👨💻 AI Engineer | 🤖 ML Engineer | 💻 Computer Engineer | 🖼️ Computer Vision | 🧠 NLP | 🔍 Deep Learning | 🎓 Master’s Degree
🚜 𝐍𝐚𝐯𝐢𝐠𝐚𝐭𝐢𝐧𝐠 𝐂𝐨𝐦𝐩𝐥𝐞𝐱 𝐎𝐟𝐟-𝐑𝐨𝐚𝐝 𝐓𝐞𝐫𝐫𝐚𝐢𝐧 𝐰𝐢𝐭𝐡 𝐌𝐮𝐥𝐭𝐢-𝐈𝐧𝐩𝐮𝐭 𝐒𝐞𝐦𝐚𝐧𝐭𝐢𝐜 𝐒𝐞𝐠𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 🛤️ In off-road environments, accurate semantic segmentation is crucial for understanding the terrain and making real-time decisions. A popular solution? 𝐌𝐮𝐥𝐭𝐢-𝐢𝐧𝐩𝐮𝐭 𝐬𝐲𝐬𝐭𝐞𝐦𝐬, where diverse data sources like LiDAR, cameras, and radar are integrated to enhance segmentation precision. But how should we combine this information? 🔗 𝐄𝐚𝐫𝐥𝐲 𝐅𝐮𝐬𝐢𝐨𝐧: This method merges sensor data at the input level, feeding the network a pre-fused dataset. It's 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐚𝐧𝐝 𝐪𝐮𝐢𝐜𝐤 but can struggle to fully leverage the unique strengths of each sensor. ⚡ 𝐋𝐚𝐭𝐞 𝐅𝐮𝐬𝐢𝐨𝐧: Here, sensor data streams are processed separately, only merging in the final layers of the network. While it 𝐫𝐞𝐭𝐚𝐢𝐧𝐬 𝐭𝐡𝐞 𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜 𝐚𝐝𝐯𝐚𝐧𝐭𝐚𝐠𝐞𝐬 𝐨𝐟 𝐞𝐚𝐜𝐡 𝐬𝐞𝐧𝐬𝐨𝐫, it requires more computational power. 🧠 𝐒𝐞𝐧𝐬𝐨𝐫 𝐒𝐞𝐥𝐞𝐜𝐭𝐢𝐨𝐧 & 𝐈𝐧𝐭𝐞𝐫𝐬𝐞𝐜𝐭𝐢𝐨𝐧: The effectiveness of either approach often depends on 𝐬𝐞𝐧𝐬𝐨𝐫 𝐬𝐞𝐥𝐞𝐜𝐭𝐢𝐨𝐧 - choosing the right combination of inputs - and the 𝐢𝐧𝐭𝐞𝐫𝐬𝐞𝐜𝐭𝐢𝐨𝐧 𝐨𝐟 𝐬𝐞𝐧𝐬𝐨𝐫 𝐜𝐨𝐯𝐞𝐫𝐚𝐠𝐞. By selecting complementary sensors and balancing their overlapping areas, segmentation accuracy can be greatly improved, regardless of the fusion method. 🚧 Both fusion strategies bring unique benefits to off-road scenarios. Early fusion offers 𝐪𝐮𝐢𝐜𝐤𝐞𝐫 𝐩𝐫𝐨𝐜𝐞𝐬𝐬𝐢𝐧𝐠, ideal for real-time decisions. Late fusion, on the other hand, is generally more 𝐜𝐨𝐦𝐩𝐮𝐭𝐚𝐭𝐢𝐨𝐧𝐚𝐥𝐥𝐲 𝐢𝐧𝐭𝐞𝐧𝐬𝐢𝐯𝐞. Which method would you choose to conquer rugged terrain? 🌲 Share your thoughts in the comments! #ComputerVision #SemanticSegmentation #OffRoad #AutonomousVehicles #AI #DeepLearning #MachineLearning #EarlyFusion #LateFusion #SensorFusion #AutonomousTech #Robotics
To view or add a comment, sign in
-
Two research papers from Zenseact are going to be published at the 27th International Conference on Information Fusion! 🎉 . One paper is also part of our WASP industrial post-doc project funded by WASP – Wallenberg AI, Autonomous Systems and Software Program. 1️⃣ Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map By Yuxuan Xia, Erik Stenborg, Junsheng Fu, Gustaf Hendeby We proposed a HDmap generation solution utilizes a prior standard-definition (SD) map, GNSS measurements, visual odometry, and lane marking edge detection points, to simultaneously estimate the vehicle's 6D pose, its position within a SD map, and also the 3D geometry of traffic lines. This is achieved using a Bayesian simultaneous localization and multi-object tracking filter, where the estimation of traffic lines is formulated as a multiple extended object tracking problem, solved using a trajectory Poisson multi-Bernoulli mixture (TPMBM) filter. Paper link: https://lnkd.in/gEvw93Z3 2️⃣ Towards Accurate Ego-lane Identification with Early Time Series Classification By Yuchuan Jin, Theodor Stenhammar, David Bejmer Axel Beauvisage, Yuxuan Xia, Junsheng Fu This paper utilizes an Early Time Series Classification (ETSC) method to achieve precise and rapid ego-lane identification in real-world driving data. The method begins by assessing the similarities between map and lane markings perceived by the vehicle's camera using measurement model quality metrics. Paper link: https://lnkd.in/gXjtc-8e Check out the full papers or join in our presentations at the conference https://fusion2024.org/ #Research #AutonomousDriving #MachineLearning #InformationFusion2024
To view or add a comment, sign in
-
Assistant Professor of Computer Science at USC | Building Trustworthy, Scalable, and Generative AI Systems | Anomaly & OOD Detection (Creator of PyOD), Graph Learning, AI4Science, Open-source ML Tools
Multimodal OOD is such an important field with promising future — please start, follow, fork our repository https://lnkd.in/gzDRxvns for checking out datasets and code. Some interesting use cases: leverage multiple sensors from the car for detecting emerging events. #datascience #machinelearning #ai
Check out our recent work "MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities". Current research in OOD detection has predominantly focused on unimodal settings, often involving images as inputs. While several recent works have investigated vision-language models to enhance OOD performance, their evaluations are still limited to benchmarks containing solely images. Consequently, existing methods fall short in fully leveraging the complementary information from various modalities, such as LiDAR and camera in autonomous driving, as well as video, audio, and optical flow in action recognition. In this work, we introduce the first-of-its-kind benchmark for Multimodal OOD Detection, named MultiOOD, characterized by diverse dataset sizes and varying modality combinations. We highlight the significance of integrating more modalities for OOD detection on MultiOOD and introduce an Agree-to-Disagree (A2D) training algorithm, inspired by the observation of the Modality Prediction Discrepancy phenomenon. We also introduce a new outlier synthesis algorithm NP-Mix that explores broader feature spaces and complements A2D to strengthen the multimodal OOD detection performance. This is joint work with Yue Zhao, Eleni Chatzi, and Olga Fink. paper: https://lnkd.in/dBvhJEY7 code: https://lnkd.in/d7MHnxYQ #datascience #machinelearning #ai #OOD #multimodal
To view or add a comment, sign in
-
Lead @ Cadence Design Systems | PhD @ Robert Bosch Centre for Cyber-Physical Systems (QIF 2023) I Applied Machine Learning
We are thrilled to announce that our paper, co-authored by V Shivaraman and Sumit Dangi, has been accepted for publication at the prestigious #IEEE International Conference on Systems, Man, and Cybernetics 2024. Our work addresses the inherent challenges of Autonomous Vehicle (AV) decision-making in urban environments, which are dynamic due to interactions with surrounding vehicles. Contemporary methods often resort to large transformer architectures for encoding these interactions, primarily for trajectory prediction. However, this approach results in increased computational complexity. To tackle this issue without compromising on spatiotemporal understanding and performance, we propose a novel framework - Deep Attention Driven Reinforcement Learning (#DADRL). 1. This framework dynamically assigns and incorporates the significance of surrounding vehicles into the ego vehicle’s RL-driven decision-making process. 2. We introduce an AV-centric Spatiotemporal Attention Encoding (STAE) mechanism that learns the dynamic interactions with different surrounding vehicles 3. The spatiotemporal representations, when combined with contextual encoding, provide a comprehensive state representation. 4. The resulting model is trained using the Soft-Actor Critic (SAC) algorithm. 5. Our proposed method outperforms recent state-of-the-art methods, demonstrating its effectiveness and efficiency. A big thanks to Prof. Suresh Sundaram and Prof. PB Sujit for their guidance. We are also thankful to our colleagues at the Artificial Intelligence and Robotics Laboratory for their support! We look forward to showcasing our work and connecting with fellow researchers at this prestigious conference! #AI #AttentionMechanism #ReinforcementLearning #DADRL #IEEESMC2024 #AutonomousDriving
To view or add a comment, sign in
-
🔍📦 Object Detection with Computer Vision 📦🔍 Object detection is a pivotal aspect of computer vision that goes beyond simply recognizing objects in an image. It involves identifying and locating objects within an image or video frame, allowing for a wide array of applications in fields such as autonomous driving, security, healthcare, and more. Key Techniques in Object Detection: YOLO (You Only Look Once): Overview: YOLO is a state-of-the-art, real-time object detection system that frames object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities. Advantages: Speed: YOLO is incredibly fast because it predicts all bounding boxes simultaneously, making it suitable for real-time applications. Accuracy: It achieves high accuracy with relatively simple architectures. Applications: Used in various real-time applications like traffic surveillance, pedestrian detection, and live video analysis. SSD (Single Shot MultiBox Detector): Overview: SSD is another real-time object detection model that, like YOLO, predicts bounding boxes and categories in one forward pass. It uses a feed-forward convolutional network to produce a fixed-size collection of bounding boxes and scores for the presence of object class instances in those boxes. Advantages: Efficiency: SSD is efficient in terms of computation, making it suitable for mobile and embedded systems. Flexibility: It can handle multiple object sizes by using different feature maps for detection. Applications: Popular in mobile applications, UAV (unmanned aerial vehicle) vision, and robotics. How Object Detection Works: Image Input: The process starts with an input image or video frame. Feature Extraction: Convolutional layers extract features from the input image. Region Proposal: Regions in the image that potentially contain objects are proposed. Classification: Each proposed region is classified into different object categories. Bounding Box Regression: The exact location of each object is predicted by regressing the coordinates of bounding boxes. Challenges in Object Detection: Variability in Object Appearance: Objects can vary significantly in size, shape, and appearance. Occlusion: Objects may be partially hidden by other objects. Complex Backgrounds: Backgrounds can introduce noise that complicates detection. Despite these challenges, techniques like YOLO and SSD have significantly advanced the field, enabling numerous practical applications that were once considered impossible. #ObjectDetection #AI #Innovation
To view or add a comment, sign in
-
Game developer | OOPs | C# Programming concepts | Unity developer | ASO Executive | Apps growth/Rank | Cloud Computing | Play Store & App Store | Kali Linux
how to make a simple field of vision (FOV) system for a robot guard unit with custom debug gizmos for easy visualisation and tweaking, and with related alertness levels. Creating a simple field of vision (FOV) system for a robot guard unit can be achieved through a combination of programming and possibly hardware integration. Here's a basic outline of how you could approach this: Define the FOV Parameters: Determine the angle and range of the robot's field of vision. This could be represented as a cone extending from the robot's "eye" or sensor. 1) Implement the FOV Algorithm: You'll need to write code that determines what the robot can "see" within its FOV. This could involve raycasting or checking within a certain angle range from the robot's orientation. 2) Custom Debug Gizmos: To visualize the FOV and tweak parameters easily, create custom debug gizmos. These could be lines or shapes representing the FOV cone, displayed in a scene view in your development environment. 3) Related Alertness Levels: You can introduce a system to adjust the robot's behavior based on its alertness level. For example, when the robot detects an object or intruder within its FOV, its alertness level could increase, leading to actions like sounding an alarm or following a predefined patrol route more closely. 4) Integration with Sensors: Depending on the capabilities of your robot, you may need to integrate sensors such as cameras or lidar to detect objects within its FOV accurately. #Robotics #FieldOfVision #RobotGuard #OpenCV #MachineLearning #AI #Automation #RobotSecurity #TechInnovation #ArtificialIntelligence #ComputerVision #SecurityTech #InnovationInTech
To view or add a comment, sign in
-
🤖 The paper "RadarLCD: Learnable Radar-based Loop Closure Detection Pipeline," presented at the IJCNN 2024 conference, offers a new deep learning-based pipeline for radar-based Loop Closure Detection (LCD). Authored by Mirko Usuelli, Matteo Frosi, Paolo Cudrano, Simone Mentasti, and Matteo Matteucci from the Dipartimento di Elettronica, Informazione e Bioingengeria - Politecnico di Milano at Politecnico di Milano's AIRLab, this work introduces a novel approach to integrating radar data in robotics. 🔍 Challenge: The research addresses the difficulty of using FMCW radar data in LCD, particularly in the context of Simultaneous Localization and Mapping (SLAM). Radar sensors, though capable of operating in adverse conditions, suffer from noise and distortion, making their integration into LCD systems challenging. 💡 Solution: We developed RadarLCD, a supervised deep learning method that leverages onpre-trained models for radar odometry. RadarLCD improves Loop Detection by extracting and selecting key points in radar scans, which are crucial for detecting loops and estimating the robot's movement. 📊 Results: RadarLCD demonstrated superior performance compared to state-of-the-art techniques like Scan Context. Extensive evaluations on multiple datasets showed RadarLCD's robustness in challenging environments, making it a scalable and reliable solution for autonomous systems. 📜 Paper: https://lnkd.in/dkxGzFQr #Robotics #LoopClosureDetection #DeepLearning #IJCNN2024 #Polimi #AIRLab
To view or add a comment, sign in
905 followers