Jiachen (Jason) Zhou’s Post

Autonomous Vehicle @ Oxa | Former Team Principal @ aUToronto Self-Driving | MASc @ UofT | Vector AI Scholarship Recipient

3mo

🎉 Excited to share that the work we have been working on for the past two years got accepted to #ECCV2024, the European Conference on Computer Vision. Our work #JDT3D (joint 3D object detection and tracking) aims to address the gaps in the current #Transformer based end-to-end detection and tracking architecture for #autonomousdriving. When I started my master, I strongly believed that object detection shouldn’t be done just on single frames, but should leverage the past frames of detection as much as possible while performing object tracking jointly in an implicit way. In this work, we took inspiration from the #Transformer (the now famous architecture in natural language processing), adopted it to 3D #LiDAR-based computer vision, and proposed several key optimization and training techniques (track sampling augmentation, confidence-based query propagation) to bridge the performance gap with the traditional two-step data association approaches. This work wouldn’t have been possible without the diligence and preseverance of Brian Cheong who is still working on further improvements in this area, and the unwavering support and supervision from Professor Steven Lake Waslander. As a commitment to the research community, we are preparing to open-source our work, so stay tuned by following us at Toronto Robotics and Artificial Intelligence Laboratory. Check out the full paper in detaills linked below. Paper arxiv link: https://lnkd.in/eVEBWU7S #autonomousdriving #computervision #lidar #objectdetection #multiobjecttracking #robotics #artificialintelligence

Toronto Robotics and Artificial Intelligence Laboratory

905 followers

3mo Edited

🚀 Exciting Breakthrough in LiDAR-Based Joint Detection and Tracking! 🚀 We're thrilled to share our latest research from the University of Toronto that has been accepted to #ECCV2024: "JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention" by Brian Cheong, Jiachen (Jason) Zhou and Steven Lake Waslander. In computer vision, approaches trained in an end-to-end manner have been shown to perform better than traditional pipeline-based methods. However, within LiDAR-based object tracking, tracking-by-detection continues to achieve state-of-the-art performance without learning both tasks jointly. In our work, we explore the potential reasons for this gap and propose techniques to leverage the advantages of joint detection and tracking. 🌟 Key Highlights: - Innovative Approach: We propose JDT3D, a novel LiDAR-based joint detection and tracking model that leverages transformer-based decoders to propagate object queries over time, implicitly performing object tracking without an association step at inference. - Enhanced Techniques: We introduce track sampling augmentation and confidence-based query propagation to bridge the performance gap between tracking-by-detection (TBD) and joint detection and tracking (JDT) methods. - Real-World Impact: Our model is trained and evaluated on the nuScenes dataset, showcasing significant improvements in tracking accuracy and robustness. Check out the full paper linked below and join us at #ECCV2024! Paper: https://lnkd.in/gbT4EStA Code: https://lnkd.in/gU385pcW #autonomousdriving #computervision #tracking #objecttracking #robotics #3DVision #transformers #deeplearning

To view or add a comment, sign in

More Relevant Posts

Toronto Robotics and Artificial Intelligence Laboratory

905 followers
3mo Edited
Report this post
🚀 Exciting Breakthrough in LiDAR-Based Joint Detection and Tracking! 🚀 We're thrilled to share our latest research from the University of Toronto that has been accepted to #ECCV2024: "JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention" by Brian Cheong, Jiachen (Jason) Zhou and Steven Lake Waslander. In computer vision, approaches trained in an end-to-end manner have been shown to perform better than traditional pipeline-based methods. However, within LiDAR-based object tracking, tracking-by-detection continues to achieve state-of-the-art performance without learning both tasks jointly. In our work, we explore the potential reasons for this gap and propose techniques to leverage the advantages of joint detection and tracking. 🌟 Key Highlights: - Innovative Approach: We propose JDT3D, a novel LiDAR-based joint detection and tracking model that leverages transformer-based decoders to propagate object queries over time, implicitly performing object tracking without an association step at inference. - Enhanced Techniques: We introduce track sampling augmentation and confidence-based query propagation to bridge the performance gap between tracking-by-detection (TBD) and joint detection and tracking (JDT) methods. - Real-World Impact: Our model is trained and evaluated on the nuScenes dataset, showcasing significant improvements in tracking accuracy and robustness. Check out the full paper linked below and join us at #ECCV2024! Paper: https://lnkd.in/gbT4EStA Code: https://lnkd.in/gU385pcW #autonomousdriving #computervision #tracking #objecttracking #robotics #3DVision #transformers #deeplearning
Like Comment
To view or add a comment, sign in
Yuki N.

Data scientist with 5+ YOE seeking new opportunities | Technology Management | Applied Research and Development | AI | Computer vision
6mo
Report this post
In computer vision, large complex models tend to be more accurate. Unfortunately, large complex models also tend to take up plenty of space and run more slowly than smaller but less accurate models. This makes large complex models costly to implement in production, while smaller models might not be able to achieve performance metrics despite the cheaper cost. One way to boost the accuracy of smaller models is to use knowledge distillation - where a smaller model learns directly from the outputs from a larger model in addition to directly the training data. Knowledge distillation has been applied to a wide variety of models and fields, and very recently has been applied to the widely used computer vision object detection model YOLOX. A recently released paper by Aubard et al. (https://lnkd.in/gR2d6Hzc) demonstrates how a pre-trained large version of YOLOX can be used in knowledge distillation to train a small version of YOLOX to detect custom objects in a custom sonar image set with high detection accuracy. Advancements in small but accurate models are an important step in fully unlocking the potential of computer vision, and when combined with other important steps such as quantization, edge-AI hardware, federated learning and deep reinforcement learning will bring us closer to making autonomous robotic vehicles a reality. #computervision #ai #drones Paper and Repository Links: https://lnkd.in/gYc3Xb5B https://lnkd.in/gwVxkr2t

Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection

arxiv.org
Like Comment
To view or add a comment, sign in
EncephAI

203 followers
4mo
Report this post
🔍📦 Object Detection with Computer Vision 📦🔍 Object detection is a pivotal aspect of computer vision that goes beyond simply recognizing objects in an image. It involves identifying and locating objects within an image or video frame, allowing for a wide array of applications in fields such as autonomous driving, security, healthcare, and more. Key Techniques in Object Detection: YOLO (You Only Look Once): Overview: YOLO is a state-of-the-art, real-time object detection system that frames object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities. Advantages: Speed: YOLO is incredibly fast because it predicts all bounding boxes simultaneously, making it suitable for real-time applications. Accuracy: It achieves high accuracy with relatively simple architectures. Applications: Used in various real-time applications like traffic surveillance, pedestrian detection, and live video analysis. SSD (Single Shot MultiBox Detector): Overview: SSD is another real-time object detection model that, like YOLO, predicts bounding boxes and categories in one forward pass. It uses a feed-forward convolutional network to produce a fixed-size collection of bounding boxes and scores for the presence of object class instances in those boxes. Advantages: Efficiency: SSD is efficient in terms of computation, making it suitable for mobile and embedded systems. Flexibility: It can handle multiple object sizes by using different feature maps for detection. Applications: Popular in mobile applications, UAV (unmanned aerial vehicle) vision, and robotics. How Object Detection Works: Image Input: The process starts with an input image or video frame. Feature Extraction: Convolutional layers extract features from the input image. Region Proposal: Regions in the image that potentially contain objects are proposed. Classification: Each proposed region is classified into different object categories. Bounding Box Regression: The exact location of each object is predicted by regressing the coordinates of bounding boxes. Challenges in Object Detection: Variability in Object Appearance: Objects can vary significantly in size, shape, and appearance. Occlusion: Objects may be partially hidden by other objects. Complex Backgrounds: Backgrounds can introduce noise that complicates detection. Despite these challenges, techniques like YOLO and SSD have significantly advanced the field, enabling numerous practical applications that were once considered impossible. #ObjectDetection #AI #Innovation
Like Comment
To view or add a comment, sign in
Yue Zhao

Assistant Professor of Computer Science at USC | Building Trustworthy, Scalable, and Generative AI Systems | Anomaly & OOD Detection (Creator of PyOD), Graph Learning, AI4Science, Open-source ML Tools
5mo
Report this post
Multimodal OOD is such an important field with promising future — please start, follow, fork our repository https://lnkd.in/gzDRxvns for checking out datasets and code. Some interesting use cases: leverage multiple sensors from the car for detecting emerging events. #datascience #machinelearning #ai
Hao Dong

PhD Student @ ETH Zürich
5mo Edited

Check out our recent work "MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities". Current research in OOD detection has predominantly focused on unimodal settings, often involving images as inputs. While several recent works have investigated vision-language models to enhance OOD performance, their evaluations are still limited to benchmarks containing solely images. Consequently, existing methods fall short in fully leveraging the complementary information from various modalities, such as LiDAR and camera in autonomous driving, as well as video, audio, and optical flow in action recognition. In this work, we introduce the first-of-its-kind benchmark for Multimodal OOD Detection, named MultiOOD, characterized by diverse dataset sizes and varying modality combinations. We highlight the significance of integrating more modalities for OOD detection on MultiOOD and introduce an Agree-to-Disagree (A2D) training algorithm, inspired by the observation of the Modality Prediction Discrepancy phenomenon. We also introduce a new outlier synthesis algorithm NP-Mix that explores broader feature spaces and complements A2D to strengthen the multimodal OOD detection performance. This is joint work with Yue Zhao, Eleni Chatzi, and Olga Fink. paper: https://lnkd.in/dBvhJEY7 code: https://lnkd.in/d7MHnxYQ #datascience #machinelearning #ai #OOD #multimodal
Like Comment
To view or add a comment, sign in
AIRLab POLIMI

1,181 followers
3w
Report this post
🤖 The paper "RadarLCD: Learnable Radar-based Loop Closure Detection Pipeline," presented at the IJCNN 2024 conference, offers a new deep learning-based pipeline for radar-based Loop Closure Detection (LCD). Authored by Mirko Usuelli, Matteo Frosi, Paolo Cudrano, Simone Mentasti, and Matteo Matteucci from the Dipartimento di Elettronica, Informazione e Bioingengeria - Politecnico di Milano at Politecnico di Milano's AIRLab, this work introduces a novel approach to integrating radar data in robotics. 🔍 Challenge: The research addresses the difficulty of using FMCW radar data in LCD, particularly in the context of Simultaneous Localization and Mapping (SLAM). Radar sensors, though capable of operating in adverse conditions, suffer from noise and distortion, making their integration into LCD systems challenging. 💡 Solution: We developed RadarLCD, a supervised deep learning method that leverages onpre-trained models for radar odometry. RadarLCD improves Loop Detection by extracting and selecting key points in radar scans, which are crucial for detecting loops and estimating the robot's movement. 📊 Results: RadarLCD demonstrated superior performance compared to state-of-the-art techniques like Scan Context. Extensive evaluations on multiple datasets showed RadarLCD's robustness in challenging environments, making it a scalable and reliable solution for autonomous systems. 📜 Paper: https://lnkd.in/dkxGzFQr #Robotics #LoopClosureDetection #DeepLearning #IJCNN2024 #Polimi #AIRLab
Like Comment
To view or add a comment, sign in
Husnain Choudhary (Bright Developer)

Game developer | OOPs | C# Programming concepts | Unity developer | ASO Executive | Apps growth/Rank | Cloud Computing | Play Store & App Store | Kali Linux
4mo
Report this post
how to make a simple field of vision (FOV) system for a robot guard unit with custom debug gizmos for easy visualisation and tweaking, and with related alertness levels. Creating a simple field of vision (FOV) system for a robot guard unit can be achieved through a combination of programming and possibly hardware integration. Here's a basic outline of how you could approach this: Define the FOV Parameters: Determine the angle and range of the robot's field of vision. This could be represented as a cone extending from the robot's "eye" or sensor. 1) Implement the FOV Algorithm: You'll need to write code that determines what the robot can "see" within its FOV. This could involve raycasting or checking within a certain angle range from the robot's orientation. 2) Custom Debug Gizmos: To visualize the FOV and tweak parameters easily, create custom debug gizmos. These could be lines or shapes representing the FOV cone, displayed in a scene view in your development environment. 3) Related Alertness Levels: You can introduce a system to adjust the robot's behavior based on its alertness level. For example, when the robot detects an object or intruder within its FOV, its alertness level could increase, leading to actions like sounding an alarm or following a predefined patrol route more closely. 4) Integration with Sensors: Depending on the capabilities of your robot, you may need to integrate sensors such as cameras or lidar to detect objects within its FOV accurately. #Robotics #FieldOfVision #RobotGuard #OpenCV #MachineLearning #AI #Automation #RobotSecurity #TechInnovation #ArtificialIntelligence #ComputerVision #SecurityTech #InnovationInTech
Like Comment
To view or add a comment, sign in
Jayabrata Chowdhury

Lead @ Cadence Design Systems | PhD @ Robert Bosch Centre for Cyber-Physical Systems (QIF 2023) I Applied Machine Learning
3mo Edited
Report this post
We are thrilled to announce that our paper, co-authored by V Shivaraman and Sumit Dangi, has been accepted for publication at the prestigious #IEEE International Conference on Systems, Man, and Cybernetics 2024. Our work addresses the inherent challenges of Autonomous Vehicle (AV) decision-making in urban environments, which are dynamic due to interactions with surrounding vehicles. Contemporary methods often resort to large transformer architectures for encoding these interactions, primarily for trajectory prediction. However, this approach results in increased computational complexity. To tackle this issue without compromising on spatiotemporal understanding and performance, we propose a novel framework - Deep Attention Driven Reinforcement Learning (#DADRL). 1. This framework dynamically assigns and incorporates the significance of surrounding vehicles into the ego vehicle’s RL-driven decision-making process. 2. We introduce an AV-centric Spatiotemporal Attention Encoding (STAE) mechanism that learns the dynamic interactions with different surrounding vehicles 3. The spatiotemporal representations, when combined with contextual encoding, provide a comprehensive state representation. 4. The resulting model is trained using the Soft-Actor Critic (SAC) algorithm. 5. Our proposed method outperforms recent state-of-the-art methods, demonstrating its effectiveness and efficiency. A big thanks to Prof. Suresh Sundaram and Prof. PB Sujit for their guidance. We are also thankful to our colleagues at the Artificial Intelligence and Robotics Laboratory for their support! We look forward to showcasing our work and connecting with fellow researchers at this prestigious conference! #AI #AttentionMechanism #ReinforcementLearning #DADRL #IEEESMC2024 #AutonomousDriving
13 Comments
Like Comment
To view or add a comment, sign in
Junsheng Fu

Technical Expert at Zenseact
5mo
Report this post
Two research papers from Zenseact are going to be published at the 27th International Conference on Information Fusion! 🎉 . One paper is also part of our WASP industrial post-doc project funded by WASP – Wallenberg AI, Autonomous Systems and Software Program. 1️⃣ Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map By Yuxuan Xia, Erik Stenborg, Junsheng Fu, Gustaf Hendeby We proposed a HDmap generation solution utilizes a prior standard-definition (SD) map, GNSS measurements, visual odometry, and lane marking edge detection points, to simultaneously estimate the vehicle's 6D pose, its position within a SD map, and also the 3D geometry of traffic lines. This is achieved using a Bayesian simultaneous localization and multi-object tracking filter, where the estimation of traffic lines is formulated as a multiple extended object tracking problem, solved using a trajectory Poisson multi-Bernoulli mixture (TPMBM) filter. Paper link: https://lnkd.in/gEvw93Z3 2️⃣ Towards Accurate Ego-lane Identification with Early Time Series Classification By Yuchuan Jin, Theodor Stenhammar, David Bejmer Axel Beauvisage, Yuxuan Xia, Junsheng Fu This paper utilizes an Early Time Series Classification (ETSC) method to achieve precise and rapid ego-lane identification in real-world driving data. The method begins by assessing the similarities between map and lane markings perceived by the vehicle's camera using measurement model quality metrics. Paper link: https://lnkd.in/gXjtc-8e Check out the full papers or join in our presentations at the conference https://fusion2024.org/ #Research #AutonomousDriving #MachineLearning #InformationFusion2024

5 Comments
Like Comment
To view or add a comment, sign in
AIthena

276 followers
3mo Edited
Report this post
ℹ #AIthenaProject 📍 In interactive #urban #traffic #environments, vehicles, pedestrians and other road users, navigate highly complex road networks under a variety of environmental conditions while interacting with different types of road users. ✅ In the #AIthenaProject Use Case 2 ▶ AI extended Situational Awareness/Understanding ◀ information from perception layers, communications and map information is fused into #Local #Dynamic #Maps (LDM) to link #layers (from static to dynamic) to achieve accurate and complete knowledge of the scene. Find out more about the #Use #Case 2 on the #AIthenaProject 🔗 https://lnkd.in/eVv35_hA Virtual Vehicle Research GmbH I Siemens #BE Siemens Digital Industries Software I TTTech TTTech Auto I Infineon Technologies I Continental I Continental Engineering Services I Eindhoven University of Technology I Vicomtech

AITHENA Use Case 2: AI extended Situational Awareness/Understanding

https://aithena.eu
Like Comment
To view or add a comment, sign in
Algoryx

4,633 followers
1w
Report this post
Algoryx is envisioning a path to full autonomy for off-road machines, even in the most complex and unpredictable environments. At the heart of our approach is the belief that a safe and efficient AI-powered control system must operate in two worlds: the real one and a parallel physics-based simulation. The domains of heavy machines and mobile robots are converging. When an excavator is equipped with an AI-based control system, it effectively becomes an autonomous mobile robot. However, existing robotic control technologies aren’t advanced enough to handle unstructured terrain and complex materials. That’s why Algoryx, together with our academic and industry partners, is embarking on the XSCAVE project. In this collaboration, Algoryx will develop a lightweight physics simulation engine to be integrated with neural networks and embedded directly into physical machines, providing real-time feedback during operation. The goal is to create autonomous terrain machines with unprecedented abilities to plan, control, and adapt their actions based on the task at hand and the terrain they’re navigating. In Algoryx's view, a hybrid approach—combining neural network pipelines with physics-derived priors—is the way forward for building autonomous systems that can handle the complex work tasks in construction, forestry, and off-road transportation. Read more about the #HorizonEU-funded XSCAVE project and Algoryx’s long-term vision for heavy machine autonomy on our website: https://lnkd.in/dcWu6BaF #AutonomousMachines #AIinConstruction #OffRoadAutonomy #HeavyMachinery #AIandRobotics #PhysicsSimulation #NeuralNetworks #MachineLearning #HorizonEurope #ConstructionTech #ForestryInnovation #AutonomousVehicles #AIInnovation #EmbeddedAI #TechForGood #AIandPhysics
2 Comments
Like Comment
To view or add a comment, sign in

1,229 followers

37 Posts

View Profile Follow

Jiachen (Jason) Zhou’s Post

More Relevant Posts

Explore topics