1. Introduction
Autonomous driving (AD) is based on the principle of driving vehicles using artificial control and perception without human intervention [
1]. Autonomous driving systems use a variety of sensors to perceive their surroundings [
2], including cameras [
3], radar [
4], and LiDAR [
5]. These sensors provide the system with information about the location of other vehicles, pedestrians, and objects in the environment. The system then uses this information to make decisions about how to control the vehicle. Autonomous driving systems are still under development, and they carry a promise with the potential to revolutionize transportation. AD could make transportation safer [
6], more efficient, and accessible.
However, AD still faces challenges and open problems such as perception under severe weather conditions.
Numerous studies have investigated perception under foggy conditions. However, these studies have generally treated fog as a binary classification problem (foggy vs. non-foggy) and extrapolated conclusions to various levels of fog density. This approach overlooks the need for improved perception tailored to each specific fog density category.
Our research proposes a novel approach to object detection in foggy conditions employing a data-driven strategy and machine learning techniques. We categorize fog density into five distinct levels: 0%, 25%, 50%, 75%, and 100% (see
Appendix A). Leveraging the CARLA simulator (Car Learning to Act) [
7,
8], we generate a comprehensive dataset encompassing a diverse range of fog densities [
9]. Subsequently, we implement a bounding box-based machine learning algorithm to effectively detect objects under varying fog conditions. and we obtained highly accurate results for all fog levels, from clear weather to the highest fog density (100%). The purpose of this work is to enhance object recall (alongside precision) in multiple categories of fog conditions. We achieved high recall and precision across all fog density levels.
Given the close relationship between this research and safety-critical applications such as autonomous driving, examining the potential impact on navigation and vehicle safety is essential. Emphasizing how this model could be integrated into existing vehicle systems or improve object recognition accuracy in foggy conditions could significantly enhance the research’s relevance and practical application in real-world autonomous systems.
This paper is organized as follows.
Section 2 gives an overview of the influence of the wearer on the performance of autonomous vehicles. The next sections describe the methodology (
Section 3), discuss the results (
Section 4), and conclude the discussion as well as point out directions for future research (
Section 5).
2. Related Work
Weather phenomena can have various negative influences on the performance of autonomous vehicles (AVs), especially in their perception and sensing systems. Adverse conditions like heavy rain, snow, fog, and low lighting can significantly impair the sensors that AVs rely on, such as cameras, radar, LiDAR, and ultrasonic sensors. These systems are crucial for detecting obstacles, lane markings, pedestrians, and other vehicles. The diminished performance in such conditions poses a serious challenge to AV safety and reliability [
10].
The authors (Diaz-Ruiz et al., 2022) [
11] have developed datasets specifically tailored for severe weather conditions, including cloudy, rainy, snowy, night, and sunny scenarios. These datasets were generated using multiple sensors, and the data for each weather condition were trained separately. This approach significantly enhanced perception and increased accuracy. The authors demonstrated that models trained for specific weather conditions yield more accurate object detection when applied in those same conditions. For example, the model trained on data from sunny conditions achieved a mean average precision (mAP.5: 0.95) of 54.3 when tested under sunny conditions but only 38.9 when tested in rainy weather. Conversely, the model trained on rainy weather data yielded an (mAP .5 :.95) of 46.3 in rainy conditions, showing an improvement in accuracy from 38.9 to 46.3. However, this approach did not include the foggy conditions. In our work, we focused specifically on foggy conditions, dividing them into four distinct classes in addition to sunny conditions. We utilized the CARLA simulation environment to generate the datasets and employed our filtering techniques within the CARLA simulator to accurately label the data [
12] and we achieved 0.739 in heavy fog (mAP .5: 0.95).
Furthermore, Valanarasu et al. (2022) [
13] proposed a transformer-based model to restore images degraded by adverse weather conditions. The authors argue that transformers can be adapted to image restoration by treating images as sequences of pixels. The proposed model, called Transweather, consists of an encoder and a decoder [
14,
15]. The encoder takes an image degraded by adverse weather conditions as input and produces a latent representation of the image. The decoder then takes the latent representation as input and produces a restored image. The encoder is a multilayer convolutional transformer (MCT) model. The MCT model consists of a stack of convolutional layers and encoder–decoder attention layers. The convolutional layers extract features from the image, while the attention layers allow the model to learn long-range dependencies between pixels. The decoder is a convolutional transformer decoder (CTD) model. The CTD model consists of a stack of decoder–encoder attention layers and upsampling layers. The attention layers allow the model to attend to the latent representation of the image, while the upsampling layers reconstruct the restored image. The authors evaluated Transweather on a dataset of images degraded by rain, snow, haze, and fog. The results showed that Transweather outperforms several state-of-the-art image restoration methods. In previous work, fog was categorized as a single class (fog or no fog), which posed challenges, particularly when dealing with light fog. In contrast, our approach did not utilize TransWeather to transform foggy images into non-foggy ones. Instead, we focused on enhancing perception directly within foggy conditions. We developed separate models tailored to different fog densities, ranging from light to heavy fog, to improve accuracy and robustness across varying fog intensities.
The authors (Bijelic et al., 2020) [
16] introduced an innovative approach by integrating four sensors—an RGB camera, LiDAR, a gated camera, and radar—into a unified perception system. The outputs of these sensors were projected into the camera’s coordinate space and then processed through a convolutional neural network with four input channels to enhance perception accuracy. The authors evaluated their method using a benchmark dataset focused on object detection in adverse weather conditions. Their approach was compared against several state-of-the-art single-sensor and fusion methods. The results demonstrated that their method outperformed existing approaches, achieving an average precision of 76.69 in heavy fog. In comparison, our approach further improved performance, achieving an average precision of 89.00. We cannot compare the two results directly due to the differing data types (simulation vs. real data), although the weather conditions are the same.
The authors (Li et al., 2023) [
17] propose a domain adaptation framework that leverages both labeled data from the source domain (clear weather) and unlabeled data from the target domain (foggy weather). The key components of their approach include feature alignment, which involves mechanisms to align the feature distributions between the clear and foggy weather domains, helping the model to learn domain-invariant features that are robust to weather changes. They also employ domain adversarial training, using a domain discriminator to distinguish between the source and target domains, where the object detector is trained adversarially to perform well in both domains by confusing the discriminator, leading to features that generalize across different weather conditions. Additionally, the paper proposes multi-level adaptation, where adaptation occurs at multiple levels of the detection pipeline, including both the image and feature levels, to enhance the model’s robustness to foggy conditions. They also incorporate a self-training mechanism where the model iteratively generates pseudo-labels for the foggy images and refines its predictions, allowing the model to learn from the target domain data without requiring explicit labels. The mean average precision mAP for this work is 42.3 for heavy fog, 36.5 for walkers, and 50 for detecting walkers under heavy fog up to 200 m in distance.
The paper “A Review of the Impacts of Defogging on Deep Learning-Based Object Detectors in Self-Driving Cars” (Ogunrinde & Bernadin, 2021) [
18] explores the effects of image defogging techniques on the performance of deep learning-based object detection systems used in autonomous vehicles. The authors analyze the effectiveness of these techniques in improving detection accuracy, highlighting that while defogging generally enhances image quality, its impact on detection performance varies depending on the method used. Some defogging approaches may introduce artifacts or alter important features in the images, potentially leading to reduced detection accuracy or false positives. The paper emphasizes the need for the careful selection and tuning of defogging methods to balance the trade-off between improved visibility and accurate object detection. Additionally, the authors discuss the potential of integrating defogging directly into the object detection pipeline, allowing models to learn defogging and detection tasks simultaneously, Using their methodology, they improved recall under heavy fog conditions from 59.61 to 62.02 and precision from 60.98 to 62.74. In comparison, our approach resulted in a more significant increase with recall improving from 43.4 to 63.6 and precision from 86.8 to 93.1. The differences in recall between our results and theirs can be attributed to the variations in the datasets used and the algorithms implemented; we employed YOLOv8 [
19], while they used YOLOv3 [
20].
The overviewed papers have generally treated fog as a binary class (fog or no fog). In contrast, our research introduces a more nuanced approach by developing four distinct categories for fog density (besides clear weather) with a separate model implemented for each category. Our findings demonstrate that by categorizing fog into multiple levels, we can significantly enhance perception accuracy compared to the binary classification approach. This methodology can be adopted by the studies mentioned above to improve their perception accuracy and achieve more precise results.
In this work, our objective is to improve perception under heavy fog conditions. Our novelty involved classifying fog levels into four distinct categories based on fog density, besides the clear weather (0% or clear weather, 25%, 50%, 75%, and 100%). We then train the model by data, using deep learning techniques tailored for object detection. The method’s foundation lies in categorizing first the input based on the fog density and then operating a model specifically trained for that particular fog density range (refer to
Figure 1). For the dataset, we employed the CARLA simulator, which allows us to precisely control fog density and gather data with automated labeling for object detection in foggy conditions. We have made the data collection project available on our GitHub (
https://github.com/Mofeed-Chaar/Improving-bouning-box-in-Carla-simulator, accessed on 2 October 2024) [
18]. Additionally, we implemented flexible weather control by modifying parameters within the YAML file [
21] named
weather.yaml within our GitHub project. The objects we focused on in our work comprise six distinct categories: cars, buses, trucks, vans, pedestrians, and traffic lights. Furthermore, we meticulously generated distinct datasets for each fog density level and trained individual object detection models for each class of fog density. This approach yielded consistently high results across various metrics, including precision, recall, and MAP@50. In particular, we achieved an accuracy of more than 90% under a heavy fog condition (100% fog density), as we will see later in this paper.
4. Results and Discussion
The datasets we generated were divided into five categories, each corresponding to a specific weather condition with varying fog density. For each category of fog density, we labeled objects into four ranges: objects within 50 m, objects within 100 m, objects within 150 m, and objects within 200 m. We then trained different models using YOLOv5s and YOLOv8m for a variety of distance ranges using specific hyperparameters (refer to
Table 2) For the latency of the YoloV8 model, see
Table 3 [
42].
Our training results suggest that training our models on datasets with varying fog densities can preserve their performance and even enhance their accuracy in heavy fog conditions. Our corresponding results of the training, on the base of YOLOv5s, are shown in
Table 4. We are utilizing the YOLO loss function (refer to Equation (
1)).
We have separated this dataset of objects, labeled within 50 m, into 80% for training and 20% for validation with an image size of 640 × 640 pixels.
These results represent the performance of our models across six object classes. It is important to note that the accuracy is not uniform across all classes with some classes performing better than others. This is due to a number of factors, including the shape, size, and texture of the objects, as well as the presence of other objects in the scene (refer to
Table 5).
This procedure effectively preserved the precision (refer to Equation (
2)) of object detection in heavy fog conditions, while the recall (refer to Equation (
3)) was inversely proportional to the fog density. This trend was consistent even when the training data were expanded to include objects within longer distances, such as 100 m or more. We trained the YOLOv8m model using the same hyperparameters as the YOLOv5s model for all distances of object detection (50 m, 100 m, 150 m, 200 m) (see
Table 6). This allowed us to directly compare the performance of the two models under the same conditions. We can conclude that precision remains largely unaffected when data are used beyond 50 m, but recall exhibits a decreasing trend. This can be attributed to the consistent detection of close objects, but the model’s ability to identify objects at greater distances was diminished, impacting recall. We can deduce that object detection is highly accurate for close objects, but it becomes less accurate for objects with increasing distances. This is due to the fact that the fog obscures the objects, making it harder for the model to distinguish between the objects and the background.
Table 6 shows that the model can detect objects with high precision (see
Figure 3).
At greater distances, the model may miss some objects, but this is acceptable given the increased difficulty of detecting objects in fog. In case of heavy fog, driving behavior and speed are significantly affected. Aside from the speed limit imposed in heavy fog conditions, drivers adapt their driving style accordingly. The priority in heavy fog is to prioritize close objects and gradually increase the perception with distance. This is because the visibility is considerably reduced in heavy fog, making it challenging to identify objects at farther distances. Our object detection model can accurately identify objects in foggy conditions even when visibility is reduced. We achieved this by training the model on a large dataset of images taken in various fog densities. Our model can detect objects with high precision under heavy fog conditions.
In our previous work, we trained our object detection model using images with a resolution of 640 × 640 pixels. However, we noticed that using higher resolution (1280 × 1280 pixels) resulted in improved recall. The results of this experiment are summarized in
Table 7. These results are essential for our work, where we implemented a special model for each fog category.
Moreover, as seen in
Table 7, the larger objects (e.g., buses) exhibit higher accuracy than smaller objects (e.g., walkers), particularly in terms of recall. Note that large objects face less accuracy degradation with increasing distances compared to smaller objects, and the impact of recall degradation on small objects at high distances is more pronounced than on larger objects. However, using higher resolutions, such as 1280 × 1280 pixels, can resolve this issue. Note that there is a trade-off between resolution and latency. To address this, we can employ an appropriate model for each fog condition. Additionally, accuracy is more crucial than latency in heavy fog conditions because vehicle speeds are slower than in clear weather. On the other hand, we found that traffic lights (as objects) are detected with high accuracy despite being small objects (see
Table 5 and
Table 7 and
Figure 3). This is likely due to the distinct features surrounding traffic lights, such as the traffic light poles, their positioning on the roadside, and the colored states of the traffic signals. Generally, the performance of our object detection model is highly accurate for fog density levels that match the fog density levels used to train the model. However, when the model is validated at fog density levels that differ from the levels used for training, the accuracy decreases (refer
Table 8). As evident from
Table 8, we can conclude that using a model trained for the same fog density significantly enhances the precision and recall. Notably, the highest accuracy values appear on the diagonal of the table, corresponding to the validation of models trained on the corresponding fog density categories.
In general, it should be also noted that for autonomous driving vehicles, it is of crucial importance to detect correctly the state (red, yellow, green) of a traffic light. This will be a subject of further study.
5. Conclusions
Our primary objective in this study was to enhance the perception of traffic participants and traffic lights under dense fog conditions by developing models that are tailored to specific fog density levels. This approach allows our system to prioritize the relevant features of objects in fog, leading to improved detection accuracy. Furthermore, this approach enhances the flexibility of autonomous driving (AD) in severe weather conditions by enabling the use of specialized algorithms tailored to specific fog density categories. Additionally, it enables the detection of objects that are not visible to the human eye using only RGB images. This capability becomes even more efficient when combined with other sensors such as LiDAR and radar. As we observed, the core of the algorithm focuses on creating a separate model for each fog category (clear, low fog, moderate fog, etc.), which improves recall and precision compared to a model trained for general weather conditions (see
Table 8).
For future research, we intend to extend our methodology to real-world data, aiming to improve object detection under actual environmental conditions. A primary challenge in utilizing real data will involve creating specialized datasets that categorize each level of fog density in addition to performing object detection.
This study demonstrates that classifying fog density enhances perceptual accuracy by increasing recall and precision. As illustrated in
Table 7, classifying fog and training each model based on fog density yields improved precision. These findings underscore the importance of fog classification, particularly given the absence of existing datasets that categorize fog levels and provide labeled bounding boxes, which are notable challenges. This study thus highlights the critical role of fog classification.