Rearview Camera-Based Blind-Spot Detection and Lane Change Assistance System for Autonomous Vehicles

Lee, Yunhee; Park, Manbok

doi:10.3390/app15010419

Open AccessArticle

Rearview Camera-Based Blind-Spot Detection and Lane Change Assistance System for Autonomous Vehicles

by

Yunhee Lee

¹

and

Manbok Park

^2,*

¹

IVIS Corp., Suwon-si 16229, Republic of Korea

²

Department of Electronic Engineering, College of Convergence Technology, Korea National University of Transportation, Chungju-si 27469, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(1), 419; https://doi.org/10.3390/app15010419

Submission received: 1 November 2024 / Revised: 9 December 2024 / Accepted: 2 January 2025 / Published: 4 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

This paper focuses on a method of rearview camera-based blind-spot detection and a lane change assistance system for autonomous vehicles, utilizing a convolutional neural network and lane detection. In this study, we propose a method for providing real-time warnings to autonomous vehicles and drivers regarding collision risks during lane-changing maneuvers. We propose a method for lane detection to delineate the area for blind-spot detection and for measuring time to collision—both utilized to ascertain the vehicle’s location and compensate for vertical vibrations caused by vehicle movement. The lane detection method uses edge detection on an input image to extract lane markings by employing edge pairs consisting of positive and negative edges. Lanes were extracted through third-polynomial fitting of the extracted lane markings, with each lane marking being tracked using the results from the previous frame detections. Using the vanishing point where the two lanes converge, the camera calibration information is updated to compensate for the vertical vibrations caused by vehicle movement. Additionally, the proposed method utilized YOLOv9 for object detection, leveraging lane information to define the region of interest (ROI) and detect small-sized objects. The object detection achieved a precision of 90.2% and a recall of 82.8%. The detected object information was subsequently used to calculate the collision risk. A collision risk assessment was performed for various objects using a three-level collision warning system that adapts to the relative speed of obstacles. The proposed method demonstrated a performance of 11.64 fps with an execution time of 85.87 ms. It provides real-time warnings to both drivers and autonomous vehicles regarding potential collisions with detected objects.

Keywords:

rearview camera; blind-spot detection; lane change assistance system; convolutional neural network; lane detection

1. Introduction

With the automobile industry currently experiencing growth due to an increase in the number of vehicles, research on autonomous vehicles is being actively pursued. Driver assistance systems (DASs) for autonomous vehicles are also a significant area of intensive research because they play a vital role in the functionality of these vehicles. Systems designed to prevent accidents—crucial for ensuring safety—have been studied in various fields. Systems designed to prevent car accidents include lane-keeping assistance, forward collision warnings, autonomous emergency braking, blind-spot warnings, and automatic parking [1,2]. In particular, the blind-spot collision warning system and lane change assistance system are designed to detect objects to the sides and at the rear of a vehicle, alert the driver to potential collision risks from the side or rear, or relay this information to the autonomous vehicle. This system is particularly important for ensuring safe lane changes [3,4].

The blind-spot collision warning and lane change assistance system consists of two main components: one for detecting rear objects and the other for assessing the risk of collision with those objects. The rear-object detection component utilizes various sensors integrated into a vehicle, broadly categorized into radar- and camera-based methods [5,6,7,8]. The radar method involves mounting radar sensors on both sides of a vehicle’s rear bumper to detect objects. In contrast, camera-based methods involve installing a camera under a side mirror and employing a rearview camera. The camera-based approach offers the advantage of distinguishing between various object types, enabling the delivery of customized warnings based on the specific object type. Furthermore, cross-entropy methods are occasionally employed to evaluate the collision risk associated with detected objects [9]. The collision risk assessment component leverages data from cameras or radar sensors to differentiate between approaching and receding objects and provides warnings regarding objects in the approaching direction when they enter a designated area or when the time to collision is within a specified time range.

In this study, a rearview camera was used to detect the rear objects. While two sensors are required when installing existing radar systems or side-mirror cameras, using a rear camera offers the advantage of real-time implementation. When two cameras are used, two images must be processed simultaneously, but with a single camera, only one image is processed, allowing for faster processing speeds. In addition, considering that the rear camera is typically installed by default, extra sensors are not required, thus allowing the rear camera to be effectively used in parking situations. Figure 1 shows a block diagram of the proposed method.

As shown in Figure 1, lane detection was performed using a rearview camera when an input image was received. This approach accurately determines the locations of side and rear vehicles based on lane detection results. The detected lanes establish a warning area, with side and rear obstacles identified using a convolutional neural network (CNN). The detected obstacles are then tracked, and the system calculates the collision risk for these tracked objects, with collision risk assessment categorized into three levels to warn the driver. The results of the collision risk assessment assist with lane changes by preventing autonomous vehicles from changing lanes when necessary.

The remainder of this paper is organized as follows: Section 2 reviews related studies on blind-spot detection and lane change assistance systems, Section 3 presents the proposed method, Section 4 discusses the experimental results, and Section 5 concludes the paper.

2. Related Works

Sensor technologies for blind-spot detection and lane-change assistance systems can be classified into three categories: radar, cameras, and ultrasonic sensors [3,4,5,6,10].

Radar is currently the most widely used method for blind-spot detection and lane change assistance systems. Radar-based detection is implemented by mounting radar sensors on each end of the rear bumper to monitor both the side and rear of the vehicle, thus allowing objects to be detected in these areas [6,7]. Ultrasonic sensors are mounted on the rear bumper and oriented towards the sides to detect objects on the left and right sides of the vehicle [10].

Methods that use radar or ultrasonic sensors detect objects but have the disadvantage of being unable to differentiate between object types and are often affected by surrounding elements such as guardrails. In contrast, camera-based systems offer the advantage of distinguishing between different types of objects, such as cars, buses, or trucks.

There are two primary methods of using cameras to detect rear objects. The first involves mounting a camera under a side mirror to detect objects to the sides and rear. The second uses a rearview camera for object detection [4,7,8].

Various methods for detecting side and rear objects using side cameras are currently being investigated. Tseng et al. employed motion detection for object identification, whereas Lin et al. used appearance and edge features [11,12]. Guo et al. implemented a convolutional neural network for object detection, and Zhao et al. applied a neural network for similar purposes [13,14]. Significant research is also underway on rear-object detection using rear cameras. Tsuchiya et al. utilized a Census Transform from a top-view perspective for object detection [15]. Dooley et al. applied AdaBoost and Hough circle transforms for a similar purpose [16]. Jung et al. employed a support vector machine [17], whereas Ra et al. converted images into side-rectilinear formats for enhanced detection [4]. Lee et al. applied a side-rectilinear image conversion technique in conjunction with a generative adversarial network (GAN) to improve nighttime object detection [8].

Mounting a camera under the side mirror requires two cameras because each camera monitors only one side. In contrast, using a rear camera offers the advantage of detecting objects on both sides using a single camera. In addition, a rear camera is already used for low-speed parking assistance; therefore, at high speeds, the same camera could be used for object detection, which negates the need for additional camera/sensor installations, thus allowing multiple systems to be configured with one camera. Although a side-mounted camera can detect vehicles directly adjacent to them, a rear camera cannot detect immediately adjacent vehicles. Nevertheless, a system incorporating a rear camera with an object tracking method can still warn of potential collisions with vehicles.

Additionally, the rear camera method has the advantage of faster processing speeds compared to the side-mirror camera setup, as it only requires the processing of a single image instead of two. Consequently, the proposed method utilizes a rear camera to enable real-time system configuration.

3. Proposed Method

The proposed method configures a blind-spot collision warning system and lane-change assistance system using a rear camera. A CNN was employed for object detection, and a Kalman filter was used to track the obstacles. The system provides three-level warnings based on the relative speeds and distances of the tracked obstacles. In addition, lane detection is used to determine the locations of the detected vehicles. Camera calibration was updated based on vehicle vibrations using the vanishing point estimated from the lane detection results. This calibration update enhanced the accuracy of the relative speed and distance estimations.

3.1. Lane Detection

The proposed blind-spot detection method and lane change assistance system leverage lane detection results to define the warning area. In this approach, the positions of obstacles are assessed relative to the detected lanes, and warnings are issued for obstacles located in adjacent lanes. To enhance the accuracy of the relative speed and distance estimations, lane detection results were used to correct for vertical vibrations caused by vehicle movement. By identifying the vanishing point through lane detection, the vertical vibrations of the vehicle can be adjusted to improve the system’s precision further [18,19].

As illustrated in Figure 2, the proposed method performs edge detection once an input image is received. Initially, the color image is converted to grayscale for edge detection. The Sobel edge detection method was applied in the vertical direction because the lane markings were oriented vertically. Equation (1) represents Sobel edge detection [20,21].

\begin{matrix} E (u, v) = \sum_{v = 1}^{h - 1} \sum_{u = 1}^{w - 1} & (I (u - 1, v - 1) + 2 I (u - 1, v) + I (u - 1, v + 1)) \\ - (I (u + 1, v - 1) + 2 I (u + 1, v) + I (u + 1, v + 1)) \end{matrix}

(1)

where E represents the edge detection result, u and v denote the image coordinates, I is the input image, h is the height of the input image, and w is the width of the input image.

Figure 3a illustrates the edge detection results. As shown Figure 3a, the lane-marking edges are composed of both positive and negative edges. Using the characteristics of lane markings, candidate lane markings are extracted by identifying edge pairs in which a positive edge is followed by a negative edge [22]. For each edge pair, a positive edge appears first, followed by a negative edge, based on the image coordinates. Figure 3b shows the results of using edge pairs. The negative edge was retained for the left-lane marking to extract the inner edge relative to the ego lane, whereas the positive edge was retained for the right-lane marking.

The lane marking area settings utilize suboptimal tracking results. Lane markings were tracked using a Kalman filter [23,24,25]. The area was set based on the tracking results at a specific width, derived from the previous lane detection operation. Equation (2) illustrates the method used to set the suboptimal area.

W_{d} = (d_{c} + d_{K p}) + C_{d}

(2)

where

d_{C}

is the offset of the lane-marking model in the current frame,

d_{K p}

is the offset of the Kalman prediction result, and

C_{d}

is a constant value in the detection range.

Lane fitting was performed using third-order polynomial fitting. Equation (3) represents the formula used for suboptimal fitting [19,26].

\begin{matrix} f_{L} (x) = a_{L} \cdot x^{3} + b_{L} \cdot x^{2} + c_{L} \cdot x + d_{L} \\ f_{R} (x) = a_{R} \cdot x^{3} + b_{R} \cdot x^{2} + c_{R} \cdot x + d_{R} \end{matrix}

(3)

where

a_{L}

and

a_{R}

represent the curvature rates,

b_{L}

and

b_{R}

are the curvatures,

c_{L}

and

c_{R}

are the heading angles, and

d_{L}

and

d_{R}

are the offsets of the left and right lanes, respectively.

Figure 4a shows the results of the lane fitting. Figure 4b shows the results of the lane-tracking range determination.

3.2. Refinement of Camera Calibration Using Lane Detection

Rearview cameras use fisheye lenses to capture a wide field of view. Owing to the significant distortion caused by fisheye lenses, a distortion correction method was first applied to calibrate the camera. Distortion in fisheye lenses is characterized by barrel distortion. Therefore, the proposed distortion correction method was implemented using Equation (4) [27,28].

\begin{matrix} x_{d} = x_{u} (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) \\ y_{d} = y_{u} (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) \\ r^{2} = x_{u}^{2} + y_{u}^{2} \end{matrix}

(4)

where

x_{d}

and

y_{d}

represent the x and y coordinates in the distorted image, respectively, and

x_{u}

and

y_{u}

represent the coordinates in the undistorted image, respectively.

k_{1}

,

k_{2}

and

k_{3}

are distortion coefficients.

Figure 5 shows the distortion correction results for the fisheye lens obtained using Equation (4). Figure 5a presents the distorted image, whereas Figure 5b displays the result after distortion correction. The lane markings, which appear as curves in Figure 5a, are represented as straight lines in Figure 5b following the correction.

A perspective matrix is used to measure the distance from a camera. However, vehicle vibrations during driving, which are influenced by road surface conditions, complicate accurate distance measurements. These vibrations alter the perspective matrix, affecting the reliability of distance calculations.

Figure 6 illustrates the variation in the height of an object in the image as a result of vehicle vibrations; the height of an obstacle fluctuates between images captured eight frames apart. As illustrated in the right section of Figure 6, the vehicle exhibits vertical movement due to road irregularities, whereas the left section of Figure 6 demonstrates that vehicles at the same distance appear to have varying heights. The red line in Figure 6 indicates a height of 182 pixels, underscoring the discrepancies in the height of the vehicle.

As shown in Figure 6, the height differences led to errors in calculating the distance to the obstacles. This resulting error affects the accuracy of measuring the direction and relative distance for object tracking, ultimately hindering the accurate determination of the collision risk. To address these issues and reflect changes in the perspective matrix, the proposed method utilizes the lane detection results.

The vanishing point can be extracted using the lane detection results. The vanishing point refers to the location where two parallel lines in three-dimensional space converge into a two-dimensional image. Figure 7 illustrates the results of extracting the vanishing point formed by the intersection of the two lanes; the vanishing point is located on the horizon. Figure 7a shows the vanishing point result with the centerline image, and Figure 7b shows the result with the center-letter image.

The perspective matrix was adjusted using the extracted vanishing points. Equation (5) employs the results from the vanishing point detection operation to compute the effects of the vehicle’s vertical movement while driving. The proposed method imposes a limitation of a maximum movement of ±30 pixels to exclude exceptional situations, such as speed bumps, where the vanishing point moves significantly.

S_{d} = O_{y} - V_{y}

(5)

where

O_{y}

represents the vanishing point coordinate measured when the vehicle is stationary and

V_{y}

denotes the modified vanishing point coordinate.

S_{d}

represents the difference in distance moved.

Using Equation (5), the camera calibration can be updated to account for changes in the vertical movement of the vehicle. Equation (6) provides the formulation for the camera calibration, where

x_{i}

represents the image coordinate system, K is the intrinsic matrix, R is the rotation matrix, T is the translation matrix, and

X_{w}

denotes the world coordinate system [27].

x_{i} = K [R | T] X_{w}

(6)

The rotation matrix, R, from Equation (6) is structured as shown in Equation (7). In Equation (7),

R_{x}

denotes the rotation matrix about the x-axis,

R_{y}

denotes the rotation matrix about the y-axis, and

R_{z}

denotes the rotation matrix about the z-axis, respectively [27].

R = R_{x} R_{y} R_{z}

(7)

The movement in the y-axis position of the vanishing point can be computed as a function of the angle from the x-axis, as shown in Equation (8). In Equation (8),

S_{d}

represents the y-axis movement of the vanishing point calculated from Equation (5),

f_{y}

denotes the focal length along the y-axis, and

θ_{o}

represents the existing x-axis rotation angle.

θ_{n} = arctan (S_{d} / f y) + θ_{o}

(8)

Applying

θ_{n}

from Equation (8) to the rotation matrix

R_{x}

yields Equation (9). The camera calibration parameters in Equation (6) should be revised using the modified rotation matrix parameters.

R_{x} = [\begin{matrix} 1 & 0 & 0 \\ 0 & cos θ_{n} & - sin θ_{n} \\ 0 & cos θ_{n} & cos θ_{n} \end{matrix}]

(9)

3.3. Object Detection Using Convolutional Neural Network

The proposed method utilizes a CNN, a commonly used deep learning technique, for object detection. A CNN consists of two-dimensional filters that are highly effective in processing two-dimensional data. Therefore, this method is appropriate for detecting objects in images [29].

In the proposed method, object extraction is performed using YOLOv9. YOLOv9 is one of the fastest methods available, offering superior detection performance [30]. Considering that YOLOv9 is not based on a box-scanning approach, it offers the advantage of efficiently extracting objects from images. Furthermore, transfer learning was employed with YOLOv9, utilizing pre-trained weight values to expedite the training process and improve model performance [31].

The training dataset consisted of labeled objects in images captured by a rear camera. The objects were categorized into groups, such as motorcycles, cars, buses, and trucks. Information regarding the YOLOv9 annotation class used in this paper is presented in Table 1.

Object detection with YOLOv9 shows a low detection probability when the height of the object is less than 20 pixels. To address this issue, the proposed method sets the ROI based on lane information to improve the detection performance of distant vehicles. Transfer learning with YOLOv9 achieves optimal performance when the height and width are in a 1:1 ratio, as the model is trained on the COCO dataset. Therefore, in the proposed method, transfer learning was performed by applying zero padding to adjust the height-to-width ratio to 1:1.

The proposed method, which uses zero padding, resizes images to 640 × 640 pixels and enhances the detection performance of distant objects by creating a single image with the area set as the ROI. The ROI was defined using lane information and was sized to include the vehicle’s own lane as well as both adjacent lanes. The constructed ROI is twice the size of the original object, allowing for improved detection. While the existing method did not detect objects smaller than 20 pixels, the proposed method successfully detected objects larger than 10 pixels.

As shown in Figure 8, an ROI containing both lanes is set using the vanishing point determined from the lane information. The set ROI is resized twice horizontally and vertically to create a single image. This process allows for the detection of objects that are twice as large within the ROI.

When using the proposed method, the difference in computation time is minimal because the size of the input image remains the same. Table 2 presents cases where calculation times are compared. It was observed that the computation time differed by less than 1 ms between the zero padding method and the proposed method. The experimental conditions are detailed in Section 4.1.

Detection within the ROI focuses on objects smaller than 20 pixels. Objects larger than 20 pixels are detected without the need for the proposed method. Therefore, by using only objects that are 20 pixels or smaller within the detected ROI, it becomes possible to detect them at longer distances. Figure 9a shows the result of zero padding, while Figure 9b illustrates the case of applying the proposed method. In Figure 9b, the results detected within the ROI are displayed in red, and the previously detected results are shown in green. As seen in Figure 9b, it can be confirmed that objects smaller than 20 pixels are detected using the ROI.

3.4. Object Tracking Using Kalman Filter

The proposed method tracks an object based on the object detection results. Specifically, it monitors the movement of an object and determines whether the object is approaching or moving away from it. A collision warning must be issued for an approaching object, even at considerable distances, whereas no warning is necessary for a receding object, regardless of its proximity. Furthermore, the object tracking process compensates for instances in which an object is not detected or disappears.

In the proposed method, a Kalman filter is employed for object tracking [32]. The Kalman filter offers the advantage of enabling rapid tracking with minimal computational overhead because it performs prediction and update steps during the tracking process. The state matrix,

S_{k}

, of the Kalman filter used for object tracking is given by Equation (10).

S_{k} = [c_{x} c_{y} w h d c_{x} d c_{y} d w d h]

(10)

where

c_{x}

and

c_{y}

represent the coordinates of the object’s center point, w denotes the object’s width, and h denotes its height.

d c_{x}

and

d c_{y}

indicate the changes in the center point,

d w

signifies the change in width, and

d h

represents the change in height.

Figure 10 illustrates the results of object tracking, with the same vehicle consistently tracked and identified by a unique number. Figure 10a displays the results of the n-th frame, while Figure 10b shows the results of the (n + 10)-th frame.

3.5. Collision Warning

The blind-spot detection warning was determined based on the distance to the object and the estimated time to collision. Figure 11 illustrates the defined warning zones.

As illustrated in Figure 11, the warnings were categorized into three levels. The closer the object and the faster it approaches, the higher the warning level. Warnings are issued progressively based on the distance and time to collision. These progressive warnings are designed to provide either a driver or autonomous vehicle with information regarding the risk of collision.

Each level is associated with specific criteria. A Level 1 warning indicates the possibility of a lane change through acceleration. A Level 2 warning permits evasive steering maneuvers to avoid a potential forward collision. A Level 3 warning signifies a scenario in which a lane change is not feasible. Table 3 presents the definitions corresponding to each warning level.

The warning level is determined by the relative distance and estimated time to collision. Relative distance refers to the distance between your vehicle and an object. The relative distance is obtained using the y-coordinate at the bottom of the bounding box of the detected object. The position of the rearview camera on the vehicle and the distance to the object can be calculated through the difference in coordinates. To calculate the relative distance of an object, the distorted coordinates are first converted to undistorted coordinates using Equation (4). By applying the converted image coordinates to Equation (6), the real-world coordinates can be obtained. Relative speed is then calculated using the relative distance. The relative speed is computed using Equation (11).

v_{r e l} = (d_{n} - d_{n - 1}) * f_{i} .

(11)

where

d_{n}

represents the distance to the object in the n-th frame,

d_{n - 1}

represents the distance to the object in the

(n - 1)

-th frame, and

f_{i}

represents the frame rate of the camera.

The time to collision is determined based on the relative distance and relative speed. Equation (12) defines the collision time [33].

t_{C} = d_{r e l} / v_{r e l} .

(12)

where

d_{r e l}

represents the relative distance and

v_{r e l}

denotes the relative speed. The warning level was determined using Equation (13).

\begin{matrix} W_{C} = \{\begin{matrix} 3, i f t_{C} < T_{t 3} o r d_{r e l} < T_{d 3} \\ 2, i f T_{t 3} \leq t_{C} < T_{t 2} o r T_{d 3} \leq d_{r e l} < T_{d 2} \\ 1, i f T_{t 2} \leq t_{C} < T_{t 3} o r T_{d 2} \leq d_{r e l} < T_{d 1} \\ 0, e l s e \end{matrix} \end{matrix}

(13)

where

W_{c}

is the warning level,

t_{c}

is the time to collision, and

d_{r e l}

is the relative distance.

T_{t 1}

,

T_{t 2}

, and

T_{t 3}

denote the thresholds based on time to collision, whereas

T_{d 1}

,

T_{d 2}

, and

T_{d 3}

correspond to the thresholds based on relative distance. As illustrated in Equation (13), the warning level is determined by both the time to collision and the relative distance.

Figure 12 shows the warning results. Specifically, Figure 12a shows a Level 1 warning, Figure 12b shows a Level 2 warning, and Figure 12c shows a Level 3 warning. The warning level increases as the relative distance decreases.

4. Experimental Results

4.1. Experimental Environment

An experimental setup was established for this study. A camera was mounted on the test vehicle, a Kia Carnival, to capture the image data. The camera, installed at the rear of the vehicle, features a 185° fisheye lens with an image resolution of 1280 × 720 pixels. The experiment was conducted on an Intel NUC 11 Enthusiast equipped with an Intel i7-1165G7 processor, an NVIDIA GeForce RTX 2060 graphics card, and 32GB of RAM. Figure 13 illustrates the test vehicle equipped with a rearview camera.

The datasets were collected from diverse environments, as illustrated in Figure 14, which presents an overview of the acquisition conditions across different settings. The data were collected on urban roads and highways, encompassing a variety of vehicle types. Additionally, data from under bridges were obtained to account for lighting changes. The dataset also includes scenarios with a high number of objects on urban roads and highways. Figure 14a illustrates the situation under a bridge on a highway, showing changes in lighting. Figure 14b depicts the shadow caused by side trees on an urban road. Figure 14c shows the area under a bridge on an urban road. Figure 14d presents a situation with many vehicles on an urban road. Figure 14e depicts a case where the lane is colored red. Figure 14f shows a situation with a crosswalk during a right turn at an intersection. Figure 14g illustrates a typical highway scenario, and Figure 14h shows a situation with many vehicles on a highway.

4.2. Experimental Results

Processing each image frame takes 85.87 ms on average; thus, we achieved a frame rate of approximately 11.64 frames per second. The average amount of processing time per algorithm step is summarized in Table 4.

For object detection training using YOLOv9, the dataset is organized into training, validation, and test sets at a 6:2:2 ratio, as presented in Table 5.

The object compositions of each set are listed in Table 6. The car class comprises the largest number of objects, buses and trucks are less represented, and motorcycles have the fewest instances.

Table 7 presents the results of transfer learning using the YOLOv9 model. The precision was 90.1%, and the recall was 82.8%. A higher precision than recall indicates a greater probability of failing to detect an object but a lower probability of incorrectly detecting the object. Specifically, for motorcycles, the recall is low, at 80.2%, due to their small size, while for cars, the recall is 79.5%, which can be attributed to a reduction in detection performance for vehicles in the opposite lane on urban roads.

A comparison with the Faster RCNN method was conducted [34]. The results of the Faster RCNN are shown in Table 8, and it can be observed that the results of YOLOv9 are favorable. The Faster RCNN utilizes the largest portion of the dataset, but since motorcycles, buses, and trucks have less training data, it can be seen that the precision and recall for these categories are lower compared to those for cars.

Table 9 shows the results under highway conditions. The precision in highway situations is 94.3%, and the recall is 92.8%. Motorcycles were excluded from the dataset as they are not typically found on highways. The performance difference between highways and urban roads is due to two main factors: (1) the presence or absence of motorcycles and (2) the presence or absence of vehicles in the opposite lane. First, the absence of motorcycles on the highway contributed to the performance difference. Additionally, on highways, vehicles in the opposite lane are not visible due to the median barrier. In contrast, on urban roads, vehicles in the opposite lane are visible, which results in a lower detection rate for vehicles in the opposite lane. Consequently, the detection rate is higher on highways.

Table 10 shows the results when comparing the method using zero padding and the ROI method discussed in Section 3.3. The method using the ROI reduced precision by 0.4 percentage points but increased recall by 4.5 percentage points. It was confirmed that objects that could not previously be detected at a distance were successfully detected, leading to an increase in recall. However, precision decreased due to an increased rate of misdetection of small objects. The proposed method offers the advantage of detecting distant objects and increasing recall, which is why it was used for detecting distant objects.

Figure 15 illustrates the obtained results, demonstrating the extraction of objects in various scenarios. Figure 15a illustrates a case with numerous vehicles on a highway, whereas Figure 15b depicts heavy traffic on an urban road. Figure 15c shows a scenario where a motorcycle is detected on red-colored lane, and Figure 15d presents a case where both a truck and an oncoming vehicle are detected on an urban road. Figure 15e presents object detection results at a crossroad. Figure 15f shows detection results under a bridge on an urban road, where minimal shadow and vehicle visibility enable successful detection. Figure 15g,h present the results for misdetection and missed detection. In Figure 15g, a nearby truck under a bridge is detected, but a distant car is not due to poor visibility caused by backlighting. Figure 15h shows a misdetection, where a toll gate is mistakenly identified as an object because its complex structure resembles typical objects of interest.

Table 11 presents the lane detection results, which indicate a lane detection rate of 91.52%. Lane detection fails in scenarios with lanes affected by backlighting or blurriness, where edges are not clearly distinguishable, and misdetection occurs at crosswalks and crossroads.

Figure 16 illustrates the lane detection cases for various scenarios. Figure 16a depicts multi-yellow lane markings, Figure 16b presents a red lane, Figure 16c shows the presence of shadows, and Figure 16d highlights the center letters. Figure 16e,f illustrate cases of undetected and misdetected lane markings. In Figure 16e, a crosswalk is mistakenly detected as a lane marking, whereas Figure 16f shows the result where lane detection is not performed due to blurred lane markings and light reflection caused by backlighting.

Figure 17 shows a comparison of the relative distance extraction results with and without the use of vanishing points. Figure 17a presents the distance calculations without a vanishing point, where the blue line indicates the extracted distance value and the red line represents the one-dimensional fit. Figure 17b shows the distance calculations using a vanishing point, where the blue line represents the extracted distance and the red line indicates the one-dimensional fit.

Table 12 presents the mean, variance, and standard deviation of the errors in the above graph. These results indicate that using the vanishing point enables stable distance extraction.

Figure 18 shows the results of the collision warning system. As shown in Figure 18, collision warnings are activated when an object is approached. In Figure 18, green represents a Level 1 collision warning, yellow represents Level 2, and red represents Level 3. Figure 18a shows the collision warning for a motorcycle in an urban setting, Figure 18b shows the warning for a car on an urban road, and Figure 18c shows the collision warning for a truck on a highway.

5. Conclusions

This paper presents a method for detecting objects and issuing collision warnings for blind-spot detection and lane change assistance using a rearview camera. Lane detection was performed to ascertain the vehicle’s location, and a technique was employed to compensate for vertical vibrations induced by vehicle motion to delineate the area for blind-spot detection and assess the time to collision.

The proposed lane detection methodology extracts lane markings by applying edge pairs derived from edge detection results, conducting third-degree polynomial fitting to identify lanes, and enhancing lane detection performance by leveraging historical lane data through tracking mechanisms. The vertical vibrations caused by vehicle motion are compensated for by utilizing the vanishing point at which the two lanes converge. Furthermore, object detection was performed using YOLOv9, resulting in a precision of 90.2% and a recall of 82.8%. The relative speed was calculated from the relative distance using object tracking information. The collision time was then determined using the relative distance and relative speed. A three-level collision warning system was implemented based on collision time and relative distance. The proposed method demonstrated a performance of 11.64 fps with an execution speed of 85.87 ms. Through this three-level collision warning method, the system provides real-time side and rear collision warnings to drivers and autonomous vehicles when changing lanes, functioning as part of a blind-spot detection and lane change assistance system. Moreover, the system effectively issues collision risk warnings for various objects through a three-level collision warning framework that adapts to the relative speed of the obstacle.

In future work, we aim to expand on this research to develop a system that operates not only during the daytime but also at night by performing object detection and lane detection in nighttime conditions using a rearview camera. Additionally, we will explore how to ensure system functionality in various adverse weather conditions, such as rain or snow, by acquiring relevant data in these challenging environments. Furthermore, we have enhanced the detection of omnidirectional objects by implementing an around-view monitoring system, which employs four cameras strategically positioned to cover all vehicle directions. This advancement is expected to improve performance in situations where obstacle detection is challenging. Since the around-view monitoring system must detect all surrounding obstacles, we plan to extend our study to include object detection not only for cars, motorcycles, trucks, and buses but also for pedestrians and bicycles. To compare performance, we will investigate and evaluate the effectiveness of radar and lidar by installing these distance sensors on the experimental vehicle, allowing for a direct comparison of their performance. Additionally, we intend to focus on optimizing deep learning algorithms for deployment in devices equipped with on-device artificial intelligence (AI). In the future, we aim to harness on-device AI for applications in autonomous vehicles, particularly for facilitating lane-changing maneuvers.

Author Contributions

Y.L. developed the algorithm and performed the experiments. M.P. contributed to the validation of the experiments and research supervision. Y.L. contributed to the manuscript writing. M.P. contributed to the review and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Trade, Industry and Energy (MOTIE, Korea) (No.20018055, Development of FailOperation technology in Lv.4 autonomous driving systems).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

Author Yun Hee Lee was employed by the company IVIS. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Trivedi, M.M.; Gandhi, T.; McCall, J. Looking-in and looking-out of a vehicle: Computer-vision-based enhanced vehicle safety. IEEE Trans. Intell. Transp. Syst. 2007, 8, 108–120. [Google Scholar] [CrossRef]
Jung, H.G.; Kim, D.S.; Yoon, P.J.; Kim, J. Parking slot markings recognition for automatic parking assist system. In Proceedings of the 2006 IEEE Intelligent Vehicles Symposium, Tokyo, Japan, 13–15 June 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 106–113. [Google Scholar]
Cicchino, J.B. Effects of blind spot monitoring systems on police-reported lane-change crashes. Traffic Inj. Prev. 2018, 19, 615–622. [Google Scholar] [CrossRef] [PubMed]
Ra, M.; Jung, H.G.; Suhr, J.K.; Kim, W.Y. Part-based vehicle detection in side-rectilinear images for blind-spot detection. Expert Syst. Appl. 2018, 101, 116–128. [Google Scholar] [CrossRef]
Liu, G.; Wang, L.; Zou, S. A radar-based blind spot detection and warning system for driver assistance. In Proceedings of the 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 25–26 March 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2204–2208. [Google Scholar]
Kim, W.; Yang, H.; Kim, J. Blind Spot Detection Radar System Design for Safe Driving of Smart Vehicles. Appl. Sci. 2023, 13, 6147. [Google Scholar] [CrossRef]
Kwon, D.; Malaiya, R.; Yoon, G.; Ryu, J.T.; Pi, S.Y. A study on development of the camera-based blind spot detection system using the deep learning methodology. Appl. Sci. 2019, 9, 2941. [Google Scholar] [CrossRef]
Lee, H.; Ra, M.; Kim, W.Y. Nighttime data augmentation using GAN for improving blind-spot detection. IEEE Access 2020, 8, 48049–48059. [Google Scholar] [CrossRef]
Foumani, M.; Moeini, A.; Haythorpe, M.; Smith-Miles, K. A cross-entropy method for optimising robotic automated storage and retrieval systems. Int. J. Prod. Res. 2018, 56, 6450–6472. [Google Scholar] [CrossRef]
Mahapatra, R.; Kumar, K.V.; Khurana, G.; Mahajan, R. Ultra sonic sensor based blind spot accident prevention system. In Proceedings of the 2008 International Conference on Advanced Computer Theory and Engineering, Phuket, Thailand, 20–22 December 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 992–995. [Google Scholar]
Tseng, D.C.; Hsu, C.T.; Chen, W.S. Blind-spot vehicle detection using motion and static features. Int. J. Mach. Learn. Comput. 2014, 4, 516. [Google Scholar] [CrossRef]
Lin, B.F.; Chan, Y.M.; Fu, L.C.; Hsiao, P.Y.; Chuang, L.A.; Huang, S.S.; Lo, M.F. Integrating appearance and edge features for sedan vehicle detection in the blind-spot area. IEEE Trans. Intell. Transp. Syst. 2012, 13, 737–747. [Google Scholar]
Guo, Y.; Kumazawa, I.; Kaku, C. Blind spot obstacle detection from monocular camera images with depth cues extracted by CNN. Automot. Innov. 2018, 1, 362–373. [Google Scholar] [CrossRef]
Zhao, Y.; Bai, L.; Lyu, Y.; Huang, X. Camera-based blind spot detection with a general purpose lightweight neural network. Electronics 2019, 8, 233. [Google Scholar] [CrossRef]
Tsuchiya, C.; Tanaka, S.; Furusho, H.; Nishida, K.; Kurita, T. Real-time vehicle detection using a single rear camera for a blind spot warning system. SAE Int. J. Passeng. Cars-Electron. Electr. Syst. 2012, 5, 146–153. [Google Scholar] [CrossRef]
Dooley, D.; McGinley, B.; Hughes, C.; Kilmartin, L.; Jones, E.; Glavin, M. A blind-zone detection method using a rear-mounted fisheye camera with combination of vehicle detection methods. IEEE Trans. Intell. Transp. Syst. 2015, 17, 264–278. [Google Scholar] [CrossRef]
Jung, K.H.; Yi, K. Vision-based blind spot monitoring using rear-view camera and its real-time implementation in an embedded system. J. Comput. Sci. Eng. 2018, 12, 127–138. [Google Scholar] [CrossRef]
Narote, S.P.; Bhujbal, P.N.; Narote, A.S.; Dhane, D.M. A review of recent advances in lane detection and departure warning system. Pattern Recognit. 2018, 73, 216–234. [Google Scholar] [CrossRef]
Lee, Y.; Park, M.k.; Park, M. Improving Lane Detection Performance for Autonomous Vehicle Integrating Camera with Dual Light Sensors. Electronics 2022, 11, 1474. [Google Scholar] [CrossRef]
Dong, Y.; Xiong, J.; Li, L.; Yang, J. Lane detection based on object segmentation and piecewise fitting. In Proceedings of the ICCP Proceedings, Seattle, WA, USA, 28–29 April 2012; pp. 461–464. [Google Scholar]
Tu, C.; Van Wyk, B.; Hamam, Y.; Djouani, K.; Du, S. Vehicle position monitoring using Hough transform. IERI Procedia 2013, 4, 316–322. [Google Scholar] [CrossRef]
Jung, H.; Lee, Y.; Kang, H.; Kim, J. Sensor fusion-based lane detection for LKS+ ACC system. Int. J. Automot. Technol. 2009, 10, 219–228. [Google Scholar] [CrossRef]
Borkar, A.; Hayes, M.; Smith, M.T. Robust lane detection and tracking with ransac and kalman filter. In Proceedings of the 2009 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–10 November 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 3261–3264. [Google Scholar]
Redmill, K.A.; Upadhya, S.; Krishnamurthy, A.; Ozguner, U. A lane tracking system for intelligent vehicle applications. In Proceedings of the ITSC 2001, 2001 IEEE Intelligent Transportation Systems, Proceedings (Cat. No. 01TH8585), Oakland, CA, USA, 25–29 August 2001; IEEE: Piscataway, NJ, USA, 2001; pp. 273–279. [Google Scholar]
Choi, H.C.; Park, J.M.; Choi, W.S.; Oh, S.Y. Vision-based fusion of robust lane tracking and forward vehicle detection in a real driving environment. Int. J. Automot. Technol. 2012, 13, 653–669. [Google Scholar] [CrossRef]
Son, Y.S.; Kim, W.; Lee, S.H.; Chung, C.C. Robust multirate control scheme with predictive virtual lanes for lane-keeping system of autonomous highway driving. IEEE Trans. Veh. Technol. 2014, 64, 3378–3391. [Google Scholar] [CrossRef]
Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
Jung, H.G.; Lee, Y.H.; Yoon, P.J.; Kim, J. Radial distortion refinement by inverse mapping-based extrapolation. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; IEEE: Piscataway, NJ, USA, 2006; Volume 1, pp. 675–678. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. Yolov9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst. 2014, 27, 1–9. [Google Scholar]
Zhang, X.; Gao, H.; Xue, C.; Zhao, J.; Liu, Y. Real-time vehicle detection and tracking using improved histogram of gradient features and Kalman filters. Int. J. Adv. Robot. Syst. 2018, 15, 1729881417749949. [Google Scholar] [CrossRef]
Bella, F.; Russo, R. A collision warning system for rear-end collision: A driving simulator study. Procedia-Soc. Behav. Sci. 2011, 20, 676–686. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]

Figure 1. Proposed blind-spot detection and lane change assistance system using CNN and lane detection.

Figure 2. Block diagram for lane detection using rearview camera.

Figure 3. (a) Edge detection result and (b) edge pairing result.

Figure 4. (a) Lane-fitting result and (b) lane-tracking range determination result.

Figure 5. Fisheye lens undistortion result: (a) fisheye lens image and (b) undistorted image.

Figure 6. Comparison of object height with vehicle vibrations.

Figure 7. (a) Vanishing point detection result with centerline and (b) vanishing point detection result with center letter.

Figure 8. Method for creating ROI image using lane detection results.

Figure 9. Object detection result (Red box detected in ROI and green box detected in original image): (a) zero padding result and (b) proposed method result.

Figure 10. Object tracking result: (a) car and truck tracking result in the n-th frame, and (b) car and truck tracking result in the

(n + 10)

-th frame.

Figure 10. Object tracking result: (a) car and truck tracking result in the n-th frame, and (b) car and truck tracking result in the

(n + 10)

-th frame.

Figure 11. Defined warning zone.

Figure 12. Collision warning result (Green box is lv.1, yellow box is lv.2 and red box is lv.3): (a) Level 1 warning, (b) Level 2 warning, and (c) Level 3 warning.

Figure 13. Test vehicle and rearview camera installation: (a) test vehicle and (b) installed rearview camera.

Figure 14. Example test images: (a) under bridge on highway, (b) shadows, (c) under bridge on urban road, (d) many objects on urban road, (e) red-colored lane, (f) crosswalk, (g) highway, and (h) many objects on highway.

Figure 15. Object detection results: (a) detection result on highway, (b) detection result on urban road, (c) motorcycle detection result on urban road, (d) detection of a truck and opposite object detection result, (e) crossroad, (f) under bridge on urban road, (g) missed detection, and (h) misdetection.

Figure 16. Lane detection results: (a) multi-yellow lane markings, (b) red-colored lane, (c) shadows, (d) center letters, (e) crosswalk, and (f) blurry lane markings.

Figure 17. Comparison of relative distance extraction results: (a) with and (b) without the use of vanishing point.

Figure 18. Collision warning results (Green box is lv.1, yellow box is lv.2 and red box is lv.3): (a) motorcycle collision warning on urban road, (b) car collision warning on urban road, and (c) truck collision warning on highway.

Table 1. Information about annotation classes.

Class	Car	Motorcycle	Bus	Truck
Class number	1	2	3	4

Table 2. Comparison of execution time between the zero padding method and the proposed method.

	Zero Padding	Proposed Method
Execution time	62.48 ms	63.06 ms

Table 3. Definitions of warning levels.

Level	Definition
1	Indicates the possibility of a lane change through acceleration
2	Permits evasive steering maneuvers to avoid a potential forward collision.
3	Denotes a scenario in which a lane change is not feasible.

Table 4. Execution time for each step.

Step	Lane Detection	Vehicle Detection	Collision Warning	Sum
Average time	22.72 ms	63.15 ms	<0.01 ms	85.87 ms

Table 5. Dataset content.

Total Image	Training Set	Validation Set	Test Set
12,537	7519	2511	2507

Table 6. Number of objects in each class.

Class	Training Set	Validation Set	Test Set
Motorcycle	294	86	85
Car	28,925	9695	9644
Bus	1549	524	521
Truck	1704	591	583
Total	32,472	10,896	10,833

Table 7. Transfer learning result using YOLOv9.

Class	Test Set	Precision	Recall
Motorcycle	85	86.5	80.2
Car	9644	91.5	79.5
Bus	521	91.2	86.5
Truck	583	91.1	85
Total	10,833	90.1	82.8

Table 8. Transfer learning result using Faster RCNN.

Class	Test Set	Precision	Recall
Motorcycle	85	57.10	78.43
Car	9644	71.69	77.22
Bus	521	54.84	59.07
Truck	583	57.57	74.43
Total	10,833	70.28	77.20

Table 9. Detection results on highway.

Class	Test Set	Precision	Recall
Car	2198	96.9	93.2
Bus	450	87.6	89.7
Truck	344	98.5	92.8
Total	2992	94.3	91.9

Table 10. Detection results of method using ROI.

Class	Test Set	Precision	Recall
Motorcycle	85	85.7	84.7
Car	9644	91.1	86.3
Bus	521	90.7	88.6
Truck	583	90.4	88.8
Total	10,833	89.7	86.6

Table 11. Lane detection results.

Test Image	Detection Image	Detection Rate
4872	4459	91.52%

Table 12. Comparison of relative distance with and without vanishing point.

	Mean	Variance	Standard Deviation
Without vanishing point	0.3354	0.0657	0.2563
Using vanishing point	0.1863	0.0119	0.1093

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, Y.; Park, M. Rearview Camera-Based Blind-Spot Detection and Lane Change Assistance System for Autonomous Vehicles. Appl. Sci. 2025, 15, 419. https://doi.org/10.3390/app15010419

AMA Style

Lee Y, Park M. Rearview Camera-Based Blind-Spot Detection and Lane Change Assistance System for Autonomous Vehicles. Applied Sciences. 2025; 15(1):419. https://doi.org/10.3390/app15010419

Chicago/Turabian Style

Lee, Yunhee, and Manbok Park. 2025. "Rearview Camera-Based Blind-Spot Detection and Lane Change Assistance System for Autonomous Vehicles" Applied Sciences 15, no. 1: 419. https://doi.org/10.3390/app15010419

APA Style

Lee, Y., & Park, M. (2025). Rearview Camera-Based Blind-Spot Detection and Lane Change Assistance System for Autonomous Vehicles. Applied Sciences, 15(1), 419. https://doi.org/10.3390/app15010419

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Rearview Camera-Based Blind-Spot Detection and Lane Change Assistance System for Autonomous Vehicles

Abstract

1. Introduction

2. Related Works

3. Proposed Method

3.1. Lane Detection

3.2. Refinement of Camera Calibration Using Lane Detection

3.3. Object Detection Using Convolutional Neural Network

3.4. Object Tracking Using Kalman Filter

3.5. Collision Warning

4. Experimental Results

4.1. Experimental Environment

4.2. Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI