1. Introduction
Continuous and reliable localization is crucial for agricultural robots in environment perception, path planning, planning and decision-making, and motion control [
1,
2]. In the past decade, a multitude of robot localization solutions utilizing single-type sensors, such as Inertial Navigation Systems (INS), Global Navigation Satellite Systems (GNSS), and LiDAR/camera-based map matching, have been developed. However, these methods are inherently limited in their applicability to specific agricultural environments, such as manually cultivated orchards. Generally speaking, the Inertial Measurement Unit (IMU) may initially exhibit high accuracy, but inherent measurement bias and noise can result in cumulative error over prolonged periods. GNSS is capable of achieving centimeter-level accuracy in Real-Time Kinematic (RTK) mode. Its accuracy is significantly compromised when operating under canopy cover from trees or buildings, leading to reduced precision within the decimeter or even meter range. This decrease in signal strength renders GNSS data temporarily unreliable and unstable, especially in regions characterized by dense vegetation or urban landscapes where signal obstructions are prevalent. LiDAR/camera-based odometry methods rely on the number and distribution of structured features. The absence of well-defined points, lines, and surface features in agricultural settings, such as orchards, substantially undermines their robustness. Furthermore, camera-based methods are vulnerable to fluctuations in lighting conditions, which further limits their applicability in agricultural settings. Given the constraints of single-sensor systems, a multi-source sensor fusion framework has become the primary approach for addressing localization challenges in orchards.
Early research was predominantly concentrated on mitigating the temporary decline in performance of GNSS by integrating IMU. Researchers have developed methodologies for real-time estimation of optimal state variables (e.g., position, velocity, and attitude) by integrating measurement data, noise characteristics, and dynamic models from both IMU and GNSS systems. This was accomplished through the application of filtering algorithms such as Kalman filtering [
3,
4], complementary filtering [
5], and Mahony filtering [
6]. Other researchers have also developed semi-global exponential-stable observers for IMUs and global positioning sensors such as GNSS and UWBs [
7,
8,
9]. This method not only addresses data reliability issues arising from GNSS signal loss or interference, but also effectively compensates for cumulative errors associated with long-term IMU pre-integration. As a result, it achieves uninterrupted, precise, and dependable navigation. However, robots may encounter extended periods of GNSS instability when traversing through rows of fruit trees. Relying exclusively on IMU integration in such situations could compromise the robot’s pose estimation accuracy. Hence, it is imperative to find solution that can handle these transitions seamlessly in order to maintain effective robot localization within challenging environments.
The advancement of perceptual sensor technology, including LiDAR and cameras, has propelled robot odometry to a higher level. These sensors enable the perception of the external environment, providing a rich array of environmental features such as color, texture, and structure to estimate self-motion. Compared to cameras, LiDAR has the capability to directly capture three-dimensional structural information of the environment and is less susceptible to variations in lighting and weather conditions. Its long-range capabilities and high precision have attracted significant attention from researchers, making it a valuable tool for scientific investigation. LiDAR is commonly combined with IMU and GNSS to achieve precise global state estimation and localization free from drift. According to previous literature [
1,
10], odometry based on multi-sensor fusion techniques with LiDAR can be classified as loosely coupled and tightly coupled, depending on the sensor coupling method.
Loosely Coupled LiDAR-Based Odometry: This method operates on the principle that data from LiDAR and other sources are processed independently and then fused together. LOAM [
11] and LeGO-LOAM [
12] serve as typical examples. The pose derived from IMU measurements is utilized to correct distortions in the point cloud scan, while also providing initial motion estimates for subsequent scan registration. Extended Kalman Filtering (EKF) is a widely employed technique for loosely coupled LiDAR-based odometry. For example, adaptive Kalman filtering is utilized to integrate GPS/INS/LiDAR data for robot pose estimation, thereby enhancing localization robustness [
13]. Cartographers integrate GNSS, wheel odometry, LiDAR, and IMU through graph optimization [
14]. Meanwhile, LOCUS [
15] and LOCUS 2.0 [
16] rely on LiDAR-based odometry and employ a straightforward rate-check of sensors as a health-metric-driven dynamic switching mechanism to achieve robust positioning in challenging perception conditions. The method of loosely coupled sensor integration, which utilizes multiple sensors to obtain a single measurement, effectively addresses the limitations of single-sensor odometry by leveraging the unique strengths of each sensor. The current system, however, does not fully exploit the potential for synergistic integration of sensors, which could be further optimized.
Tightly Coupled LiDAR-based Odometry: In contrast to loosely coupled strategies, tightly coupled LiDAR-based odometry facilitates seamless data processing from multiple sensors within a unified framework. This approach improves the precision and robustness of robot pose estimation by utilizing the relationships between different sensors to optimize data fusion [
17]. This strategy can be further divided into filter-based and optimization-based methods. Iterative Kalman filters (IEKF) are commonly employed in filter-based methods, as demonstrated in FAST-LIO [
18] and FAST-LIO2 [
19], to mitigate estimation errors. LIO-EKF [
20] which utilizes point-to-point registration and the classical extended Kalman filter scheme, presents an adaptive data association method without requiring parameter adjustments. However, the fundamental principle of the EKF entails estimating the state of a nonlinear system through linearization, which introduces errors in linear approximation and presents challenges for further development. The method of graph optimization aims to achieve global optimality by constructing and refining the error graph. It has the advantages of global optimization, high precision, and strong flexibility. LIO-SAM [
21] establishes a tightly coupled LiDAR-IMU framework with exceptional precision and feasibility. LIO-Vehicle [
22] further enhances the capabilities of LIO by incorporating a vehicle dynamics model, expanding its applicability to vehicular scenarios. The graph optimization-based method, however, incurs significant computational overhead and demonstrates suboptimal real-time performance.
The aforementioned methods are implemented in discrete time, potentially compromising accuracy. Consequently, there has been a growing interest in continuous-time methods. Elastic LiDAR Fusion [
23] optimizes continuous linear trajectories for scan distortions, while Wildcat [
24,
25] iteratively fits cubic B-splines to eliminate surface map distortions. Recently, CT-ICP [
26] and ElasticLiDAR++ [
27] have introduced continuous-time trajectories using pure LiDAR methods, which are parameterized by two poses per scan. This enables flexible scan registration during optimization process. KISS-ICP [
28], inspired by CT-ICP, relies on point-to-point ICP combined with adaptive thresholding for correspondence matching a robust kernel, a simple but widely applicable motion compensation approach, and a point cloud subsampling strategy. These features endow KISS-ICP with considerable resilience when dealing with challenge environments. DLIO [
29] has introduced a novel coarse-to-fine approach for constructing continuous-time trajectories to achieve precise motion correction. The essence of DLIO lies in formulating a set of analytical equations parameterized solely by time, enabling fast and parallelizable point-wise deskewing. These methods heavily rely on the accuracy of scan-to-scan or scan-to-map matching. In unstructured scenarios, sparse features can introduce misalignment in point clouds, leading to the collapse of the odometry system.
Consequently, numerous studies incorporate GNSS data as an absolute observational constraint in the odometry system to enhance robustness and precision. Gao, Y. et al. [
30] incorporated integrated high-frequency pose estimations from GNSS/INS as prior information for scan matching in the LiDAR module. To further improve the system stability and robustness, Chiang, K.W. et al. [
31] proposed a semi-tightly coupled integration of GNSS, IMU, and grid-based LiDAR, that fused the position and attitude information from each sensor. To fully exploit the benefits of GNSS, some literature have integrated the raw data from GNSS with LiDAR-based odometry [
32]. For instance, Li, S. et al. [
33] employed the EKF to directly fuse the GNSS phase and pseudorange measurements, MEMS-IMU data, sparse visual landmarks, as well as extracted LiDAR edge/planar features. LIO-Fusion [
34] employed factor graph optimization to fuse LiDAR, IMUs, encoders, GNSS, and a priori for achieving real-time robust odometry and preventing sensor failures.
All of the above methods are primarily applied in indoor or road-like structured scenes. In contrast, orchards are typically unstructured agricultural scenes characterized by sparse geometric features. These features exhibit high similarity and repetitive occurrence. Consequently, this poses a significant challenge for LiDAR-based odometry which heavily relies on point cloud registration. Moreover, agricultural environments introduce additional obstacles such as uneven terrain and potential interference from surrounding vegetation. These factors contribute to the intricacy of robust odometry measurements, necessitating sophisticated algorithms and sensor fusion technology for precise localization in such settings. Furthermore, the presence of dynamic elements, such as moving livestock or machinery, introduces an additional layer of complexity that must be carefully considered to achieve reliable odometry measurements in agricultural settings. Numerous scholars have conducted extensive research and obtained commendable outcomes. Nevertheless, when implemented in the context of this study, they may encounter setbacks due to the distinctive characteristics of the aforementioned environment [
35,
36,
37].
In view of these challenges, we propose a tightly coupled LiDAR-based odometry (GLIO) framework, which integrates multi-sensor information, such as GNSS, IMU, and LiDAR information. It can provide precise positioning and sparse mapping capabilities in agricultural environments. Based on DLIO [
29] and KISS-ICP [
28], we utilized GNSS as absolute observational constraints to mitigate point cloud registration errors and deviations based on IMU nonlinear trajectory planners. Furthermore, we introduce an adaptive threshold to improve the accuracy of point cloud registration. In summary, the contributions of this paper are as follows:
- (1)
We employ a health-check mechanism to filter GNSS data and integrate it with the IMU, thereby facilitating the robot’s estimation poses in orchard characterized by sparse geometric features and bumpy terrain;
- (2)
We propose a novel approach to enhance alignment stability in orchard by integrating IMU and GNSS information into scan-to-map using the point-to-point ICP algorithm, complemented with adaptive thresholding;
- (3)
We use GNSS as absolute observational constraints to improve the robustness of LiDAR-based odometry, thereby compensating for the influence of initial values on the nonlinear estimator;
- (4)
Extensive experiments conducted on various public datasets and our self-made orchard dataset demonstrate the robustness of our approach in maintaining precise and consistent odometry, even in challenging agricultural environments.
2. System Description
The spatial coordinates in our unified system are defined by incorporation data from multiple sensors, based on the following elements:
Relative Sensor Coordinate: The measurements from each sensor are captured using their respective coordinate system, indicating that the relative sensor coordinates correspond to the coordinates in which each individual sensor is positioned. Relative sensor coordinates in our framework include LiDAR coordinates , IMU coordinates , and robot body coordinates . The LiDAR and IMU coordinates are aligned with themselves. The robot’s body coordinate is positioned at its center of mass, where the x-axis indicates forward direction, the y-axis indicates pointing to the left, and the z-axis denotes upward direction. To ensure accurate data integration, the coordinates of the IMU, LiDAR, and robot must be aligned using an extrinsic parameter matrix. This alignment guarantees accurate measurements from different sensors are correctly integrated within the unified system, allowing for precise and reliable navigation and localization.
Absolute Sensor Coordinate: The GNSS serves as the sole absolute sensor in our system, thereby rendering the GNSS coordinates as the definitive reference for absolute position. The latitude and longitude obtained from GNSS are based on a global coordinate of Earth. To ensure consistency with other coordinates within our system, we employed the Gauss-Krueger projection for converting global coordinates into the ENU (East-North-Up) coordinate . In this coordinate, the x-axis is oriented towards the east, the y-axis points northward, and the z-axis points upward. This conversion facilitates seamless integration of GNSS data with relative sensor coordinates, thereby establishing a unified framework for precise navigation and localization.
Local World Coordinate: The LiDAR-inertial system typically operates within the local world coordinate framework. In our system, the origin of the local world coordinate can be arbitrarily fixed, with the z-axis commonly aligned with gravity. In this paper, we set the robot’s initial position as the origin of the local world coordinate . The directional axes of this local coordinate are aligned with the robot body coordinate to ensure consistency and facilitate sensor data integration. This alignment enables the local world coordinate to serve as a stable reference frame for the multi-sensor-fusion-based odometry, thereby enhancing the accuracy and reliability of robot localization.
The state of the robot at time
k can be defined within our framework as the following:
where,
is the robot’s position,
is the quaternion vector orientation represented in Hamilton notation,
is the robot’s velocity,
denotes the transformation matrix from the GNSS coordinate system
to the local world coordinate
,
represents the accelerometer bias of IMU, and
represents the gyroscope bias of IMU.
Based on DLIO [
29] and KISS-ICP [
28], we leveraged GNSS as an absolute observation to constrain the estimation bias of the nonlinear observer, thereby providing an initial transformation for scan-to-map registration. The framework consists of five parts: initialization, GNSS data health checking mechanism, point cloud deskewing and prior optimization, scan-to-map registration, and a nonlinear observer to estimate the robot’s pose and construct a 3D sparse map. The overall pipeline is illustrated in
Figure 1. This entire process is implemented within a robot operating system (ROS), as depicted in pseudocode Algorithm 1. Within ROS, GLIO comprises of two nodes: odometry and map, represented by the hollow circle and solid circle shown in
Figure 1. The map node primarily constructs a global map by utilizing keyframes and corresponding pose information obtained from the odometry node. The odometry module is primarily responsible for processing and integrating data from LiDAR, IMU, and GNSS information to estimate the robot pose. The process is as follows: (i) GLIO employs three threads for LiDAR, IMU, and GNSS, respectively, to handle the data obtained from the sensors and initializes them according to Algorithm 1. (ii) Upon receiving new point cloud
, we utilize IMU velocity integration
to estimate the relative pose transformation between points for motion distortion correction. Subsequently, by combining GNSS and IMU data, we provide prior optimization
to finish the current cloud and submap registration. (iii) The submap
generated by keyframe detection
and the submap generation module is utilized for point cloud registration. During the scan-to-map phase, we evaluate the point cloud registered pose
and GNSS relative pose
in the local world coordinate to facilitate configuration assessment of point cloud registration, and then incorporate it into a nonlinear observer for pose estimation
.
Algorithm 1. GNSS-LiDAR-Inertial Odometry. |
Input: Output: |
//LiDAR Callback Thread |
- 1:
while do - 2:
; ( Section 2.1) //motion distortion correction - 3:
for between and do - 4:
; (Equation (6)) - 5:
; - 6:
end for - 7:
for do - 8:
; (Equation (7)) - 9:
;; - 10:
end for //scan-to-map registration - 11:
; - 12:
( Section 2.4) //state update - 13:
; ( Section 2.5) - 14:
//update keyframe map - 15:
If is a keyframe then - 16:
return - 17:
end while //IMU callback thread - 18:
while and do - 19:
; ( Section 2.1) //state propagation - 20:
; ( Section 2.5) - 21:
return - 22:
end while //GNSS callback thread - 23:
while do - 24:
; ( Section 2.1) - 25:
; ( Section 2.2) - 26:
return - 27:
end while
|
2.1. Initialization
The inputs of our pipeline consist of three main components: 3D point cloud, IMU, and GNSS data, as depicted in
Figure 1. It is noted that all sensors should undergo external calibration to ensure accurate and reliable localization performance within the robot body coordinate
. Initialization processing plays a crucial role in preparing the data for input into the odometry module. The subsequent steps outline the preprocessing procedures:
- (1)
Point Cloud Preprocessing. The dense 3D point cloud is collected by mechanical LiDAR, such as RS-LiDAR or Velodyne (10 Hz~20 Hz). To minimize the loss of information, we employ a 1 m3 box filter at the origin of the point cloud to account for points that potentially originate from the robot itself.
- (2)
IMU initial processing: The IMU is capable of capturing the robot’s linear acceleration and angular velocity at a frequency ranging from 100 Hz to 500 Hz. The mathematical model representing measurements
and
obtained from the IMU [
29] is as follows:
where, the index
represents the M measurements within the time stamps
and
,
denotes the sensor bias,
represents the sensor white noise, and
g is the rotational gravity vector. In a stationary state, the IMU continues to generate output values, which significantly impact the accuracy of the sensor and GLIO system. Hence, online estimation and compensation of IMU bias errors are pivotal preprocessing steps. The process of online estimating IMU bias errors essentially entails calculating the average value of data obtained by the sensor during a 3 s static period and subsequently subtracting this average value from subsequent IMU data to rectify the sensor data.
In accordance with the principles outlined in the literature [
29,
38], it is imperative to align the IMU coordinate with the robot body, ensuring coinciding origins and consistent directional alignment. If the sensor does not align with the center of mass, the displacement effect on the linear acceleration measurements of the rigid body must be considered. Therefore, we account for all contributions to the linear acceleration at
R by considering the product of the angular velocity and the IMU offset.
- (3)
GNSS Status Check and Online Transformation Correction. When the number of received satellites is less than four, the sensor state is unreliable. Therefore, it is imperative to discard such data. The GNSS position
at time
k − 1 is represented as a one-dimensional vector encompassing longitude, latitude, and altitude measurements. The orientation
of GNSS is determined by the IMU. Thus, the GNSS transform matrix can be defined as the following:
where
is the translation vector between original of local world
and GNSS
.
is the point alignment with the local world coordinate. The initial transformation between the GNSS and the local coordinate is assumed to be represented as
. Subsequently, the transformation matrix at time k is as follows:
2.2. GNSS Healthy Check Mechanism
Traditional LiDAR-inertial odometry (LIO) solely relies on scan-to-scan registration for IMU accumulated error correction. However, the point cloud matching error is relatively significant in orchard, impacting the final robot localization results. To mitigate this effect, the GLIO system incorporates GNSS measurements. While it does not consider the reliability of GNSS data during integration due to canopy interference, GNSS sensor measurements may be erroneous or contain valid signals. Consequently, signal noise alone cannot serve as the sole criterion for the health metrics. To address this issue, we propose a distance-based health check mechanism.
The robot’s motion can be seen as continuous, and its positional changes are differentiable. Therefore, the predicted position
at timestamp
k is utilized for evaluate the GNSS position
as follows:
where, the distance confidence threshold
is set to 5 m, aligning with the distance used for selecting point cloud keyframes, to optimize computational resources. By setting the distance confidence threshold, we can ensure that only positions within this range are deemed reliable, providing a practical measure for assessing the accuracy of GNSS data. This approach effectively harnesses the robot’s capabilities while maintaining a high level of confidence in its positioning information.
2.3. Point Cloud De-Skewing and Optimization Prior
The orchard scene exhibits sparsely distributed geometric features, which are characterized by high repetitiveness. In such scenarios, achieving successful registration of the point clouds requires an accurate initial value. Otherwise, failure is likely to occur, impacting the outcomes of the nonlinear observer (as discussed in
Section 2.5). Furthermore, continuous movement of the robot during data acquisition brings motion distortion to the collected point clouds. It will impact the scan-to-scan registration result. To tackle this concern, we leverage IMU data for computing intra-frame pointwise continuous pose information.
Assuming that
is the time when the point cloud
is received. There are
N number of accumulated points during this time period. Let
be the timestamp of a specific point
in the cloud. To estimate the position of each point in the local world coordinate
W, we employ numerical integration of IMU measurements to calculate the position between
and
as follows:
where, index
represents the
M number of IMU measurements between the timestamps of two continuous scan frame, and
and
are the time-varying rates of linear acceleration and angular acceleration of the IMU, respectively.
Furthermore, we also add GNSS absolute state constraints to correct the long-term accumulated error of IMU. Thus, Equation (6) can be reformulated as follows:
where
is constant. If the timestamps of GNSS and IMU measurements are synchronized and the GNSS data is deemed healthy, then
equals 1; Otherwise it equals 0.
and
are the position and velocity of GNSS measurements in local coordinate
W, respectively. The vehicle’s movement speed
can be calculated as follow:
Therefore, the set of homogeneous transformations
that correspond to
and
subsequently define the coarse discrete-time trajectory during a sweep, as shown in
Figure 2. Consequently, this process yields a motion-corrected point cloud that is also approximately aligned with the map in W, which inherently incorporates the optimization prior utilized for scan-to-map registration.
2.4. Scan-to-Map Registration
Scan-to-map registration is the key step in the GLIO framework, as its outcomes significantly impact the accuracy of robot state estimation. Generally, point clouds are matched through feature association computation, wherein features are extracted from the point cloud using a feature descriptor. While this approach has proved effective in urban environments, the challenges arise when applying in orchards with sparse and highly repetitive features. To avoid the complicated parameter tuning process, we employ direct point cloud registration algorithms for aligning the current scan with the submap.
By simultaneously correcting for motion distortion and incorporating the point-to-point registration optimization prior into the point cloud, GLIO can directly perform scan-to-map registration and bypass the scan-to-scan procedure required in previous methods. When receiving a new scan, we initially employ the IMU-predicted state to deskew the point cloud of the scan and transform it into the local world coordinate
. Then, we adopt a subsampling strategy proposed in KISS-ICP to effectively reduce the point density within the scan and generate a local map
using a voxel grid. The correspondences between the current point cloud
and the local map
are computed by employing voxel hashing for a nearest neighbor search, resulting in a set of correspondences as follows:
where
is the
i-th point in the down-sample point cloud,
is the
j-th point in the submap, and
is the maximum distance allowed for the corresponding point from
and
. Therefore, we can get the transformation
between the current scan
and submap
through Equation (9):
where the global transformation
can be acquired to construct the global map and serves as the update signal for the nonlinear geometric observer. Among them,
is the result of the integration of the IMU and GNSS measurements of the last point, index, and M represents the sparse local map from the previous iteration process.
The effect of outlier correspondences in Equation (9) is mitigated by incorporating a maximum correspondence distance threshold
. The threshold
in KISS-ICP can be dynamically determined based on relative pose uncertainty and sensor noise. In our pipeline, the robot’s pose between the current scan and the submap is calculated through IMU/GNSS accumulated integration. Consequently, we can estimate the potential limits by analyzing the angle at which the robot may deviate from the motion prediction over time. The point-to-point distance
resulting from the robot’s relative motion can be mathematically formulated as the following:
where
represents the angle corresponding to the rotation matrix derived from the IMU, and
denotes the maximum range of the LiDAR.
Furthermore, we also account for sensor noise in our analysis. Given that the calculation between two scan frames introduces the absolute error from the GNSS, we solely focus on evaluating the impact of noise on LiDAR measurement distance accuracy, which is denoted as
. Hence, we assume their independence, the threshold
, can be expressed as the following:
2.5. Nonlinear Observer
The scan-to-map registration transformation
, as described in DLIO, is integrated with IMU measurements to generate global state estimate
through a hierarchical nonlinear geometric observer. It is proven through construction theory that the robot state
can globally converge to
in the deterministic setting with minimal computation. According to the IMU integration principle, the nonlinear observer establishes a hierarchical structure to ensure convergence of estimates towards their true values. Thus, it incorporates two branches: translation and orientation. It is well known from the literature [
29,
38] that the nonlinear observer is an open-loop control system. The input values directly affect the accuracy of the nonlinear observer. Therefore, it is imperative to have accurate inputs for the observer when implementing in orchard. In the GLIO system, we integrate the GNSS measurements and scan-to-map alignment results as the input of the nonlinear observer, as shown in
Figure 1.
If we let
be the positive constants and
be the time between scan-to-map registration result pose. If
and
(errors between propagated and measured poses) then the state correction takes the form of the following:
where we must note that Equation (13) is hierarchical as the attitude update (first two equations) is completely decoupled from the translation update (last three eqs.). Meanwhile, we use the GNSS position difference during the time period
to evaluate the point cloud matching accuracy. If the translation distance in the scan-to-map registration result is within a certain threshold
of the translation distance computed by GNSS, it will be used as an input for the nonlinear observer. Otherwise, we will input the translation vector obtained from GNSS measurement relative distance together with the rotation matrix of the registration result into the nonlinear estimator module. The distance
in our pipeline is set 0.5. At the same time, Equation (13) is a fully nonlinear update which allows one to guarantee the state estimates are accurate enough to directly perform scan-to-map registration with an GNSS/IMU prior without the need for scan-to-scan.