2. Related Work
Underwater mapping is a valuable tool in a wide range of applications, including marine biology, geology, archaeology, and the offshore industry. Various methods and sensors are employed to obtain 3D reconstructions of the environment, with data typically collected by divers, ROVs ( Remotely Operated Vehicle), or AUVs ( Autonomous Underwater Vehicle). In most cases, this involves performing surveys with downward-facing sensors and reconstructing the seafloor surface [
4,
5]. The studies conducted by the authors of [
6,
7] provide comprehensive reviews of the various underwater 3D reconstruction methods. This synthesis of methods is illustrated in
Figure 2.
Sensors used for 3D reconstruction can be classified into two main categories:
Active sensors: these sensors physically interact with the environment by emitting a signal or wave, such as sound waves in the case of sonars or light waves in the case of LiDAR systems.
Passive sensors: these sensors do not modify the environment; the most commonly used ones for 3D reconstruction are camera-based.
Time-of-flight methods are the most direct methods for 3D reconstruction. They work on the same principle as a rangefinder, namely emitting a signal or wave and retrieving its echo. By knowing the speed of the signal thanks to the nature of the wave and the propagation medium, it is possible to proportionally deduce the relative distance between the sensor and the observed element from the measurement of the time elapsed between the emission of the wave and the return of the echo.
Among time-of-flight methods, those based on acoustic waves are by far the most used because their propagation is favored by the medium. They offer an interesting coverage capacity in front of the size of the areas to be studied in underwater mapping.
In addition, and to allow the localization of 3D data during the measurement, most underwater navigation algorithms are based on acoustic positioning systems such as Doppler Velocity Log (DVL) and Ultra-short baseline (USBL) [
8,
9]. On the other hand, Multibeam sonars (MBS) are adapted to bathymetric mapping (measurement of the seabed relief) [
10]; however, they are much less suitable for structured (non-flat) environments such as karst aquifers.
Gary et al. [
11] and Kantor et al. [
12] presented a 3D model of the Zacatón cenote (a water-filled cave at least 300 m deep) using sonar data collected by the DEPTHX (Deep Phreatic THermal eXplorer) vehicle. It is a NASA autonomous underwater vehicle measuring just over 2 m and weighing over a ton, prefiguring among the robots that could explore underwater environments on other planets. Its sensors consist of a DVL, a sonar, and an IMU (Inertial Measurement Unit); additionally, its movements are ensured by six motors allowing it to move in all directions.
Another method using acoustics is described by the authors of [
13], which shows the results of an AUV performing limited penetration inside an underwater cave using a mechanically scanned imaging sonar (MSIS, namely mechanically scanned imaging sonar). The results, obtained only for localization, show that it is possible to use this type of technology in karst environments.
Though they are efficient for obtaining bathymetry and macro-characteristics of habitats, these sensors nevertheless provide little information on fine characteristics (micro-bathymetry), do not allow for the recovery of the texture of the environment, and are very expensive.
Still in the class of time-of-flight methods, those based on electromagnetic waves, such as LiDAR, offer access to a very dense reconstruction of the seabed. However, on the other hand, they are reserved for very local observations, because the propagation of these waves are limited by the medium as explained in the constraints section. They are therefore not adapted to the situation envisaged.
The other class of methods are triangulation methods which are favored in this presentation. These methods require at least two devices (or two different views) which will provide distance information from the same point, and thus form a triangle between the two devices and the point to be measured.
There are several ways to perform triangulation:
Multi-view triangulation involves taking multiple views of the same scene. By knowing the disparities of each viewpoint, it is possible to project the points of the scene onto different “image” planes, and thus triangulate the points of the scene to obtain a depth map. This technique is based on epipolar geometry (explained in [
14]) with two possible approaches:
- –
The use of a binocular stereo system where the relative positions of the two cameras are known. It is possible to make the system active by using a structured light projector to add discriminant points or features related to the observed environment, and thus facilitate the matching of points between the two images for triangulation. In [
15], an original stereovision system is presented carried by divers for pipeline measurements; however, its performance needs to be evaluated.
- –
The method known as Structure from Motion (SFM) involves taking a sequence of sequential images of an object or scene, typically from a single camera. To obtain a metric reconstruction, it is necessary to know the camera trajectory to resolve the scale ambiguity.
Structured light triangulation relies on an emitter-receiver system. The emitter is a light source (laser or not) that projects a pattern (distinctive patterns) onto a scene observed by the receiver—a camera. Using the positions of the camera and the projector, it is possible to triangulate the discriminant points brought by the light in the scene captured by the camera [
16]. In [
17], results obtained on small objects (spheres) with a measurement distance of about 1 m lead to radii estimation of less than 0.5 mm. In [
18], surface measurements based on the fringe projection technique are presented. They describe an extended camera model which takes refraction effects into account. The results have a resolution of less than 150
m; the study was performed on small objects and needs to be conducted on bigger scenes.
Regarding structured light triangulation methods, they are very accurate, but are generally only used to reconstruct small areas or objects. This is because light is quickly absorbed in water; therefore, the sensor must be close to the target to best detect the projected light. Additionally, structured light sensors are not yet well-suited for mobility and are therefore mostly used statically for the moment.
Digital reflex cameras, on the other hand, have the advantage of being inexpensive, providing density as well as textural data. However, most camera-based methods operate in open areas with natural lighting (or with artificial lighting that completely illuminates the workspace) and generally try to reconstruct the ground or small objects.
For example, refs. [
19,
20] propose a 3D reconstruction method based on a stereo pair for archaeological objects in shallow water and in very poor conditions. Special filters are implemented to solve the noise problems caused by these poor conditions. This static 3D measurement is fused with a 3D map of the excavation area obtained using a scanning sonar. Brandou et al. [
21] performed a dense reconstruction of submerged structures at very high depths (up to 6000 m) with a stereo system mounted on a controlled manipulator arm. The imposition of a known trajectory allows one to know the different positions of the cameras to calculate a precise 3D model of the scene.
Beall et al. [
22] reconstructed coral reefs with a wide-baseline stereo system using high-resolution video and a dense reconstruction method.
The context of this paper is different from these works since our experiments are in total darkness in a structured and extended environment whose content appears as the diver or robot moves.
Zhou et al. [
23] presented a description of the problems related to metrology in the underwater environment. The importance of calibrating the stereo pair underwater is emphasized, taking into account the refraction problems related to the medium. The detection of interest points is performed on sufficiently bright images allowing a robust matching then used for a 3D reconstruction. The comparison of the object measurements obtained is made with a reference metrology and gives good results; however, only in static cases.
The work of [
24] is directly related to the present theme. The objective is to reconstruct a karstic underwater tunnel of
Woodville Karst Plain in Florida using a new approach to stereovision generating a 3D point cloud of the cave. In their method, after calibration and rectification of the highly distorted images, they extract the contours of the artificial light projected into the cave which creates a cone of light illuminating part of the walls. These contours are then used as inputs in their stereovision algorithm. The displacement is estimated by visual odometry (ORB-SLAM ( Oriented FAST and Rotated BRIEF-Simultaneous Localization and Mapping) [
25]) and allows a 3D reconstruction of a portion of the cave of about one hundred meters; however, it does so without specifying the comparison of the result with a known ground truth.
For an even more optimal mapping of this underwater gallery, ref. [
26] developed a SLAM method where they fuse this stereo method with a sonar and an inertial measurement unit, thus improving the accuracy of their measurements. Reviews on SLAM methods in underwater environment can be found in refs. [
27,
28].
As stated by Joshi et al. [
29] and Ende [
30], the underwater environment is very specific and induces new challenges both in exploring and mapping tasks. The interested reader can refer to the work of Han, Hao and Qi [
31] or Papp et al. [
32] for an example on how light refraction can impact photogrametric approaches, or to Hou et al. [
33] to see how the camera calibration is altered.
Of course, 3D reconstructions have not been preserved from the current trend of deep learning techniques. Ju et al. outlined a good state of such methods for calibrated approaches in ref. [
34], but the work of Abdullah et al. [
35] is more specific to the semantics of underwater scenes.
Wu et al. [
36] used a SFM method coupled with multi-view stereo photography to perform measurements of silting areas at the outlet of water transport pipelines to the seabed. The accuracy of their method allows a robust estimation of these elements useful for monitoring these areas, but it also only uses static measurements.
Fan et al. [
37] provided an overview of calibration and underwater image processing methods in structured light. Wang et al. [
38] proposed a method of 3D reconstruction in underwater environments combining a SLAM approach and dense reconstruction using a stereo system embedded on the Aqua2 robot whose IMU allows for localization. The results are mainly on coral reefs and partially on the reconstruction of underwater cavities. The question of the accuracy of the measurements with respect to a ground truth remains. Braeuer-Burchardt et al. [
39] proposed an application of 3D reconstruction for underwater archeology. Their system consists of two monochrome cameras and a sinusoidal light pattern projector. A combination of 3D reconstruction in structured light and SLAM-based visual odometry is proposed. Their design is mounted on an ROV whose IMU data complements that of visual odometry. The first results were obtained on objects in a controlled environment. A test was then carried out over a dozen meters to follow a pseudo pipeline, but the size of the system was too large for usable application in underground galleries.
Yu et al. [
40] used Ariane’s thread to enable the movement and guidance of a BlueRov robot. This learning method gives good results for localization and assisted piloting. The next steps envisaged are autonomous navigation and then mapping of the cavities studied (but were already explored by divers).
Hu et al. [
41] proposed a real-time monocular 3D reconstruction method for underwater environments. It relies on optical flow tracking of characteristic seabed points and uses Delaunay triangulation to complete the seabed estimation. The method assumes a nearly planar seabed but remains unclear about the achieved metric accuracy.
3. Materials and Methods
The complete development of the proposed method is detailed in
Figure 3 and is based on three main blocks:
Zhang’s
camera calibration method [
42] is used to estimate its intrinsic parameters, including radial distortion;
light projector calibration consists of estimating the geometric parameters of the cone (vertex , direction , half-angle of opening );
three-dimensional local reconstruction is based on the camera/projector triangulation leading to a 3D point cloud expressed in , the camera reference frame.
This triangulation is illustrated in
Figure 4. It shows the system, with one camera and light projector. To simplify the figure, the light is projected onto a plane, but one can imagine that the procedure has to be applied to non-planar surfaces such as the walls of a gallery. Once the light contour has been detected in the image, the associated camera ray is calculated for each point on the contour.
Figure 4 shows two points,
and
, belonging to the contour detected in the image and the associated rays. In the configuration shown in
Figure 4, the ray associated with
and
, respectively, intersects the cone at two points
and
, and
and
, respectively. To calculate these intersections, one needs to know the angle of aperture
and the transformation between the references of the cone and the camera called
.
Also in
Figure 4, the intersections which correspond to the 3D points of the contour (
and
) are shown in green and those which should not be taken into account (
) are shown in red. The point
is not shown, but one can imagine that it would be red and beyond the area of the plane in
Figure 4. For a given ray, it is therefore necessary to identify whether it is the first intersection or the second intersection that belongs to the observed light contour.
This method of selecting intersections requires a study of the geometric relationships between the cone and the camera.
3.1. Cone Parameters
3.1.1. Cone of Revolution
The cone of revolution, referred to as a cone in the remainder of this document, is associated with its surface of revolution and not with its solid, as may be the case in certain applications. The terms used to define the relationship between a point and the cone are as follows:
a point belonging to the cone is a point which belongs to the surface of the cone;
a point inside the cone is a point which belongs to the solid bounded by the surface of the cone;
a point outside the cone is a point which does not belong to the solid bounded by the surface of the cone.
This surface is generated by the revolution of a line, called the generatrix, noted and which passes through a vertex, in this case .
The revolution takes place around a fixed axis which also passes through the vertex and which turns out to be an axis of symmetry of the cone, the direction of which is defined by the unit vector .
If a plane not passing through is orthogonal to the axis of symmetry, then its intersection with the cone is a circle.
Thus, the angle formed by and the axis of symmetry is constant and corresponds to the half-angle of the cone opening .
The cone is represented in
Figure 5 with the orthonormal base
, oriented such that
, which defines the
reference frame associated with the cone.
The generatrix
has the unit direction vector
which is set by a new angle
(
Figure 5).
In the cone frame
, one can simply express the vector
(Equation (
1)) which then depends only on the angles
and
:
This vector can be used to express all the points on the surface parametrically.
Let
be a point
expressed in the reference frame
. Then, if
belongs to the cone, one can calculate (Equation (
2)):
where
l is a real parameter whose absolute value defines the distance between
and the cone vertex. However, with
, the parametric equation defines a “double cone”. However, since the cone models the light projector, the study is only interested in the upper part of the “double cone”, i.e., when the parameter
.
3.1.2. Orthogonal Distance
This parameterization of the cone will be useful to write the equation for the orthogonal distance between a point and the cone, an equation that will be useful for cone estimation. Let
be a point outside the cone and
be its orthogonal projection on the cone (
Figure 5). The angle
is the angle that gives the generatrix of the cone closest to
. This generatrix, called
, passes through
and has the unit direction vector
whose expression in the
reference frame is:
If
is also expressed in
with
, the expressions become (Equation (
3)):
If
then the projected point is in the lower part of the cone. Working with the upper part of the cone, when this condition is true, the nearest point is the vertex of the
cone (example in
Figure 6). The orthogonal distance
h is therefore (Equation (
4)):
3.1.3. Quadratic Form of the Cone
To obtain this relationship, one must first note that if the angle between the vectors
and
or
and
is equal to the angle
, then the point
belongs to the cone. This is equivalent to the following scalar product:
When squared, the expression becomes:
Thus, one can obtain a quadratic function of the cone in the general case (Equation
5), directly showing the parameters
,
and
which define it:
This gives the relationships between the point
and the cone (Equations (6a)–(6f)):
3.2. Projector Calibration
Projector calibration is an essential step, since it involves estimating the parameters of the cone used in this method and in the Equations (
1) to (6f). As a reminder, the parameters of the cone to be estimated are its vertex
, its direction vector
and its half-angle of aperture
.
Since and represent the relative pose of the cone with respect to the camera, the projector must be fixed with respect to the camera during calibration and during any experimentation. Furthermore, if the projector has a variable half-angle aperture, it is also necessary to ensure that this is fixed.
3.2.1. Three-Dimensional Point Generation for Cone Estimation
The first step in the calibration method is to generate a cloud of 3D points belonging to the cone. To achieve it, the chosen method is to capture several light projection images on a flat surface.
This light projection is the result of the intersection between the projector cone and the flat surface, which is by definition a conic. In this case, it will be an ellipse because one wants a closed shape. The projection of this ellipse into the image is also an ellipse.
If one can estimate the relative pose of this surface with respect to the camera and extract the light contour in the image, which is an ellipse, then one can obtain the real ellipse from the ellipse in the image (
Figure 7).
To estimate the relative pose, a chessboard pattern is attached to the surface to estimate its relative pose. This relative pose is obtained by estimating the homography between the calibration pattern and the image plane [
43]. As the camera is calibrated, rotation and translation can be estimated from the homography. To extract the ellipse from the image, a contour detection method is used, presented in [
44,
45]. Then, from the contour points obtained, the best ellipse is estimated using the method of [
46], which is based on least squares optimisation.
It is therefore possible to obtain n elliptical sections of the cone from n images taken at different distances from the surface. For each ellipse extracted in an image, the ellipse in the surface is obtained via the estimated homography between the image plane and this surface. The estimated relative pose allows one to obtain the elliptical section, i.e., the set of 3D points expressed in the camera frame of reference belonging to the ellipse of the surface.
Figure 8 shows an example where the camera has captured three images of an ellipse surrounding a chessboard pattern on the flat surface in three different poses. This is equivalent to obtaining three elliptical sections of the same cone.
Consequently, if n elliptical sections are obtained where m 3D points are extracted per section, one can generate a cloud of 3D points belonging to the cone.
3.2.2. Cone Estimation
The aim is now to estimate the cone that best approximates the 3D point cloud. This is performed by geometric fitting, i.e., by minimizing the orthogonal distances between the 3D points and the cone.
Let be the vector containing the six parameters of the cone, which are:
the three coordinates of its vertex ;
the two angles yaw and pitch in ZYX-Euler convention of its direction vector , the angle roll not being necessary since a cone has an axis of symmetry;
its opening half-angle .
To estimate the
vector, the Levenberg–Marquardt (LM) iterative method [
47] is used. For the first iteration, one needs to choose an initial parameter vector
. This could be the null vector, for example. In practice, measurements will be used, taken with a caliper for the various parameters of
.
At each iteration
i, LM receives a solution vector
and the function for calculating the vector of orthogonal distances to be minimized. This function first calculates the rotation and translation of the cone relative to the camera from
and the yaw and pitch angles of the vector
. It then expresses the 3D points in the cone reference frame and calculates the orthogonal distance vector between the 3D points and the cone using the Equation (
4).
So from and this function, LM returns a new solution vector .
When the difference between the sum of the squared orthogonal distances obtained with and the calculus obtained with is less than an arbitrary threshold, this means that LM has reached convergence at iteration j. The best solution in the least squares sense is therefore the vector .
It is important to note that there are an infinite number of cones that share the same elliptical section. On the other hand, there is only one cone that passes through an elliptical section and through a point outside that section. Consequently, this minimization can only work if one has at least two elliptical sections.
3.3. Intersection with a Ray
Now that the method to determine the cone parameters using the calibration method is known, the next step is to be able to calculate the intersections between a camera ray and the cone; we are only interested in the intersections with the top of the cone.
A ray is a half-line that can be expressed parametrically using the following Equation (
7):
where
is the starting point of the ray,
is its unit direction vector and
t is a positive real number such that
. Note that it only deals with cases where the point
is outside the cone. This is because the starting point of the camera rays is the optical center in this situation. The point
can nevertheless be inside the lower part of the cone.
If
is an intersection point, then it belongs to the cone. Therefore, to determine the intersections, one can simply calculate the values of
t which cancel the general polynomial of the cone (Equation (
5)) after replacing
by the expression for
:
3.4. Projection of the Generatrices of the Cone in the Image
To understand the relationship between a point on the cone and its projection into the image, one needs to look at the projections of the generatrices of the cone. Since the cone represents the projector, only the upper half of the cone is considered, which means that the generatrices can be limited to the half-lines that start at the vertex
. Note that the study is limited to the case where the cone is oriented in the same direction as the camera, as required by this method. Geometrically, this condition is satisfied if the angle called
between
and
is in the interval ]
[ (see
Figure 9). In practice, for the camera to be able to see the light projection, the angle
will be close to 0.
Let us call the plane which passes through the optical center and through at least one generatrix of the cone. The intersection of this plane with the image plane is the line .
Let be a point in the image such that , where is the direction vector of the camera ray associated with . If is the projection into the image of a 3D point belonging to a generatrix resulting from the intersection between the cone and the plane , then belongs to the line . However, if belongs to the line , it is not necessarily the projection of a point on the cone. To obtain the equation of , the simplest way is to select two points belonging to one of the generatrices (or the generatrix) through which the passes, then project them into the image to find the parameters of the line. The same result would be obtained by projecting two points of the plane which do not belong to the same camera radius. However, knowing the line is not enough to understand the projection of a generatrix.
When the plane
is not tangent to the cone, it passes through two generatrices
and
, with
chosen as the generatrix closest to the optical center
(see
Figure 10).
The direction vectors of these two generatrices are
and
such that:
where
and
are the angle between the vector
and
, respectively, projected in the plane
and the vector
(see
Figure 10). As the plane
is chosen here as not tangent to the cone, one has
. In this configuration, if
is a point on the line
then it can be both the projection of a point
belonging to
and of a point
belonging to
. The generatrix
therefore contains the first intersections of the camera ray with the cone, while
contains the second intersections.
There are only two orientations for which the plane
is tangent to the cone. When this happens, then
and the generatrices
and
are superimposed. The two possible cases are illustrated in
Figure 11. The two angles
and
define the two generatrices
and
through which pass the two planes tangent to the cone
and
. It is important to note that the angles
and
depend solely on the parameters of the cone and its pose relative to the camera
, which means that they are unique and fixed for a given configuration.
To determine these two angles, a plane tangent to the cone is used, called
, which passes through a generatrix
whose direction vector is
defined by an angle
. It is represented in
Figure 10 passing through the generatrix
purely for reasons of visibility. The orientation of the plane
is characterized by the unit normal vector
such that:
It is now possible to look for the two solutions of
so that the plane
is superimposed on the plane
or
. This superposition only occurs when the plane
passes through the optical center
and therefore when the vector
is orthogonal to the vector (
−
). This leads us to look for
when (Equation (
9)):
where
is the optical center expressed in the reference frame of the
projector. To solve the Equation (
9), one can pose (Equation (
10)):
One can rewrite Equation (
9) using the scalar product between the vectors
and
and the angles defined in
Figure 12:
The angle
between the vectors
and
is therefore:
As for the angle
which defines the orientation of the vector
, its expression is:
The solution to Equation (
9) is therefore (Equation (
11)):
Note that the two angles
and
only depend on the half-angle of the opening
of the cone and the translation between the cone and the camera (translation expressed in the equation by
). From the expression of these two angles, one can obtain the expression of the two unit direction vectors
and
of the two special generatrices
and
. The projections of these two generatrices belong to the lines
and
which are the intersections of the image plane with the planes
and
.
Using the generatrices
and
, one can divide the cone into two surfaces as shown in
Figure 13. The cyan surface named
(and the magenta surface named
, respectively) contains the set of generatrices containing all the first (and second, respectively) possible intersections between a camera beam and the cone. These generatrices were previously named
(and
, respectively).
In the image plane, this shows that one can delimit the areas where the camera rays can intersect the cone. The cyan area, surrounded by the ellipse , contains the projections of part of the set of 3D points belonging to the surface . The magenta area contains the projections of the other part of the set of 3D points belonging to the surface and contains the projections of the set of 3D points belonging to the surface . These areas are bounded by:
the lines and ;
the ellipse which is the set of vanishing points of the generatrices. These points define a first bound on all the projections of the generatrices;
the projection of the vertex into the image plane, named . It defines a second bound on all the projections of the generatrices.
In the configuration illustrated in
Figure 13, each generatrix projection is therefore a segment whose limits are the vanishing point of the generatrix belonging to the ellipse
and the point
.
However, there are other configurations where the projections of the generatrices onto the image plane are not segments but lines. Indeed, if , then the line passing through and is parallel to the plane, which implies that the projection of the vertex into the image plane is a point at infinity (or ideal point) and that the lines and are parallel.
The case where
is illustrated in
Figure 14 which shows the lower part of the cone behind the camera. This part is shown in brown and its projection in the image plane is also in brown. This projection is obviously fictitious, as this part cannot be seen by the camera even if it had an infinite field of view. The point
, also fictitious, is still the intersection of the lines
and
, but this time it is on the other side in comparison to that shown in
Figure 13.
As for the magenta and cyan areas in the image plane, they are still delimited by the lines and and the ellipse but extend towards infinity on the side where the lines and diverge.
Therefore, if , then each generatrix projection is a half-line whose only boundary is its vanishing point belonging to the ellipse . However, working with real cameras and not with the image plane, which is infinite, each generatrix will have a segment as its projection into the image.
This division of the cone into two surfaces leads to several results:
Any ray associated with a point in the cyan area has its unique intersection with the cone a 3D point belonging to the surface .
Any ray associated with a point in the magenta area has two intersections with the cone, the first of which belongs to the surface and the second to the surface . These two intersections are superimposed if the point in the magenta area belongs to the line or to the line .
Any ray associated with a point outside the magenta area and the cyan area does not have any intersection with the cone.
3.5. Intersection Selection and Triangulation
Now, everything is ready to develop a method for determining which intersection to choose when the ray associated with a pixel in the contour has two intersections with the cone. To better understand the various stages of the procedure, let us take as an example the 3D contour of light whose associated 3D curve is circular and results from the intersection between a plane and the cone.
The example of the circular contour is illustrated in
Figure 13 which contains the following elements:
is the closed 2D curve corresponding to the contour in the image with a 2D point on the curve.
is our circular 3D curve corresponding to the contour of the light in the scene, where is a 3D point on the curve which has as its projection. The point therefore corresponds to the correct intersection between the cone and the camera ray associated with .
The areas containing the first and second intersections with the cone are delimited by the two generatrices and which divide the cone into two surfaces and . They also divide into two curves, one in cyan called and the other in magenta called .
and are the two intersections of and , respectively, with the curve and are thus the only two 3D points common to the curves and .
The cyan curve named and the magenta curve named are the projections of the curves and , respectively.
and are the projections of and and are therefore the only two points common to the curves and . They belong to both the curve and the lines and . So, in theory, they correspond to the two unique points of tangency of the lines and with .
Using the points of tangency, and , will allow one to divide the 2D curve in two via the line . However, in practice:
the curve is discrete,
the light contours are extracted in a perfectible way,
the camera calibration and the cone estimation have uncertainties, so the lines and also have uncertainties.
All this means that the lines and are not exactly tangent to the curve . The method for determining the points and must therefore be adapted to this constraint.
Once these two points have been obtained and the curve
has been divided in two, it remains to determine which of the two curves is
or
. To do this, one needs to look at the relative position of the vertex
in relation to the camera.
Figure 15 shows that the vertex
and the curve
lie in the same half-space bounded by the plane passing through
and the line
. This property is always true whatever the pose of the cone.
In the image plane, this property can be expressed by defining a point such that and the curve lie in the same half-plane bounded by the line .
There are an infinite number of solutions for
in the image plane which satisfy this condition. Therefore
is forced to belong to the circumscribed circle of the rectangle delimiting the image, as can be seen in
Figure 15. The segment between
and the center of the image forms an angle with the horizontal axis of the image called
. This gives (Equation (
12)):
where
w and
h are the width and height of the image in pixels, respectively. For the condition to be true, one possible solution for
is to indicate the direction of
relative to the optical center
. To obtain this angle,
is orthogonally projected onto the plane normal to the camera axis passing through
, which gives the point
illustrated in
Figure 15. This gives
which is the angle formed by the vectors
and
(Equation (
13)):
Calculating this angle
and the point
allows one to deduce which of the two curves is
.
Once the curves and have been obtained, it is then possible to obtain the 3D contour. Indeed, if a point belongs to , then the first intersection of the camera ray associated with it must be chosen, and if it belongs to , the second must be chosen. This is because any point of is the image of a 3D point of , and any point of is the image of a 3D point of .
3.6. Test of the Method in Simulation
To illustrate the different stages of the method, a gallery model will be used.
Figure 16 shows the camera represented by a pyramid and the projector represented by a cone positioned inside the model. In this example, the axis of the cone is parallel to the axis of the camera. The vertex
is positioned above and to the right of the camera; it belongs to the plane which passes through the optical center
and which is orthogonal to the camera axis. Consequently, the projection of
in the image plane is a point at infinity (an ideal point) and the projections of the generatrices all belong to lines parallel to each other.
The 3D curve
is the intersection between the cone and the model. The set of 3D points on this curve is thus obtained from the intersections of the generatrices of the cone with the model, and they are shown in black in
Figure 16. The number of 3D points is equal to the number of generatrices chosen to represent the cone. These 3D points are then projected onto the image to obtain a set of 2D points belonging to the curve
. These 2D points are shown in black in
Figure 16.
The idea now is to obtain the 3D points of the curve from the 2D points of the curve . Assume that the parameters of the cone and the camera are known.
The first step is to calculate the direction vectors
and
of the generatrices
and
. To do this, one needs to calculate the angles
and
using the Equation (
11).
Once the direction vectors have been obtained, two 3D points are selected on and , projected into the image, and the parameters of the lines and are computed. The two lines are defined by their respective unit direction vectors and and by and , two points belonging to the two lines, respectively.
Now, one needs to obtain the points and which are theoretically the points of tangency of the lines and with the curve . The approach will be only presented for the point since it will be the same for .
A first solution would be to say that
is the point belonging to the line
closest to the curve
. However, a slightly more generic approach has been chosen, the result of which is illustrated in
Figure 17. It consists of considering
as the point belonging to the line
which minimizes the sum of the squared distances between itself and the x% of the points on the curve
closest to the line
. Let
t be the parameter of the line. Therefore, looking for
t such that (Equation (
14)):
where:
Figure 17 shows the points
and
obtained, with 40% of the points closest to
in yellow and 40% of the points closest to
in orange. Developing the sum to be minimized in Equation (
14) gives:
with
Since
a must be positive, the polynomial
has a minimum when its derivative cancels, i.e., when:
This gives:
Using the same procedure, the point
can be obtained. This produces the line
(
Figure 17) which will be used to divide the curve
in two.
The points and are determined in a perfectible way in reality. Therefore, a minimum distance is defined around the line called e which the points of the curve must respect. Points whose distance orthogonal to the line is less than e are considered indeterminate and will therefore not be taken into account in the 3D reconstruction of the contour. In practice, the value of e will depend on the quality of the measurements taken.
This separation of the
curve into two curves is illustrated in
Figure 18 with
in cyan and
in magenta.
To obtain
or
, the point
is used (Equation (
12)) obtained via the angle
(Equation (
13)) and by applying the following two conditions:
with
, the normal vector to the line
.
Now, points
which are on the curve
are known as they belong to the curve
or the curve
. One therefore can apply what was presented in
Section 3.3 and
Section 3.5 to reconstruct these points in 3D.
Figure 19 takes the 3D scene from
Figure 16 and adds the reconstructed 3D points. The calculated differences between these reconstructed 3D points and the initial 3D points of the simulation (obtained via the intersection between the generatrices of the cone and the gallery model) are below
m, which is very small. These deviations are due to numerical errors in the calculations at the various stages. This proves that these reconstructed 3D points are the same as the initial 3D points and that our method therefore works. Now the method will be tested with real data.
4. Results
The local heritage offers access, between two laboratory buildings, to the Saint-Clément aqueduct (more commonly known as the Arceaux aqueduct) which was built in the 18th century. It is no longer in use, but in dry periods it is an ideal experimental platform. The gallery is accessible via trapdoors, which, once closed, plunge one into total darkness. This confined space recreates experimental conditions similar to those in a karstic environment, but in a dry environment whose dimensions are easy to measure. Indeed, the aqueduct is simply a long corridor whose left and right walls are parallel and separated by a width of 62 cm, which were measured with a laser rangefinder.
4.1. Experimental Setup
The camera used is the Nikon D7000 reflex with a 18 mm focal length and a resolution of pixels (PlongeImage, Bordeaux, France).
The projector is a DIVEPRO M35 (La Palanquee, Palavas, France) dive light with an aperture angle of
in air and
in water. As the aperture angle is too large to capture the full projection of light in this experiment, a custom-made plastic cylinder (obtained from the 3D printer) was added to reduce it (
Figure 20), as can be seen in
Figure 21. Its internal diameter is the same as the diameter of the projector, i.e.,
d = 53.8 mm. The cylinder protrudes from the projector by
mm. It is possible to calculate the half-angle of aperture
as a function of the lengths
l and
d:
This value will be used to check the order of magnitude of future
values for the cone estimation.
The camera and projector are placed on a rigid support as shown in
Figure 21. They had to be fixed to the support because the slightest rotation of one in relation to the other alters the transformation between the camera and the projector, which is estimated during the projector calibration.
To form an idea of the magnitude of this transformation, a caliper was used to measure the translations between the camera and the projector along the camera’s
x,
y and
z axes. The rotation between the two is supposed to be small because an identical orientation for both was chosen. Here is a summary of the various measurements of the projector parameters (lengths in m):
4.2. Camera Calibration Results
To calibrate the camera, Zhang’s method [
42] was used with the help of eight images of a flat chessboard test pattern measuring 7 × 10 squares and 36 mm wide (
Figure 22). In each of the eight images, the position of the test pattern relative to the camera is different (
Figure 23), which is a 3D representation of the test pattern in eight different positions. Here are the camera parameters estimated via calibration:
It can be seen that the orders of magnitude of the various camera parameters are consistent with the manufacturer’s parameters. In addition, the average reprojection errors are of the order of 1/10 of a pixel. This level of error is quite satisfactory for all eight shots, each containing 56 calibration points.
4.3. Projector Calibration Results
To calibrate the projector, i.e., estimate the parameters of the cone, a white wall was used on which a chessboard pattern was hung. The first step is to obtain 3D points belonging to the cone by detecting the contour of the light projected onto the wall, as explained in the
Section 3.2. To maximise the contrast of the light projected onto the white wall, the only light source in the room is the projector.
For this calibration, five images were captured of light projected onto the wall with the axis of the cone almost orthogonal to the wall (
Figure 24). The shape of the projected light therefore resembles a circle.
Each detected contour is transformed into an ellipse, shown in blue in
Figure 24. To obtain the ellipses, the method detailed in [
46] was used, but iteratively with the RANSAC algorithm [
48] to eliminate certain outlying contour points. The contour points eliminated by RANSAC are shown in red in
Figure 24 while the points used to obtain the ellipses are in green.
Using the pattern detected in each image, the equations for the five planes were obtained. From these five planes and the five ellipses obtained, the five elliptical sections are calculated. They are represented in 3D in
Figure 25 with the camera and the test pattern.
The image on the left shows all the elements expressed in the chessboard pattern frame, so one can see the different camera positions used to capture the five images.
The image on the right shows all the elements expressed in the camera frame. It is in this frame of reference that the elliptical sections must be expressed in order to estimate the cone.
Three-dimensional points are extracted and expressed in the camera frame of reference for each elliptical section. In this case, 200 points per section were recovered, making a total of 1000 3D points. Note that these 3D points are at a distance from the camera frame of between 1.50 m and 2.17 m.
Now that 1000 3D points have been obtained, which are supposed to belong to the cone, the next step is to define an initial cone for the iterative minimization algorithm. This is shown in red in
Figure 26. An initial cone is chosen, that is far from the solution for greater visibility in
Figure 26 and also to prove that the method is capable of converging even when the initial cone is far from the solution. Obviously, this is an experimental result and not a mathematical proof. This optimisation problem is probably non-convex; therefore, an initial cone close to the solution should ideally be chosen using the aperture angle previously calculated and the relative pose measurement between the projector and the cone.
The algorithm is then applied to obtain the optimised cone shown in gray. Its estimated parameters (lengths in meters) are:
To ensure the validity of the cone obtained, the estimated parameters are compared with the previously measured parameters.
Table 1 compares the estimated opening angle and the estimated gap with the measurements. As the measurements can be improved, this comparison at least proves that the order of magnitude of the estimated parameters is consistent.
Another point to check is the minimization error for estimating the cone. In this case, this error is directly linked to the orthogonal distances between the 3D points of the elliptical sections and the cone, since it is the root mean square (RMS) of these distances that the algorithm is trying to minimize during estimation. In
Figure 26, the color of each 3D point is defined by its orthogonal distance from the cone via the colormap Jet (the colormap bar is shown in the figure) and the following function:
where
d is the orthogonal distance, and
m is chosen as the median of all these distances (here,
m = 1.4 mm).
The statistics for these distances can be found in
Table 2. In particular, a maximum distance of 4.2 mm and an overall mean of 1.5 mm are obtained. The overall RMS is 1.9 mm, which corresponds to approximately 0.13% of 1.50 m which is the smallest distance between a 3D point of an elliptical section and the camera frame.
Table 3 represents the statistics between the 3D points of each elliptical section and the reconstructed 3D points. A maximum distance of 226.1 mm and an overall mean of 29.0 mm are obtained.
In view of the analysis of the orthogonal distances, the minimization is satisfactory, confirming once again that the estimated parameters of the cone are at least consistent with their true values.
4.4. Three-Dimensional Results
The shape of the aqueduct is a long narrow corridor, which makes it possible to fully distinguish the light projection in the images. Six images were captured, taken at different distances from a chessboard pattern placed in the aqueduct. For each image, the light contours were checked manually, to isolate the triangulation method from the contour detection method, which is still in need of improvement. For each contour extracted, part of the contour points were manually delimited on the left wall and another part on the right wall. These delimitations can be seen in
Figure 27. It shows an example of an extracted contour in two images, with contour points on the left wall in red, on the right wall in blue and the rest of the contour points in green.
Only the contours of the left and right walls will be used for triangulation, as this will allow us to check that the 3D points obtained respect the dimensions of the aqueduct corridor. To obtain these 3D points, it is necessary, as usual, to define which intersection is relevant for each radius associated with the contour points. In this situation, one can see that each contour point on the left wall implies a first intersection; additionally, each contour point on the right wall implies a second intersection.
Figure 28 illustrates the method presented in
Section 3.4 and
Section 3.5 for the two images in
Figure 27:
Lines
in yellow,
in orange and their intersection with the curve (
Figure 28a,d)
Intersections with the walls left in cyan and right in magenta (
Figure 28c,f)
Figure
Figure 29a shows the result of the reconstruction of the contours of the six images. The shape of the 3D reconstructions of the contours is close to a hyperbola, which is consistent since the intersection between a plane and a cone in this configuration is supposed to be a hyperbola.
A comparison of this 3D reconstruction with reality is now possible. The aqueduct is a corridor whose left and right walls are parallel and separated by a width of 62 cm. One can therefore check that:
the 3D points resulting from the left part and the right part, respectively, of the contour belong to the same plane so they are supposed to be coplanar,
If a plane is estimated from the left 3D points and a plane from the right 3D points, they must be parallel and separated by a distance of approximately 62 cm.
As shown in
Figure 29b, estimations are obtained for the left wall in red and for the right wall in blue by using a least square method.
To evaluate coplanarity, the orthogonal distance between 3D points and their associated plane is computed. For left points,
Table 4a presents the results for the six positions of the sensor and
Table 4b presents the results for all positions with an average deviation of 13.9 mm. For right points,
Table 4c presents the results for the six positions of the sensor and
Table 4d presents the results for all positions with an average deviation of 24.1 mm.
To check the gap between the two estimated planes, the distances are computed between the left 3D points and the right 3D points, respectively, and the right and left plane, respectively. To estimate the distance between the left points and the right plane,
Table 5a presents the results for the six positions of the sensor and
Table 5b presents the results for all positions with an average distance of 584.9 mm. To estimate the distance between the left points and the right plane,
Table 5c presents the results for the six positions of the sensor and
Table 5d presents the results for all positions with an average distance of 583.6 mm. Both distances are near the measured distance of 620 mm.
The two estimated planes are almost parallel since the angle calculated between the two normal vectors is .
5. Discussion
This original approach is based on the principles of structured light, leveraging the precise projection of light contours onto the karstic surfaces. It functions by capturing the resultant light patterns with a camera leading to reconstruct detailed three-dimensional representations of the subsurface structures. A classical method is used for the camera calibration. However, for the projector calibration estimation, each step is precisely detailed, namely the 3D point generation, the cone estimation, the estimation of the 3D points obtained by the intersection of the cone and its projection in the camera image plane. The main limitation of this approach is the projector contour detection. Indeed, since the boundary between the light and dark zones is not well defined, a detection method based on threshold brightness will be imprecise. It could be solved using a laser cone projector; however, due to the experimental conditions and the fact that the system will be carried by divers, it will be dangerous for their eyes. An active contour model such as that presented in ref. [
49] could be used.
For the validation of the method, a reflex Nikon D7000 with an underwater housing was used, allowing a good resolution (2464 × 1632 pixels), but also allowing for the possibility of adjusting the many settings that can be changed on this type of camera. However, its size remains substantial. The projector (DIVEPRO M35) has a wide angle of aperture and needs a custom-made cylinder to reduce its size. Maybe a dedicated projector with a correct aperture could be better. Also, to explore a new configuration of this system, some simulations were made with one projector and four cameras (
Figure 30) to solve the error of the transition zone (see
Figure 19). Another idea concerns using several projectors with localization sensors with respect to one or more cameras to have different configurations and measurement redundancy.
An embedded computer would allow the detection and 3D reconstruction directly.
For the camera, the results in dry conditions with a standard chessboard calibration pattern are correct enough, with fewer than 1/10 pixels. However, for underwater calibration, the results are dependent on water turbidity. A solution could be to measure the refraction index of the water and then apply a correction to the camera parameters. For the projector, the same kind of remarks can be made. A first estimation of the cone parameters is performed in dry conditions allowing for an estimation of the errors on its parameters, namely 8% for the angle and 1% for its position with respect to the camera. Performing the experiment in dry conditions, also knowing the refraction index, could provide useful information to compute the new value of the aperture angle , avoiding complex experimentation.This calibration is also dependent on the camera calibration; therefore, if the camera calibration is correct, the projector calibration will be accurate.
The experimentation was conducted in dry conditions with known dimensions of the “cave” (aqueduct). The planes estimation was obtained with a mean value less than 2.5 cm, the distance estimation between the planes was obtained with a mean value less than 3 cm and the angle estimation between the normal range of the planes was obtained with a mean value near . Validation of the method in wet conditions should also be acceptable. However, experimentation must be performed in dry conditions, first in a pool with good visibility conditions. Then a second experiment must be performed in a real cave. One can be reminded that the sites are dependent on administrative authorizations and especially weather conditions; currently, heavy rains make them impassable. Experiments were carried out in a pool a few months ago, but unfortunately the turbidity was such that the detection of the halo contours was problematic, not allowing for a presentation of these results. A new experiment is planned in the coming weeks.