1. Introduction
Advancements in low-altitude remote sensing and image analysis techniques have revolutionized the digitizing of real-world objects, initially represented by point clouds [
1]. Over the past decade, there has been a notable increase in the number of studies examining the utilization of unmanned aerial vehicle (UAV) image technology for surveying and inspection, which has been extensively documented in the literature [
2,
3,
4,
5]. UAV photogrammetry exhibits immense potential for built environment inspections and surveys thanks to multisource data acquisition, efficient data collection, rapid observation, relatively low costs, and multidimensional data representation. However, a major challenge lies in the noise introduced during data capture and 3D reconstruction [
6]. The transformation of a noisy point cloud into its unknown noise-free state is an inherently ill-posed problem. This noise significantly affects the accuracy and usability of UAV images, hindering their effectiveness in real-world applications. Over the past decade, the use of photogrammetry for digital 3D recording has expanded significantly. Advances in computer vision and modern computing technologies have addressed photogrammetry’s long-standing limitations by accelerating processing times and enabling automation. The adoption of automatic structure from motion (SfM) technology has gradually shifted the focus from using laser scanner technology for 3D measurement in scientific applications to a growing reliance on photogrammetry. Despite numerous research efforts, point cloud denoising remains challenging [
7,
8,
9]. With the integration of computer vision techniques, point cloud processing has become faster and more efficient, addressing many limitations of traditional photogrammetry.
Through the application of computer vision techniques, point cloud technology for 3D recording has made significant advancements and has become a key tool in surveying and structural monitoring. Various applications are highlighted in the literature, such as structural monitoring of historical buildings, generating 3D models for volume calculations, and creating metric maps for use in mining estimation [
10,
11,
12,
13,
14]. Denoising point clouds is a crucial step in many applications like object recognition and autonomous navigation. While significant progress has been made in utilizing artificial intelligence for these tasks, challenges remain. Several studies have explored point cloud denoising, employing advanced computer vision techniques and deep learning architectures. For instance, Bai et al. [
15] introduced SM-HFEGCN, a graph convolutional network designed to enhance point cloud understanding by incorporating scale measurement and high-frequency enhancement. While their approach effectively captures local geometric relationships and addresses limitations in representing the overall spatial scale of local graphs, it is primarily focused on point cloud classification and segmentation tasks. The method emphasizes the integration of spatial scale features and high-frequency information to capture node variations, which improves the representation of differences and similarities between nodes. However, despite its contributions, SM-HFEGCN does not directly address the challenge of noise reduction in point cloud data, particularly in the context of enhancing 3D reconstruction. Wu et al. [
16] proposed the Plant-Denoising-Net, a deep learning-based approach designed to address the specific challenges of plant point clouds, such as uneven density, incompleteness, and diverse noise types. Plant-Denoising-Net utilizes a density gradient learning approach and incorporates three key modules: the Point Density Feature extraction module, the Umbrella Operator Feature computation module, and the density gradient estimation module. While Plant-Denoising-Net achieves state-of-the-art performance in denoising plant point clouds, with improvements of 7.6–19.3% under Gaussian noise and notable computational efficiency, its application is tailored to plant-phenotyping scenarios. Consequently, its generalizability to other domains, such as build environment point clouds or UAV-based 3D reconstructions, remains unproven and unlikely. This highlights the need for approaches capable of addressing noise in more diverse and geometrically complex datasets. Sohail et al. [
17] reviewed the application of deep transfer learning and domain adaptation in addressing these issues, particularly for tasks such as denoising, object detection, semantic labeling, and classification. While these approaches have effectively mitigated noise and enhanced point cloud data quality, they often rely on pre-trained models and fine-tuning strategies that do not generalize to complex or large-scale datasets. Moreover, their performance can degrade in scenarios with partial overlap or outliers, as seen in sensor-acquired point clouds. Although combining their method with traditional machine learning methods has shown promise in addressing these limitations, existing frameworks still struggle with computational inefficiency and inconsistent results in complex applications. These challenges underscore the need for more robust and scalable solutions to improve point cloud quality, particularly in geometrically complex and noisy datasets like those encountered in cultural heritage preservation. Zhang et al. [
18] conducted a comprehensive survey of point cloud completion methods, categorizing them into four primary approaches: point-based, convolution-based, GAN-based, and geometry-based methods. While these techniques have significantly improved with advancements in deep learning, challenges remain in enhancing their robustness, computational efficiency, and ability to capture intricate geometric details. This study highlighted the current methods’ limitations, such as noise sensitivity and high computational complexity, that hinder their effectiveness in practical applications. Despite these advancements, existing approaches often fall short in addressing complex scenarios, necessitating further exploration of novel architectures and techniques to better meet real-world demands. These limitations emphasize the importance of developing more accurate and efficient point cloud completion methods, particularly in domains requiring precise geometric reconstructions. Zhu et al. [
19] conducted the first comprehensive survey of point cloud data augmentation methods, categorizing them into a taxonomy framework comprising basic and specialized approaches. These methods are essential for addressing challenges such as overfitting and limited diversity in training datasets, which are common in point cloud processing tasks. Despite their wide application, the study identified several limitations, including the lack of standardization in augmentation techniques and their varying effectiveness across different tasks. The research highlights the importance of selecting appropriate augmentation methods tailored to specific applications and suggests future directions to improve their robustness and scalability. These findings underscore the necessity of advancing augmentation techniques to support the growing demands of deep learning in point cloud analysis.
It is important to note that the accuracy required for data collection and processing in photogrammetry depends significantly on the intended purpose. For instance, when generating 3D models for applications such as augmented reality or basic web visualization in non-scientific contexts, achieving high levels of accuracy may not be essential. However, for applications where precise data are critical, such as condition assessment or structural analysis, optimizing the dataset through advanced processing techniques, including 3D mesh decimation, becomes a necessary step to ensure reliability. In the field of cultural heritage (CH), photogrammetry has a wide range of applications [
20,
21,
22]. Its speed of acquisition and the portability of the equipment make it highly versatile technology, suitable for various uses. For the condition assessment of CH, it is essential to accurately compare the current state of a structure with its previous condition. Since revisiting and surveying CH as it existed in the past is impossible, reducing noise to generate the most accurate 3D model from available periodic survey data becomes essential. In cases where damage is identified, an accurate model of the structure’s past state, with minimal noise, is crucial for understanding the extent of the damage, its severity, and the rate of progression. This highlights the importance of improving the accuracy of available point cloud data for CH [
23,
24,
25].
The accuracy of the model is influenced by specific photogrammetric constraints. One of the most significant factors impacting output accuracy in several studies is the angle formed between homologous rays captured by different cameras [
26,
27,
28]. In general, a larger angle (within a certain range) results in higher achievable accuracy. Kraus’s research demonstrates a direct proportional relationship between the Base/Height ratio and accuracy [
29]. While numerous studies have investigated models to improve point cloud accuracy [
30,
31], they often overlook the specific challenges of condition assessment. These studies primarily focus on optimizing the ideal datasets for accurately and efficiently reconstructing 3D models, without accounting for the practical limitations of condition assessment. In such scenarios, having the most accurate datasets takes precedence, even if creating a precise 3D model with the available data is not feasible. This paper aims to fill this gap by introducing a novel approach based on deep learning clustering models to optimize various SFM parameters, enhancing the accuracy of 3D reconstruction specifically for applications in CH 3D reconstruction and monitoring. Unlike traditional methods that focus on a single accuracy-related parameter, this approach simultaneously considers several calculated parameters within the latent space of a variational autoencoder model. This enables minimizing the influence of outlier data or noise while uncovering the most significant patterns and structures in the data. Noise reduction is the process of eliminating random variations or irrelevant data points that do not contribute to the accurate representation of the object or scene in the data. In this approach, several AI models, which are typically used for outlier detection, are specifically employed to identify data points that deviate significantly from the general pattern or distribution of the dataset, thereby reducing noise.
To do so, first, different accuracy-related parameters are analyzed separately to demonstrate that relying on a single parameter is insufficient. Then, the proposed methodology, which applies four different clustering models into the latent layer of a variational autoencoder (VAE), is implemented to enhance the accuracy of point cloud data and study the most powerful clustering algorithm for accuracy enhancement. A case study is used to showcase the robustness of the new method.
The methodology presented in this study, combining VAE with clustering algorithms for improving the accuracy of point cloud data, has broad applicability across various fields. Accurate point clouds improve the accuracy of existing digital models of historical structures or infrastructures such as bridges, aiding in structural integrity assessments, conservation planning, and restoration efforts. More precise models guide restoration work, ensuring that interventions align with historical accuracy and preserve the integrity of built environments [
32,
33,
34]. Enhanced point clouds are a pivotal tool in geotechnical engineering and environmental monitoring, facilitating the analysis of slope stability, landslides, and other geological phenomena. Their application extends to tracking environmental changes, such as forest canopy dynamics and shoreline erosion. The increased precision in terrain and environmental modeling enhances safety protocols, supports the development of preventive measures, and aids in the sustainable management of natural resources and climate change mitigation efforts [
35,
36,
37,
38]. In disaster management and recovery, enhanced point clouds enable high-resolution damage assessments of infrastructure, including buildings and transportation networks, post-natural disasters. These assessments allow for the efficient prioritization of recovery operations and resource allocation, significantly reducing the time required for disaster response and rehabilitation planning [
39]. Enhanced point clouds are integral to object detection, environmental mapping, and navigation systems. They provide the high-fidelity spatial data necessary to improve situational awareness, reliability, and the overall safety of autonomous systems, ensuring optimal performance under real-world conditions [
40]. For 3D printing and additive manufacturing, enhanced point clouds provide the detailed geometric data required to fabricate electronic components such as antennas, sensors, and circuit boards. Their higher accuracy ensures that printed components adhere to precise design specifications, resulting in improved performance and quality in additive manufacturing processes [
41]. In component design and reverse engineering, point cloud data support the creation of detailed 3D models of electronic components, including connectors, enclosures, and housings. The precision afforded by enhanced point clouds accelerates the prototyping process, enables optimized design workflows, and facilitates the reverse engineering of existing products. By replicating intricate geometries with high fidelity, they allow for the comprehensive analysis and reproduction of original designs [
42].
This article is organized as follows: In
Section 2, a brief explanation of the various accuracy parameters studied is provided, along with the presentation of the new methodology for optimizing point cloud data.
Section 3 introduces a case study, where different parameters are analyzed separately to demonstrate their limitations in analyzing the model, and the robustness and the accuracy of the new method are presented. In
Section 4, the robustness and accuracy of the new method are discussed across different clustering algorithms. Finally, the conclusions are drawn in
Section 5.
2. Materials and Methods
The new method for optimizing the point cloud utilizes several accuracy parameters applied during both the acquisition phase and the image processing phase. The data related to these accuracy parameters are then analyzed using deep learning models, which cluster the optimized datasets. The dataset is obtained through photogrammetric processing in the photogrammetric processing software Agisoft Metashape [
43], specifically an SfM software that allows for the processing of digital image sets and obtaining numerous outputs such as point clouds, 3D models, orthophotos, contour lines, DEM, and much more. Some of the parameters used are geometric parameters related to the acquisition phase (intersection angle and number of images), while the remaining are numerical parameters extracted from the SfM processing and, therefore, are potentially dependent on the software used (reprojection errors and accuracy projection).
2.1. Accuracy Parameters
2.1.1. Reprojection Error
The first parameter calculated is the reprojection error, a geometric error that represents the image distance between a projected point and its corresponding measured point. This error is used to evaluate how accurately a 3D point estimate replicates the true projection of the point. To compute the 3D coordinates of the tie point, the camera’s internal and external orientation parameters, along with the image coordinates of the point, are utilized. The reprojection error estimation can be seen in
Figure 1.
2.1.2. The Angle Between Homologous Points
In this work, the Base/Height ratio is analyzed by estimating the angle between two lines of view that generate the 3D point called the angle of intersection or angle between homologous points, given the k-th tie point seen from two images i and i + 1 (see
Figure 2).
2.1.3. Number of Images
Another estimated parameter is the number of images, which is the number of photogrammetric shots of the scene that have contributed to the reconstruction of the tie point in object space. This parameter is as follows:
where
njTPi is the number of cameras for the reconstruction of the
i-th tie point.
2.1.4. Projection Accuracy
Another estimated parameter is projection accuracy, which allows us to recognize less reliable tie points. The projection accuracy parameter in Agisoft Metashape measures how precisely a tie point is positioned relative to its neighboring points within the point cloud. This precision is influenced by the scale at which the points were identified during processing. Metashape leverages scale information to adjust the weighting of reprojection errors for tie points, assigning higher or lower importance depending on the detail level at which the point is detected. The Sigma (σ) parameter determines the scale of key points, which represents the degree of Gaussian blur applied at a specific level of the scale pyramid. This parameter incorporates the local context of each point, affecting the treatment of reprojection errors and improving the robustness of the 3D reconstruction.
In essence, the projection accuracy parameter enhances the quality of the 3D model by balancing errors according to the resolution and scale at which the tie points are identified. This provides essential insights into the spatial consistency of the point cloud.
While the exact mathematical formula for projection accuracy in Metashape is proprietary, it aligns with the principles of photogrammetry and computer vision. The relationship can be summarized as follows:
where Error
proj represents the weighted reprojection error,
Pmeas denotes the position of the detected point in the image (measured point), and
Pproj is the projected point’s position calculated from the 3D model. The symbol ∥⋅∥ indicates the Euclidean distance between the measured and projected points. The parameter
w is the weight assigned to the reprojection error, determined by the scale (σ) from the SIFT level where the tie point is detected.
In Metashape, the weight is proportional to the scale of the key point, which corresponds to the scale pyramid level where the point was identified. Points detected at higher pyramid levels (more detailed scales) contribute more significantly to the model’s computation. This approach ensures that points identified with greater local precision have a more substantial impact on projection and model optimization than those identified at coarser scales. By incorporating these principles, Metashape refines the 3D reconstruction process, emphasizing the spatial consistency and accuracy of the resulting model.
2.1.5. Camera Distance—Tie Point
The last value taken into account in the analysis is the camera distance, tie point, which refers to the distance between the center of the i-th camera’s focal point, and the j-th tie point, which is located within the i-th image.
Except for the reprojection error and projection accuracy, other accuracy parameters depend heavily on the image acquisition phase, causing their values to vary significantly between projects.
2.2. Methodology
This article introduces a novel noise reduction method to optimize the 3D reconstruction models of CH and enhance the accuracy of damage detection models based on point cloud data. Rather than relying on a single accuracy-related parameter, the method simultaneously evaluates all calculated parameters. It develops datasets of the most accurate 3D points, considering the availability of nodes in the point cloud.
First, the new model uses VAE to reduce its dimension from five different parameters to two synthetic parameters.
VAE is a type of neural network used for dimensionality reduction, feature extraction, and generative modeling. Like a traditional autoencoder, it consists of two parts: an encoder that maps the input data to a probabilistic latent space by learning the parameters of a probability distribution (typically a Gaussian) and a decoder that reconstructs the original input from a sampled latent representation. The VAE aims to learn an efficient representation of the data and ensure that the latent space follows a predefined probabilistic structure, enabling meaningful sampling and interpolation. The encoder and the decoder are defined as multilayer perceptrons (MLPs). A layer of MLP encoder E
F is
where σ is an element-wise activation function, W is a weight matrix, and B is a bias vector. The analyzed features for each data point (X) in the input dataset of the MLP model consist of five elements, representing the accuracy parameters detailed in the previous section. Each row corresponds to the geometry of a 3D point within the point cloud. In the latent space of the proposed model, the feature dimensions are reduced from the original five input columns to two features. Reducing the feature dimensions and leveraging the probabilistic nature of a VAE offers several advantages and enhances the applicability of the method, as outlined below:
- -
By compressing the data into a probabilistic latent space, the VAE not only reduces computational requirements but also facilitates sampling from the latent space, making it suitable for big data applications such as point cloud processing, which is the primary focus of this study. This improvement increases the model’s scalability and versatility.
- -
The VAE transforms complex, diverse features from various factors into a smaller, cohesive set of probabilistic latent representations, improving interpretability and usability and enabling meaningful interpolations between data points.
- -
The latent space representation generated by the VAE simplifies the data, removes noise, and provides a structured probabilistic foundation, enhancing the performance of downstream tasks such as clustering and anomaly detection.
- -
The VAE’s latent space enables the detection of meaningful patterns, including nonlinear and probabilistic relationships, that may not be apparent in the original dimensions. This feature allows for more insightful analysis and the generation of new synthetic data samples.
- -
Unlike traditional autoencoders, the VAE provides generative capabilities, enabling the creation of realistic new data samples from the latent space. This feature is particularly useful for augmenting datasets or exploring variations in the data. While dataset augmentation is not applied in this study, it represents a potential future direction for the authors’ research.
- -
The VAE can be trained to handle missing data by learning the distribution of the data and reconstructing missing values. While this is not the focus of the current research, it represents a promising avenue for future work.
These factors collectively make the VAE an effective tool for enhancing the optimization of point cloud data for damage detection.
In this research, after applying the VAE, four main clustering machine-learning algorithms are employed in its latent space to compare and observe their robustness. The first algorithm is k-means clustering, which partitions data based on similarity. It operates by assigning each data point to the nearest cluster centroid and iteratively updating the centroids until convergence. To minimize the within-cluster variance, the objective is to find
where
S represents the set of clusters,
k is the number of clusters,
μi is the mean point of the
i-th cluster, and
x denotes the data points. The k-means algorithm is suitable for applications where the number of clusters is optimized, making it ideal for point cloud optimization when performing full 3D reconstruction of an entire structure. In the context of the VAE’s latent space, k-means can be effective for global damage detection, where the data are relatively well separated and the cluster centroids represent general patterns. However, k-means clustering assumes that clusters are spherical and of similar size, which may limit its effectiveness in more complex, non-linear data distributions often present in real-world datasets.
The Gaussian mixture model (GMM) allows for the creation of an optimized point cloud not only useful for full 3D reconstruction but also for detecting specific local damages. Moreover, it enables the assessment of global damage using a smaller, highly accurate subset of tie points. GMM is a probabilistic model that assumes the data are generated from a mixture of several Gaussian distributions. This method is particularly useful for data that may have overlapping clusters or complex distributions, as it allows for soft clustering where data points can belong to multiple clusters with varying probabilities. In the latent space of the VAE, GMM can be beneficial for detecting subtle variations in the data, but it may not perform as well as k-means or agglomerative clustering when the data are imbalanced or when the model is not well tuned. Despite this, GMM can still offer valuable insights for applications where the relationships between data points are more probabilistic and less deterministic. GMM with the formulation of the posterior distribution is given by
where
ϕ and Σ are weights and covariance matrices,
N is the number of observations, and
k is the number of clusters.
Spectral clustering is another method employed in this research to analyze the latent space. This algorithm is a graph-based clustering technique that uses the eigenvalues of a similarity matrix to reduce the dimensionality of the data before applying a clustering algorithm like k-means. This method is particularly effective for identifying non-linear relationships in the data, making it well-suited for complex datasets where clusters are not necessarily spherical. In the VAE’s latent space, Spectral clustering can capture more intricate patterns and relationships, especially when the data exhibit non-convex shapes or varying densities. It is particularly useful when the underlying structure of the data is complex, and traditional methods like k-means may fail to capture the nuances of the distribution. However, Spectral clustering can be computationally expensive, especially for large datasets, and requires careful selection of the similarity measure and the number of clusters. Its ability to leverage graph theory makes it particularly useful in point cloud processing when the relationships between points are non-linear or when identifying regions of interest within a complex structure. In addition, as it is able to perform soft clustering, it is a good choice for local damage detection.
Agglomerative hierarchical clustering is the fourth method considered in this research. This bottom-up approach starts by treating each data point as its cluster and iteratively merges the closest clusters based on a chosen linkage criterion until a desired number of clusters is achieved or all points are merged into a single cluster. Agglomerative hierarchical clustering is particularly suited for datasets where the relationships between data points vary at different scales like point cloud data. In the VAE’s latent space, agglomerative clustering can provide valuable insights into damage detection, especially when the clusters exhibit hierarchical or nested structures. However, its computational complexity increases with the size of the dataset, which can be a limitation for large-scale applications.
The clustered data are then compared using evaluation metrics to analyze their robustness. Since there are no ground truth or labeled data available due to the nature of this study, external validation metrics cannot be applied. Therefore, three internal evaluation metrics are considered in this research.
The Silhouette Score is a measure of how similar each data point is to its cluster compared to other clusters. It combines both cohesion and separation. A higher Silhouette Score indicates better-defined clusters.
The Calinski–Harabasz Index measures the ratio of the sum of between-cluster dispersion to within-cluster dispersion. A higher value indicates better-defined clusters, with more separation between them.
The Davies–Bouldin Index evaluates the average similarity between each cluster and its most similar counterpart. A lower Davies–Bouldin score indicates better clustering, as it reflects smaller intra-cluster distances and larger inter-cluster distances.
For implementing the algorithm, the Python programming language (version 3.11.1) and the Scikit-learn library were used. A summary of the proposed methodology can be seen in
Figure 3.
3. Results
The case study for this work is the Temple of Neptune, a Greek temple located in Paestum, Campania, Italy. Constructed in the fifth century B.C.E., the temple features six front columns and fourteen side columns. As one of the three best-preserved templates in the Greek world, it was surveyed using aerial photogrammetry by UAV in 2017 (see
Figure 4).
The complex spatial articulation of the geometries makes the Temple of Neptune an ideal subject for evaluating the robustness of the new methodology. The UAV utilized for the survey was a hexacopter equipped with a three-axis gimbal and a Alpha 6500 camera (Sony Corporation, Tokyo, Japan), capturing a total of 908 photogrammetric images. A GNSS network with 11 Ground Control Points was incorporated to estimate the internal orientation parameters in Agisoft Metashape through a self-calibrating bundle adjustment. For the analysis, a standard section was selected, highlighted in red in
Figure 5.
Gujsli et al. [
44] analyzed accuracy parameters independently to demonstrate that relying on a single parameter is ineffective for noise reduction in point cloud data. In their study, they considered the 90th percentiles for reprojection errors (see
Figure 6a), angles greater than 10° for the average intersection angle (see
Figure 6b), and the use of more than 10 cameras for reconstructing each 3D point (see
Figure 6c). They identified an optimal threshold for noise reduction at a projection accuracy of 10 (see
Figure 6d). While increasing this threshold further reduces noise, it comes at the cost of losing valuable data and compromising the overall data integrity. This leads to reduced cloud density, negatively impacting the reconstructed object’s descriptive quality. A visualization of the point cloud corresponding to single-parameter analysis is shown in
Figure 6.
To implement the new model, the data are first reduced to two dimensions using a VAE model. In the latent space of the encoder, clustering algorithms are applied. The hyperparameters of the VAE model used in this study are detailed in
Table 1. The VAE architecture incorporates a probabilistic framework to map input data to a latent space, enabling both dimensionality reduction and generative capabilities. The encoder and decoder networks are designed with intermediate layers that utilize the ReLU (Rectified Linear Unit) activation function. ReLU introduces non-linearity by outputting the input directly if it is positive and zero otherwise. This choice of activation function is computationally efficient and helps mitigate the vanishing gradient problem, ensuring effective training of the deep learning model. Batch normalization and dropout regularization are applied to improve generalization and prevent overfitting. The encoder compresses the input data into a two-dimensional latent space, optimized for visualization and clustering tasks. The loss function combines reconstruction loss (mean squared error) with the Kullback–Leibler divergence, which ensures that the learned latent space approximates a standard normal distribution. This probabilistic framework allows the VAE to generate meaningful representations and handle noise effectively.
It is important to note that all clustering algorithms are configured with identical parameters. Specifically, the number of clusters is set to 10, with 10 initializations and a tolerance of 1 × 10
−4, which determines the stopping criterion for the algorithm. A lower tolerance value indicates a stricter convergence requirement. The covariance type is set to “full,” meaning that each cluster is modeled with its full covariance matrix, offering greater flexibility in capturing the data’s shape and fitting it to the assigned clusters. Additionally, the initial weight settings are defined to provide reasonable initial estimates for cluster assignments and distribution parameters, which contribute to the overall performance and stability of the clustering process. The hyperparameters for each clustering model are selected through a combination of random search and experimental analysis. Moreover, experimental analysis is conducted by systematically testing various hyperparameter configurations and selecting the ones that yield the best results based on clustering evaluation metrics. This approach ensures that the chosen hyperparameters are optimal for each model and dataset. The analysis of clustering algorithms reveals that the average values of the parameters differ significantly between the four methods and the distribution of clusters produced by each algorithm varies when applied to the two features generated by the VAE model. The data points are clustered using four different clustering models, and the resulting clusters are depicted in
Figure 7, with each cluster represented by one of ten distinct color tones.
The comparison of cluster information is presented in
Table 2,
Table 3,
Table 4 and
Table 5. It shows that all clustering algorithms produce consistent results. The number of tie points identified in the clusters enhances the point cloud density while maintaining its quality, enabling a more detailed description of the object.
The results of clustering using GMM, k-means, agglomerative clustering, and Spectral clustering are evaluated across three key metrics, Silhouette Score, Calinski–Harabasz Index, and Davies–Bouldin Index, and can be seen in
Table 6. The evaluation metrics provide a quantitative basis for comparing clustering algorithms.
Silhouette Score: k-means and agglomerative clustering achieved the best scores, suggesting that they are more effective at identifying well-separated clusters.
Calinski–Harabasz Index: k-means achieved the highest score, indicating excellent inter-cluster separation.
Davies–Bouldin Index: the low Davies–Bouldin scores of k-means and agglomerative clustering confirm their ability to produce compact and distinct clusters.
The results indicate that k-means clustering is the most robust and effective method for analyzing the VAE latent space, followed closely by agglomerative hierarchical clustering. Both hard clustering methods outperform GMM and Spectral clustering in terms of cluster cohesion, separation, and overall quality. Spectral clustering can serve as a secondary choice for local damage detection with additional optimization as it is able to perform soft clustering, while GMM may not be appropriate without substantial modifications to its parameters or assumptions.
5. Conclusions
Given the lack of prior structural information, creating an accurate 3D model from available data is essential for point-cloud-based monitoring and condition assessment methods. This work proposes a novel methodology for reducing noise in tie point clouds, which is particularly valuable for the condition assessment and 3D reconstruction of cultural heritage sites. The proposed methodology is crucial for generating precise digital documentation, enabling effective comparison with the current conditions of the analyzed object and facilitating the identification of any new damage.
The proposed method introduces an innovative approach by utilizing a combination of multiple accuracy parameters rather than relying on a single metric. Initially, a variational autoencoder model reduces the features to only two features, and in this latent space, four clustering algorithms are applied. This analysis enables the simultaneous consideration of multiple accuracy parameters, improving the overall effectiveness of noise reduction in point clouds. Additionally, this study investigates the impact of these four widely used clustering algorithms through several evolutional metrics, aiming to establish the most robust methodology for noise reduction. To validate the robustness and applicability of the proposed approach, the Temple of Neptune is employed as a case study, demonstrating its potential to preserve the accuracy and integrity of 3D reconstructions for cultural heritage sites. K-means and agglomerative hierarchical clustering methods show comparable average accuracy values across features. Spectral clustering follows these methods but offers additional advantages, such as the ability to perform soft clustering by capturing complex relationships in the data and handling non-linear boundaries more effectively.
Future directions for this work include extending the model’s application across diverse disciplines, integrating additional data sources, and refining the algorithms to handle more complex damage scenarios. The model demonstrates significant potential beyond cultural heritage, with applicability in fields such as civil engineering, urban planning, environmental monitoring, and autonomous systems, where enhanced point cloud data can greatly improve accuracy and inform decision-making processes. By improving the accuracy of these digital models, we aim to contribute to the long-term preservation and protection of valuable assets across various fields, ensuring that structures, environments, and systems can be accurately assessed, maintained, and optimized for future generations. Furthermore, implementing a Siamese neural network is proposed for future research to enhance damage detection across various fields. This approach will allow for the comparison of point cloud datasets captured at different times from the same location, enabling the analysis of temporal changes to identify structural alterations, detect new damage, and monitor ongoing deterioration effectively in contexts such as cultural heritage, civil engineering, urban planning, and environmental monitoring.