Abstract
This paper introduces a novel approach for optimizing image capture using View-Planning (VP) to enhance Gaussian splatting for 3D reconstruction of archaeological sites, specifically focusing on Castellaraccio di Monteverdi. Traditional photogrammetry often produces over-smoothed models with artifacts. Our proposed approach leverages VP planning to select optimal viewpoints, ensuring comprehensive image coverage. We integrate this with Gaussian splatting which outputs highly-realistic 3D reconstructions. Initial evaluations on sample datasets demonstrate the potential of VP-enhanced Gaussian splatting to surpass traditional methods in terms of quality. The paper details our approach, discusses the challenges of 3D reconstruction without VP, and outlines our ongoing and future work, including upcoming tests at Castellaraccio di Monteverdi. Our findings aim to contribute to the field of archaeological documentation and analysis, supporting better preservation and understanding of historical sites.
Similar content being viewed by others
Keywords
1 Introduction
The process of archaeological excavation is a careful effort to uncover and preserve historical artifacts and structures. Archaeologists start with detailed surveys, often using Geographic Information Systems (GIS) [24] to create maps and analyze the site. These surveys help determine where to dig. During the excavation, archaeologists remove soil and debris layer by layer, documenting each step with photos, sketches, and notes to keep track of where everything is found. Modern techniques like 3D reconstruction are now commonly used. Methods such as photogrammetry and structure from motion (SfM) allow for the creation of detailed 3D models of the site and artifacts.
One such site is Castellaraccio di Monteverdi, a deserted medieval village located on a hilltop 130 m above sea level, with remnants of structures hidden in the vegetation. Overlooking the Ombrone River and a ruined medieval bridge, its history dates back to the mid-12th century when it was part of the Monastery of Saint Salvatore of Giugnano and the Abbey of Saint Lorenzo all’Ardenghesca. In the 13th century, it changed hands between the Commune of Siena and various noble families, ultimately becoming abandoned by the late 13th century. Since 2017, as part of the IMPERO project [29], archaeological research has been excavating the site to map its extent, urban plan, and chronological use, uncovering early medieval phases not documented in historical records.
Photo-realistic 3D reconstruction of archaeological sites [6, 9, 10, 13, 20] has gained popularity following the development of sophisticated algorithms [5, 27, 28], affordable sensors [11, 14] and easy access to high-performance compute [26]. Such methods also provide insights into the spatial relationships, architectural details, and historical context of the structures.
A new way to enhance traditional photogrammetry and Structure from Motion (SfM) techniques involves the use of Gaussian splatting [19] and Neural Radiance Fields (NeRFs) [23] for 3D reconstruction. Gaussian splatting represents sparse point clouds as 3D Gaussians which are then optimized, leading to more efficient rendering and processing of 3D scenes. NeRFs employ neural networks to represent 3D scenes with exceptional detail and accuracy, optimizing both geometry and appearance from sparse input images.
While these 3D reconstruction methods offer high-fidelity, photo-realistic results, they require a substantial number of source images that fully cover a scene [31]. These input images can be captured using hand-held cameras [30], UAVs [32], or other autonomous platforms [7]. Typically, images are taken in a pre-configured pattern over the target scene with set overlap, resulting in uniform coverage but often leading to low resolution and artifacts.
To address this, we propose an innovative approach to optimize image capture using View Planning (VP) planning [8] and Gaussing splatting. The proposed method is currently under investigation with initial experiments showing a significant increase in quality of 3D models generated using our method. We intend to deploy the proposed approach at Castellaraccio di Monteverdi in the following year. The findings of this paper along with datasets and code will be open-sourced to the community following publication.
2 Related Work
2.1 Photogrammetry
In recent years, the popularity of 3D reconstruction has surged due to advancements in technology and its applications in fields such as virtual reality [6], augmented reality [33], and cultural heritage preservation [12]. One prominent method is photogrammetry, which involves using structure from motion (SfM) algorithms to create a detailed 3D model.
Photogrammetry creates 3D models by capturing multiple overlapping images of an object or scene from different angles, then using algorithms to triangulate the points where the images overlap to generate a detailed 3D representation. This process (shown in Fig. 1 involves aligning the images, extracting key features, and reconstructing the surface geometry to produce an accurate and photo-realistic 3D model. The surface reconstruction is typically performed using Poisson surface reconstruction [17] which attempts to create water-tight surfaces leading to over-smoothed outputs. This is a common source of artifacts when the under-lying point cloud is sparse or scene has high variance in geometry.
2.2 Neural Radiance Fields (NERFs)
Neural Radiance Fields (NeRFs) [4, 22] (Fig. 2) have gained significant attention due to their ability to produce highly detailed and realistic 3D reconstructions from 2D images. They work by training a neural network to model the way light interacts with objects in a scene. This process involves feeding the network a series of images taken from different angles, along with the corresponding camera positions which can be estimated using SfM methods. The network learns how light is absorbed, reflected, and transmitted by different surfaces in the scene.
The core idea is the use of a volumetric representation, where each point in the 3D space is associated with a color and opacity value. The network attempts at predicting these values based on the input images, effectively creating a continuous function that describes the scene. By sampling many points along rays that pass through the 2D images, the network can render new views of the scene with high accuracy and detail. NeRFs have become a powerful tool in applications ranging from virtual reality to digital content creation, offering an unprecedented level of realism. This comes at the cost of training time as NERFs often have long training times. A scene containing 100 800\(\,\times \,\)800 images can take 1–2 days to train [22].
2.3 Gaussian Splatting
Gaussian splatting [19] (Fig. 3), a newer technique in 3D reconstruction, offers competitive performance to NERFs while being extremely fast in training and inference. It uses simple Gaussian shapes (“blobs”) to represent 3D objects which is particularly useful for real-time applications because it reduces the amount of computing power needed. Gaussian splatting creates 3D models by approximating complex surfaces using a collection of Gaussian functions (Gaussians). These Gaussians serve as building blocks, each representing a small, localized area of the surface with parameters that define its position, size, orientation, and intensity.
The process begins by capturing multiple images of the object or scene from different angles, similar to traditional photogrammetry. Key features and points are then extracted from these images and used to define the initial positions and parameters of the Gaussians. Each Gaussian function acts like a smooth, flexible “blob” that can be adjusted to fit the shape and texture of the object’s surface accurately.
The parameters of the Gaussian functions are iteratively optimized to minimize the difference between the projected 3D model and the input images. This involves fine-tuning the position, size, and orientation of each Gaussian to ensure that the model accurately represents the underlying geometry and appearance of the scene. Its efficiency makes it ideal for situations where quick rendering is crucial, such as in virtual and extended reality applications.
2.4 View-Planning
For large scale 3D reconstruction of sites, especially with UAVs, a set of waypoints are chosen that maximize the amount of information received by the onboard sensor (monocular cameras, LiDARs). These waypoints tend to be grid-based or circular patterns. However, the regularity of the path may cause some areas to be occluded. These occlusions create gaps or reconstruction artifacts which would need additional data acquisition steps which is not economical. Minimizing the amount of views required to cover an object is called as a View Planning Problem and choosing a view on-line is called as Next-Best View [25]. Traditionally, VP algorithms utilize an octree approach whereby 3D locations are voxelized and queried as a tree. Each voxel is tagged as unmapped, free or occupied. At every iteration the planner (PRM [16], RRT [21], RRT* [15]) creates a path that maximally reduces the number of unmapped voxels by following only free voxels. This path considers the system state as well as the robot constraints and dynamics (Fig. 4).
3 Method
In order to accurately reconstruct Castellaraccio di Monteverdi, we collected images using a UAV (DJI Mavic 3 Enterprise RTK) and a hand-held camera (Sony A7R III + Sony 24–70 F4). The use of UAVs, equipped with Real-Time Kinematic (RTK) positioning, ensured high-precision geo-location data for each image. We employed a grid-surveying technique with the UAV, which involves flying the drone in a systematic grid pattern to capture overlapping top-down images of the site. This method ensures comprehensive coverage and consistency in image capture, which is essential for photogrammetry and structure from motion (SfM) processes.
Additionally, images were taken from varying angles with the hand-held camera to capture details that might be missed from aerial views. This combination of aerial and ground-level photography provides a rich dataset that enhances the accuracy and detail of the 3D reconstruction.
For photogrammetry we used popular software such as [1, 2] which offer an extensive suite of tools to build, customize and manipulate 3D models. The software outputs a dense point cloud and a textured mesh which is shown in Fig. 6. [28] was also used for reconstruction of the Castellaraccio site.
Photogrammetry and 3D reconstruction based view generation pipelines are feed-backward methods that generate pixels on an image plane one pixel at a time. To generate accurate lighting and shadows, they use an expensive ray-tracing operation. These methods also use interpolation to produce continuous meshes that often leads to over-construction that is commonly observed as smoothing.
Feed-forward methods, like Gaussian splatting, on the other hand, iterate over objects in the scene and produce color and alpha values for pixels in a process called splatting. In Gaussian splatting specifically, clusters of points on homogeneous surfaces can be represented by parametric Gaussians which leads to a reduction in the number of operations required to generated views. Further the illumination from light sources is precomputed before the view synthesis and need only be computed once rather than for every pixel in ray tracing. In effect, splatting leads to photo realistic volumetric rendering with lower point densities provided the gaussians are optimized well to represent both homogeneous and non-homogeneous structures.
The output of these modules are a set of SFM and Splat models. Due to the path, these are noisy models and may contain smoothed or missing chunks due to possible non-overlapping viewpoints. An improved plan is needed to efficiently retrieve these missing viewpoints. A limitation of traditional View Planning algorithms is that the space captured by the sensor is categorized as either occupied, free or unmapped. The algorithm would then optimize the path to minimize the unmapped region. Rather than binary categorization (free or occupied), we propose the use of 3D Gaussians from the splat output instead. Each 3D Gaussian has an associated Mean and Variance corresponding to the uncertainity in the distribution of the points.
A splat with low variance is preferred on surfaces with higher variance in position, surface normals, color, contrast and opacity whereas in large homogeneous areas, fewer gaussians with higher variance are sufficient to capture the lower detail. These are then used to determine “voxels” with maximum uncertainity. Based on these factors, the planning algorithm will aim to plan a path that captures the best views to optimize the gaussian parameters. Thus, re-planning to achieve minimum artifacting and over-smoothing. This pipeline is shown in Fig. 5.
In the next section, we demonstrate the photorealistic output of this new view synthesis method and detail how and why they fail under certain circumstances. We indicate our future research direction to use domain specific cues to improve this mode of visualization to aid archaeo-geophysical sensing of cultural heritage sites.
4 Results and Discussion
4.1 Photogrammetry
Figure 6 (Left) show results using Photogrammetry to reconstruct Castellaraccio. While these models are of respectable perceptual quality, they contain over-smoothed regions and some artifacts. Artifacts are created due to the mesh generation process which is based on Poisson Surface Reconstruction which often smooths out surfaces when normal estimation is lacking or the underlying point cloud is not dense enough [18]. This can obscure intricate details and sharp edges leading to “bumps” and distortions.
4.2 Gaussian Splatting
Figure 6(Right), show 3D models (Splats) built using Gaussian Splatting. Splats are capable of synthesizing novel views which are visually superior to renderings of models built using photogrammetry. These confirm our hypothesis that Gaussian Splatting performs better than photogrammetry on the same dataset while offering comparable processing times. Although these splats are capable of representing the scene in high quality, they perform poorly when input images do not cover certain view-points. This is apparent in Fig. 6 where artifacts and noise elements are seen due to lack of coverage (Fig. 7).
4.3 Structure-Guided View Planing for Improved Gaussian Densification
Gaussian splatting uses fewer Gaussians to represent large homogeneous areas and collections of much denser Gaussians for other areas to reduce computational complexity and is one of ways through which this methods performs volumetric renderings in tens of minutes rather than days required by NeRF. An optimization process, termed densification, splits, merges and removes gaussians according to perceived complexity of the scene.
However, there are two key opportunities to make improvements to the domain-specific problem of generating photorealistic volumetric renderings for archeo geophysical sensing: (i) the densification in the original gaussian splatting is generalized and does not take any semantic cues from the scene, (ii) the densification process uses view-space positional gradients from a loss that is dominated by structural variations in the image space, as seen in Eq. 1, and (iii) a majority of the feedback from the loss function is biased by the image samples used to compute the loss. Therefore, splatting is clearly a downstream process that assumes the completeness of view samples and is largely influenced by the quality of the samples.
Since the structure of interest is bounded and can be known apriori, we are currently developing a method to plan the best views around the structure which can then be autonomously executed by a drone to produce an optimal set of views to: (i) better represent non-homogeneous surfaces often found during excavation, (ii) decrease geometric artifacts over structures largely caused by suboptimal densification, and (iii) improve the volumetric rendering in a 360\(^{\circ }\) viewing space around structures of interest.
5 Conclusion
In this paper we introduce a novel method for optimizing image capture by using View-planning. The images captured using this approach can be used to improve Gaussian splats by using the uncertainty information and by modeling the task as an information maximization problem. We describe our approach and demonstrate on a sample dataset. The method is set to be evaluated at Castellaraccio di Monteverdi in the following year following which we intend to publish our findings. The code, datasets and implementation details will be open sourced to the community following publication.
References
Agisoft Metashape: Agisoft Metashape—agisoft.com. https://www.agisoft.com. Accessed 15 May 2024
Polycam - LiDAR & 3D Scanner for iPhone & Android—poly.cam. http://poly.cam. Accessed 15 May 2024
Frequency-importance Gaussian splatting for real-time lightweight radiance field rendering - scientific figure on researchgate (2024). https://www.researchgate.net/figure/Pipeline-of-Gaussian-Splatting-Pipeline-of-Gaussian-Splatting-First-initial-sample_fig4_378904778. Accessed 15 May 2024
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5855–5864 (2021)
Bradski, G.: The OpenCV library. Dr. Dobb’s J. Softw. Tools Prof. Programmer 25(11), 120–123 (2000)
Bruno, F., Bruno, S., De Sensi, G., Luchi, M.L., Mancuso, S., Muzzupappa, M.: From 3D reconstruction to virtual reality: a complete methodology for digital archaeological exhibition. J. Cult. Herit. 11(1), 42–49 (2010). https://doi.org/10.1016/j.culher.2009.02.006
Colomina, I., Molina, P.: Unmanned aerial systems for photogrammetry and remote sensing: a review. ISPRS J. Photogramm. Remote. Sens. 92, 79–97 (2014). https://doi.org/10.1016/j.isprsjprs.2014.02.013
Connolly, C.: The determination of next best views. In: Proceedings. 1985 IEEE International Conference on Robotics and Automation, vol. 2, pp. 432–435 (1985). https://doi.org/10.1109/ROBOT.1985.1087372
Dawn, S., Biswas, P.: Technologies and methods for 3D reconstruction in archaeology. In: Thampi, S.M., Marques, O., Krishnan, S., Li, K.-C., Ciuonzo, D., Kolekar, M.H. (eds.) SIRS 2018. CCIS, vol. 968, pp. 443–453. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-5758-9_38
De Reu, J., De Smedt, P., Herremans, D., Van Meirvenne, M., Laloo, P., De Clercq, W.: On introducing an image-based 3D reconstruction method in archaeological excavation practice. J. Archaeol. Sci. 41, 251–262 (2014). https://doi.org/10.1016/j.jas.2013.08.020
Dias, P., Matos, M., Santos, V.: 3D reconstruction of real world scenes using a low-cost 3D range scanner. Comput.-Aided Civil Infrastruct. Eng. 21(7), 486–497 (2006). https://doi.org/10.1111/j.1467-8667.2006.00453.x
Gomes, L., Bellon, O.R.P., Silva, L.: 3D reconstruction methods for digital preservation of cultural heritage: a survey. Pattern Recogn. Lett. 50, 3–14 (2014)
Green, S., Bevan, A., Shapland, M.: A comparative assessment of structure from motion methods for archaeological research. J. Archaeol. Sci. 46, 173–181 (2014). https://doi.org/10.1016/j.jas.2014.02.030. https://www.sciencedirect.com/science/article/pii/S030544031400079X
Gupta, T., Li, H.: Indoor mapping for smart cities an affordable approach: using Kinect sensor and zed stereo camera. In: 2017 International Conference on Indoor Positioning and Indoor Navigation (IPIN), pp. 1–8 (2017). https://doi.org/10.1109/IPIN.2017.8115909
Karaman, S., Frazzoli, E.: Sampling-based algorithms for optimal motion planning. Int. J. Rob. Res. 30(7), 846–894 (2011). https://doi.org/10.1177/0278364911406761
Kavraki, L.E., Svestka, P., Latombe, J.C., Overmars, M.H.: Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans. Robot. Autom. 12(4), 566–580 (1996)
Kazhdan, M., Bolitho, M., Hoppe, H.: Poisson surface reconstruction. In: Proceedings of the Fourth Eurographics Symposium on Geometry Processing, vol. 7 (2006)
Kazhdan, M., Chuang, M., Rusinkiewicz, S., Hoppe, H.: Poisson surface reconstruction with envelope constraints. Comput. Graph. Forum 39(5), 173–182 (2020). https://doi.org/10.1111/cgf.14077
Kerbl, B., Kopanas, G., Leimkuehler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4), 1–14 (2023). https://doi.org/10.1145/3592433
Koutsoudis, A., Vidmar, B., Ioannakis, G.A., Arnaoutoglou, F., Pavlidis, G., Chamzas, C.: Multi-image 3D reconstruction data evaluation. J. Cult. Heritage 15, 73–79 (2014). https://api.semanticscholar.org/CorpusID:135535640
Kuffner, J.J., LaValle, S.M.: RRT-connect: an efficient approach to single-query path planning. In: Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), vol. 2, pp. 995–1001. IEEE (2000)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis (2020). https://doi.org/10.48550/ARXIV.2003.08934
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Neubauer, W.: GIS in archaeology the interface between prospection and excavation. Archaeol. Prospect. 11(3), 159–166 (2004). https://doi.org/10.1002/arp.231
Peralta, D., Casimiro, J., Nilles, A.M., Aguilar, J.A., Atienza, R., Cajote, R.: Next-best view policy for 3D reconstruction. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12538, pp. 558–573. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66823-5_33
Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional (2010)
Sansoni, G., Trebeschi, M., Docchio, F.: State-of-the-art and applications of 3D imaging sensors in industry, cultural heritage, medicine, and criminal investigation. Sensors 9(1), 568–601 (2009). https://doi.org/10.3390/s90100568
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Sebastiani, A.: Digital artifacts and landscapes. Experimenting with placemaking at the IMPERO project. Heritage 4(1), 281–303 (2021)
Torr, P.H.S., Zisserman, A.: Feature based methods for structure and motion estimation. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) IWVA 1999. LNCS, vol. 1883, pp. 278–294. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44480-7_19
Turkar, Y., Aluckal, C., De, S., Turkar, V., Agarwadkar, Y.: Generative-network based multimedia super-resolution for UAV remote sensing. In: IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium, pp. 527–530 (2022). https://doi.org/10.1109/IGARSS46834.2022.9884486
Verhoeven, G.J.J., Loenders, J., Vermeulen, F., Docter, R.: Helikite aerial photography a versatile means of unmanned, radio controlled, low-altitude aerial archaeology. Archaeol. Prospect. 16(2), 125–138 (2009). https://doi.org/10.1002/arp.353
Yang, M.D., Chao, C.F., Huang, K.S., Lu, L.Y., Chen, Y.P.: Image-based 3D scene reconstruction and exploration in augmented reality. Autom. Constr. 33, 48–60 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Turkar, Y., Aluckal, C., Adhivarahan, C., Sebastiani, A., Dantu, K. (2024). A View-Planning Approach to 3D Reconstruction. In: De Paolis, L.T., Arpaia, P., Sacco, M. (eds) Extended Reality. XR Salento 2024. Lecture Notes in Computer Science, vol 15029. Springer, Cham. https://doi.org/10.1007/978-3-031-71710-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-71710-9_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71709-3
Online ISBN: 978-3-031-71710-9
eBook Packages: Computer ScienceComputer Science (R0)