2.1. Study Site and Instruments
A small area of 215 × 115 m
2 was selected inside a wide private plant nursery, close to Cirimido village, Como, Italy. This area was characterized by the presence of several tree species: maple (Acer spp.), red maple (Acer rubrum), hornbeam (Carpinus betulus), catalpa (Catalpa spp.), dogwood (Cornus spp.), beech (Fagus spp.), copper-beech (Fagus purpurea), ash (Fraxinus spp.), ginkgo biloba (Ginkgo biloba), honey locust (Gleditsia triacanthos), and chestnut-leaved oak (Quercus castaneifolia). The local technical staff provided information for their identification. The employed platform was the HexaKopter by the German company MikroKopter (Moormerland, Germany.), (
Figure 2).
This powered vertical take-off and landing vehicle is equipped with six brushless motors; it weighs about 1.2 kg with batteries and its maximum transportable payload is equal to 0.5 kg. The HexaKopter can fly up to 200 m far from the take-off point and the flight duration is limited to 10 min.
Two different digital compact cameras were used for surveying. The Red Green Blue (RGB) digital compact camera was a mirrorless Nikon 1 J1 (Nikon Corporation, Tokyo, Japan). It weighs 310 g and acquires images in the visible part of the electromagnetic spectrum by means of a CMOS sensor of maximum size 10 Mpx. The other digital compact camera was a Tetracam ADC Lite (Tetracam Inc., Chatsworth, CA, USA): this lightweight (200 g) Agricultural Digital Camera (ADC) is specially designed for UAS applications and features a single CMOS sensor of 3.2 Mpx. The camera is optimized for capture of Green, Red and Near-Infrared (NIR) wavelengths approximately corresponding to Landsat TM2, TM3 and TM4 bands [
25]. The resulting Color Infrared (CIR) imagery is suitable for derivation of several vegetation indices, canopy segmentation and band ratios. Further cameras details are reported in
Table 1.
2.2. UAS Survey Planning and Data Collection
In summer, the multi-rotor HexaKopter was remotely piloted by an operator during landing and take-off, whilst a pre-set flight planning was used during the imagery acquisition. Owing to the UAV technical limitations, two separate flights were carried out, first with the Nikon 1 J1 and then with the Tetracam ADC Lite. To cover the entire area, it was necessary to plan the acquisition of four strips with forward and side overlaps equal to 85% and 50%, respectively. The flight plan was tuned according to the sensor with the lower geometric performance (i.e., the ADC Lite). Taking into account the various tree heights, the flight altitude was fixed at 40 m above ground level, thus producing different Ground Sample Distances (GSDs): 1.3 cm for the RGB images and 1.6 cm for the CIR ones. Thus, the Nikon 1 J1 acquired 125 images and the ADC Lite camera 159 CIR images.
Fifteen black and white (b/w) square targets, with side of 30 cm and triangular pattern, were homogeneously distributed in the area. The center coordinates were measured by means of a GNSS receiver Trimble 5700 in NRTK (Network Real Time Kinematic) survey (
Figure 3), with horizontal and vertical accuracies of 2–3 cm and 5 cm, respectively.
The geometric calibration of the cameras was performed, by taking some images of a b/w planar calibration grid of known geometric properties; the parameters of a Brown distortion model [
26] were estimated through PhotoModeler Scanner [
27]. Moreover, the CIR dataset was radiometrically corrected as recommended by the ADC Lite technical documentation [
18]. Some images were taken of a small white Teflon calibration plate, provided with the camera, whose spectral response is known. These images were used by the accompanying software PixelWrench2 [
28] to process all other images in relation to the sunlight during the acquisition period.
Aiming to make use of a multi-temporal dataset, after the first summer survey, a second one was planned in autumn, when some tree species are clearly recognizable thanks to their specific foliage color, as the red crowns of the copper-beech, or the yellow one of gingko. Unfortunately, due to a long period of bad weather conditions, the second survey took place at the end of November when some of tree species had already lost leaves. The same flight plan and a similar ground points survey were performed. Unique difference is that only the Nikon 1 J1 was employed, because the rainy weather forced to stop the survey prior to flying with the ADC Lite.
From here on, the three final datasets are identified with the names “RGB_S”, “CIR_S” and “RGB_A”, where “S” stands for summer and “A” for autumn.
2.5. Processing of Multispectral Orthophotos
Some derived variables were computed from the 6 original layers acquired in summer (RGB_S and CIR_S). The first computed band was the Normalized Difference Vegetation Index (NDVI): this index is quite powerful in performing a qualitative classification thanks to the capability to distinguish alive vegetated areas from other surface types (water, soil, etc.). Successfully employed in hundreds of studies, it is expressed as the difference between the NIR and the Red bands (both from the CIR dataset) normalized by the sum of these [
34]:
As already explained in
Section 2.2., CIR dataset was processed using a Teflon calibration target in order to perform spectral balance. Hence, the resulting NDVI is refined in relation to that day’s lighting conditions.
Then, texture measures were generated from spatial relationship of pixels. According to Gonzalez and Woods [
35], texture can be defined as the measures of smoothness, coarseness, and regularity of an image region, and can be discriminated by using structural or statistical techniques. The latter are most suitable for natural image scenes with rare re-occurring patterns [
36]: therefore, the commonly used Grey Level Co-occurrence Matrix (GLCM) method, suggested by Haralick et al. [
37], was employed in this case study. The GLCM describes the texture in a user-defined moving window, by looking at the spatial co-occurrence of the grey levels of the included pixels. The GLCM algorithm computes a matrix that accounts for the difference in grey values between two pixels at a time, called the reference pixel and the neighbor pixel. In this case study, the neighbor pixel was located, each time, one pixel above and to the right of the reference pixel: according to Murray et al. [
38], this offset of (1,1) is the most commonly used.
The GLCM was calculated for 24 different window sizes, ranging from the minimum value of 3 pixels to a maximum value of 49 pixels (corresponding to roughly 49 cm, i.e., included in most of tree crowns). Lastly, the GLCM implementation of ENVI provides the so-called eight standard Haralick measures, which can be subdivided into three categories [
39]:
contrast based: contrast, dissimilarity, and homogeneity;
orderliness based: angular second moment and entropy;
statistically based: mean, variance, and correlation.
The GLCM calculations were performed only on one channel, to reduce data redundancy. Following the approach of Dorigo et al. [
40], the band with the highest value of entropy was chosen; i.e., the Red summer band (Red_S). To summarize, for each window size (24, from 3 to 49 size) the 8 texture features were computed, on the Red_S band, with equal offset of (1,1). Then, a selection of the optimal window size was performed, as described below.
The final result of a texture analysis relies heavily on the adopted window size which, in turn, depends on several factors [
41]. If the window is too small, then it does not contain enough information about the area; on the contrary, if the window is too large, it can overlap with other types of ground cover, thus producing erroneous results (the so-called edge effect [
20]). However, despite the issue being recognized, a general method for establishing the optimum size has not yet been proposed. In this study, hence, two methods, usually applied to Remote Sensing standard imagery, were compared and used to determine the optimal window size.
The first method performs the classification of all the computed texture features, for each window size, with the Maximum Likelihood classifier, as already done by Murray et al. [
38], evaluating the behavior of classification accuracies. For each of the 24 different window sizes, the eight texture features were classified together with the Maximum Likelihood algorithm on training samples described in 2.4. By analyzing the Overall Accuracy provided by the Error Matrix [
33], it was possible to observe how the accuracy varied with respect to the window size: as visible in
Figure 6, classification accuracy grows with increasing window size.
Although it seems sensible to choose the largest window size, the optimal one is a compromise between having a good OA and minimizing the aforementioned edge effects. According to Murray et al. [
38], the optimal window size corresponds to where the slope of accuracy curve varies. In this case, the slope value is 1.3 for window size from 0 to 19, 0.3 from 21 to 27 and 0.5 after 27. Hence, the main changes are in correspondence of values between 21 and 27, as shown in
Figure 6.
The second method, instead, is based on the empirical semivariance computation, which is a measure of data variations in the spatial domain [
42]. The empirical semivariance γ(h) of a random function Z(x) is defined as
where Z(x
i) and Z(x
i + h) are two values of the same function Z(x) at a distance h (lag); N is the total number of pairs at given lag h. For digital images; i.e., for digital functions in two dimensions—the optimal window size to describe a specific kind of coverage is the lag (range) that results in the maximum variability (sill) of a scene structure [
43]. This method was applied on a sample set of pixels, selected on the Red_S channel, for each tree species. The related semivariograms are shown in
Figure 7. The empirical semivariograms were best-fitted with exponential model. The resulting ranges, evaluated at a lag of 4 times the exponential constant, varied for the different tree species from a minimum of 18 to a maximum of 35. Four species have ranges close to 21, other four around 28, the remaining three saturate at higher lag. To avoid the use of too many layers in classification processing, only two representative values of window sizes were selected, and the two values determined with the previous method were chosen.
Therefore, both texture images produced with window sizes of 21 and 27 (called T21 and T27 from now on) were selected to be further analyzed.