1. Introduction
In imaging spectroscopy, contiguous narrow-band spectrographic information is collected for each spatial pixel in an imaging system. The technology is presently synonymous with hyperspectral imaging (HSI) and is commonly implemented within the discipline of remote sensing to characterize the physical and chemical properties of observed materials. This is preformed via spectroscopic and spatial analysis methodologies [
1]. Imaging spectroscopy technologies have shown their utility in numerous remote sensing applications in geology [
2,
3,
4], defense [
5,
6], agriculture [
7,
8,
9], forestry [
10,
11,
12], oceanography [
13,
14,
15], forensics [
16,
17,
18] and ecology [
15,
19,
20], among others. In theory, spectrographic imaging data are spectrally and spatially piece-wise smooth; neighboring locations and wavelengths are well-correlated due to the high spatial-spectral resolution allowed by the narrow band criterion [
21,
22].
With such an abundance of information, the processing and analysis of HSI data are not trivial. Relevant spectral signatures are often difficult to identify, especially given the presence of signal noise, which further impedes information extraction [
23]. Spatial and spectral correlations can be exploited to aid in the analysis of imaging spectroscopy data with a correlation metric. The Pearson product-moment correlation coefficient (CC) is one of the simplest statistical tools that has been widely implemented to measure levels of correlation [
24].
The CC is a measure of linear association between two variables. It is formally given [
24] by Equation (1):
where
,
,
,
represent the two variables of interest and their means, respectively. In mathematical terms, the CC represents the sum of the centered and normalized cross-product of
x and
y [
24]. Each variable is centered by removing its mean. The denominator normalizes the numerator by the variance of the studied variables. Using the Cauchy–Swartz inequality, it can be shown [
24] that the numerator is always less than or equal to the denominator. Therefore, the value of the CC is bounded between −1 and 1. The boundary values represent a perfect linear correlation between
and
. A value of zero corresponds to no linear correlation between the variables. Values greater than zero indicate a positive correlation between the variables of interest; the opposite is true for values less than zero. The CC is a useful descriptive measure of correlation since its value does not depend on the scales of measurement for the studied variables [
24]. It is important to note that the calculation of the CC is not limited by any statistical assumptions; however, its value as an input to other statistical metrics may need to conform to certain restraints (e.g., normally-distributed data).
To date, the CC has been widely implemented to investigate spectrographic imaging data [
11,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35]. In these efforts, the literature has concentrated on applying the statistical tool to establish bands that linearly associate with quantifiable physical and chemical properties. Exploiting the linear relation, this method of band selection has been used to create and improve predictive models that associate hyperspectral data with useful parameters [
11,
25,
27,
29,
30,
31,
32,
34,
35]. For example, Peng et al. [
11] applied the CC to establish bands that strongly correlate with forest leaf area index, improving predictive models at the landscape level. To a lesser extent, the CC has been applied for the purposes of data reduction and correction [
25,
28,
33,
36]. In 2011, Richter et al. [
33] outlined a corrective method for HSI data that relied, in part, on the CC. The correction accounted for the effects of the spectral smile, a spectral non-uniformity in the cross-track direction that is caused by the optical design of the spectrometer and results in per pixel changes in wavelength registration across the field-of-view (FOV) [
33]. In the study, the CC was used to measure uniformity levels across the FOV, indirectly assessing the effects of the spectral smile defect. A corrective solution was selected by maximizing this metric. From this application, the CC was shown to be a useful tool in the assessment of HSI data. Following this example, the CC can be used for the detection and quantification of other errors. This was exemplified by Tanabe and Saeki [
26], who rigorously quantified the sensitivity of the CC to spectral shifts in infrared spectra. Such research was fundamental to the application of the CC for error detection in infrared spectroscopy. Unfortunately, the findings were somewhat limited in their application to hyperspectral remote sensing as the study was conducted in an ideal environment with a laboratory-grade spectrometer. Earth observation (EO) remotely-sensed measurements are most often collected with airborne spectrometers under less than the ideal conditions. Before the CC can be confidently applied to hyperspectral EO data, the sensitivity of the tool needs to be characterized with respect to various potential errors and noise levels.
The purpose of this study was to use the CC to develop an easy to implement methodology to detect issues with HSI data. The methodology was intended explicitly for the detection of errors, not for the identification of their origin. Although other error detection methodologies exist (e.g., [
37,
38,
39,
40]), they can be expensive to implement and rely on a higher level of mathematical understanding. To develop a novel method, the CC was first characterized with respect to artificially-induced errors in ground data. Afterwards, this information was applied to locate the spatial location and spectral bands associated with errors in real HSI data. The overall objective of this study was to substantiate the CC metric as a low monetary cost, robust and simple statistical tool in the quality assessment of EO HSI data through the detection of errors.
2. Materials and Methods
2.1. In-Situ Ground Hyperspectral Data
In-situ hyperspectral radiance measurements were collected on 23 June 2016, from 16:54:19 to 17:00:36 GMT with a Spectra Vista Corporation (Poughkeepsie, NY, USA) HR-1024i ground spectrometer at the Flight Research Laboratory of the National Research Council of Canada (NRC) under stable illumination conditions (
Figure 1). The HR-1024i is a solid-state device that collects radiance data in a circular FOV. The device collects spectral data over 1024 spectral bands, which are non-uniformly distributed between 350 and 2500 nm using three independent detectors: a single 512-chanel silicon photodiode array and two 256-channel indium gallium arsenide arrays. The three detectors are characterized by nominal spectral resolutions of ≤3.5 nm (340 nm–1014 nm), ≤9.5 nm (971 nm–1911 nm) and ≤6.5 nm (1897 nm–2523 nm), respectively. In this study, spectral measurements were acquired with a 4° FOV fore-optics from a height of 1 m at 10 different locations on an asphalt target. Each measurement covered a single 38.3 cm
2 segment of asphalt that was contained within the area imaged by the airborne HSI systems (ITRES Research Limited, Calgary, AB, Canada) described in
Section 2.2. The in-situ datasets were used to provide ground truth measurements for the characterization of the CC.
A wavelength (
)-dependent, interpolated and normalized mean in-situ radiance spectrum for asphalt,
, was derived from the collected ground measurements for use in
Section 2.3. In particular, ten asphalt radiance spectra were averaged, normalized by the maximum and then resampled at 0.1 nm intervals using the Akima interpolation method [
41] to produce the “true” spectral signature of asphalt to be used in the characterization phase of the CC tool. The Akima interpolation method was selected due to its robust ability to provide a smooth interpolation that closely matched the original input signal [
41]. The spectrum was interpolated to place
on a uniform wavelength array and increase the density of spectral information while preserving the overall shape and content of the original signal.
2.2. Airborne Hyperspectral Image Acquisition and Processing
Airborne HSI data were acquired on 23 June 2016 at 14:53:13 GMT and 24 June 2016 at 13:24:16 GMT over the Macdonald–Cartier International Airport (containing the Flight Research Laboratory calibration site) in Ottawa, ON, Canada (
Figure 2). Asphalt is an ideal target for real airborne acquisition as it is effectively ubiquitous in urban settings and can be found on the roadway systems surrounding the studied area. Furthermore, the surface reflectance of the material has a low amplitude (nearly flat), smoothly varying spectral response and is thus useful for in-field pseudo-calibration and validation [
42].
Airborne imaging spectrometry data were acquired aboard the NRC’s Twin Otter fixed-wing aircraft with two complimentary HSI systems. The imagers each recorded an adjacent and partially-overlapping portion of the reflective electromagnetic spectrum between 366 nm and 2530 nm. Both imagers were manufactured by ITRES Research Limited. The first sensor system, the Compact Airborne Spectrographic Imager 1500 (CASI), acquired 288 bands (wavelength samples) within the 366–1053 nm range. The CASI is a variable frame rate, grating-based, pushbroom imager with a 39.7° FOV across 1500 spatial pixels. The device has a 0.49-mrad instantaneous FOV with a variable f-number aperture, configurable between 3.5 and 18.0. The second imaging system was the Shortwave Airborne Spectrographic Imager (SASI). The SASI is a prism-based pushbroom imager that acquires data from 160 spectral bands within the 885–2530 nm range with 640 spatial pixels across a 39.8° FOV. The device has an instantaneous FOV of 1.14 mrad and an aperture with a constant f-number of 1.8. Imagery is acquired at a fixed frame rate of 60 hertz with a programmable integration time of ≤16.6 ms. On both data acquisition days, imagery was obtained from a nominal height of 1115 m AGL with an approximate heading of 306.5° True North (
Figure 2).
Prior to CC analysis, the HSI data underwent three pre-processing steps. The first step was a correction in the calibration to take into consideration the effects of small, but measurable pressure and temperature-induced shifts in the spatial-spectral sensor alignment. The second step was a spectroradiometric calibration that, following removal of estimated signal offset contributions (electronic offset, dark current, frame shift smear (CASI only), internal scattered light (CASI only) and 2nd order (CASI only)), converted the resulting radiance-induced digital pixel signal into units of spectral radiance (uW⋅cm−2·sr−1·nm−1). The final step was implemented to remove the laboratory-measured spectral smile by resampling the data from each spatial pixel to a uniform wavelength array. Although most of the spectral smile effects are removed by this pre-processing, extremely small residual effects may remain. Geocorrection of the data was not performed in order to preserve the original spectral response per pixel.
The described pre-processing methodologies utilized NIST traceable calibration data provided by the sensor manufacturer. Using the initial calibration data, various artefacts were identified in the resulting calibrated imagery. Independent of this study, the processing methodology was updated to refine the steps described above, resulting in new calibration programs and calibration data files. This refined processing removed many of the identified artefacts in the data. The CC analysis was performed on the raw imagery after being processed with both the initial and refined calibration files and methodologies. Overall, the study examined 8 datasets: the four raw hyperspectral images collected by the CASI and SASI over the two acquisition dates processed with both the original and refined processing methodologies.
2.3. Characterization of the Correlation Coefficient with Averaged and Interpolated In-Situ Radiance Hyperspectral Data
Before the CC was applied to the airborne imagery, the sensitivity of the statistical tool needed to be characterized with respect to the natural variances within asphalt spectra. This was accomplished by calculating the CC between each of 10 raw in-situ hyperspectral radiance measurements and their averaged spectral response.
The sensitivity of the CC to common signal issues in HSI data was also characterized by artificially inducing errors in
, the spectral response derived in
Section 2.1. Five artificial errors were introduced independently by modifying
in accordance with
Table 1 to generate a variety of transformed signals,
. The following modifications were applied: introduction of additive white Gaussian noise (AWGN), additive transformation, multiplicative transformation, introduction of spectral shift and multiplicative transformation of a single feature. The transformation models in
Table 1 were developed to mediate the desired modifications. Parameters were carefully selected to mimic realistic potential errors. The AWGN modification was applied to generate a transformed spectral response with a specified signal-to-noise ratio, SNR. SNR designates the ratio between the energy of the original signal and the generated noise. For example, to obtain an SNR of 100:1, 4.31% AWGN was added to the signal.
,
and
represent the additive factor, multiplicative factor and spectral shift (in nm), respectively, used to carry out each modification. Although there was no reason for the additive and multiplicative modifications to influence the CC, these tests were included to help provide a clear understanding of the approach. The multiplicative transformation of a single spectral feature was mediated through a normal distribution scaled by
and vertically shifted with a minimum value of 1.
and
corresponded to the standard deviation and mean values, respectively, of the distribution. A normal distribution was used for the multiplicative factor to ensure the feature remained continuous along the edges of the spectral feature.
was selected to capture the atmospheric absorption feature centered at 935 nm. The
of 12 nm was chosen to ensure that the shoulders of the feature between 899 nm and 971 nm were within 3
of
. α was varied from 1–50 to control the degree to which the absorption feature was modified.
The tested ranges of values for SNR,
,
and
were selected to introduce nominal to substantial errors. The CC was calculated between
and each of the transformed datasets,
, in accordance with
Figure 3.
To test the persistence of the acquired trends with the presence of signal noise, the CC calculations for the last four modifications were repeated with AWGN. In particular, 4.31% AWGN was introduced to
to acquire a new radiance signal,
, with an SNR of 100:1, a reasonable value for airborne HSI data. A new transformed signal,
, was acquired by applying transformation models from the last four rows of
Table 1 to
after introducing AWGN to generate a signal with an SNR of 100:1. The CC was calculated between
and each
in accordance with
Figure 4.
As a final test of consistency, the standard deviation of the CC was assessed in the presence of noise. In particular, the AWGN transformation in
Table 1 was applied to
1000 times. A CC was calculated between
and each of its transformations. The standard deviation of the CCs from each distinct SNR was calculated.
2.4. Application of the Correlation Coefficient to Airborne Hyperspectral Imagery (Error Detection)
Before applying the CC, a region of interest (ROI) (blue line in
Figure 5) was identified across the FOV, along the taxiway located directly south of the calibration site. The ROI was comprised of a single asphalt road pixel from each column within the sensor FOV. Every attempt was made to acquire spectra from asphalt pixels that were uncontaminated by non-asphalt substances such as paint, vegetation and other non-asphalt hydrocarbons. “Wobbles” in the imagery in
Figure 5 are caused by the movement of the aircraft and can be readily accounted for through various geocorrective methodologies. In this work, it was fundamental to preserve the original sensor geometry in the analysis, so no geocorrection process was applied.
The spectrum from the center asphalt pixel in the ROI was designated as the reference for the application of the CC since it was the center of the instruments’ FOV. The center pixel was evaluated to ensure that it was a reasonable reference that contained no obvious errors. A CC was calculated between the spectrum from each pixel in the ROI and the designated central pixel reference in accordance with
Figure 6.
Theoretically, the CCs should be exactly 1 across the FOV. Although this is not the case in real data, the CC between well-behaved target spectra will vary around a mean value that is still quite close to 1. The spatial pixels associated with substantial reductions in the CCs were recorded as potential locations for errors in the HSI data. Substantial reductions were characterized by CCs that fell below a designated threshold that was derived from the sensitivity testing.
To calculate the threshold, a stable spatial region was manually identified by consistent CCs that varied around a constant mean. Using the mean CC of this region, the SNR of a stable spectrum was approximated using the noise sensitivity data derived in
Section 2.3. With the approximate SNR, the data from the final test in
Section 2.3 were used to estimate the expected standard deviation of the CCs derived from stable spectra. Using the estimated standard deviation and the mean value of the CCs in the stable region, potential errors were detected by reductions more than
below the mean. A
threshold was selected to ensure that at least 99.7% of the stable data were not flagged as a potential error. Consequently, CCs below the threshold were likely associated with errors in the HSI data.
To spectrally isolate the potential errors in the recorded spatial locations, the CCs across the ROI were recalculated after removing the data in pre-defined spectral windows. The schematic in
Figure 7 was carried out for various spectral windows. The spectral windows were designed to vary in size and spectral location. The window sizes were selected to ensure that windows contained anywhere from 1 to half of the total spectral bands. For any given size, the window was spectrally located beginning at the lower boundary of the spectral range. Each window was shifted by 5 nm until its edge surpassed the upper boundary of the dataset. For each window size and location, the average CC was calculated across the spatial regions associated with the detected potential errors. By maximizing the average CC over these regions, it was possible to identify the spectral window that was associated with a majority of the studied potential error.
To verify the spectral window and specify the nature of the potential errors, the imagery was visualized for a single band within the identified spectral ranges. In this visualization, image intensities were histogram equalized to enhance contrast by making the histogram of the resulting image equalized to a constant value. To verify that the reductions in the CCs were associated with these errors, the CCs were calculated across the FOV with respect to the center pixel after the removal of the identified spectral region.
Atmospheric absorption features were used to locate finer errors in the imagery that might not be easily visible in the CCs when calculated with the entire spectrum. These features were manually identified in the spectrum of the center asphalt pixel using the theoretical locations in
Table 2 for guidance.
Atmospheric absorption features are distinctive and constant under stable conditions [
45]. As such, the CC was thought to be able to detect inconsistencies in these regions since error-induced changes located within these features are more easily identifiable. As depicted in
Figure 8, a CC was calculated between the spectrum from each pixel in the ROI and the designated central reference pixel using only the hyperspectral data that corresponded to each of the approximate wavelength regions identified in
Table 2.
For each spectral range, the imagery was visualized for a single band within the specified window to study the nature of any detected errors. Once again, image intensities were histogram equalized to enhance contrast and clearly display potential errors. The methodologies presented in this section were repeated for each of the 8 processed images described in
Section 2.2.
4. Discussion
By characterizing the sensitivity of the CC before its application to real airborne HSI data, it was possible to verify the detective capabilities of the metric in the localization of errors in hyperspectral data. The findings generally agreed with all basic intuition and theoretical expectation of the CC. Linear transformations, in agreement with theory, had no impacts on the value of the CC. By calculating the CC between two similar spectra, the value could be used to gauge the consistency independent of the effects associated with linear transformations. Because of this property, the CC was shown to be extremely insensitive to the natural variances between different asphalt spectra. This was important for the detection of errors in HSI data as it implied that the differences in the calculated CCs were not primarily due to the variations between asphalt samples. All modifications, aside from the linear transformations, resulted in a consistent reduction in the CC. Consequently, the CC could detect spectral shifts and modified spectral features. Although the CC was sensitive to signal noise, all general trends held irrespective of the AWGN in hyperspectral data with an SNR of 100:1, which is a reasonably high noise level for airborne HSI data. This trend was fundamental to the application of the CC as it meant that the metric was sufficiently resistant to noise for the purposes of error detection; so long as errors are not being completely masked by noise, the CC can detect their presence. Implementing this knowledge, the CC was applied to real airborne HSI data.
Through the application of the CC, the quality of remotely-sensed hyperspectral data could be assessed through error detection in a quantitative manner. This was evident in the analysis of the eight hyperspectral images that were studied. By calculating the CCs across the FOV with the entire spectra, it was possible to immediately gauge the spectral consistency of the HSI data collected by the CASI and SASI, across the FOV. It is important to note that the method was explicitly designed for the detection of errors, not for the identification of their origin.
In the CASI imagery, the methodology was able to spatially detect errors along the edges of
Figure 13a,c by systematic reductions in the CCs near the boundaries of the FOV. The spectral locations of these effects were found in the blue end of the spectrum, in accordance with
Table 3. Visualization of the imagery in
Figure 14 and
Figure 15 revealed an error that is consistent with the effects of the spectral smile or other cross-track illumination effects [
46]. With a greater decline in the CCs near the edges of the FOV, this error was more prominent in the CASI data collected with the original processing methodology. As such, it is possible to deduce that the refined processing was able to better correct for the effects observed at the edges. The CASI imagery from the 24th was characterized by slightly lower and more variable CCs then the data from the 23rd, especially near the edges of the FOV. With this information, there is some innate variability in the data acquisition of the CASI that could be quantified from the CCs.
The CCs of the SASI imagery were virtually identical regardless of the processing methodology and acquisition date. This suggested that the SASI was very stable in its data acquisition. Furthermore, it was clear that the refined processing methodology did not have a large impact on the data. Using the developed algorithms, an error was detected in the SASI imagery at a single spectral band by a reduction in the CCs from Pixels 548–564. This showcased the developed CC-based methodology as a strong tool in the localization of errors in imaging spectrometers.
After removing the data within the spectral windows identified in
Table 3, there was a greater degree of consistency amongst all of the CASI and SASI images. That being said, not all datasets perfectly aligned; there was a slight offset between the CASI images collected from the 24th. To investigate the discrepancy in the CASI images from the 24th, finer errors were detected in the regions that surrounded the five atmospheric absorption features in
Table 4. All but one of the spectral regions was characterized by non-uniform CCs across the FOV (
Figure 18). The irregular structure in
Figure 18a,b was caused by non-uniform pixels, which noticeably varied in brightness. This error created “striping” artefacts across the image data. These artefacts have been observed in the literature and are likely due to radiometric calibration errors [
47]. Although the origin of the low frequency sinusoidal structure could not be established, it is clear that the trend is not a numerical computational effect. As such, there is likely a subtle wide spatial scale feature. The origin of the subtle feature in the CCs is still being investigated. The sporadic reduction in the CCs of
Figure 18c detected errors at Pixel Columns 1454 and 1456, which were visualized as a bright and dark vertical stripe, respectively, across the imagery (
Figure 20). Since this reduction was not present in
Figure 18d, the refined processing methodology was able to correct for this error. Based on the structure of the CCs near the edges of the FOV in
Figure 18e,f, there were potential residual smile effects or other cross-track illumination effects that could not be clearly visualized in the imagery. The sporadic reductions in the CC of
Figure 18g revealed groups of non-uniform pixels that created distinct “striping” artefacts that can be seen at several points across the CASI imagery from the 24th with the original processing (
Figure 21). These errors were not present in
Figure 18h or its associated imagery. As such, the refined processing methodology was able to correct for this error. The relatively constant CCs across the FOV in
Figure 18i,j corresponded with stable imagery within the designated spectral window, as displayed in
Figure 22. This information is fundamental as it showcases that the CC method can detect stable imagery, when it is present. The offset between the CASI imagery collected on the 24th in
Figure 17 was likely due to the additional errors that were not corrected in the original processing methodology.
Although significance testing yielded
p-values less than 10
−5 for all observed relationships, it is important to note that these values did not necessarily imply practical significance. This was due to an issue inherent to the
p-value itself; with such a large sample size and small variance, significance testing flagged even the most subtle of changes as significantly different [
48]. Fortunately, this was not an issue within the study as all of the flagged potential errors could be visualized and verified in the imagery itself. A similar statement can be made for the differences observed in the CCs between the distinct processing methodologies and acquisition devices.
Overall, errors were detected in the CASI and SASI imagery though the application of the CC. Although more sophisticated error detection methodologies exist (e.g., [
37,
39,
40]), they can be monetarily expensive to implement and rely on a higher level of mathematical understanding. Without a fundamental understanding of a method, its implementation can lead to inaccurate interpretations. The presented method is intuitive; the CC is a rather simple statistical tool and its application is straight forward. The detection can be conducted on radiance spectra prior to atmospheric correction, quickly after acquisition. After removing the wavelength region associated with large errors, the described methodologies could be repeated to isolate smaller errors. Although the application was developed for hyperspectral technologies, it can be easily generalized for data collected by other imaging spectrometers. This versatility showcases the CC as a strong and simple statistical tool for the analysis of spectrographic imaging data through the detection of errors.