1. Introduction
Total basin discharge (TBD) is a fundamental water balance component of river basins [
1,
2] and it has been traditionally measured at in-situ hydrological stations near estuary mouths. TBD can be converted into surface runoff (
R) when considering a surface area unit over the entire river basin. Continuous TBD time series are necessary for the monitoring of hydrological extremes (i.e., droughts and floods) in deltaic regions. Such monitoring is important for better water management, allowing an increase in water usage efficiency and minimizing unpredictable human, agricultural, and economic losses [
3,
4,
5,
6,
7]. Nevertheless, a comprehensive global river water discharge (RWD) observing network has not been established yet [
8]; moreover, the number of in-situ stations has been decreasing since the late 1970s due to the absence of sufficient funding for the upgrade and maintenance of facilities [
9,
10]. Consequently, a gauge-independent method, which would provide a synoptic mean of observing the global RWD, remains elusive [
11].
Remotely sensed RWD estimations based on the most recent (passive and active) remote sensing (RS) advances, have been demonstrated to be viable alternatives [
12]. Over the long period, it is anticipated that remotely sensed RWD estimations will compensate the scarcity of in-situ hydrological data, particularly in remote regions. Current remotely sensed RWD methods can be classified into the following four categories:
- (1)
The correlation between passive remotely sensed variables (e.g., Normalized Difference Vegetation Index (NDVI), and Land Surface Temperature (LST) [
13,
14,
15,
16]) and the water level (WL) or RWD data [
17,
18,
19,
20,
21,
22]. Notably, the above remotely sensed variables have no direct causal relationship with RWD.
- (2)
The calculation of the RWD from passive remotely sensed hydraulic variables, via hydraulic geometric equations (e.g., [
23,
24,
25,
26,
27,
28,
29]). In this case, the accuracy of RWD estimation would be region-dependent: the resolution of RS images might be insufficient to detect small changes in river width [
30] and the roughness coefficient might be unavailable in some regions [
31,
32].
- (3)
The correlation between active remotely sensed WL (collected from satellite radar altimetry) and in-situ RWD data (collected from hydrological stations) via stage–discharge rating curves [
1,
33,
34,
35]. In this case, the altimetry radar signals are partially contaminated by land surfaces when crossing rivers with short widths: the accuracy of the observed altimetric WL is limited (e.g., [
36]).
- (4)
The calculation of the RWD near the estuary mouth (or of
R), based on the land water balance [
37] or on the combined land–atmosphere water balance equation [
38]. However, both observed and modeled data of various qualities have been utilized in previous studies based on this process (e.g., [
39,
40]).
The fourth method represents a purely hydrological modeling method that can be applied to the basin-wide estimation of
R and does not rely on in-situ ground observations. This method involves the combined use of remotely sensed precipitation (
P), terrestrial water storage (
S), modeled evapotranspiration (
ET) and/or modeled atmospheric moisture budget data (i.e., moisture flux divergence (
), and the changes in the total column water vapor (
)). Both in the case of land-based (i.e.,
) [
41,
42] and combined land–atmosphere water balance (i.e.,
) e.g., [
43] equations, the quality of the modeled
ET and of the atmospheric moisture budget results is limited by large uncertainties [
44,
45,
46]. This is due to the fact that different land cover types and physical assumptions are considered for their calculation. Moreover, before the launch of Gravity Recovery and Climate Experiment (GRACE), the
S values were unavailable and their change (
) was assumed to be zero [
41,
47]. This assumption caused further uncertainties in the estimation of the remotely sensed
R.
The land–atmosphere water balance equation used for
R estimation has been applied efficiently for the first time to the Amazon and Mississippi river basins [
38]. The authors of that study used GRACE
S and a modeled atmospheric moisture budget operational forecast analysis (provided by the Environmental Protection-National Center for Atmospheric Research (NCEP-NCAR) [
48] and by the European Centre for Medium-Range Forecasts (ECMWF)). The estimated
R for the Yangtze river basin (YRB) closely agrees with in-situ observed
R: the peak-to-peak Pearson correlation coefficient (PCC) was equal to 0.92 [
44]. Ferreira et al. (2013) [
37] estimated
R for the same river basin but based on the land-based water balance equation (using remotely sensed
P from the Tropical Rainfall Measuring Mission (TRMM) [
49],
from GRACE [
50], and modeled
ET data from the Global Land Data Assimilation System (GLDAS) [
45]). The result of Ferreira et al. (2013) [
37] indicated a good agreement between the estimated
R and the in-situ observed
R time series (i.e., PCC 0.74 and root-mean-square error (RMSE) 14.30 mm/month).
The
ET obtained from Moderate Resolution Imaging Spectroradiometer (MODIS) could conveniently replace the modeled
ET, since the Penman–Monteith equation provided a better representation of croplands and grasslands [
51]. These land cover types are densely situated within the YRB, particularly at the middle and lower reaches of the YRB. Previous accuracy assessment did not account for the potential existence of seasonal error characteristics and for deficiencies in the modeled data products (for the estimation of
R). This study aims to address such issues; we applied a purely remotely sensed data-driven method based on the land water balance equation and able to estimate
R in the YRB at a monthly temporal scale. This data-driven method is established using water balance equations in combination with TRMM
P, MODIS
ET, and GRACE
data.
In China, food security relies on the water resources of the YRB; hence, this river basin is one of the most significant study areas [
52]. A number of water management projects have been operated for adjustment of
R during severe drought and flood seasons [
53,
54]. Other human activities, such as damming, groundwater withdrawal, water consumption, and land use change, can have substantial impacts on
R of the river basin [
55,
56]. Land-use change accounts for <0.2% of the change in the runoff trend [
57], while damming, groundwater withdrawal, and water consumption account for <2%, <1%, and <10% of the changes in the annual runoff [
58], respectively. However, other research studies have indicated climate change as the main factor affecting
R (accounting for 90% of the change in
R in the YRB) [
57,
59,
60,
61]. Additional studies have indicated that human activities in the YRB might significantly influence changes in the sediments, but not in
R [
60,
62].
Notably, the
R values estimated from the water balance equation were calculated by subtraction among the remotely sensed
P,
ET, and ∆
S: the systematic effect caused by the YRB environment and by human activities should have been partly mitigated by this subtraction process. The aforementioned information partially supports our data-driven
R estimation for the YRB, based purely on the above remotely sensed hydrological variables on the application of the water balance equation. After calculating the resulting
R, we compared our results with those published in Syed et al. (2009) [
44] and Ferreira et al. (2013) [
37]; moreover, we performed a seasonal accuracy assessment of the observed
R time series, in order to evaluate the performance of our method during all seasons. Finally, we discussed the deficiencies of modeled data products and tried to explain the reasons for different seasonal discrepancies between the
R obtained through our method and the observed in-situ data.
This paper is organized as follows:
Section 2 presents the geographic environment of the YRB;
Section 3 presents the methodology, data used in this study, and their validation;
Section 4 demonstrates the validity of the estimated
R and compares our results with those of previously published studies; and
Section 5 summarizes our conclusions.
4. Discussion
The resulting
R time series based on our purely satellite data-driven method (i.e., MODIS
ET and several GRACE
S) were compared to remotely sensed and modeled data products (i.e., ITSG GRACE
S and several GLDAS
ET, or MODIS
ET and several GLDAS
S), which were based on the same TRMM
P data (
Table 6). Our analysis showed that the MODIS and ITSG, ITSG and CLM, and MODIS and Mosaic were the best data combinations to describe the remotely sensed
ET and
S changes (
Figure 7). The combination (i.e., ITSG GRACE
S and CLM
ET) yields a PCC of 0.73 and an RMSE of 15.69 mm/month, comparable to those reported by Ferreira et al. (2013) [
37] (i.e., 0.74 and 14.30 mm/month, respectively). Our metrics were slightly less accurate than those in [
37], probably due to the additional three-year time span (between 2010 and 2012) considered in the study: during this period, the occurrence of a moderately strong El Niño event might have created anomalous conditions. In the case of the MODIS
ET and GRACE
S data, our data-driven method resulted in a PCC of 0.88 and an RMSE of 11.69 mm/month, indicating a higher accuracy than that obtained in [
37]. According to the results, the MODIS16A2
ET data are more reliable than those of the GLDAS, at least for the YRB. Probably, this was due to the use of the Penman–Monteith equation during the production of the MODIS16A2
ET data: this equation favors the detection of cropland and grassland covers [
51], particularly in the YRB.
Our results were validated against those result of Syed et al. (2009) [
44] by considering the peak-to-peak correlation between the remotely sensed and the observed
R (
Table 7). We found that all the peak-to-peak correlations were larger than 0.92: our pure data-driven method resulted in a correlation of 0.96, while the methods used in [
44] and [
37] resulted in correlation coefficients of 0.92 and 0.93, respectively. Overall, the results we derived from TRMM
P, MODIS
ET and GRACE
S data were substantially more accurate than those presented in the previous publications.
The Taylor’s diagram provides a direct way to determine the degree of correspondence between observed and estimated data [
90]: it shows the correlation, the RMSE difference, and the standard deviation between two types of data.
Figure 8 displayed that four different remotely sensed
R derived through the purely data-driven method (triangles) overlapped each other: this indicates that the choice of different GRACE
S in combination with the MODIS
ET data has a negligible effect on the estimated
R. However, the results derived from the MODIS
ET and GLDAS
S, and GLDAS
ET and GRACE
S datasets, showed poor consistency, in particular the latter one. These results highlight that the
ET is likely the most significant error source during the estimation of the remotely sensed
R. In summary, the remotely sensed
R derived from the purely data-driven method is the closest to the observed
R.
The discrepancy between the observed and evaluated runoff was examined in detail by computing the mean monthly
R derived from the three dataset combinations. The results derived from the purely data-driven method best matched those of the observations: the estimated
R values for different seasons match those of the observations with various degrees of accuracies. Our results indicated a maximum and a minimum
R in July and January, respectively (
Figure 9); however, the estimated minimum
R was inconsistent with that observed
R, which occurred in February. Apparently, the estimated
R was underestimated in winter and overestimated in summer (
Figure 9a), likely due to the characteristics of the vegetation cover types, which can influence the calculated MODIS
ET in different ways during dry and humid seasons.
In case of the GLDAS
ET and ITSG GRACE
S combination (
Figure 9b), the estimated runoff was underestimated between January and August: the GLDAS
ET data were overestimated (
Figure 5), likely due to the effect of incoming radiations and temperature, as stated in [
51]. The
S data derived from the GLDAS land surface model were even less accurate than those described above (
Figure 9c): the resulting estimated
R displayed its maximum one month earlier than the observed time series, causing an underestimation of the
R between July and December.
In general, the correlation coefficients between the estimated and the observed
R during spring and autumn were higher than during summer and winter from the three combinations of remotely sensed data (
Table 8). These results further confirm our observations based on
Figure 9: overall, the rising and falling trends were well captured by our estimated
R. The higher correlation coefficients during the spring and autumn months probably derived from an overestimation of the modeled
ET and, possibly, by the low temporal resolution of the data at the monthly scale.
5. Conclusions
Previous studies have calculated the time series of monthly TBD (in terms of R) in the YRB by applying the water balance equation to a combination of RS and modeled data products. Here, we applied a data-driven method purely based on RS data. Before the investigation, the remotely sensed data were first validated against the in-situ gauge measurements or the inferred measurements at point locations. This validation process indicated that large uncertainties existed in the modeled data products, as verified when the modeled data products were compared with the observed hydrological data collected from the in situ stations or the inferred data, or when the estimated runoff were compared against the observed runoff.
Our best
R (obtained from purely remotely sensed data) and those of Ferreira et al. (2013) [
37] against the observed runoff reveal the PCC of 0.89 and 0.74, and the RMSE of 11.69 mm/month and 14.30 mm/month, respectively: our method showed statistically better results. The peak-to-peak correlation values were also calculated: in this sense, our method produced slightly better results than those of Syed et al. (2009) [
44] and Ferreira et al. (2013) [
37].
Seasonal error characterization was conducted to assess the performance of our method during specific seasons. We found that the remotely sensed TBD did not accurately capture the maximum and minimum runoff values in summer and winter, respectively. This poor performance could be attributed to an overestimation and underestimation of
ET, respectively, which depend on the input variables (e.g., vegetation cover types, solar radiation, and temperature) [
51]. The
ET values tend to be overestimated or underestimated when the input variables are instable, particularly in dry and humid seasons. Notably, the low temporal resolution of the data at a monthly scale could have also contributed to these effects. This finding has not been reported in previous studies.
Satellite data products with higher temporal resolution are gradually becoming available (e.g., daily TRMM precipitation [
91], eight-day MODIS evapotranspiration [
92], and daily GRACE terrestrial water storage data products [
93]). Future research might contemplate the application of our proposed method to these new data, while caution has to be taken with the data validation, data postprocessing steps, and the geographic region.