The Choice of an Appropriate Information Dissimilarity Measure for Hierarchical Clustering of River Streamflow Time Series, Based on Calculated Lyapunov Exponent and Kolmogorov Measures
Abstract
:1. Introduction
2. Data and Computations
2.1. Data and Gauging Locations
2.2. Basic Descriptive Statistics
3. Method
3.1. Choice of Measures for Characterization of Streamflow for Clustering
3.2. Normalized Compression Distance
- Select data compressor among available compressors (gzip, bzip2,).
- Set the number of time series compressed by the chosen compressor to N.
- Set all elements of clustering matrix to zero.
- Calculate Kolmogorov complexity by the length of the compressed time series obtained from some data compressors , .
- Calculate which is the size in bytes of the time series and concatenated.
- Find the lower value of .
- Find the higher value of .
- Calculate the normalized compressed distance (NCD) given by Equation (3).
- Set the calculated value into .
3.3. Permutation Distribution Dissimilarity Measure
- Set all elements of clustering matrix to zero.
- Use time series
- For given time series, the -dimensional embedding with time delay is
- Sort in the ascending order to get permutation for each
- Obtain the distribution of permutations
- Steps 2–5 for time series
- Calculate distance , where P and Q are discrete probability distributions.
- Set calculated value of into .
3.4. Kolmogorov Complexity Distance (KD)
- ,
- ,
- Set all elements of clustering matrix to zero.
- Calculate distances using Equation (5).
- Check for all pairs: If for a given pair of time series y it turns that then the distance is set to
- The true distance is computed by iterating this procedure until for all and the triangle inequality is satisfied .
- Set the calculated value of into .
3.5. Calculation of Largest Lyapunov Exponent and Kolmogorov Measures
4. Results and Discussion
4.1. Selection of Information Measures for K-Means Clustering of Daily Streamflow
4.1.1. General Features
4.1.2. Largest Lyapunov Exponent (LLE)
4.1.3. Kolmogorov Complexity (KC) and the Highest Value of Kolmogorov Complexity Spectrum (KCM)
4.2. Hierarchical Clustering of Daily Streamflow
4.3. K-Means Clustering of Daily Streamflow
5. Conclusions
- We considered the way of selecting suitable information measures for K-means clustering. Accordingly, we selected three measures (i.e., the LLE, KC, and KCM).
- This choice was made for the following reasons. There are many factors, both natural and human-induced, that cause continuous changes in streamflow time series and; therefore, in its nonlinearity and complexity, of the Brazos River, and its drainage basin. Additionally, because streamflow processes are unavoidably influenced by measurement at gauging stations (including uncertainties in the single determination of river discharge) and dynamical noise that increases LLE under the influence of noise.
- Using a dissimilarity matrix based on NCD, PDDM, and KD for daily streamflow discharge data from twelve gauging stations, the agglomerative average-linkage hierarchical algorithm was applied. We selected the KD clustering algorithm as the most suitable among others.
- The dendrogram gave the indication that the gauging stations may be grouped either in three or four clusters. For statistical analysis (3D scatter plot specified by the vectors KC, KCM, and LLE, and calculating the centroids (means) of the clusters), we chose four clusters.
- On the basis of analysis of variance (ANOVA) results, it could be concluded that there was highly significant differences between mean values of four clusters, which confirmed that the choice of the number of clusters was correctly done.
- The predictability of standardized daily discharge data of the Brazos River given by the Lyapunov time (LT), corrected for randomness (in days), increased in the following way: (i) three to four days for Cluster 1 (1_08082500, 2_08088000, 5_08090800, 6_08091000, 8_08096500, and 9_08098290 stations); (ii) up to four days for Cluster 2 (3_08088610 station); (iii) approximately three days for Cluster 3 (4_08089000 and 7_08093100 stations); and approximately five days for Cluster 4 (10_08111500, 11_08114000, and 12_08116650 stations).
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Everitt, B.; Landau, S.; Leese, M.; Stahl, D. Cluster Analysis, 5th ed.; Wiley: Hoboken, NJ, USA, 2011; p. 346. [Google Scholar]
- Rani, S.; Sikka, G. Recent techniques of clustering of time series data: A survey. Int. J. Comput. Appl. 2012, 52, 1–9. [Google Scholar] [CrossRef]
- Aghabozorgi, S.; Shirkhorshidi, A.S.; Wah, T.Y. Time-series clustering—A decade review. Inform. Syst. 2015, 53, 6–38. [Google Scholar] [CrossRef]
- Timm, N.H. Applied Multivariate Analysis; Springer Inc.: New York, NY, USA, 2002; p. 624. [Google Scholar]
- Demirel, M.C.; Kahya, E. Hydrological determination of hierarchical clustering scheme by using small experimental matrix. In Proceedings of the 27th AGU Hydrology Day, Fort Collins, CO, USA, 19–21 March 2007; pp. 161–168. [Google Scholar]
- Hong-fa, W. Clustering of hydrological time series based on discrete wavelet transform. Phys. Proc. 2012, 25, 1966–1972. [Google Scholar] [CrossRef]
- Benavides-Bravo, F.G.; Almaguer, F.-J.; Soto-Villalobos, R.; Tercero-Gómez, V.; Morales-Castillo, J. Clustering of Rainfall Stations in RH-24 Mexico Region Using the Hurst Exponent in Semivariograms. Math. Prob. Eng. 2015, 2015. [Google Scholar] [CrossRef]
- Haggarty, R.A.; Miller, C.A.; Scott, E.M. Spatially weighted functional clustering of river network data. Appl. Stat. 2015, 64, 491–506. [Google Scholar] [CrossRef] [PubMed]
- Dogulu, N.; Kentel, E. Clustering of hydrological data: A review of methods for runoff predictions in ungauged basins. In Proceedings of the European Geosciences Union General Assembly, Vienna, Austria, 23–28 April 2017. [Google Scholar]
- Śmieja, M.; Warszycki, D.; Tabor, J.; Bojarski, A.J. Asymmetric clustering index in a case study of 5-HT1A receptor ligands. PloS ONE 2014, 9, e102069. [Google Scholar] [CrossRef] [PubMed]
- Sarstedt, M.; Mooi, E. A Cluster Analysis. In Concise Guide to Market Research: The Process, Data, and Methods Using IBM SPSS Statistics, 2nd ed.; Sarstedt, M., Mooi, E.A., Eds.; Springer: Berlin, Germany, 2014; pp. 273–324. [Google Scholar]
- Milligan, G.W. An examination of the effect of six types of error perturbation of fifteen clustering algorithms. Psychometrika 1980, 45, 325–342. [Google Scholar] [CrossRef]
- Stosic, T.; Stosic, B.; Singh, V.P. Optimizing streamflow monitoring networks using joint permutation entropy. J. Hydrol. 2017, 552, 306–312. [Google Scholar] [CrossRef]
- Chiang, S.M. Hydrologic Regionalization for the Estimation of Streamflow at Ungauged Sites Based on Time Series Analysis and Multivariate Statistical Analysis; Syracuse University: New York, NY, USA, 1996. [Google Scholar]
- Gong, X.; Richman, M.B. On the application of cluster analysis to growing season precipitation data in north America east of the rockies. J. Climate 1995, 8, 897–931. [Google Scholar] [CrossRef]
- Kahya, E.; Mc, D.; Bég, A.O. Hydrologic homogeneous Regions using monthly streamflows in Turkey. Earth Sci. Res. J. 2008, 12, 181–193. [Google Scholar]
- Ouyang, R.; Ren, L.; Cheng, W.; Cheng, W.; Zhou, C. Similarity search and pattern discovery in hydrological time series data mining. Hydrol. Proc. 2010, 24, 1198–1210. [Google Scholar] [CrossRef]
- Mishra, S.; Saravanan, C.; Dwivedi, V.K.; Pathak, K.K. Discovering flood rising pattern in hydrological time series data mining during the pre monsoon period. Indian J. Geo-Mar. Sci. 2015, 44, 303–317. [Google Scholar]
- Vilar, J.A.; Alonso, A.M.; Vilar, J.M. Non-linear time series clustering based on non-parametric forecast densities. Comput. Stat. Data Anal. 2010, 54, 2850–2865. [Google Scholar] [CrossRef]
- Coltuc, D.; Datcu, M.; Coltuc, D. On the use of normalized compression distances for image similarity detection. Entropy 2018, 20, 99. [Google Scholar] [CrossRef]
- Brandmaier, A.M. pdc: An R package for complexity based clustering of time series. J. Stat, Softw. 2015, 67, 1–23. [Google Scholar] [CrossRef]
- Samaniego, L.; Kumar, R.; Attinger, S. Multiscale parameter regionalization of a grid-based hydrologic model at the mesoscale. Water Resour. Res. 2010, 46, W05523. [Google Scholar] [CrossRef]
- Corduas, M. Clustering streamflow time series for regional classification. J. Hydrol. 2011, 407, 73–80. [Google Scholar] [CrossRef]
- Brown, S.C.; Lester, R.E.; Versace, V.L.; Fawcett, J.; Laurenson, L. Hydrologic landscape regionalisation using deductive classification and random forests. PLoS ONE 2014, 9, e112856. [Google Scholar] [CrossRef] [PubMed]
- Hu, B.; Bi, L.; Dai, S. Information distances versus entropy metric. Entropy 2017, 19, 260. [Google Scholar] [CrossRef]
- Dehotin, J.; Braud, I. Which spatial discretization for distributed hydrological models? Proposition of a methodology and illustration for medium to large-scale catchments. Hydrol. Earth Syst. Sci. 2008, 12, 769–796. [Google Scholar] [CrossRef]
- Montero, P.; Vilar, J.A. Tclust: An R package for time series clustering. J. Stat. Softw. 2014, 62, 1–43. [Google Scholar] [CrossRef]
- Anderson, M.J. Permutational Multivariate Analysis of Variance (PERMANOVA). In Statistics Reference Online; Balakrishnan, N., Colton, T., Everitt, B., Piegorsch, W., Ruggeri, F., Teugels, J.L., Eds.; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2017. [Google Scholar]
- Rosenstein, M.T.; Collins, J.J.; De Luca, C.J. A practical method for calculating largest Lyapunov exponents from small data sets. Phys. D 1993, 65, 117–134. [Google Scholar] [CrossRef]
- Shapour, M. LYAPROSEN: MATLAB Function to Calculate Lyapunov Exponent; University of Tehran: Tehran, Iran, 2009. [Google Scholar]
- Rhodes, C.; Morari, M. The false nearest neighbors algorithm: An overview. Comp. Chem. Eng. 1997, 21, S1149–S1154. [Google Scholar] [CrossRef]
- Lei, M.; Wang, Z.; Feng, Z. A method of embedding dimension estimation based on symplectic geometry. Phys. Lett. A 2002, 303, 179–189. [Google Scholar] [CrossRef]
- Mihailović, D.; Mimić, G.; Gualtieri, P.; Arsenić, I.; Gualtieri, C. Randomness representation of turbulence in canopy flows using Kolmogorov complexity measures. Entropy 2017, 19, 519. [Google Scholar] [CrossRef]
- Jian, X.; Wolock, D.M.; Brady, S.J.; Lins, H.F. Streamflow—Water year 2017: U.S. Geological Survey Fact Sheet 2018, 3056, 6 p. Available online: https://doi.org/10.3133/fs20183056 (accessed on 20 January 2019).
- Pelletier, P.M. Uncertainties in the single determination of river discharge: A literature review. Can. J. Civ. Engr. 1988, 15, 834–850. [Google Scholar] [CrossRef]
- Sauer, T.D.; Tempkin, J.A.; Yorke, J.A. Spurious Lyapunov exponents in attractor reconstruction. Phys. Rev. Lett. 1998, 81, 4341–4344. [Google Scholar] [CrossRef]
- Dingwell, J.B. Lyapunov exponents. In Wiley Encyclopedia of Biomedical Engineering 1-12; Akay, M., Ed.; Wiley-Interscience: Hoboken, NJ, USA, 2006; Volume 2, p. 4037. [Google Scholar]
- Ghilardi, P.; Rosso, R. Comment on Chaos in rainfall by Rodriguez-Iturbe, I. et al. Water Resour. Res. 1990, 26, 1837–1839. [Google Scholar] [CrossRef]
- Sivakumar, B. Rainfall dynamics at different temporal scales: Achaotic perspective. Hydrol. Earth Sys. Sci. 2001, 5, 645–651. [Google Scholar] [CrossRef]
- Salas, J.D.; Kim, H.S.; Eykholt, R.; Burlando, P.; Green, T.R. Aggregation and sampling in deterministic chaos: Implications for chaos identification in hydrological processes. Nonlinear Proc. Geoph. 2005, 12, 557–567. [Google Scholar] [CrossRef]
- Zaslavsky, G.M. Chaos in Dynamic Systems; Harwood Academic: New York, NY, USA, 1985. [Google Scholar]
- Sagdeev, R.Z.; Usikov, D.A.; Zazlavsky, G.M. Nonlinear Physics: Fromthe Pendulum to Turbulence and Chaos; Harwood Academic: New York, NY, USA, 1988. [Google Scholar]
- Heinämäki, P.; Lehto, H.; Chernin, A.; Valtonen, M. Three-body dynamics: Intermittent chaos with strange attractor. Mon. Not. R. Astron. Soc. 1998, 298, 790–796. [Google Scholar] [CrossRef]
- Wu, Y.; Su, J.; Tang, H.; Tianfield, H. Analysis of the Emergence in Swarm Model Based on Largest Lyapunov Exponent. Math. Probl. Eng. 2011, 21. [Google Scholar] [CrossRef]
- Ghorbani, M.A.; Kisi, O.; Aalinezhad, O.M. A probe into the chaotic nature of daily streamflow time series by correlation dimension and largest Lyapunov methods. Appl. Math. Model. 2010, 34, 4050–4057. [Google Scholar] [CrossRef]
- Wang, X.; Lei, T. Hydrologic system behavior characteristic analysis and long-term prediction based on chaos radial basis function Networks. Boletín Técnico 2017, 55, 536–546. [Google Scholar]
- Herschy, R.W. 2002: The uncertainty in a current meter measurement. Flow Msmt. Instrum. 2002, 13, 281–284. [Google Scholar] [CrossRef]
- Ward, G.H. Hydrological Indices and Triggers, and Their Application to Hydrometeorological Monitoring and Water Management in Texas; Final report, TWDB-UTA Interagency Contact No. 0904830964; Center for Research in Water Resources: Austin, TX, USA, 2013; p. 225. [Google Scholar]
- Black, L.L. Quantifying Instream Sediment Transport in Several Reaches of the Upper Brazos River Basin. Master’s Thesis, Texas Christian University, Fort Worth, TX, USA, 2008. [Google Scholar]
- Tahmoures, M.; Moghadamnia, A.R.; Naghiloo, M. Modeling of streamflow–suspended sediment load relationship by adaptive neuro-fuzzy and artificial neural network approaches (Case Study: Dalaki River, Iran). Desert 2005, 2, 177–195. [Google Scholar]
- Mihailović, D.; Nikolić-Đorić, E.M.; Arsenić, I.; Malinović-Milićević, S.; Singh, V.P.; Stošić, T.; Stošić, B. Analysis of Daily Streamflow Complexity by Kolmogorov Measures and Lyapunov Exponent. arXiv, 2018; arXiv:1809.08633. [Google Scholar]
- Mihailović, D.T.; Nikolić-Ðorić, E.; Drešković, N.; Mimić, G. Complexity analysis of the turbulent environmental fluid flow time series. Physica A 2014, 395, 96–104. [Google Scholar] [CrossRef]
- Vilela Mendes, R.; Araújo, T.; Louçã, F. Reconstructing an Economic Space from a Market Metric. Physica A 2003, 323, 635–650. [Google Scholar] [CrossRef]
- Koppel, M. Complexity, depth and sophistication. Complex Syst. 1987, 1, 1087–1091. [Google Scholar]
- Aghabozorgi, S.; Teh, Y. Clustering of large time-series datasets. Intell. Data Anal. 2014, 18, 793–817. [Google Scholar] [CrossRef]
- Frison, T.W.; Abarbanel, H.D.I. Ocean gravity waves: A nonlinear analysis of observations. J. Geophys. Res. 1997, 102, 1051–1059. [Google Scholar] [CrossRef]
USGS Code | Station | Mean | Median | Min | Max | IQR | SDi |
---|---|---|---|---|---|---|---|
1_08082500 | Seymour | 223.5 | 51.0 | 0.0 | 30,700.0 | 130.0 | 907.9 |
2_08088000 | South Bend | 613.0 | 110.0 | 0.0 | 43,800.0 | 320.0 | 2209.8 |
3_08088610 | Graford | 623.5 | 109.0 | 4.1 | 43,800.0 | 300.0 | 2306.9 |
4_08089000 | Palo Pinto | 723.7 | 133.0 | 8.5 | 39,700.0 | 361.0 | 2557.9 |
5_08090800 | Dennis | 974.4 | 195.0 | 0.0 | 79,500.0 | 418.0 | 3600.3 |
6_08091000 | Glen Rose | 1078.8 | 86.0 | 1.5 | 82,100.0 | 530.0 | 4093.9 |
7_08093100 | Aquilla | 1561.2 | 445.0 | 1.2 | 27,100.0 | 1118.0 | 3687.3 |
8_08096500 | Waco | 2456.1 | 695.0 | 0.5 | 44,000.0 | 1775.0 | 5237.7 |
9_08098290 | Highbank | 3103.7 | 873.5 | 30.0 | 70,300.0 | 2240.0 | 6148.1 |
10_08111500 | Hempstead | 8014.3 | 2520.0 | 58.0 | 137,000.0 | 7650.0 | 12,821.1 |
11_08114000 | Richmond | 8523.8 | 2855.0 | 182.0 | 102,000.0 | 8660.0 | 13,232.0 |
12_08116650 | Rosharon | 8851.4 | 3060.0 | 27.0 | 109,000.0 | 9080.0 | 13,638.0 |
USGS Code | Station | LLE | KC | KCM |
---|---|---|---|---|
1_08082500 | Seymour | 0.158 | 0.266 | 0.489 |
2_08088000 | South Bend | 0.038 | 0.242 | 0.446 |
3_08088610 | Graford | 0.394 | 0.474 | 0.682 |
4_08089000 | Palo Pinto | 0.032 | 0.371 | 0.658 |
5_08090800 | Dennis | 0.042 | 0.311 | 0.510 |
6_08091000 | Glen Rose | 0.051 | 0.301 | 0.508 |
7_08093100 | Aquilla | 0.055 | 0.352 | 0.581 |
8_08096500 | Waco | 0.061 | 0.298 | 0.526 |
9_08098290 | Highbank | 0.061 | 0.316 | 0.422 |
10_08111500 | Hempstead | 0.027 | 0.218 | 0.285 |
11_08114000 | Richmond | 0.014 | 0.201 | 0.260 |
12_08116650 | Rosharon | 0.018 | 0.200 | 0.252 |
Information Measure | Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 |
---|---|---|---|---|
KC | 0.289 | 0.474 | 0.362 | 0.206 |
KCM | 0.484 | 0.682 | 0.620 | 0.266 |
LLE | 0.069 | 0.394 | 0.044 | 0.020 |
Variable | Between SS | df | Within SS | df | F | P |
---|---|---|---|---|---|---|
KC | 0.064679 | 3 | 0.004561 | 8 | 37.81 | 0.00005 |
KCM | 0.215958 | 3 | 0.011885 | 8 | 48.46 | 0.00002 |
LLE | 0.112645 | 3 | 0.010489 | 8 | 28.64 | 0.00013 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mihailović, D.T.; Nikolić-Đorić, E.; Malinović-Milićević, S.; Singh, V.P.; Mihailović, A.; Stošić, T.; Stošić, B.; Drešković, N. The Choice of an Appropriate Information Dissimilarity Measure for Hierarchical Clustering of River Streamflow Time Series, Based on Calculated Lyapunov Exponent and Kolmogorov Measures. Entropy 2019, 21, 215. https://doi.org/10.3390/e21020215
Mihailović DT, Nikolić-Đorić E, Malinović-Milićević S, Singh VP, Mihailović A, Stošić T, Stošić B, Drešković N. The Choice of an Appropriate Information Dissimilarity Measure for Hierarchical Clustering of River Streamflow Time Series, Based on Calculated Lyapunov Exponent and Kolmogorov Measures. Entropy. 2019; 21(2):215. https://doi.org/10.3390/e21020215
Chicago/Turabian StyleMihailović, Dragutin T., Emilija Nikolić-Đorić, Slavica Malinović-Milićević, Vijay P. Singh, Anja Mihailović, Tatijana Stošić, Borko Stošić, and Nusret Drešković. 2019. "The Choice of an Appropriate Information Dissimilarity Measure for Hierarchical Clustering of River Streamflow Time Series, Based on Calculated Lyapunov Exponent and Kolmogorov Measures" Entropy 21, no. 2: 215. https://doi.org/10.3390/e21020215
APA StyleMihailović, D. T., Nikolić-Đorić, E., Malinović-Milićević, S., Singh, V. P., Mihailović, A., Stošić, T., Stošić, B., & Drešković, N. (2019). The Choice of an Appropriate Information Dissimilarity Measure for Hierarchical Clustering of River Streamflow Time Series, Based on Calculated Lyapunov Exponent and Kolmogorov Measures. Entropy, 21(2), 215. https://doi.org/10.3390/e21020215