skip to main content
10.1145/2487575.2488220acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Towards long-lead forecasting of extreme flood events: a data mining framework for precipitation cluster precursors identification

Published: 11 August 2013 Publication History

Abstract

The development of disastrous flood forecasting techniques able to provide warnings at a long lead-time (5-15 days) is of great importance to society. Extreme Flood is usually a consequence of a sequence of precipitation events occurring over from several days to several weeks. Though precise short-term forecasting the magnitude and extent of individual precipitation event is still beyond our reach, long-term forecasting of precipitation clusters can be attempted by identifying persistent atmospheric regimes that are conducive for the precipitation clusters. However, such forecasting will suffer from overwhelming number of relevant features and high imbalance of sample sets. In this paper, we propose an integrated data mining framework for identifying the precursors to precipitation event clusters and use this information to predict extended periods of extreme precipitation and subsequent floods. We synthesize a representative feature set that describes the atmosphere motion, and apply a streaming feature selection algorithm to online identify the precipitation precursors from the enormous feature space. A hierarchical re-sampling approach is embedded in the framework to deal with the imbalance problem. An extensive empirical study is conducted on historical precipitation and associated flood data collected in the State of Iowa. Utilizing our framework a few physically meaningful precipitation cluster precursor sets are identified from millions of features. More than 90% of extreme precipitation events are captured by the proposed prediction model using precipitation cluster precursors with a lead time of more than 5 days.

References

[1]
J. P. Charba and F. G. Samplatsky. High-resolution gfs-based mos quantitative precipitation forecasts on a 4-km grid. Monthly Weather Review, 139(1):39--68, 2011.
[2]
H. Cloke and F. Pappenberger. Ensemble flood forecasting: a review. Journal of Hydrology, 375(3):613--626, 2009.
[3]
A. P. de Roo, B. Gouweleeuw, J. Thielen, J. Bartholmes, P. Bongioannini-Cerlini, E. Todini, P. D. Bates, M. Horritt, N. Hunter, K. Beven, et al. Development of a european flood forecasting system. International Journal of River Basin Management, 1(1):49--59, 2003.
[4]
S. Dravitzki and J. McGregor. Predictability of heavy precipitation in the waikato river basin of new zealand. Monthly Weather Review, 139(7):2184--2197, 2011.
[5]
X. Guo, Y. Yin, C. Dong, G. Yang, and G. Zhou. On the class imbalance problem. In Natural Computation, 2008. ICNC'08. Fourth International Conference on, volume 4, pages 192--201. IEEE, 2008.
[6]
I. Guyon and A. Elisseeff. An introduction to variable and feature selection. The Journal of Machine Learning Research, 3:1157--1182, 2003.
[7]
I. M. Held, R. T. Pierrehumbert, S. T. Garner, and K. L. Swanson. Surface quasi-geostrophic dynamics. Journal of Fluid Mechanics, 282:1--20, 1995.
[8]
T. M. Hopson and P. J. Webster. A 1--10-day ensemble forecasting scheme for the major river basins of bangladesh: Forecasting severe floods of 2003-07*. Journal of Hydrometeorology, 11(3):618--641, 2010.
[9]
E. Kalnay, M. Kanamitsu, R. Kistler, W. Collins, D. Deaven, L. Gandin, M. Iredell, S. Saha, G. White, J. Woollen, et al. The ncep/ncar 40-year reanalysis project. Bulletin of the American meteorological Society, 77(3):437--471, 1996.
[10]
R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial intelligence, 97(1):273--324, 1997.
[11]
S. Kotsiantis, D. Kanellopoulos, and P. Pintelas. Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30(1):25--36, 2006.
[12]
E. N. Lorenz. Deterministic nonperiodic flow. Journal of the atmospheric sciences, 20(2):130--141, 1963.
[13]
M. C. Morgan, D. D. Houghton, and L. M. Keller. The future of medium extended-range weather prediction: Challenges and a vision. Bulletin of the American Meteorological Society, 88:631, 2007.
[14]
F. Pappenberger, K. J. Beven, N. Hunter, P. Bates, B. Gouweleeuw, J. Thielen, A. De Roo, et al. Cascading model uncertainty from medium range weather forecasts (10 days) through a rainfall-runoff model to flood inundation predictions within the european flood forecasting system (effs). Hydrology and Earth System Sciences Discussions, 9(4):381--393, 2005.
[15]
F. Pappenberger and R. Buizza. The skill of ecmwf precipitation and temperature predictions in the danube basin as forcings of hydrological models. Weather and Forecasting, 24(3):749--766, 2009.
[16]
J. L. Pelly and B. J. Hoskins. A new perspective on blocking. Journal of the atmospheric sciences, 60(5):743--755, 2003.
[17]
S. Perkins and J. Theiler. Online feature selection using grafting. In International Conference on Machine Learning. Citeseer, 2003.
[18]
C. Schwierz, M. Croci-Maspoli, and H. Davies. Perspicacious indicators of atmospheric blocking. Geophysical research letters, 31(6):L06125, 2004.
[19]
A. Singhal, M. Mitra, and C. Buckley. Learning routing queries in a query zone. In ACM SIGIR Forum, volume 31, pages 25--32. ACM, 1997.
[20]
S. Visa and A. Ralescu. Issues in mining imbalanced data sets-a review paper. In Proceedings of the sixteen midwest artificial intelligence and cognitive science conference, pages 67--73. sn, 2005.
[21]
X. Wu, K. Yu, W. Ding, H. Wang, and X. Zhu. Online feature selection with streaming features. IEEE transactions on pattern analysis and machine intelligence, 2012.
[22]
L. Yu and H. Liu. Efficient feature selection via analysis of relevance and redundancy. The Journal of Machine Learning Research, 5:1205--1224, 2004.
[23]
Z. Zheng, X. Wu, and R. Srihari. Feature selection for text categorization on imbalanced data. ACM SIGKDD Explorations Newsletter, 6(1):80--89, 2004.
[24]
J. Zhou, D. P. Foster, R. A. Stine, and L. H. Ungar. Streamwise feature selection. Departmental Papers (CIS), page 335, 2006.

Cited By

View all

Index Terms

  1. Towards long-lead forecasting of extreme flood events: a data mining framework for precipitation cluster precursors identification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2013
    1534 pages
    ISBN:9781450321747
    DOI:10.1145/2487575
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 August 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. flood forecasting
    2. online streaming feature selection
    3. spatial-temporal data mining

    Qualifiers

    • Research-article

    Conference

    KDD' 13
    Sponsor:

    Acceptance Rates

    KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 27 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media