skip to main content
10.1145/1557019.1557113acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Learning, indexing, and diagnosing network faults

Published: 28 June 2009 Publication History

Abstract

Modern communication networks generate massive volume of operational event data, e.g., alarm, alert, and metrics, which can be used by a network management system (NMS) to diagnose potential faults. In this work, we introduce a new class of indexable fault signatures that encode temporal evolution of events generated by a network fault as well as topological relationships among the nodes where these events occur. We present an efficient learning algorithm to extract such fault signatures from noisy historical event data, and with the help of novel space-time indexing structures, we show how to perform efficient, online signature matching. We provide results from extensive experimental studies to explore the efficacy of our approach and point out potential applications of such signatures for many different types of networks including social and information networks.

Supplementary Material

JPG File (p857-srivatsa.jpg)
MP4 File (p857-srivatsa.mp4)

References

[1]
R. Agrawal, A. Borgida, and H. Jagadish. Efficient management of transitive relationships in large data and knowledge bases. SIGMOD Rec., 18(2), 1989.
[2]
Y. Ahmad and S. Nath. COLR-tree: communication-efficient spatio-temporal indexing for a sensor data web portal. In ICDE, 2008.
[3]
H. Akaike. A new look at the statistical model identification. IEEE Trans. Auto. Cont., 19(6), 1974.
[4]
A.-L. Barabási. Linked: The New Science of Networks. Perseus Publishing, 2002.
[5]
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SIAM J. Comput., 32(5), 2003.
[6]
I. Cohen, S. Zhang, M. Goldszmidt, J. Symons, T. Kelly, and A. Fox. Capturing, indexing, clustering, and retrieving system history. In SOSP, 2005.
[7]
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. J. Royal Stat. Soci. B, 39(1), 1977.
[8]
L. Fan, P. Cao, J. Almeida, and A. Broder. Summary cache: A scalable wide-area web cache sharing protocol. In IEEE/ACM Trans. Netw., 1998.
[9]
A. Feldmann, O. Maennel, Z. Mao, A. Berger, and B. Maggs. Locating internet routing instabilities. SIGCOMM Comput. Commun. Rev., 34(4), 2004.
[10]
I. E. T. Force. OSPF version 2. http://www.ietf.org/rfc.
[11]
A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, 1984.
[12]
Y. Huang, N. Feamster, A. Lakhina, and J. Xu. Diagnosing network disruptions with network-wide analysis. SIGMETRICS Perform. Eval. Rev., 35(1), 2007.
[13]
I. Katzela and M. Schwartz. Schemes for fault identification in communication networks. IEEE/ACM Trans. Netw., 3(6), 1995.
[14]
A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using traffic feature distributions. SIGCOMM Comput. Commun. Rev., 35(4), 2005.
[15]
L. Lewis. A case-based reasoning approach to the resolution of faults in communication networks. In IM, 1993.
[16]
X. Meng, G. Jiang, H. Zhang, H. Chen, and K. Yoshihira. Automatic profiling of network event sequences: algorithm and application. In IEEE INFOCOM, 2008.
[17]
Y. Nygate. Event correlation using rule and object based techniques. In IM, 1995.
[18]
J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., 1988.
[19]
F. Salfner. Event-based failure prediction: an extended hidden markov model approach. Department of Computer Science, Humboldt-Universitat zu Berlin, Germany, 2008.
[20]
M. Steinder and A. Sethi. A survey of fault localization techniques in computer networks. Sci. Comput. Prog., 53, 2004.
[21]
P. Wu, R. Bhatnagar, L. Epshtein, M. Bhandaru, and S. Zhongwen. Alarm correlation engine. In NOMS, 1998.
[22]
S. Yemini, S. Kliger, E. Mozes, Y. Yemini, and D. Ohsie. High speed and robust event correlation. Communications Magazine, IEEE, 34(5), 1996.
[23]
C. Yuan, N. Lao, J.-R. Wen, J. Li, Z. Zhang, Y.-M. Wang, and W.-Y. Ma. Automated known problem diagnosis with event traces. In EuroSys, 2006.
[24]
J. Zhang, J. Rexford, and J. Feigenbaum. Learning-based anomaly detection in BGP updates. In MineNet, 2005.
[25]
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. In SIGMOD, 1996.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
June 2009
1426 pages
ISBN:9781605584959
DOI:10.1145/1557019
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fault signature
  2. network topology
  3. online diagnosis

Qualifiers

  • Research-article

Conference

KDD09

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media