skip to main content
10.1145/2342356.2342435acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free access

LIFEGUARD: practical repair of persistent route failures

Published: 13 August 2012 Publication History

Abstract

The Internet was designed to always find a route if there is a policy-compliant path. However, in many cases, connectivity is disrupted despite the existence of an underlying valid path. The research community has focused on short-term outages that occur during route convergence. There has been less progress on addressing avoidable long-lasting outages. Our measurements show that long-lasting events contribute significantly to overall unavailability.
To address these problems, we develop LIFEGUARD, a system for automatic failure localization and remediation. LIFEGUARD uses active measurements and a historical path atlas to locate faults, even in the presence of asymmetric paths and failures. Given the ability to locate faults, we argue that the Internet protocols should allow edge ISPs to steer traffic to them around failures, without requiring the involvement of the network causing the failure. Although the Internet does not explicitly support this functionality today, we show how to approximate it using carefully crafted BGP messages. LIFEGUARD employs a set of techniques to reroute around failures with low impact on working routes. Deploying LIFEGUARD on the Internet, we find that it can effectively route traffic around an AS without causing widespread disruption.

Supplementary Material

JPG File (sigcomm-ix-02-lifeguardpracticalrepairofpersistentroutefailures.jpg)
MP4 File (sigcomm-ix-02-lifeguardpracticalrepairofpersistentroutefailures.mp4)

References

[1]
Abilene Internet2 network. http://www.internet2.edu/network/.
[2]
D. Andersen, H. Balakrishnan, F. Kaashoek, and R. Morris. Resilient overlay networks. In SOSP, 2001.
[3]
R. Austein, S. Bellovin, R. Bush, R. Housley, M. Lepinski, S. Kent, W. Kumari, D. Montgomery, K. Sriram, and S. Weiler. BGPSEC protocol. http://tools.ietf.org/html/draft-ietf-sidr-bgpsec-protocol.
[4]
The BGP Instability Report. http://bgpupdates.potaroo.net/instability/bgpupd.html.
[5]
BGPMux Transit Portal. http://tp.gtnoise.net/.
[6]
C. Bornstein, T. Canfield, and G. Miller. Akarouting: A better way to go. In MIT OpenCourseWare 18.996, 2002.
[7]
M. A. Brown, C. Hepner, and A. C. Popescu. Internet captivity and the de-peering menace. In NANOG, 2009.
[8]
R. Bush, O. Maennel, M. Roughan, and S. Uhlig. Internet optometry: assessing the broken glasses in Internet reachability. In IMC, 2009.
[9]
K. Chen, D. R. Choffnes, R. Potharaju, Y. Chen, F. E. Bustamante, D. Pei, and Y. Zhao. Where the sidewalk ends: Extending the Internet AS graph using traceroutes from P2P users. In CoNEXT, 2009.
[10]
L. Colitti. Internet Topology Discovery Using Active Probing. PhD thesis, University di "Roma Tre", 2006.
[11]
I. Cunha, R. Teixeira, and C. Diot. Predicting and tracking Internet path changes. In SIGCOMM, 2011.
[12]
B. Donnet, P. Raoult, T. Friedman, and M. Crovella. Efficient algorithms for large-scale topology discovery. In SIGMETRICS, 2005.
[13]
N. Feamster, D. G. Andersen, H. Balakrishnan, and M. F. Kaashoek. Measuring the effects of internet path faults on reactive routing. In SIGMETRICS, 2003.
[14]
A. Feldmann, O. Maennel, Z. M. Mao, A. Berger, and B. Maggs. Locating Internet routing instabilities. In SIGCOMM, 2004.
[15]
L. Gao. On inferring autonomous system relationships in the Internet. IEEE/ACM TON, 2001.
[16]
K. P. Gummadi, H. V. Madhyastha, S. D. Gribble, H. M. Levy, and D. Wetherall. Improving the reliability of Internet paths with one-hop source routing. In OSDI, 2004.
[17]
iPlane. http://iplane.cs.washington.edu.
[18]
J. P. John, E. Katz-Bassett, A. Krishnamurthy, T. Anderson, and A. Venkataramani. Consensus routing: The Internet as a distributed system. In NSDI, 2008.
[19]
E. Katz-Bassett, H. V. Madhyastha, V. K. Adhikari, C. Scott, J. Sherry, P. van Wesep, A. Krishnamurthy, and T. Anderson. Reverse traceroute. In NSDI, 2010.
[20]
E. Katz-Bassett, H. V. Madhyastha, J. P. John, A. Krishnamurthy, D. Wetherall, and T. Anderson. Studying black holes in the Internet with Hubble. In NSDI, 2008.
[21]
R. R. Kompella, J. Yates, A. Greenberg, and A. C. Snoeren. Detection and localization of network black holes. In INFOCOM, 2007.
[22]
N. Kushman, S. Kandula, and D. Katabi. R-BGP: Staying connected in a connected world. In NSDI, 2007.
[23]
C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian. Delayed Internet routing convergence. In SIGCOMM, 2000.
[24]
K. K. Lakshminarayanan, M. C. Caesar, M. Rangan, T. Anderson, S. Shenker, and I. Stoica. Achieving convergence-free routing using failure-carrying packets. In SIGCOMM, 2007.
[25]
H. Madhyastha, E. Katz-Bassett, T. Anderson, A. Krishnamurthy, and A. Venkataramani. iPlane Nano: Path Prediction for Peer-to-Peer Applications. In NSDI, 2009.
[26]
D. Meyer. RouteViews. http://www.routeviews.org.
[27]
P. Mohapatra, J. Scudder, D. Ward, R. Bush, and R. Austein. BGP prefix origin validation. http://tools.ietf.org/html/draft-ietf-sidr-pfx-validate.
[28]
Outages mailing list. http://isotf.org/mailman/listinfo/outages.
[29]
Packet clearing house. http://www.pch.net/home/index.php.
[30]
B. Quoitin and O. Bonaventure. A survey of the utilization of the BGP community attribute. Internet draft, draft-quoitin-bgp-comm-survey-00, 2002.
[31]
RIPE RIS. http://www.ripe.net/ris/.
[32]
C. Scott. LIFEGUARD: Locating Internet Failures Effectively and Generating Usable Alternate Routes Dynamically. Technical report, Univ. of Washington, 2012.
[33]
UCLA Internet topology. http://irl.cs.ucla.edu/topology/.
[34]
W. Xu and J. Rexford. MIRO: Multi-path Interdomain ROuting. In SIGCOMM, 2006.
[35]
J. Yates and Z. Ge. Network Management: Fault Management, Performance Management and Planned Maintenance. Technical report, AT&T Labs, 2009.
[36]
M. Zhang, C. Zhang, V. Pai, L. Peterson, and R.Wang. PlanetSeer: Internet path failure monitoring and characterization in wide-area services. In OSDI, 2004.
[37]
Y. Zhang, V. Paxson, and S. Shenker. The stationarity of Internet path properties: Routing, loss, and throughput. ACIRI Technical Report, 2000.
[38]
Z. Zhang, M. Zhang, A. Greenberg, Y. C. Hu, R. Mahajan, and B. Christian. Optimizing cost and performance in online service provider networks. In NSDI, 2010.
[39]
Z. Zhang, Y. Zhang, Y. C. Hu, Z. M. Mao, and R. Bush. iSpy: detecting IP prefix hijacking on my own. In SIGCOMM, 2008.

Cited By

View all

Index Terms

  1. LIFEGUARD: practical repair of persistent route failures

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGCOMM '12: Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
    August 2012
    474 pages
    ISBN:9781450314190
    DOI:10.1145/2342356
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 August 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. availability
    2. bgp
    3. internet
    4. measurement
    5. outages
    6. repair
    7. routing

    Qualifiers

    • Research-article

    Conference

    SIGCOMM '12
    Sponsor:
    SIGCOMM '12: ACM SIGCOMM 2012 Conference
    August 13 - 17, 2012
    Helsinki, Finland

    Acceptance Rates

    Overall Acceptance Rate 462 of 3,389 submissions, 14%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)99
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media