ACM Home Page
Please provide us with feedback. Feedback
Characterization of failures in an operational IP backbone network
Full text PdfPdf (1.47 MB)
Source IEEE/ACM Transactions on Networking (TON) archive
Volume 16 ,  Issue 4  (August 2008) table of contents
Pages 749-762  
Year of Publication: 2008
ISSN:1063-6692
Authors
Athina Markopoulou  Department of Electrical Engineering and Computer Science, University of California at Irvine, Irvine, CA
Gianluca Iannaccone  Intel Research, Berkeley, CA
Supratik Bhattacharyya  Snaptell Inc., Mountain View, CA
Chen-Nee Chuah  Department of Electrical and Computer Engineering, University of California at Davis, Davis, CA
Yashar Ganjali  Department of Computer Science, University of Toronto, Bahen Center for Information Technology, Toronto, ON, Canada
Christophe Diot  Thomson R&D, Paris, France
Publisher
IEEE Press  Piscataway, NJ, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 270,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.1109/TNET.2007.902727

ABSTRACT

As the Internet evolves into a ubiquitous communication infrastructure and supports increasingly important services, its dependability in the presence of various failures becomes critical. In this paper, we analyze IS-IS routing updates fromthe Sprint IP backbone network to characterize failures that affect IP connectivity. Failures are first classified based on patterns observed at the IP-layer; in some cases, it is possible to further infer their probable causes, such as maintenance activities, router-related and optical layer problems. Key temporal and spatial characteristics of each class are analyzed and, when appropriate, parameterized using well-known distributions. Our results indicate that 20% of all failures happen during a period of scheduled maintenance activities. Of the unplanned failures, almost 30% are shared by multiple links and are most likely due to router-related and optical equipment-related problems, respectively, while 70% affect a single link at a time. Our classification of failures reveals the nature and extent of failures in the Sprint IP backbone. Furthermore, our characterization of the different classes provides a probabilistic failure model, which can be used to generate realistic failure scenarios, as input to various network design and traffic engineering problems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
C. Fraleigh, S. Moon, B. Lyles, C. Cotton, M. Khan, R. Rockell, D. Moll, T. Seely, and C. Diot, "Packet-level traffic measurements from the Sprint IP backbone," IEEE Network Mag., vol. 17, no. 6, pp. 6-16, Nov.-Dec. 2003.
 
2
K. Papagiannaki, S. Moon, C. Fraleigh, P. Thiran, and C. Diot, "Measurement and analysis of single-hop delay on an IP backbone network," IEEE J. Sel. Areas Commun., vol. 21, no. 6, pp. 908-921, Aug. 2003.
3
 
4
G. Iannaccone, C.-N. Chuah, S. Bhattacharyya, and C. Diot, "Feasibility of IP restoration in a tier-1 backbone," IEEE Netw., vol. 18, no. 2, pp. 13-19, Mar. 2004.
 
5
 
6
S. Iyer, S. Bhattacharyya, N. Taft, and C. Diot, "An approach to alleviate link overload as observed on an IP backbone," in Proc. IEEE INFOCOM, San Francisco, CA, Mar. 2003, vol. 1, pp. 406-416.
 
7
A. Markopoulou, G. Iannaccone, S. Bhatacharyya, C.-N. Chuah, and C. Diot, "Characterization of failures in an IP backbone," in Proc. IEEE INFOCOM, Hong Kong, Mar. 2004, vol. 4, pp. 2307-2317.
 
8
C. Fraleigh, F. Tobagi, and C. Diot, "Provisioning IP backbone networks to support latency sensitive traffic," in Proc. IEEE INFOCOM, San Francisco, CA, Mar.-Apr. 2003, vol. 1, pp. 375-385.
9
 
10
A. Fumagalli and L. Valcarenghi, "IP restoration versus WDM protection: Is there an optimal choice?," IEEE Network Magazine, vol. 14, no. 6, pp. 34-41, Nov. 2000.
 
11
L. Sahasrabuddhe, S. Ramamurthy, and B. Mukherjee, "Fault management in IP-over-WDM networks: WDM protection versus IP restoration," IEEE J. Sel. Areas Commun., vol. 20, no. 1, pp. 21-33, Jan. 2002.
 
12
A. Alaettinoglou and S. Casner, "Detailed analysis of ISIS Routing Protocol on the Qwest backbone," NANOG [Online]. Available: http:// www.nanog.org/mtg-0202/ppt/cengiz.pdf
 
13
A. Nucci, B. Schroeder, S. Bhattacharyya, N. Taft, and C. Diot, "IGP link weight assignment for transient link failures," in Proc. 18th Int. Teletraffic Congr., Berlin, Germany, Sep. 2003.
 
14
B. Fortz and M. Thorup, "Optimizing OSPF/IS-IS weights in a changing world," IEEE J Sel. Areas Commun., vol. 20, no. 4, pp. 756-767, Apr. 2002.
 
15
M. Durvy, C. Diot, N. Taft, and P. Thiran, "Network availability based service differentiation," in Proc. IWQoS, Monterey, CA, Jun. 2003.
 
16
 
17
 
18
Y. Zhang, V. Paxson, and S. Shenker, "The stationarity of Internet path properties: Routing, loss and throughput," Tech. Rep. ICIR, 2000 [On-line]. Available: http://www.icir.org/
 
19
 
20
 
21
22
 
23
24
 
25
M. Steinder and A. Sethi, "Increasing robustness of fault localization through analysis of lost, spurious and positive symptoms," in Proc. IEEE INFOCOM, New York, NY, Jun. 2002, vol. 1, pp. 322-331.
 
26
Y. Ganjali, S. Bhattacharyya, and C. Diot, "Limiting the impact of failures on network performance," Sprint ATL Tech. Res. Rep. RR04- ATL-020666, 2003.
 
27
P. Tobias and D. Trindade, Applied Reliability, 2nd ed. London, U.K.: Chapman Hall/CRC, 1995.
 
28
L. Adamic, "Zipf, power-laws and Pareto: A ranking tutorial," Xerox Palo Alto Research Center, Palo Alto, CA [Online]. Available: http:// ginger.hpl.hp.com/shl/papers/ranking/ranking.html
 
29
30

Collaborative Colleagues:
Athina Markopoulou: colleagues
Gianluca Iannaccone: colleagues
Supratik Bhattacharyya: colleagues
Chen-Nee Chuah: colleagues
Yashar Ganjali: colleagues
Christophe Diot: colleagues