|
ABSTRACT
As the Internet evolves into a ubiquitous communication infrastructure and supports increasingly important services, its dependability in the presence of various failures becomes critical. In this paper, we analyze IS-IS routing updates fromthe Sprint IP backbone network to characterize failures that affect IP connectivity. Failures are first classified based on patterns observed at the IP-layer; in some cases, it is possible to further infer their probable causes, such as maintenance activities, router-related and optical layer problems. Key temporal and spatial characteristics of each class are analyzed and, when appropriate, parameterized using well-known distributions. Our results indicate that 20% of all failures happen during a period of scheduled maintenance activities. Of the unplanned failures, almost 30% are shared by multiple links and are most likely due to router-related and optical equipment-related problems, respectively, while 70% affect a single link at a time. Our classification of failures reveals the nature and extent of failures in the Sprint IP backbone. Furthermore, our characterization of the different classes provides a probabilistic failure model, which can be used to generate realistic failure scenarios, as input to various network design and traffic engineering problems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. Fraleigh, S. Moon, B. Lyles, C. Cotton, M. Khan, R. Rockell, D. Moll, T. Seely, and C. Diot, "Packet-level traffic measurements from the Sprint IP backbone," IEEE Network Mag., vol. 17, no. 6, pp. 6-16, Nov.-Dec. 2003.
|
| |
2
|
K. Papagiannaki, S. Moon, C. Fraleigh, P. Thiran, and C. Diot, "Measurement and analysis of single-hop delay on an IP backbone network," IEEE J. Sel. Areas Commun., vol. 21, no. 6, pp. 908-921, Aug. 2003.
|
 |
3
|
Gianluca Iannaccone , Chen-nee Chuah , Richard Mortier , Supratik Bhattacharyya , Christophe Diot, Analysis of link failures in an IP backbone, Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment, November 06-08, 2002, Marseille, France
[doi> 10.1145/637201.637238]
|
| |
4
|
G. Iannaccone, C.-N. Chuah, S. Bhattacharyya, and C. Diot, "Feasibility of IP restoration in a tier-1 backbone," IEEE Netw., vol. 18, no. 2, pp. 13-19, Mar. 2004.
|
| |
5
|
|
| |
6
|
S. Iyer, S. Bhattacharyya, N. Taft, and C. Diot, "An approach to alleviate link overload as observed on an IP backbone," in Proc. IEEE INFOCOM, San Francisco, CA, Mar. 2003, vol. 1, pp. 406-416.
|
| |
7
|
A. Markopoulou, G. Iannaccone, S. Bhatacharyya, C.-N. Chuah, and C. Diot, "Characterization of failures in an IP backbone," in Proc. IEEE INFOCOM, Hong Kong, Mar. 2004, vol. 4, pp. 2307-2317.
|
| |
8
|
C. Fraleigh, F. Tobagi, and C. Diot, "Provisioning IP backbone networks to support latency sensitive traffic," in Proc. IEEE INFOCOM, San Francisco, CA, Mar.-Apr. 2003, vol. 1, pp. 375-385.
|
 |
9
|
|
| |
10
|
A. Fumagalli and L. Valcarenghi, "IP restoration versus WDM protection: Is there an optimal choice?," IEEE Network Magazine, vol. 14, no. 6, pp. 34-41, Nov. 2000.
|
| |
11
|
L. Sahasrabuddhe, S. Ramamurthy, and B. Mukherjee, "Fault management in IP-over-WDM networks: WDM protection versus IP restoration," IEEE J. Sel. Areas Commun., vol. 20, no. 1, pp. 21-33, Jan. 2002.
|
| |
12
|
A. Alaettinoglou and S. Casner, "Detailed analysis of ISIS Routing Protocol on the Qwest backbone," NANOG [Online]. Available: http:// www.nanog.org/mtg-0202/ppt/cengiz.pdf
|
| |
13
|
A. Nucci, B. Schroeder, S. Bhattacharyya, N. Taft, and C. Diot, "IGP link weight assignment for transient link failures," in Proc. 18th Int. Teletraffic Congr., Berlin, Germany, Sep. 2003.
|
| |
14
|
B. Fortz and M. Thorup, "Optimizing OSPF/IS-IS weights in a changing world," IEEE J Sel. Areas Commun., vol. 20, no. 4, pp. 756-767, Apr. 2002.
|
| |
15
|
M. Durvy, C. Diot, N. Taft, and P. Thiran, "Network availability based service differentiation," in Proc. IWQoS, Monterey, CA, Jun. 2003.
|
| |
16
|
|
| |
17
|
|
| |
18
|
Y. Zhang, V. Paxson, and S. Shenker, "The stationarity of Internet path properties: Routing, loss and throughput," Tech. Rep. ICIR, 2000 [On-line]. Available: http://www.icir.org/
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
Aman Shaikh , Chris Isett , Albert Greenberg , Matthew Roughan , Joel Gottlieb, A case study of OSPF behavior in a large enterprise network, Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment, November 06-08, 2002, Marseille, France
[doi> 10.1145/637201.637236]
|
| |
23
|
|
 |
24
|
|
| |
25
|
M. Steinder and A. Sethi, "Increasing robustness of fault localization through analysis of lost, spurious and positive symptoms," in Proc. IEEE INFOCOM, New York, NY, Jun. 2002, vol. 1, pp. 322-331.
|
| |
26
|
Y. Ganjali, S. Bhattacharyya, and C. Diot, "Limiting the impact of failures on network performance," Sprint ATL Tech. Res. Rep. RR04- ATL-020666, 2003.
|
| |
27
|
P. Tobias and D. Trindade, Applied Reliability, 2nd ed. London, U.K.: Chapman Hall/CRC, 1995.
|
| |
28
|
L. Adamic, "Zipf, power-laws and Pareto: A ranking tutorial," Xerox Palo Alto Research Center, Palo Alto, CA [Online]. Available: http:// ginger.hpl.hp.com/shl/papers/ranking/ranking.html
|
| |
29
|
|
 |
30
|
Anja Feldmann , Anna C. Gilbert , Polly Huang , Walter Willinger, Dynamics of IP traffic: a study of the role of variability and the impact of control, Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, p.301-313, August 30-September 03, 1999, Cambridge, Massachusetts, United States
|
|