ACM Home Page
Please provide us with feedback. Feedback
Troubleshooting chronic conditions in large IP networks
Full text PdfPdf (134 KB)
Source International Conference On Emerging Networking Experiments And Technologies archive
Proceedings of the 2008 ACM CoNEXT Conference table of contents
Madrid, Spain
Article No. 2  
Year of Publication: 2008
ISBN:978-1-60558-210-8
Authors
Ajay Mahimkar  The University of Texas at Austin
Jennifer Yates  AT&T Labs -- Research
Yin Zhang  The University of Texas at Austin
Aman Shaikh  AT&T Labs -- Research
Jia Wang  AT&T Labs -- Research
Zihui Ge  AT&T Labs -- Research
Cheng Tien Ee  AT&T Labs -- Research
Sponsors
ACM: Association for Computing Machinery
SIGCOMM: ACM Special Interest Group on Data Communication
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 50,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1544012.1544014
What is a DOI?

ABSTRACT

Chronic network conditions are caused by performance impairing events that occur intermittently over an extended period of time. Such conditions can cause repeated performance degradation to customers, and sometimes can even turn into serious hard failures. It is therefore critical to troubleshoot and repair chronic network conditions in a timely fashion in order to ensure high reliability and performance in large IP networks. Today, troubleshooting chronic conditions is often performed manually, making it a tedious, time-consuming and error-prone process.

In this paper, we present NICE (Network-wide Information Correlation and Exploration), a novel infrastructure that enables the troubleshooting of chronic network conditions by detecting and analyzing statistical correlations across multiple data sources. NICE uses a novel circular permutation test to determine the statistical significance of correlation. It also allows flexible analysis at various spatial granularity (e.g., link, router, network level, etc.). We validate NICE using real measurement data collected at a tier-1 ISP network. The results are quite positive. We then apply NICE to troubleshoot real network issues in the tier-1 ISP network. In all three case studies conducted so far, NICE successfully uncovers previously unknown chronic network conditions, resulting in improved network operations.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
D. R. Dawdy and N. C. Matalas. Statistical and probability analysis of hydrologic data, part III: Analysis of variance, covariance and time series. In V. T. Chow, editor, Handbook of applied hydrology, a compendium of water-resource technology, pages 8.68--8.90, 1964.
4
5
6
 
7
S. K. Kachigan. Statistical analysis: an interdisciplinary introduction to univariate and multivariate methods. Radius Press, 1986.
8
9
 
10
 
11
R. R. Kompella, J. Yates, A. Greenberg, and A. C. Snoeren. Detection and localization of network blackholes. In Infocom, 2007.
12
13
 
14
A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C. Chuah, and C. Diot. Characterization of failures in an IP backbone network. In Infocom, 2004.
 
15
HP Openview. http://www.openview.hp.com.
 
16
 
17
 
18
M. Steinder and A. S. Sethi. A survey of fault localization techniques in computer networks. Science of Computer Programming, 2004.
 
19
IBM Tivoli. http://www-306.ibm.com/software/tivoli.
 
20
21
 
22


Collaborative Colleagues:
Ajay Mahimkar: colleagues
Jennifer Yates: colleagues
Yin Zhang: colleagues
Aman Shaikh: colleagues
Jia Wang: colleagues
Zihui Ge: colleagues
Cheng Tien Ee: colleagues