|
ABSTRACT
Chronic network conditions are caused by performance impairing events that occur intermittently over an extended period of time. Such conditions can cause repeated performance degradation to customers, and sometimes can even turn into serious hard failures. It is therefore critical to troubleshoot and repair chronic network conditions in a timely fashion in order to ensure high reliability and performance in large IP networks. Today, troubleshooting chronic conditions is often performed manually, making it a tedious, time-consuming and error-prone process. In this paper, we present NICE (Network-wide Information Correlation and Exploration), a novel infrastructure that enables the troubleshooting of chronic network conditions by detecting and analyzing statistical correlations across multiple data sources. NICE uses a novel circular permutation test to determine the statistical significance of correlation. It also allows flexible analysis at various spatial granularity (e.g., link, router, network level, etc.). We validate NICE using real measurement data collected at a tier-1 ISP network. The results are quite positive. We then apply NICE to troubleshoot real network issues in the tier-1 ISP network. In all three case studies conducted so far, NICE successfully uncovers previously unknown chronic network conditions, resulting in improved network operations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Paramvir Bahl , Ranveer Chandra , Albert Greenberg , Srikanth Kandula , David A. Maltz , Ming Zhang, Towards highly reliable enterprise network services via inference of multi-level dependencies, Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications, August 27-31, 2007, Kyoto, Japan
|
 |
2
|
|
| |
3
|
D. R. Dawdy and N. C. Matalas. Statistical and probability analysis of hydrologic data, part III: Analysis of variance, covariance and time series. In V. T. Chow, editor, Handbook of applied hydrology, a compendium of water-resource technology, pages 8.68--8.90, 1964.
|
 |
4
|
|
 |
5
|
Yiyi Huang , Nick Feamster , Anukool Lakhina , Jim (Jun) Xu, Diagnosing network disruptions with network-wide analysis, Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 12-16, 2007, San Diego, California, USA
|
 |
6
|
Ihab F. Ilyas , Volker Markl , Peter Haas , Paul Brown , Ashraf Aboulnaga, CORDS: automatic discovery of correlations and soft functional dependencies, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007641]
|
| |
7
|
S. K. Kachigan. Statistical analysis: an interdisciplinary introduction to univariate and multivariate methods. Radius Press, 1986.
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
R. R. Kompella, J. Yates, A. Greenberg, and A. C. Snoeren. Detection and localization of network blackholes. In Infocom, 2007.
|
 |
12
|
Anukool Lakhina , Mark Crovella , Christophe Diot, Mining anomalies using traffic feature distributions, Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, August 22-26, 2005, Philadelphia, Pennsylvania, USA
|
 |
13
|
Franck Le , Sihyung Lee , Tina Wong , Hyong S. Kim , Darrell Newcomb, Minerals: using data mining to detect router misconfigurations, Proceedings of the 2006 SIGCOMM workshop on Mining network data, p.293-298, September 11-15, 2006, Pisa, Italy
[doi> 10.1145/1162678.1162681]
|
| |
14
|
A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C. Chuah, and C. Diot. Characterization of failures in an IP backbone network. In Infocom, 2004.
|
| |
15
|
HP Openview. http://www.openview.hp.com.
|
| |
16
|
|
| |
17
|
|
| |
18
|
M. Steinder and A. S. Sethi. A survey of fault localization techniques in computer networks. Science of Computer Programming, 2004.
|
| |
19
|
IBM Tivoli. http://www-306.ibm.com/software/tivoli.
|
| |
20
|
Jean-Philippe Vasseur , Mario Pickavet , Piet Demeester, Network Recovery: Protection and Restoration of Optical, SONET-SDH, IP, and MPLS, Morgan Kaufmann Publishers Inc., San Francisco, CA, 2004
|
 |
21
|
|
| |
22
|
Yin Zhang , Zihui Ge , Albert Greenberg , Matthew Roughan, Network anomography, Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement, p.30-30, October 19-21, 2005, Berkeley, CA
|
CITED BY
|
|
Ajay Anil Mahimkar , Zihui Ge , Aman Shaikh , Jia Wang , Jennifer Yates , Yin Zhang , Qi Zhao, Towards automated performance diagnosis in a large IPTV network, ACM SIGCOMM Computer Communication Review, v.39 n.4, October 2009
|
|