| Learning, indexing, and diagnosing network faults |
| Full text |
Mov
(10:59),
Pdf
(540 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Paris, France
SESSION: Research track papers
table of contents
Pages 857-866
Year of Publication: 2009
ISBN:978-1-60558-495-9
|
|
Authors
|
|
Ting Wang
|
Georgia Institute of Technology, Atlanta, GA, USA
|
|
Mudhakar Srivatsa
|
IBM T.J. Watson Research Center, Hawthorne, NY, USA
|
|
Dakshi Agrawal
|
IBM T.J. Watson Research Center, Hawthorne, NY, USA
|
|
Ling Liu
|
Georgia Institute of Technology, Atlanta, GA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 45, Downloads (12 Months): 104, Citation Count: 0
|
|
|
ABSTRACT
Modern communication networks generate massive volume of operational event data, e.g., alarm, alert, and metrics, which can be used by a network management system (NMS) to diagnose potential faults. In this work, we introduce a new class of indexable fault signatures that encode temporal evolution of events generated by a network fault as well as topological relationships among the nodes where these events occur. We present an efficient learning algorithm to extract such fault signatures from noisy historical event data, and with the help of novel space-time indexing structures, we show how to perform efficient, online signature matching. We provide results from extensive experimental studies to explore the efficacy of our approach and point out potential applications of such signatures for many different types of networks including social and information networks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
]]H. Akaike. A new look at the statistical model identification. IEEE Trans. Auto. Cont., 19(6), 1974.
|
| |
4
|
]]A.-L. Barabási. Linked: The New Science of Networks. Perseus Publishing, 2002.
|
| |
5
|
|
 |
6
|
Ira Cohen , Steve Zhang , Moises Goldszmidt , Julie Symons , Terence Kelly , Armando Fox, Capturing, indexing, clustering, and retrieving system history, Proceedings of the twentieth ACM symposium on Operating systems principles, October 23-26, 2005, Brighton, United Kingdom
|
| |
7
|
]]A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. J. Royal Stat. Soci. B, 39(1), 1977.
|
| |
8
|
|
 |
9
|
|
| |
10
|
]]I. E. T. Force. OSPF version 2. http://www.ietf.org/rfc.
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
]]X. Meng, G. Jiang, H. Zhang, H. Chen, and K. Yoshihira. Automatic profiling of network event sequences: algorithm and application. In IEEE INFOCOM, 2008.
|
| |
17
|
|
| |
18
|
|
| |
19
|
]]F. Salfner. Event-based failure prediction: an extended hidden markov model approach. Department of Computer Science, Humboldt-Universitat zu Berlin, Germany, 2008.
|
| |
20
|
]]M. Steinder and A. Sethi. A survey of fault localization techniques in computer networks. Sci. Comput. Prog., 53, 2004.
|
| |
21
|
]]P. Wu, R. Bhatnagar, L. Epshtein, M. Bhandaru, and S. Zhongwen. Alarm correlation engine. In NOMS, 1998.
|
| |
22
|
]]S. Yemini, S. Kliger, E. Mozes, Y. Yemini, and D. Ohsie. High speed and robust event correlation. Communications Magazine, IEEE, 34(5), 1996.
|
 |
23
|
Chun Yuan , Ni Lao , Ji-Rong Wen , Jiwei Li , Zheng Zhang , Yi-Min Wang , Wei-Ying Ma, Automated known problem diagnosis with event traces, Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, April 18-21, 2006, Leuven, Belgium
|
 |
24
|
|
 |
25
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
|