ACM Home Page
Please provide us with feedback. Feedback
Mining logs files for data-driven system management
Full text PdfPdf (797 KB)
Source ACM SIGKDD Explorations Newsletter archive
Volume 7 ,  Issue 1  (June 2005) table of contents
Natural language processing and text mining
Pages: 44 - 51  
Year of Publication: 2005
ISSN:1931-0145
Authors
Wei Peng  Florida International University, Miami, FL
Tao Li  Florida International University, Miami, FL
Sheng Ma  IBM T.J. Watson Research Center, Hawthorne, NY
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 58,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1089815.1089822
What is a DOI?

ABSTRACT

With advancement in science and technology, computing systems are becoming increasingly more complex with an increasing variety of heterogeneous software and hardware components. They are thus becoming increasingly more difficult to monitor, manage and maintain. Traditional approaches to system management have been largely based on domain experts through a knowledge acquisition process that translates domain knowledge into operating rules and policies. This has been well known and experienced as a cumber-some, labor intensive, and error prone process. In addition, this process is difficult to keep up with the rapidly changing environments. There is thus a pressing need for automatic and efficient approaches to monitor and manage complex computing systems.A popular approach to system management is based on analyzing system log files. However, some new aspects of the log files have been less emphasized in existing methods from data mining and machine learning community. The various formats and relatively short text messages of log files, and temporal characteristics in data representation pose new challenges. In this paper, we will describe our research efforts on mining system log files for automatic management. In particular, we apply text mining techniques to categorize messages in log files into common situations, improve categorization accuracy by considering the temporal characteristics of log messages, and utilize visualization tools to evaluate and validate the interesting temporal patterns for system management.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
M. Chessell. Specification: Common base event, 2003. http://www-106.ibm.com/developerworks/webservices/library/ws-cbe/.
4
 
5
 
6
J. Goldstein, S. F. Roth, J. Kolojejchick, and J. Mattis. A framework for knowledge-based, interactive data exploration. Journal of visual languages and computing, 5:339--363, 1994.
 
7
Joseph L. Hellerstein, Sheng Ma, and Chang shing Perng. Discover actionable patterns in event data. IBM System Journal, 41(3):475--493, 2002.
 
8
 
9
 
10
 
11
Nicholas Kushmerick, Edward Johnston, and Stephen McGuinness. Information extraction by text classification. Proceedings of the IJCAI-01 Workshop on Adaptive Text Extraction and Mining, 2001.
 
12
T. R. Leek. Information extraction using hidden markov models. Master's thesis, UC San Diego, 1997.
 
13
 
14
 
15
 
16
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (SIGKDD'95), pages 210--215. AAAI Press, 1995.
 
17
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In In AAAI-98 Workshop on Learning for Text Categorization, 1998.
 
18
 
19
 
20
 
21
 
22
IBM Market Research. Autonomic computing core technology study, 2003.
 
23
Irina Rish. An empirical study of the naive Bayes classifier. In Proceedings of IJCAI-01 workshop on Empirical Methods in Al, pages 41--46, 2001.
24
 
25
 
26
Brad Topol, David Ogle, Donna Pierson, Jim Thoenscn, John Sweitzer, Marie Chow, Mary Ann Hoffmann, Pamela Durham, Ric Telford, Sulabha Sheth, and Thomas Studwell. Automating problem determination: A first step toward self-healing computing systems. IBM White Paper, October 2003. http://www-106.ibm.com/developerworks/autonomic/library/acsummary/ac-prob.html.
 
27
 
28