ACM Home Page
Please provide us with feedback. Feedback
Constructing comprehensive summaries of large event sequences
Full text PdfPdf (222 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Research papers table of contents
Pages 417-425  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
Jerry Kiernan  IBM, San Jose, CA, USA
Evimaria Terzi  IBM Almaden, San Jose, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 199,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1401943
What is a DOI?

ABSTRACT

Event sequences capture system and user activity over time. Prior research on sequence mining has mostly focused on discovering local patterns. Though interesting, these patterns reveal local associations and fail to give a comprehensive summary of the entire event sequence. Moreover, the number of patterns discovered can be large. In this paper, we take an alternative approach and build short summaries that describe the entire sequence, while revealing local associations among events.

We formally define the summarization problem as an optimization problem that balances between shortness of the summary and accuracy of the data description. We show that this problem can be solved optimally in polynomial time by using a combination of two dynamic-programming algorithms. We also explore more efficient greedy alternatives and demonstrate that they work well on large datasets. Experiments on both synthetic and real datasets illustrate that our algorithms are efficient and produce high-quality results, and reveal interesting local structures in the data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
4
5
 
6
 
7
 
8
M. Koivisto, M. Perola, T. Varilo, et al. An MDL method for finding haplotype blocks and for estimating the strength of haplotype block boundaries. In Pacific Symposium on Biocomputing, pages 502--513, 2003.
9
 
10
H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In KDD , pages 146--151, 1996.
 
11
M. Mehta, J. Rissanen, and R. Agrawal. Mdl-based decision tree pruning. In KDD, pages 216--221, 1995.
12
 
13
 
14
L. R. Rabiner and B. H. Juang. An introduction to Hidden Markov Models. IEEE ASSP Magazine, pages 4--15, January 1986.
 
15
J. Rissanen. Modeling by shortest data description. Automatica, 14:465--471, 1978.
 
16
17
 
18
E. Terzi and P. Tsaparas. Efficient algorithms for sequence segmentation. In SDM, 2006.
19
 
20


Collaborative Colleagues:
Jerry Kiernan: colleagues
Evimaria Terzi: colleagues