|
ABSTRACT
Using frequent patterns to analyze data has been one of the fundamental approaches in many data mining applications. Research in frequent pattern mining has so far mostly focused on developing efficient algorithms to discover various kinds of frequent patterns, but little attention has been paid to the important next step—interpreting the discovered frequent patterns. Although the compression and summarization of frequent patterns has been studied in some recent work, the proposed techniques there can only annotate a frequent pattern with nonsemantical information (e.g., support), which provides only limited help for a user to understand the patterns. In this article, we study the novel problem of generating semantic annotations for frequent patterns. The goal is to discover the hidden meanings of a frequent pattern by annotating it with in-depth, concise, and structured information. We propose a general approach to generate such an annotation for a frequent pattern by constructing its context model, selecting informative context indicators, and extracting representative transactions and semantically similar patterns. This general approach can well incorporate the user's prior knowledge, and has potentially many applications, such as generating a dictionary-like description for a pattern, finding synonym patterns, discovering semantic relations, and summarizing semantic classes of a set of frequent patterns. Experiments on different datasets show that our approach is effective in generating semantic pattern annotations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
3
|
|
 |
4
|
|
 |
5
|
Sergey Brin , Rajeev Motwani , Craig Silverstein, Beyond market baskets: generalizing association rules to correlations, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.265-276, May 11-15, 1997, Tucson, Arizona, United States
|
 |
6
|
Chris Burges , Tal Shaked , Erin Renshaw , Ari Lazier , Matt Deeds , Nicole Hamilton , Greg Hullender, Learning to rank using gradient descent, Proceedings of the 22nd international conference on Machine learning, p.89-96, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102363]
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 6, 391--407.
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
Aristides Gionis , Heikki Mannila , Taneli Mielikäinen , Panayiotis Tsaparas, Assessing data mining results via swap randomization, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150424]
|
| |
15
|
Grahne, G. and Zhu, J. 2003. Efficiently using prefix-trees in mining frequent itemsets. In FIMI'03 Workshop on Frequent Itemset Mining Implementations.
|
| |
16
|
|
| |
17
|
|
| |
18
|
Jaccard, P. 1908. Nouvelles recherches sur la distribution florale. Bull. Soc. Vaudoise Sci. Nat. 44, 223C-270.
|
 |
19
|
|
| |
20
|
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
Ling, X., Jiang, J., He, X., Mei, Q., Zhai, C., and Schatz, B. 2006. Automatically generating gene summaries from biomedical literature. In Proceedings of the Pacific Symposium on Biocomputing, 40--51.
|
 |
25
|
|
| |
26
|
|
 |
27
|
|
| |
28
|
Tao, T., Zhai, C., Lu, X., and Fang, H. 2004. A study of statistical methods for function prediction of protein motifs. Appl. Bioinf. 3, 2-3, 115--124.
|
 |
29
|
|
 |
30
|
Ke Wang , Chu Xu , Bing Liu, Clustering transactions using large items, Proceedings of the eighth international conference on Information and knowledge management, p.483-490, November 02-06, 1999, Kansas City, Missouri, United States
[doi> 10.1145/319950.320054]
|
| |
31
|
|
| |
32
|
|
 |
33
|
Xifeng Yan , Hong Cheng , Jiawei Han , Dong Xin, Summarizing itemset patterns: a profile-based approach, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081907]
|
| |
34
|
|
| |
35
|
Yan, X., Han, J., and Afshar, R. 2003. Clospan: Mining closed sequential patterns in large datasets. In Proceedings of the 3rd SIAM International Conference on Data Mining (SDM), 166--177.
|
|