ACM Home Page
Please provide us with feedback. Feedback
Frequent term-based text clustering
Full text PdfPdf (655 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Edmonton, Alberta, Canada
POSTER SESSION: Poster papers table of contents
Pages: 436 - 442  
Year of Publication: 2002
ISBN:1-58113-567-X
Authors
Florian Beil  Ludwig-Maximilians-Universitaet, Muenchen, Munich, Germany
Martin Ester  Simon Fraser University, Burnaby, BC, Canada
Xiaowei Xu  Siemens AG, Munich, Germany
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
: AAAI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 37,   Downloads (12 Months): 255,   Citation Count: 20
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775047.775110
What is a DOI?

ABSTRACT

Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special problems of text clustering: very high dimensionality of the data, very large size of the databases and understandability of the cluster description. In this paper, we introduce a novel approach which uses frequent item (term) sets for text clustering. Such frequent sets can be efficiently discovered using algorithms for association rule mining. To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents. We present two algorithms for frequent term-based text clustering, FTC which creates flat clusterings and HFTC for hierarchical clustering. An experimental evaluation on classical text documents as well as on web documents demonstrates that the proposed algorithms obtain clusterings of comparable quality significantly more efficiently than state-of-the- art text clustering algorithms. Furthermore, our methods provide an understandable description of the discovered clusters by their frequent term sets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
Steinbach M., Karypis G., Kumar V.: A Comparison of Document Clustering Techniques, Proc. TextMining Workshop, KDD 2000, 2000.
 
5
 
6
Kaufman L., Rousseeuw P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, 1990.
7
8
 
9
Liu B., Hsu W., Ma Y.: Integrating Classification and Association Rule Mining, Proc. KDD 98, pp. 80--86.
 
10
 
11
Yibin S.: An implementation of the Apriori algorithm, <u>http://www.cs.uregina.ca/~dbd/cs831/notes/itemsets/dic.java</u>, 2000.
12

CITED BY  21

Collaborative Colleagues:
Florian Beil: colleagues
Martin Ester: colleagues
Xiaowei Xu: colleagues