| Combining statistics and semantics via ensemble model for document clustering |
| Full text |
Pdf
(304 KB)
|
Source
|
Symposium on Applied Computing
archive
Proceedings of the 2009 ACM symposium on Applied Computing
table of contents
Honolulu, Hawaii
SESSION: Data mining track
table of contents
Pages 1446-1450
Year of Publication: 2009
ISBN:978-1-60558-166-8
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 15, Downloads (12 Months): 66, Citation Count: 0
|
|
|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge provided by domain experts, knowledge specific to the particular data set. In this study, we propose an ensemble model that couples two sources of information: statistics information that is derived from the data set, and sense information retrieved from WordNet that is used to build a semantic binary model. We evaluated the efficacy of using our combined ensemble model on the Reuters-21578 and 20newsgroups data sets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bradley P., Bennett K., and Demiriz A., Constrained k-means clustering. Microsoft Research Technical Report, MSR-TR-2000-65, 2000.
|
| |
2
|
|
| |
3
|
Goe J., Tan P. N., and Cheng H., Semi-supervised Clustering with Partial Background Information. In Proc. of SIAM Int'l Conf on Data Mining, Bethesda, MD 2006.
|
| |
4
|
Mann H. B., Whitney D. R. On a test whether one of two random variables is stochastically larger than the other. Annals of Mathmatical Statistics, 18, 1947, 50--60.
|
 |
5
|
|
| |
6
|
Sedding J., Kazakov D., WordNet-based text document clustering. In Proc. of the 3rd Workshop on Robust Methods in Analysis of Natural Language Processing Data. 2004, 104--113
|
| |
7
|
Steinbach M. and Karypis G. and Kumar V., A comparison of document clustering techniques. In proc. of KDD Workshop on Text Mining, 2000.
|
| |
8
|
Termier A., Rousset MC, Sebag M, Combining statistics and semantics for word and document clustering, In Proc. of IJCAI, 2001, 49--54.
|
| |
9
|
Topchy A., Jain A. K., Punch W., A mixture model for clustering ensembles, In Proc. of SIAM Conference on Data Mining, 2004, 379--390.
|
| |
10
|
|
|