|
ABSTRACT
Many applications dealing with textual information require classification of words into semantic classes (or concepts). However, manually constructing semantic classes is a tedious task. In this paper, we present an algorithm, UNICON, for UNsupervised Induction of CONcepts. Some advantages of UNICON over previous approaches include the ability to classify words with low frequency counts, the ability to cluster a large number of elements in a high-dimensional space, and the ability to classify previously unknown words into existing clusters. Furthermore, since the algorithm is unsupervised, a set of concepts may be constructed for any corpus.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.94-105, June 01-04, 1998, Seattle, Washington, United States
|
| |
2
|
Arora, S. and Sagra, S. 1992. Approximating Clique is NP-Complete. In Proceedings of IEEE Symposium on Foundations of Computer Science. pp. 2.- 13.
|
| |
3
|
Bomze, I. M., Budinich, M., Pardalos, P. M., and Pelillo, M. 1999. The maximum clique problem. Handbook of Combinatorial Optimization (Supplement Volume A). D.-Z. Du and P. M. Pardalos (Eds.). Kluwar Academic Publishers. Boston,/viA. pp. 1-74
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
Harris, Z. 1985. Distributional Structure. In: Katz, J. J. (ed.) The Philosophy of Linguistics. New York: Oxford University Press. pp. 26-47.
|
| |
9
|
Hays, D. 1964. Dependency Theory: a Formalism and Some Observations. Language, 40:511-525.
|
| |
10
|
Hudson, R. 1984. Word Grammar. Basil Blackwell Publishers Limited. Oxford, England.
|
| |
11
|
Jing, Y. and Croft, W. B. 1994. An Association Thesaurus for Information Retrieval. In Proceedings of RIAO-94. pp. 146-160. New York.
|
| |
12
|
Lin, D. 1998a. Extracting Collocations from Text Corpora. Workshop on Computational Terminology. pp. 57-63. Montreal, Canada.
|
| |
13
|
|
| |
14
|
|
 |
15
|
Andrew McCallum , Kamal Nigam , Lyle H. Ungar, Efficient clustering of high-dimensional data sets with application to reference matching, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.169-178, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347123]
|
| |
16
|
Melcuk, I.A. 1987. Dependency Syntax: theory and practice. State University of New York Press. Albany, NY.
|
| |
17
|
|
| |
18
|
Resnik, P. 1998. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Al Research, 11:95-130.
|
| |
19
|
|
| |
20
|
|
CITED BY 11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qiaozhu Mei , Dong Xin , Hong Cheng , Jiawei Han , ChengXiang Zhai, Generating semantic annotations for frequent patterns with context analysis, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|