|
ABSTRACT
The most important prerequisite for the success of the Semantic Web research is the construction of complete and reliable domain ontologies. In this paper we describe an unsupervised framework for domain ontology enrichment based on mining domain text corpora. Specifically, we enrich the hierarchical backbone of an existing ontology, i.e. its taxonomy, with new domain-specific concepts. The framework is based on an extended model of hierarchical self-organizing maps. As being founded on an unsupervised neural network architecture, the framework can be applied to different languages and domains. Terms extracted by mining a text corpus encode contextual content information, in a distributional vector space. The enrichment behaves like a classification of the extracted terms into the existing taxonomy by attaching them as hyponyms for the nodes of the taxonomy. The experiments reported are in the "Lonely Planet" tourism domain. The taxonomy and the corpus are the ones proposed in the PASCAL ontology learning and population challenge. The experimental results prove that the quality of the enrichment is considerably improved by using semantics based vector representations for the classified (newly added) terms, like the document category histograms (DCH) and the document frequency times inverse term frequency (DF-ITF) weighting scheme.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Buitelaar, P., Cimiano, P., Grobelnik, M., Sintek, M., 2005. Ontology learning from text. Tutorial at ECML/PKDD workshop on Knowledge Discovery and Ontologies.
|
| |
3
|
Buitelaar, P., Cimiano, P., Magnini B., 2005. Ontology learning from text: an overview. In P. Buitelaar, P. Cimiano, B. Magnini (Eds.), Ontology Learning from Text: Methods, Evaluation and Applications, Frontiers in Artificial Intelligence and Applications Series. IOS Press, pp. 1--10.
|
| |
4
|
Chifu, E.Şt., Leţia, I. A. 2006. Unsupervised ontology enrichment with hierarchical self-organizing maps, In: IEEE 2nd International Conference on Intelligent Computer Communication and Processing, pp. 3--9, IEEE Press, Cluj-Napoca.
|
| |
5
|
Cimiano, P., Völker, J., 2005. Towards large-scale, open-domain and ontology-based named entity classification. In RANLP'05, International Conference on Recent Advances in Natural Language Processing, pp. 166--172.
|
| |
6
|
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V., 2002. GATE: a framework and graphical development environment for robust NLP tools and applications. In 40th Anniversary Meeting of the ACL.
|
| |
7
|
Dittenbach, M., Merkl, D., Rauber, A., 2002. Organizing and exploring high-dimensional data with the Growing Hierarchical Self-Organizing Map. In L. Wang, et al. (Eds.), 1st International Conference on Fuzzy Systems and Knowledge Discovery, vol. 2, pp. 626--630.
|
| |
8
|
Grobelnik, M., Cimiano, P., Gaussier, E., Buitelaar, P., Novak, B., Brank, J., Sintek, M. 2006. Task description for PASCAL challenge. Evaluating ontology learning and population from text.
|
| |
9
|
Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., Saarela, A., 2000. Self-organization of a massive document collection. IEEE Transactions on Neural Networks 11, pp. 574--585.
|
| |
10
|
|
| |
11
|
|
| |
12
|
Witschel, H. F., 2005. Using decision trees and text mining techniques for extending taxonomies. In Learning and Extending Lexical Ontologies by using Machine Learning Methods, Workshop at ICML-05, pp. 61--68.
|
|