| Concept vector extraction from Wikipedia category network |
| Full text |
Pdf
(704 KB)
|
| Source
|
Conference On Ubiquitous Information Management And Communication
archive
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
table of contents
Suwon, Korea
SESSION: Data search I
table of contents
Pages 71-79
Year of Publication: 2009
ISBN:978-1-60558-405-8
|
|
Authors
|
|
Masumi Shirakawa
|
Osaka Univ., Suita, Osaka, Japan
|
|
Kotaro Nakayama
|
Tokyo Univ., Bunkyo-ku, Tokyo, Japan
|
|
Takahiro Hara
|
Osaka Univ., Suita, Osaka, Japan
|
|
Shojiro Nishio
|
Osaka Univ., Suita, Osaka, Japan
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 33, Downloads (12 Months): 97, Citation Count: 0
|
|
|
ABSTRACT
The availability of machine readable taxonomy has been demonstrated by various applications such as document classification and information retrieval. One of the main topics of automated taxonomy extraction research is Web mining based statistical NLP and a significant number of researches have been conducted. However, existing works on automatic dictionary building have accuracy problems due to the technical limitation of statistical NLP (Natural Language Processing) and noise data on the WWW. To solve these problems, in this work, we focus on mining Wikipedia, a large scale Web encyclopedia. Wikipedia has high-quality and huge-scale articles and a category system because many users in the world have edited and refined these articles and category system daily. Using Wikipedia, the decrease of accuracy deriving from NLP can be avoided. However, affiliation relations cannot be extracted by simply descending the category system automatically since the category system in Wikipedia is not in a tree structure but a network structure. We propose concept vectorization methods which are applicable to the category network structured in Wikipedia.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Becker and D. Kuropka. Topic-based vector space model. In Proc. of International Conference on Business Information Systems (BIS), pages 7--12, June 2003.
|
| |
2
|
C. Brewster. Techniques for automated taxonomy building: Towards ontologies for knowledge management. In Proc. of Computational Linguistics UK Research Colloquium (CLUK), Jan. 2002.
|
| |
3
|
|
 |
4
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
[doi> 10.1145/133160.133214]
|
| |
5
|
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proc. of International Joint Conference on Artificial Intelligence (IJCAI), pages 1606--1611, Jan. 2007.
|
| |
6
|
J. Giles. Internet encyclopedias go head to head. Nature, 438(7070):900--901, Dec. 2005.
|
| |
7
|
S. Ikehara, M. Miyazaki, S. Shirai, A. Yokoo, H. Nakaiwa, K. Ogura, Y. Ooyama, and Y. Hayashi. Goi-Taikei --- A Japanese Lexicon. Iwanami Shoten, 1997.
|
| |
8
|
B. Leuf and W. Cunningham. The Wiki Way: Collaboration and sharing on the Internet. Addison-Wesley, 2001.
|
| |
9
|
|
 |
10
|
|
| |
11
|
K. Nakayama, T. Hara, and S. Nishio. Wikipedia mining to construct a thesaurus(information retrieval). Transactions of Information Processing Society of Japan, 47(10):2917--2928, Oct. 2006.
|
| |
12
|
M. Ruiz-Casado, E. Alfonseca, and P. Castells. Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. In Proc. of International Atlantic Web Intelligence Conference (AWIC), pages 380--386, June 2005.
|
| |
13
|
M. Strube and S. Ponzetto. WikiRelate! computing semantic relatedness using Wikipedia. In Proc. of National Conference on Artificial Intelligence (AAAI), pages 1419--1424, July 2006.
|
 |
14
|
Max Völkel , Markus Krötzsch , Denny Vrandecic , Heiko Haller , Rudi Studer, Semantic Wikipedia, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
[doi> 10.1145/1135777.1135863]
|
| |
15
|
Wikimedia Foundation. Categorytree. http://en.wikipedia.Org/wiki/Special:CategoryTree.
|
|