ACM Home Page
Please provide us with feedback. Feedback
Concept vector extraction from Wikipedia category network
Full text PdfPdf (704 KB)
Source Conference On Ubiquitous Information Management And Communication archive
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication table of contents
Suwon, Korea
SESSION: Data search I table of contents
Pages 71-79  
Year of Publication: 2009
ISBN:978-1-60558-405-8
Authors
Masumi Shirakawa  Osaka Univ., Suita, Osaka, Japan
Kotaro Nakayama  Tokyo Univ., Bunkyo-ku, Tokyo, Japan
Takahiro Hara  Osaka Univ., Suita, Osaka, Japan
Shojiro Nishio  Osaka Univ., Suita, Osaka, Japan
Sponsor
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 33,   Downloads (12 Months): 97,   Citation Count: 0
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1516241.1516255
What is a DOI?

ABSTRACT

The availability of machine readable taxonomy has been demonstrated by various applications such as document classification and information retrieval. One of the main topics of automated taxonomy extraction research is Web mining based statistical NLP and a significant number of researches have been conducted. However, existing works on automatic dictionary building have accuracy problems due to the technical limitation of statistical NLP (Natural Language Processing) and noise data on the WWW. To solve these problems, in this work, we focus on mining Wikipedia, a large scale Web encyclopedia. Wikipedia has high-quality and huge-scale articles and a category system because many users in the world have edited and refined these articles and category system daily. Using Wikipedia, the decrease of accuracy deriving from NLP can be avoided. However, affiliation relations cannot be extracted by simply descending the category system automatically since the category system in Wikipedia is not in a tree structure but a network structure. We propose concept vectorization methods which are applicable to the category network structured in Wikipedia.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Becker and D. Kuropka. Topic-based vector space model. In Proc. of International Conference on Business Information Systems (BIS), pages 7--12, June 2003.
 
2
C. Brewster. Techniques for automated taxonomy building: Towards ontologies for knowledge management. In Proc. of Computational Linguistics UK Research Colloquium (CLUK), Jan. 2002.
 
3
4
 
5
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proc. of International Joint Conference on Artificial Intelligence (IJCAI), pages 1606--1611, Jan. 2007.
 
6
J. Giles. Internet encyclopedias go head to head. Nature, 438(7070):900--901, Dec. 2005.
 
7
S. Ikehara, M. Miyazaki, S. Shirai, A. Yokoo, H. Nakaiwa, K. Ogura, Y. Ooyama, and Y. Hayashi. Goi-Taikei --- A Japanese Lexicon. Iwanami Shoten, 1997.
 
8
B. Leuf and W. Cunningham. The Wiki Way: Collaboration and sharing on the Internet. Addison-Wesley, 2001.
 
9
10
 
11
K. Nakayama, T. Hara, and S. Nishio. Wikipedia mining to construct a thesaurus(information retrieval). Transactions of Information Processing Society of Japan, 47(10):2917--2928, Oct. 2006.
 
12
M. Ruiz-Casado, E. Alfonseca, and P. Castells. Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. In Proc. of International Atlantic Web Intelligence Conference (AWIC), pages 380--386, June 2005.
 
13
M. Strube and S. Ponzetto. WikiRelate! computing semantic relatedness using Wikipedia. In Proc. of National Conference on Artificial Intelligence (AAAI), pages 1419--1424, July 2006.
14
 
15
Wikimedia Foundation. Categorytree. http://en.wikipedia.Org/wiki/Special:CategoryTree.


Collaborative Colleagues:
Masumi Shirakawa: colleagues
Kotaro Nakayama: colleagues
Takahiro Hara: colleagues
Shojiro Nishio: colleagues