| Classifying tags using open content resources |
| Full text |
Pdf
(440 KB)
|
| Source
|
Web Search and Web Data Mining
archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining
table of contents
Barcelona, Spain
SESSION: Classification and clustering
table of contents
Pages 64-73
Year of Publication: 2009
ISBN:978-1-60558-390-7
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 34, Downloads (12 Months): 293, Citation Count: 2
|
|
|
ABSTRACT
Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, dates, people and other associated meta-data. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet categories as our classification schema and ground truth. Two structural patterns found in Wikipedia are used for training and classification: categories and templates. We apply our system to classifying Flickr tags. Compared to a WordNet baseline our method increases the coverage of the Flickr vocabulary by 115%. We can classify many important entities that are not covered by WordNet, such as, London Eye, Big Island, Ronaldinho, geocaching and wii.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proc. of EACL, pages 9--16, 2006.
|
| |
3
|
D. Buscaldi, P. Rosso, and P. García. Inferring geographic ontologies from multiple resources for geographical information retrieval. In Proc. of the SIGIR workshop on GIR, pages 53--55, 2006.
|
| |
4
|
P. Clough, A. Al-Maskari, and K. Darwish. Providing multilingual access to Flickr for arabic users. In Proc. of CLEF, 2006.
|
| |
5
|
S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In Proc. of EMNLP-CoNLL, pages 708--716, 2007.
|
| |
6
|
DBpedia. http://dbpedia.org/. Accessed 5 Dec 08.
|
| |
7
|
Delicious. http://del.icio.us/. Accessed 5 Dec 08.
|
| |
8
|
Flickr. http://www.Flickr.com/. Accessed 5 Dec 08.
|
| |
9
|
FlickrAPI. http://www.flickr.com/services/api/. Accessed 5 Dec 08.
|
| |
10
|
T. Joachims. Making large-scale SVM learning practical. In Advances in Kernal Methods - Support Vector Learning, pages 41--56, 1998.
|
| |
11
|
R. Mihalcea. Using wikipedia for automatic word sense disambiguation. In Proc. of NAACL, pages 196--203, 2007.
|
 |
12
|
|
 |
13
|
|
| |
14
|
M. Ruiz-Casado, E. Alfonseca, and P. Castells. Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. In Proc. of AWIC, pages 380--386, 2005.
|
| |
15
|
P. Schmitz. Inducing an ontology from flickr tags. In Proc. of the Workshop on Collaborative Web Tagging at WWW'06, 2006.
|
 |
16
|
|
 |
17
|
|
| |
18
|
TagExplorer. http://sandbox.yahoo.com/TagExplorer. Accessed 5 Dec 08.
|
 |
19
|
|
| |
20
|
Wikipedia. http://www.wikipedia.org/. Accessed 5 Dec 08.
|
| |
21
|
WordNet. http://wordnet.princeton.edu/. Accessed 5 Dec 08.
|
 |
22
|
Ka-Ping Yee , Kirsten Swearingen , Kevin Li , Marti Hearst, Faceted metadata for image search and browsing, Proceedings of the SIGCHI conference on Human factors in computing systems, April 05-10, 2003, Ft. Lauderdale, Florida, USA
[doi> 10.1145/642611.642681]
|
| |
23
|
YouTube. http://youtube.com/. Accessed 5 Dec 08.
|
|