| Real-time automatic tag recommendation |
| Full text |
Pdf
(619 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Singapore, Singapore
SESSION: Social tagging
table of contents
Pages 515-522
Year of Publication: 2008
ISBN:978-1-60558-164-4
|
|
Authors
|
|
Yang Song
|
The Pennsylvania State University, University Park, PA, USA
|
|
Ziming Zhuang
|
Yahoo! Applied Research, Santa Clara, CA, USA
|
|
Huajing Li
|
The Pennsylvania State University, University Park, PA, USA
|
|
Qiankun Zhao
|
AOL Research Lab, Beijing, China
|
|
Jia Li
|
The Pennsylvania State University, University Park, PA, USA
|
|
Wang-Chien Lee
|
The Pennsylvania State University, University Park, PA, USA
|
|
C. Lee Giles
|
The Pennsylvania State University, University Park, PA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 74, Downloads (12 Months): 642, Citation Count: 3
|
|
|
ABSTRACT
Tags are user-generated labels for entities. Existing research on tag recommendation either focuses on improving its accuracy or on automating the process, while ignoring the efficiency issue. We propose a highly-automated novel framework for real-time tag recommendation. The tagged training documents are treated as triplets of (words, docs, tags), and represented in two bipartite graphs, which are partitioned into clusters by Spectral Recursive Embedding (SRE). Tags in each topical cluster are ranked by our novel ranking algorithm. A two-way Poisson Mixture Model (PMM) is proposed to model the document distribution into mixture components within each cluster and aggregate words into word clusters simultaneously. A new document is classified by the mixture model based on its posterior probabilities so that tags are recommended according to their ranks. Experiments on large-scale tagging datasets of scientific documents (CiteULike) and web pages del.icio.us) indicate that our framework is capable of making tag recommendation efficiently and effectively. The average tagging time for testing a document is around 1 second, with over 88% test documents correctly labeled with the top nine tags we suggested.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In International Workshop on Clustering Information over the Web (in conjunction with EDBT), 2004.
|
| |
2
|
G. Begelman, P. Keller, and F. Smadja. Automated tag clustering: Improving search and exploration in the tag space. In Collaborative Web Tagging Workshop at WWW2006, Edinburgh, Scotland, 2006.
|
| |
3
|
J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Uncertainty in Artificial Intelligence. Proceedings of the Fourteenth Conference (1998), pages 43--52, 1998.
|
 |
4
|
Paul - Alexandru Chirita , Stefania Costache , Wolfgang Nejdl , Siegfried Handschuh, P-TAG: large scale automatic generation of personalized annotation tags for the web, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242686]
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
Bin Gao , Tie-Yan Liu , Xin Zheng , Qian-Sheng Cheng , Wei-Ying Ma, Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081879]
|
| |
11
|
|
| |
12
|
|
| |
13
|
M. Kendall. A new measure of rank correlation. Biometrika, 30:81--89, 1938.
|
| |
14
|
J. Li and H. Zha. Two-way poisson mixture models for simultaneous document classification and word clustering. Computational Statistics & Data Analysis, 2006.
|
 |
15
|
Wensi Xi , Edward A. Fox , Weiguo Fan , Benyu Zhang , Zheng Chen , Jun Yan , Dong Zhuang, SimFusion: measuring similarity using unified relationship matrix, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076059]
|
 |
16
|
Hongyuan Zha , Xiaofeng He , Chris Ding , Horst Simon , Ming Gu, Bipartite graph partitioning and data clustering, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502585.502591]
|
|