| A statistical comparison of tag and query logs |
| Full text |
Pdf
(1.42 MB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
table of contents
Boston, MA, USA
SESSION: Web 2.0
table of contents
Pages 123-130
Year of Publication: 2009
ISBN:978-1-60558-483-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 71, Downloads (12 Months): 238, Citation Count: 0
|
|
|
ABSTRACT
We investigate tag and query logs to see if the terms people use to annotate websites are similar to the ones they use to query for them. Over a set of URLs, we compare the distribution of tags used to annotate each URL with the distribution of query terms for clicks on the same URL. Understanding the relationship between the distributions is important to determine how useful tag data may be for improving search results and conversely, query data for improving tag prediction. In our study, we compare both term frequency distributions using vocabulary overlap and relative entropy. We also test statistically whether the term counts come from the same underlying distribution. Our results indicate that the vocabulary used for tagging and searching for content are similar but not identical. We further investigate the content of the websites to see which of the two distributions (tag or query) is most similar to the content of the annotated/searched URL. Finally, we analyze the similarity for different categories of URLs in our sample to see if the similarity between distributions is dependent on the topic of the website or the popularity of the URL.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Shenghua Bao , Guirong Xue , Xiaoyuan Wu , Yong Yu , Ben Fei , Zhong Su, Optimizing web search using social annotations, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242640]
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
| |
8
|
R. Krichevsky and V. Trofimov. The performance of universal encoding. IEEE Transactions on Information Theory, 27(2):199--207, 1981.
|
| |
9
|
Mu Li , Yang Zhang , Muhua Zhu , Ming Zhou, Exploring distributional similarity based models for query spelling correction, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, p.1025-1032, July 17-18, 2006, Sydney, Australia
[doi> 10.3115/1220175.1220304]
|
| |
10
|
|
 |
11
|
|
| |
12
|
Z. Talata. Model selection via information criteria Periodica Mathematica Hungarica, 51:99--117, 2005.
|
 |
13
|
Yusuke Yanbe , Adam Jatowt , Satoshi Nakamura , Katsumi Tanaka, Can social bookmarking enhance search in the web?, Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, June 18-23, 2007, Vancouver, BC, Canada
[doi> 10.1145/1255175.1255198]
|
|