ACM Home Page
Please provide us with feedback. Feedback
A statistical comparison of tag and query logs
Full text PdfPdf (1.42 MB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
SESSION: Web 2.0 table of contents
Pages 123-130  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Mark J. Carman  University of Lugano, Lugano, Switzerland
Mark Baillie  University of Strathclyde, Glasgow, United Kingdom
Robert Gwadera  University of Lugano, Lugano, Switzerland
Fabio Crestani  University of Lugano, Lugano, Switzerland
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 71,   Downloads (12 Months): 238,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1571965
What is a DOI?

ABSTRACT

We investigate tag and query logs to see if the terms people use to annotate websites are similar to the ones they use to query for them. Over a set of URLs, we compare the distribution of tags used to annotate each URL with the distribution of query terms for clicks on the same URL. Understanding the relationship between the distributions is important to determine how useful tag data may be for improving search results and conversely, query data for improving tag prediction. In our study, we compare both term frequency distributions using vocabulary overlap and relative entropy. We also test statistically whether the term counts come from the same underlying distribution. Our results indicate that the vocabulary used for tagging and searching for content are similar but not identical. We further investigate the content of the websites to see which of the two distributions (tag or query) is most similar to the content of the annotated/searched URL. Finally, we analyze the similarity for different categories of URLs in our sample to see if the similarity between distributions is dependent on the topic of the website or the popularity of the URL.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
 
5
6
7
 
8
R. Krichevsky and V. Trofimov. The performance of universal encoding. IEEE Transactions on Information Theory, 27(2):199--207, 1981.
 
9
 
10
11
 
12
Z. Talata. Model selection via information criteria Periodica Mathematica Hungarica, 51:99--117, 2005.
13

Collaborative Colleagues:
Mark J. Carman: colleagues
Mark Baillie: colleagues
Robert Gwadera: colleagues
Fabio Crestani: colleagues