ACM Home Page
Please provide us with feedback. Feedback
Building enriched document representations using aggregated anchor text
Full text PdfPdf (659 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
SESSION: Web Retrieval I table of contents
Pages 219-226  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Donald Metzler  Yahoo! Labs, Santa Clara, CA, USA
Jasmine Novak  Yahoo! Labs, Santa Clara, CA, USA
Hang Cui  Yahoo! Labs, Santa Clara, CA, USA
Srihari Reddy  Yahoo! Labs, Santa Clara, CA, USA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 60,   Downloads (12 Months): 170,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1571981
What is a DOI?

ABSTRACT

It is well known that anchor text plays a critical role in a variety of search tasks performed over hypertextual domains, including enterprise search, wiki search, and web search. It is common practice to enrich a document's standard textual representation with all of the anchor text associated with its incoming hyperlinks. However, this approach does not help match relevant pages with very few inlinks. In this paper, we propose a method for overcoming anchor text sparsity by enriching document representations with anchor text that has been aggregated across the hyperlink graph. This aggregation mechanism acts to smooth, or diffuse, anchor text within a domain. We rigorously evaluate our proposed approach on a large web search test collection. Our results show the approach significantly improves retrieval effectiveness, especially for longer, more difficult queries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Allan, B. Carterette, J.A. Aslam, V. Pavlu, B. Dachev, and E. Kanoulas. Million query track 2007 overview. In Proc. 16th Text REtrieval Conference, 2007.
2
 
3
 
4
5
 
6
C. Clarke, N. Craswell, and I. Soboroff. Overview of the trec 2004 terabyte track. In Proc. 13th Text REtrieval Conference, 2004.
 
7
N. Craswell and D. Hawking. Overview of the trec 2003 web track. In Proc. 12th Text REtrieval Conference, 2003.
 
8
9
 
10
E. Fox and J. Shaw. Combination of multiple searches. In Proc. 2nd Text REtrieval Conference, 1994.
11
12
13
14
15
16
17
18
19
 
20
21
22
23
24
25
26
 
27
28
29
 
30
K. Spärck Jones. Wearing proper combinations. Technical report, University of Cambridge, 2005.
 
31
32
 
33
H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC 13: Web and hard tracks. In Proc. 13th Text REtrieval Conference, 2004.
34

Collaborative Colleagues:
Donald Metzler: colleagues
Jasmine Novak: colleagues
Hang Cui: colleagues
Srihari Reddy: colleagues