| Building enriched document representations using aggregated anchor text |
| Full text |
Pdf
(659 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
table of contents
Boston, MA, USA
SESSION: Web Retrieval I
table of contents
Pages 219-226
Year of Publication: 2009
ISBN:978-1-60558-483-6
|
|
Authors
|
|
Donald Metzler
|
Yahoo! Labs, Santa Clara, CA, USA
|
|
Jasmine Novak
|
Yahoo! Labs, Santa Clara, CA, USA
|
|
Hang Cui
|
Yahoo! Labs, Santa Clara, CA, USA
|
|
Srihari Reddy
|
Yahoo! Labs, Santa Clara, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 60, Downloads (12 Months): 170, Citation Count: 0
|
|
|
ABSTRACT
It is well known that anchor text plays a critical role in a variety of search tasks performed over hypertextual domains, including enterprise search, wiki search, and web search. It is common practice to enrich a document's standard textual representation with all of the anchor text associated with its incoming hyperlinks. However, this approach does not help match relevant pages with very few inlinks. In this paper, we propose a method for overcoming anchor text sparsity by enriching document representations with anchor text that has been aggregated across the hyperlink graph. This aggregation mechanism acts to smooth, or diffuse, anchor text within a domain. We rigorously evaluate our proposed approach on a large web search test collection. Our results show the approach significantly improves retrieval effectiveness, especially for longer, more difficult queries.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Allan, B. Carterette, J.A. Aslam, V. Pavlu, B. Dachev, and E. Kanoulas. Million query track 2007 overview. In Proc. 16th Text REtrieval Conference, 2007.
|
 |
2
|
Jing Bai , Yi Chang , Hang Cui , Zhaohui Zheng , Gordon Sun , Xin Li, Investigation of partial query proximity in web search, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
[doi> 10.1145/1367497.1367717]
|
| |
3
|
|
| |
4
|
Andrei Broder , Ravi Kumar , Farzin Maghoul , Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata , Andrew Tomkins , Janet Wiener, Graph structure in the Web, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.33 n.1-6, p.309-320, June 2000
|
 |
5
|
Chris Burges , Tal Shaked , Erin Renshaw , Ari Lazier , Matt Deeds , Nicole Hamilton , Greg Hullender, Learning to rank using gradient descent, Proceedings of the 22nd international conference on Machine learning, p.89-96, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102363]
|
| |
6
|
C. Clarke, N. Craswell, and I. Soboroff. Overview of the trec 2004 terabyte track. In Proc. 13th Text REtrieval Conference, 2004.
|
| |
7
|
N. Craswell and D. Hawking. Overview of the trec 2003 web track. In Proc. 12th Text REtrieval Conference, 2003.
|
| |
8
|
|
 |
9
|
|
| |
10
|
E. Fox and J. Shaw. Combination of multiple searches. In Proc. 2nd Text REtrieval Conference, 1994.
|
 |
11
|
|
 |
12
|
V. Harmandas , M. Sanderson , M. D. Dunlop, Image retrieval by hypertext links, Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, p.296-303, July 27-31, 1997, Philadelphia, Pennsylvania, United States
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
| |
20
|
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
 |
24
|
|
 |
25
|
|
 |
26
|
Tao Qin , Tie-Yan Liu , Xu-Dong Zhang , Zheng Chen , Wei-Ying Ma, A study of relevance propagation for web search, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076105]
|
| |
27
|
|
 |
28
|
|
 |
29
|
|
| |
30
|
K. Spärck Jones. Wearing proper combinations. Technical report, University of Cambridge, 2005.
|
| |
31
|
Tao Tao , Xuanhui Wang , Qiaozhu Mei , ChengXiang Zhai, Language model information retrieval with document expansion, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.407-414, June 04-09, 2006, New York, New York
[doi> 10.3115/1220835.1220887]
|
 |
32
|
|
| |
33
|
H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC 13: Web and hard tracks. In Proc. 13th Text REtrieval Conference, 2004.
|
 |
34
|
|
|