|
ABSTRACT
Hyperlink recommendation evidence, that is, evidence based on the structure of a web's link graph, is widely exploited by commercial Web search systems. However there is little published work to support its popularity. Another form of query-independent evidence, URL-type, has been shown to be beneficial on a home page finding task. We compared the usefulness of these types of evidence on the home page finding task, combined with both content and anchor text baselines. Our experiments made use of five query sets spanning three corpora---one enterprise crawl, and the WT10g and VLC2 Web test collections.We found that, in optimal conditions, all of the query-independent methods studied (in-degree, URL-type, and two variants of PageRank) offered a better than random improvement on a content-only baseline. However, only URL-type offered a better than random improvement on an anchor text baseline. In realistic settings, for either baseline, only URL-type offered consistent gains. In combination with URL-type the anchor text baseline was more useful for finding popular home pages, but URL-type with content was more useful for finding randomly selected home pages. We conclude that a general home page finding system should combine evidence from document content, anchor text, and URL-type classification.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
 |
7
|
|
| |
8
|
DMOZ. n.d. Open directory project. See Web site www.dmoz.org.
|
| |
9
|
Excite. 2002. Excite. See Web site www.excite.com.
|
| |
10
|
FAST Search and Transfer, ASA. Personal communication. See Web site www.alltheweb.com.
|
| |
11
|
Google. 2003a. Google search appliance frequently asked questions. Available online at www.google.com/appliance/faq.html.
|
| |
12
|
Google. 2003b. Google search engine. See Web site www.google.com.
|
| |
13
|
Hawking, D. 2000. Overview of the TREC-9 Web Track. In Proceedings of TREC-9. Available online at trec.nist.gov/pubs/trec9/.
|
| |
14
|
Hawking, D. and Craswell, N. 2001. Overview of the TREC-2001 Web Track. In Proceedings of TREC-2001 (Gaithersburg, MD). Available online at trec.nist.gov/pubs/.
|
| |
15
|
Hawking, D., Craswell, N., and Griffiths, K. 2001. Which search engine is best at finding online services? In WWW10 Poster Proceedings (Hong Kong). Available online at www10.org/ cdrom/posters/1089.pdf.
|
| |
16
|
Hawking, D., Voorhees, E., Bailey, P., and Craswell, N. 1999. Overview of TREC-8 Web Track. In Proceedings of TREC-8. 131--150. Available online at trec.nist.gov/pubs/trec8/.
|
 |
17
|
|
| |
18
|
Looksmart. n.d. See Web site www.looksmart.com.
|
| |
19
|
Ng, A. Y., Zheng, A. X., and Jordan, M. I. 2001. Link analysis, eigenvectors, and stability. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01). ACM Press, New York, NY.
|
| |
20
|
Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The PageRank citation ranking: Bringing order to the Web. Tech. Rep. 1999-66, Stanford University Database Group. Stanford, CA. Available online at dbpubs.stanford.edu:8090/pub/1999-66.
|
| |
21
|
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., and Gatford, M. 1994. Okapi at TREC-3. In Proceedings of TREC-3. Available online at trec.nist.gov/pubs/trec3/.
|
| |
22
|
Savoy, J. and Rasolofo, Y. 2001. Report on the TREC-10 experiment: Distributed collections and entrypage searching. In TREC-2001 Notebook proceedings. Available online at trec.nist.gov/pubs/.
|
 |
23
|
|
| |
24
|
Travis, B. and Broder, A. 2001. Web search quality vs. informational relevance. In Proceedings of the Infonortics Search Engines Meeting. Boston. Available online at www.infonortics.com/ searchengines/sh01/slides-01/travis.html.
|
| |
25
|
Westerveld, T., Kraij, W., and Hiemstra, D. 2001. Retrieving Web pages using content, links, URLs and anchors. In TREC-2001 Notebook Proceedings. Available online at trec.nist. gov/pubs/.
|
| |
26
|
Yahoo!. n.d. Yahoo! directory service. See Web site www.yahoo.com.
|
CITED BY 8
|
|
|
|
|
David Hawking , Francis Crimmins , Nick Craswell , Trystan Upstill, How valuable is external link evidence when searching enterprise Webs?, Proceedings of the fifteenth Australasian database conference, p.77-84, January 01, 2004, Dunedin, New Zealand
|
|
|
|
|
|
|
|
|
|
|
|
Huaiyu Zhu , Sriram Raghavan , Shivakumar Vaithyanathan , Alexander Löser, Navigating the intranet with high precision, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
Thomaz Philippe C. Silva , Edleno Silva de Moura , João Marcos B. Cavalcanti , Altigran S. da Silva , Moisés Gomes de Carvalho , Marcos André Gonçalves, An evolutionary approach for combining different sources of evidence in search engines, Information Systems, v.34 n.2, p.276-289, April, 2009
|
|