ACM Home Page
Please provide us with feedback. Feedback
Query-independent evidence in home page finding
Full text PdfPdf (258 KB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 21 ,  Issue 3  (July 2003) table of contents
Pages: 286 - 313  
Year of Publication: 2003
ISSN:1046-8188
Authors
Trystan Upstill  Australian National University, Canberra, Australia
Nick Craswell  CSIRO Mathematical and Information Sciences, Canberra, Australia
David Hawking  CSIRO Mathematical and Information Sciences, Canberra, Australia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 71,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/858476.858479
What is a DOI?

ABSTRACT

Hyperlink recommendation evidence, that is, evidence based on the structure of a web's link graph, is widely exploited by commercial Web search systems. However there is little published work to support its popularity. Another form of query-independent evidence, URL-type, has been shown to be beneficial on a home page finding task. We compared the usefulness of these types of evidence on the home page finding task, combined with both content and anchor text baselines. Our experiments made use of five query sets spanning three corpora---one enterprise crawl, and the WT10g and VLC2 Web test collections.We found that, in optimal conditions, all of the query-independent methods studied (in-degree, URL-type, and two variants of PageRank) offered a better than random improvement on a content-only baseline. However, only URL-type offered a better than random improvement on an anchor text baseline. In realistic settings, for either baseline, only URL-type offered consistent gains. In combination with URL-type the anchor text baseline was more useful for finding popular home pages, but URL-type with content was more useful for finding randomly selected home pages. We conclude that a general home page finding system should combine evidence from document content, anchor text, and URL-type classification.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
5
 
6
7
 
8
DMOZ. n.d. Open directory project. See Web site www.dmoz.org.
 
9
Excite. 2002. Excite. See Web site www.excite.com.
 
10
FAST Search and Transfer, ASA. Personal communication. See Web site www.alltheweb.com.
 
11
Google. 2003a. Google search appliance frequently asked questions. Available online at www.google.com/appliance/faq.html.
 
12
Google. 2003b. Google search engine. See Web site www.google.com.
 
13
Hawking, D. 2000. Overview of the TREC-9 Web Track. In Proceedings of TREC-9. Available online at trec.nist.gov/pubs/trec9/.
 
14
Hawking, D. and Craswell, N. 2001. Overview of the TREC-2001 Web Track. In Proceedings of TREC-2001 (Gaithersburg, MD). Available online at trec.nist.gov/pubs/.
 
15
Hawking, D., Craswell, N., and Griffiths, K. 2001. Which search engine is best at finding online services? In WWW10 Poster Proceedings (Hong Kong). Available online at www10.org/ cdrom/posters/1089.pdf.
 
16
Hawking, D., Voorhees, E., Bailey, P., and Craswell, N. 1999. Overview of TREC-8 Web Track. In Proceedings of TREC-8. 131--150. Available online at trec.nist.gov/pubs/trec8/.
17
 
18
Looksmart. n.d. See Web site www.looksmart.com.
 
19
Ng, A. Y., Zheng, A. X., and Jordan, M. I. 2001. Link analysis, eigenvectors, and stability. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01). ACM Press, New York, NY.
 
20
Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The PageRank citation ranking: Bringing order to the Web. Tech. Rep. 1999-66, Stanford University Database Group. Stanford, CA. Available online at dbpubs.stanford.edu:8090/pub/1999-66.
 
21
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., and Gatford, M. 1994. Okapi at TREC-3. In Proceedings of TREC-3. Available online at trec.nist.gov/pubs/trec3/.
 
22
Savoy, J. and Rasolofo, Y. 2001. Report on the TREC-10 experiment: Distributed collections and entrypage searching. In TREC-2001 Notebook proceedings. Available online at trec.nist.gov/pubs/.
23
 
24
Travis, B. and Broder, A. 2001. Web search quality vs. informational relevance. In Proceedings of the Infonortics Search Engines Meeting. Boston. Available online at www.infonortics.com/ searchengines/sh01/slides-01/travis.html.
 
25
Westerveld, T., Kraij, W., and Hiemstra, D. 2001. Retrieving Web pages using content, links, URLs and anchors. In TREC-2001 Notebook Proceedings. Available online at trec.nist. gov/pubs/.
 
26
Yahoo!. n.d. Yahoo! directory service. See Web site www.yahoo.com.

CITED BY  8

Collaborative Colleagues:
Trystan Upstill: colleagues
Nick Craswell: colleagues
David Hawking: colleagues