ACM Home Page
Please provide us with feedback. Feedback
The Importance of Prior Probabilities for Entry Page Search
Full text PdfPdf (136 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Tampere, Finland
SESSION: Web Information Retrieval table of contents
Pages: 27 - 34  
Year of Publication: 2002
ISBN:1-58113-561-0
Authors
Wessel Kraaij  TNO TPD, The Netherlands
Thijs Westerveld  University of Twente
Djoerd Hiemstra  University of Twente
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 77,   Citation Count: 44
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/564376.564383
What is a DOI?

ABSTRACT

An important class of searches on the world-wide-web has the goal to find an entry page (homepage) of an organisation. Entry page search is quite different from Ad Hoc search. Indeed a plain Ad Hoc system performs disappointingly. We explored three non-content features of web pages: page length, number of incoming links and URL form. Especially the URL form proved to be a good predictor. Using URL form priors we found over 70% of all entry pages at rank 1, and up to 89% in the top 10. Non-content features can easily be embedded in a language model framework as a prior probability.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
 
5
6
 
7
N. Craswell, D. Hawking, R. Wilkinson, and M. Wu. TREC10 web and interactive tracks at CSIRO. In Voorhees and Harman TREC10, pages 261--268.
 
8
F. Crivellari and M. Melucci. Web document retrieval using passage retrieval, connectivity information, and automatic link weighting - TREC-9 report. In Voorhees and Harman, pages 611--620.
 
9
W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors. Proceedings of the 23rd Annual International Conference on Research and Development in Information Retrieval, 2000.
10
11
 
12
C. Gurrin and A. F. Smeaton. Dublin city university experiments in connectivity analysis for TREC-9. In Voorhees and Harman TREC9, pages 179--188.
 
13
D. Hawking. Overview of the TREC-9 web track. In Voorhees and Harman TREC9, pages 87--102.
 
14
D. Hawking and N. Craswell. Overview of the TREC-2001 web track. In Voorhees and Harman TREC10, pages 25--31.
 
15
D. Hawking, E. voorhees, N. Craswell, and P. Bailey. Overview of the TREC-8 web track. In Voorhees and Harman TREC8, pages 131--148.
 
16
D. Hiemstra. Using language models for information retrieval. PhD thesis, Centre for Telematics and Information Technology, University of Twente, 2001.
 
17
D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Voorhees and Harman TREC7, pages 227--238.
18
 
19
W. Kraaij, M. Spitters, and M. van der Heijden. Combining a mixture language model and naive bayes for multi-document summarisation. In Working notes of the DUC2001 workshop (SIGIR2001), New Orleans, 2001.
 
20
W. Kraaij and T. Westerveld. TNO/UT at TREC-9: How different are web documents? In Voorhees and Harman TREC9, pages 665--671.
 
21
J. Lafferty and C. Zhai. Probabilistic IR models based on document and query generation. In J. Callan, B. Croft, and J. Lafferty, editors, Proceedings of the workshop on Language Modeling and Informati on Retrieval, 2001.
 
22
 
23
D. R. H. Miller, T. Leek, and R. M. Schwartz. BBN at TREC-7: using hidden markov models for information retrieval. In Voorhees and Harman TREC7, pages 133--142.
 
24
25
26
 
27
D.-Y. Ra, E.-K. Park, and J.-S. Jang. Yonsei/etri at TREC-10: Utilizing web document properties. In Voorhees and Harman TREC10, pages 643--650.
 
28
S. Robertson and K. Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146, 1976.
 
29
 
30
 
31
J. Savoy and Y. Rasolofo. Report on the TREC-9 experiment: Link-based retrieval and distributed collections. In Voorhees and Harman TREC9, pages 579--588.
 
32
J. Savoy and Y. Rasolofo. Report on the TREC-10 experiment: Distributed collections and entrypage searching. In Voorhees and Harman TREC10, pages 578--590.
33
 
34
E. M. Voorhees and D. K. Harman, editors. The Seventh Text Retrieval Conference (TREC7), volume 7. National Institute of Standards and Technology, NIST, 1999.
 
35
E. M. Voorhees and D. K. Harman, editors. The Eighth Text Retrieval Conference (TREC8), volume 8. National Institute of Standards and Technology, NIST, 2000.
 
36
E. M. Voorhees and D. K. Harman, editors. The Ninth Text Retrieval Conference (TREC9), volume 9. National Institute of Standards and Technology, NIST, 2001.
 
37
E. M. Voorhees and D. K. Harman, editors. The Tenth Text Retrieval Conference (TREC-2001), volume 10. National Institute of Standards and Technology, NIST, 2002.
 
38
T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, URL's and anchors. In Voorhees and Harman TREC10, pages 52--61.
 
39
W. Xi and E. A. Fox. Machine learning approaches for homepage finding tasks at TREC-10. In Voorhees and Harman TREC10, pages 633--642.
40

CITED BY  44

Collaborative Colleagues:
Wessel Kraaij: colleagues
Thijs Westerveld: colleagues
Djoerd Hiemstra: colleagues