ACM Home Page
Please provide us with feedback. Feedback
Fast generation of result snippets in web search
Full text PdfPdf (181 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Amsterdam, The Netherlands
SESSION: Summaries table of contents
Pages: 127 - 134  
Year of Publication: 2007
ISBN:978-1-59593-597-7
Authors
Andrew Turpin  RMIT University
Yohannes Tsegay  RMIT University
David Hawking  CSIRO ICT Centre
Hugh E. Williams  Microsoft Corporation
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 35,   Downloads (12 Months): 213,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1277741.1277766
What is a DOI?

ABSTRACT

The presentation of query biased document snippets as part of results pages presented by search engines has become an expectation of search engine users. In this paper we explore the algorithms and data structures required as part of a search engine to allow efficient generation of query biased snippets. We begin by proposing and analysing a document compression method that reduces snippet generation time by 58% over a baseline using the zlib compression library. These experiments reveal that finding documents on secondary storage dominates the total cost of generating snippets, and so caching documents in RAM is essential for a fast snippet generation process. Using simulation, we examine snippet generation performance for different size RAM caches. Finally we propose and analyse document reordering and compaction, revealing a scheme that increases the number of document cache hits with only a marginal affect on snippet quality. This scheme effectively doubles the number of documents that can fit in a fixed size cache.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
J.-L. Gailly and M. Adler. Zlib Compression Library. www.zlib.net. Accessed January 2007.
 
5
6
7
 
8
D. Hawking, Nick C., and Paul Thistlewaite. Overview of TREC-7 Very Large Collection Track. In Proc. of TREC-7, pages 91--104, November 1998.
 
9
10
 
11
S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:107--109, July 1999.
 
12
H. P. Luhn. The automatic creation of literature abstracts. IBM Journal, pages 159--165, April 1958.
 
13
I. Mani. Automatic Summarization, volume 3 of Natural Language Processing. John Benjamins Publishing Company, Amsterdam/Philadelphia, 2001.
 
14
15
 
16
17
18
 
19
20
21
 
22
H. E. Williams and J. Zobel. Compressing integers for fast ?le access. Comp. J., 42(3):193--201, 1999.
 
23
H. E. Williams and J. Zobel. Searchable words on the Web. International Journal on Digital Libraries, 5(2):99--105, April 2005.
 
24
 
25
The Zettair Search Engine. www.seg.rmit.edu.au/zettair. Accessed January 2007.

CITED BY  8

Collaborative Colleagues:
Andrew Turpin: colleagues
Yohannes Tsegay: colleagues
David Hawking: colleagues
Hugh E. Williams: colleagues