ACM Home Page
Please provide us with feedback. Feedback
Improving Web search efficiency via a locality based static pruning method
Full text PdfPdf (175 KB)
Source International World Wide Web Conference archive
Proceedings of the 14th international conference on World Wide Web table of contents
Chiba, Japan
SESSION: Indexing and querying table of contents
Pages: 235 - 244  
Year of Publication: 2005
ISBN:1-59593-046-9
Authors
Edleno S. de Moura  Federal University of Amazonas, Brazil
Célia F. dos Santos  Federal University of Amazonas, Brazil
Daniel R. Fernandes  Federal University of Amazonas, Brazil
Altigran S. Silva  Federal University of Amazonas, Brazil
Pavel Calado  INESC-ID, Portugal
Mario A. Nascimento  University of Alberta, Canada
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 50,   Citation Count: 14
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1060745.1060783
What is a DOI?

ABSTRACT

The unarguably fast, and continuous, growth of the volume of indexed (and indexable) documents on the Web poses a great challenge for search engines. This is true regarding not only search effectiveness but also time and space efficiency. In this paper we present an index pruning technique targeted for search engines that addresses the latter issue without disconsidering the former. To this effect, we adopt a new pruning strategy capable of greatly reducing the size of search engine indices. Experiments using a real search engine show that our technique can reduce the indices' storage costs by up to 60% over traditional lossless compression methods, while keeping the loss in retrieval precision to a minimum. When compared to the indices size with no compression at all, the compression rate is higher than 88%, i.e., less than one eighth of the original size. More importantly, our results indicate that, due to the reduction in storage overhead, query processing time can be reduced to nearly 65% of the original time, with no loss in average precision. The new method yields significative improvements when compared against the best known static pruning method for search engine indices. In addition, since our technique is orthogonal to the underlying search algorithms, it can be adopted by virtually any search engine.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
4
5
6
7
 
8
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of American Society for Information Science, 41(6), 1990.
 
9
10
 
11
D. Hawking, N. Craswell, and P. B. Thistlewaite. Overview of TREC-7 very large collection track. In The Seventh Text REtrieval Conference (TREC-7), pages 91--104, Gaithersburg, Maryland, USA, November 1998.
 
12
 
13
D. Hawking, E. Voorhees, P. Bailey, and N. Craswell. Overview of trec-8 web track. In Proc. of TREC-8, pages 131--150, Gaithersburg MD, November 1999.
 
14
15
 
16
 
17
 
18
19
 
20
I. Witten, A. Moffat, and T. Bell. Managing Gigabytes. Morgan Kaufmann Publishers, New York, second edition, 1999.

CITED BY  14

Collaborative Colleagues:
Edleno S. de Moura: colleagues
Célia F. dos Santos: colleagues
Daniel R. Fernandes: colleagues
Altigran S. Silva: colleagues
Pavel Calado: colleagues
Mario A. Nascimento: colleagues