| Pruning long documents for distributed information retrieval |
| Full text |
Pdf
(186 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the eleventh international conference on Information and knowledge management
table of contents
McLean, Virginia, USA
SESSION: Information retrieval 1
table of contents
Pages: 332 - 339
Year of Publication: 2002
ISBN:1-58113-492-4
|
|
Authors
|
|
Jie Lu
|
Carnegie Mellon University, Pittsburgh, PA
|
|
Jamie Callan
|
Carnegie Mellon University, Pittsburgh, PA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 36, Citation Count: 10
|
|
|
ABSTRACT
Query-based sampling is a method of discovering the contents of a text database by submitting queries to a search engine and observing the documents returned. In prior research sampled documents were used to build resource descriptions for automatic database selection, and to build a centralized sample database for query expansion and result merging. An unstated assumption was that the associated storage costs were acceptable.When sampled documents are long, storage costs can be large. This paper investigates methods of pruning long documents to reduce storage costs. The experimental results demonstrate that building resource descriptions and centralized sample databases from the pruned contents of sampled documents can reduce storage costs by 54-93% while causing only minor losses in the accuracy of distributed information retrieval.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Allan, J. Callan, M. Sanderson, J. Xu and S. Wegmann. INQUERY and TREC-7. In Proc. of the Seventh Text Retrieval Conference (TREC-7). 1999.
|
 |
2
|
James P. Callan , Zhihong Lu , W. Bruce Croft, Searching distributed collections with inference networks, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.21-28, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215328]
|
| |
3
|
J. Callan. Distributed information retrieval. W. B. Croft, editor, Advances in information retrieval, chapter 5, pages 127--150. Kluwer Academic Publishers, 2000.
|
 |
4
|
|
 |
5
|
David Carmel , Doron Cohen , Ronald Fagin , Eitan Farchi , Michael Herscovici , Yoelle S. Maarek , Aya Soffer, Static index pruning for information retrieval systems, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.43-50, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383958]
|
| |
6
|
N. Craswell, D. Hawking and P. Thistlewaite. Merging results from isolated search engines. In Proc. of the Tenth Australasian Database Conference. 1999.
|
| |
7
|
D. Harman, editor. Proc. of the Third Text Retrieval Conference (TREC-3). 1995.
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
|