ACM Home Page
Please provide us with feedback. Feedback
Comparing the performance of collection selection algorithms
Full text PdfPdf (668 KB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 21 ,  Issue 4  (October 2003) table of contents
Pages: 412 - 456  
Year of Publication: 2003
ISSN:1046-8188
Authors
Allison L. Powell  University of Virginia, Charlottesville, VA
James C. French  University of Virginia, Charlottesville, VA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 78,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/944012.944016
What is a DOI?

ABSTRACT

The proliferation of online information resources increases the importance of effective and efficient information retrieval in a multicollection environment. Multicollection searching is cast in three parts: collection selection (also referred to as database selection), query processing and results merging. In this work, we focus our attention on the evaluation of the first step, collection selection.In this article, we present a detailed discussion of the methodology that we used to evaluate and compare collection selection approaches, covering both test environments and evaluation measures. We compare the CORI, CVV and gGLOSS collection selection approaches using six test environments utilizing three document testbeds. We note similar trends in performance among the collection selection approaches, but the CORI approach consistently outperforms the other approaches, suggesting that effective collection selection can be achieved using limited information about each collection.The contributions of this work are both the assembled evaluation methodology as well as the application of that methodology to compare collection selection approaches in a standardized environment.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Araújo, M. D., Navarro, G., and Ziviani, N. 1997. Large text searching allowing errors. In Proceedings of the 4th South American Workshop on String Processing (WSP '97). 2--20.
3
4
 
5
Buckley, C. 1992. SMART version 11.0. ftp://ftp.cs.cornell.edu/pub/smart/.
6
 
7
Callan, J., Powell, A. L., French, J. C., and Connell, M. 2000. The effects of query-based sampling on automatic database selection algorithms. Tech. Rep. CMU-LTI-00-162, Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pa.
 
8
Callan, J. P., Croft, W. B., and Harding, S. M. 1992. The INQUERY Retrieval System. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications (DEXA'92). 78--83.
9
10
11
12
 
13
Dolin, R., Agrawal, D., Abbadi, E. E., and Pearlman, J. 1998. Using Automated Classification for Summarizing and Selecting Heterogeneous Information Sources. D-Lib Mag. http://www.dlib.org/dlib/january98/dolin/01dolin.html.
14
15
 
16
 
17
18
19
20
 
21
Gauch, S., Wang, G., and Gomez, M. 1996. ProFusion: Intelligent fusion from multiple, distributed search engines. J. Univ. Comput. 2, 9, 637--649.
 
22
23
 
24
25
26
 
27
Harman, D. K., Ed. 1995. Proceedings of the 4th Text Retrieval Conference (TREC-4). NIST Special Publication 500--236. Department of Commerce, National Institute of Standards and Technology, Gaithersburg, Md.
28
 
29
Ipeirotis, P. G. and Gravano, L. 2002. Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB 2002). 394--405.
30
31
 
32
Liu, K.-L., Yu, C., Meng, W., Wu, W., and Rishe, N. 1999. A statistical method for estimating the usefulness of text databases. Tech. rep., Department of EECS, University of Illinois at Chicago, Chicago, Ill.
 
33
Lu, Z., Callan, J. P., and Croft, W. B. 1996. Measures in collection ranking evaluation. Tech. Rep. TR-96-39, Computer Science Department, University of Massachusetts.
 
34
 
35
 
36
Moffat, A. and Zobel, J. 1995. Information retrieval systems for large document collections. In Proceedings of the 3rd Text Retrieval Conference (TREC-3). 85--94.
 
37
38
 
39
40
 
41
42
 
43
Voorhees, E., Gupta, N. K., and Johnson-Laird, B. 1994. The collection fusion problem. In Proceedings of the 3rd Text REtrieval Conference (TREC-3). 95--104.
44
 
45
Voorhees, E. M. 1995. Siemens TREC-4 Report: Further Experiments with Database Merging. In Proceedings of the 4th Text REtrieval Conference (TREC-4). 121--130.
46
47
48
49
 
50
51
 
52
 
53
Zobel, J. 1997. Collection selection via lexicon inspection. In Proceedings of the 2nd Australian Document Computing Symposium. 74--80.

CITED BY  13

Collaborative Colleagues:
Allison L. Powell: colleagues
James C. French: colleagues