ACM Home Page
Please provide us with feedback. Feedback
Evaluating the performance of distributed architectures for information retrieval using a variety of workloads
Full text PdfPdf (254 KB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 18 ,  Issue 1  (January 2000) table of contents
Pages: 1 - 43  
Year of Publication: 2000
ISSN:1046-8188
Authors
Brendon Cahoon  Univ. of Massachusetts, Amherst, MA
Kathryn S. McKinley  Univ. of Massachusetts, Amherst, MA
Zhihong Lu  Village Networks, Hazlet, NJ
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 77,   Citation Count: 17
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/333135.333136
What is a DOI?

ABSTRACT

The information explosion across the Internet and elswhere offers access to an increasing number of document collections. In order for users to effectively access these collections, information retrieval (IR) systems must provide coordinated, concurrent, and distributed access. In this article, we explore how to achieve scalable performance in a distributed system for collection sizes ranging from 1GB to 128GB. We implement a fully functional distributed IR system based on a multithreaded version of the Inquery simulation model. We measure performance as a function of system parameters such as client command rate, number of document collections, ter ms per query, query term frequency, number of answers returned, and command mixture. Our results show that it is important to model both query and document commands because the heterogeneity of commands significantly impacts performance. Based on our results, we recommend simple changes to the prototype and evaluate the changes using the simulator. Because of the significant resource demands of information retrieval, it is not difficult to generate workloads that overwhelm system resources regardless of the architecture. However under some realistic workloads, we demonstrate system organizations for which response time gracefully degrades as the workload increases and performance scales with the number of processors. This scalable architecture includes a surprisingly small number of brokers through which a large number of clients and servers communicate.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
BAILEY, P. AND HAWKING, D. 1996. A parallel architecture for query processing over a terabyte of text. Tech. Rep. TR-CS-96-04. Department of Computer Science, Australian National Univ., Canberra, Australia.
 
2
 
3
BROWN, E. W. AND CHONG, H.A. 1998. The GURU system in TREC-7. In Proceedings of the 7th Text Retrieval Conference (TREC-7),
 
4
5
 
6
BURKOWSKI, F., CORMACK, G., CLARKE, C., AND GOOD, R. 1995. A global search architecture. Tech. Rep. CS-95-12. Computer Science Dept., University of Waterloo, Waterloo, Canada.
7
 
8
 
9
CALLAN, J. P., CROFT, W. B., AND HARDING, S. M. 1992. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert System Applications (Valencia, Spain, Sept.),
10
 
11
12
 
13
CROFT, W. B., COOK, R., AND WILDER, D. 1995. Providing government information on the Internet: Experiences with THOMAS. In Proceedings of the 2nd International Conference on Theory and Practice of Digital Libraries (DL '95, Austin, TX, June), 19-24.
 
14
CROWDER, G. AND NICHOLAS, C. 1995. An approach to large scale distributed information systems using statistical properties of text to guide agent search. In Proceedings of the CIKM Workshop on Intelligent Information Agents (Baltimore, MD, Dec.),
15
 
16
 
17
Fox, E.A. 1983. Characterization of two new experimental collections in computer and information science containing textual and bibliographic concepts. Tech. Rep. 83-561. Cornell University, Ithaca, NY.
18
19
 
20
HARMAN, D. K., Ed. 1992. Proceedings of the 1st Text Retrieval Conference. (TREC-1, Gaithersburg, MD, Nov.). National Institute of Standards and Technology, Gaithersburg, MD. NIST Special Publication 500-217.
 
21
 
22
 
23
HAWKING, D. AND THISTLEWAITE, P. 1997. Overview of the TREC-6 very large collection track. In Proceedings of the 6th Text Retreival Conference (TREC-6, Nov.), E. Voorhees and D. Harman, Eds.
 
24
HAWKING, D., CRASWELL, N., AND THISTLEWAITE, P. 1998. Overview of TREC-7 very large collection track. In Proceedings of the 7th Text Retrieval Conference (TREC-7),
 
25
 
26
JUMP, J. R. 1993. YACSIM reference manual. Version 2.1.1. Rice University, Houston, TX.
27
 
28
29
 
30
 
31
 
32
 
33
MOFFAT, n. AND ZOBEL, g. 1995. Information retrieval systems for large document collections. In Proceedings of the 3rd Text Retrieval Conference (TREC-3), D. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 500-525.
 
34
POGUE, C. A. AND WILLETT, P. 1987. Use of text signatures for document retrieval in a highly parallel environment. Parallel Comput. 4, 3 (June), 259-268.
 
35
SCHATZ, B. R. 1990. Interactive retrieval in information spaces distributed across a wide-area network. TR 90-35. Department of Computer Science, University of Arizona, Tucson, AZ.
36
37
38
 
39
STONEBRAKER, M., WOODFILL, J., RANSTROM, J., KALASH, J., ARNOLD, K., AND ANDERSON, E. 1983. Performance analysis of distributed data base systems. In Proceedings of the 3rd Symposium on Reliability in Distributed Software and Database Systems (Clearwater Beach, FL, Oct.),
 
40
 
41
TOMASIC, A. AND GARCIA-MOLINA, H. 1992. Caching and database scaling in distributed shared-nothing information retrieval systems. Tech. Rep. STAN-CS-92-1456. Stanford University, Stanford, CA.
 
42
43
44
 
45
 
46
ZIPF, G. K. 1949. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Reading, MA.

CITED BY  17

Collaborative Colleagues:
Brendon Cahoon: colleagues
Kathryn S. McKinley: colleagues
Zhihong Lu: colleagues