ACM Home Page
Please provide us with feedback. Feedback
Shortest-substring retrieval and ranking
Full text PdfPdf (228 KB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 18 ,  Issue 1  (January 2000) table of contents
Pages: 44 - 78  
Year of Publication: 2000
ISSN:1046-8188
Authors
Charles L. A. Clarke  Univ. of Toronto, Toronto, Ont. Canada
Gordon V. Cormack  Univ. of Waterloo, Waterloo, Ont. Canada
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 80,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/333135.333137
What is a DOI?

ABSTRACT

We present a model for arbitrary passage retrieval using Boolean queries. The model is applied to the task of ranking documents, or other structural elements, in the order of their expected relevance. Features such as phrase matching, truncation, and stemming integrate naturally into the model. Properties of Boolean algebra are obeyed, and the exact-match semantics of Boolean retrieval are preserved. Simple inverted-list file structures provide an efficient implementation. Retrieval effectiveness is comparable to that of standard ranking techniques. Since global statistics are not used, the method is of particular value in distributed environments. Since ranking is based on arbitrary passages, the structural elements to be ranked may be specified at query time and do not need to be restricted to predefined elements.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
BOOKSTEIN, A. 1978. On the perils of merging Boolean and weighted retrieval systems. J. Am. Soc. Inf. Sci. 29, 3, 156-158.
 
4
BOOKSTEIN, n. 1980. Fuzzy requests: An approach to weighted Boolean searches. J. Am. Soc. Inf. Sci. 31, 4 (July), 240-247.
 
5
 
6
BUELL, D. A. AND KRAFT, D. H. 1981. Threshold values and Boolean retrieval systems. Inf. Process. Manage. 17, 3, 127-136.
 
7
 
8
9
 
10
CLARKE, C. L. A. AND CORMACK, G.V. 1996. Interactive substring retrieval. In Proceedings of the 5th Text Retrieval Conference (TREC-5, Gaithersburg, MD, Nov.), E. M. Voorhees and D. K. Harman, Eds. National Institute of Standards and Technology, Gaithersburg, MD, 267-277.
 
11
CLARKE, C. L. A., CORMACK, G. V., AND BURKOWSKI, F. J. 1994. Fast inverted indexes with on-line update. Tech. Rep. CS-94-40. Computer Science Dept., University of Waterloo, Waterloo, Canada.
 
12
CLARKE, C. L. A., CORMACK, G. V., AND BURKOWSKI, F. J. 1995a. An algebra for structured text search and a framework for its implementation. Comput. J. 38, 1, 43-56.
 
13
CLARKE, C. L. A., CORMACK, G. V., AND BURKOWSKI, F.J. 1995b. Schema-independent retrieval from heterogeneous structured text. In Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (Las Vegas, NV), 279-289.
 
14
CLARKE, C. L. A., CORMACK, G. V., AND BURKOWSKI, F.J. 1995c. Shortest substring ranking MultiText experiments for TREC-4. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 295-304.
 
15
 
16
CORMACK, G. V., CLARKE, C. L. A., PALMER, C. R., AND TO, S. S.-L. 1997. Passage based refinement. In Proceedings of the 6th Text Retreival Conference (TREC-6, Nov.), E. Voorhees and D. Harman, Eds. 303-319.
 
17
CORMACK, G., PALMER, C., VAN BIESBROUCK, M., AND CLARKE, C. 1998. Deriving very short queries for high precision and recall. In Proceedings of the 7th Text Retreival Conference (TREC-7),
18
 
19
HARMAN, D. K., Ed. 1995. Proceedings of the 4th Text Retrieval Conference. (TREC-4, Washington, D.C., Nov.). National Institute of Standards and Technology, Gaithersburg, MD.
 
20
HAWKING, D. AND THISTLEWAITE, P. 1995. Proximity operators--So near and yet so far. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 131-143.
 
21
HAWKING, D. AND THISTLEWAITE, P. 1996. Relevance weighting using distance between term occurrences. Tech. Rep. TR-CS-96-08. Department of Computer Science, Australian National Univ., Canberra, Australia. Available via http://cs.anu.edu.au/techreports/1996/ index.html.
 
22
HAWKING, D. AND THISTLEWAITE, P. 1997. Overview of the TREC-6 very large collection track. In Proceedings of the 6th Text Retreival Conference (TREC-6, Nov.), E. Voorhees and D. Harman, Eds.
 
23
HAWKING, D., CRASWELL, N., AND THISTLEWAITE, P. 1998. Overview of TREC-7 very large collection track. In Proceedings of the 7th Text Retrieval Conference (TREC-7),
 
24
HEARST, M.A. 1996. Improving full-text precision on short queries using simple constraints. In Proceedings of the 5th Annual Symposium on Document Analysis and Information Retrieval (Las Vegas, NV, Apr.),
25
 
26
JING, Y. AND CROFT, W. B. 1994. An association thesaurus for information retrieval. In Proceedings of the Intelligent Multimedia Information Retrieval Systems (RIAO '94, New York, NY), 146-160.
27
 
28
 
29
30
 
31
KERRE, E. E., ZENNER, R. B. R. C., AND DE CALUWE, R. M. M. 1986. The use of fuzzy set theory in information retrieval and databases: A survey. J. Am. Soc. Inf. Sci. 37, 5 (Sept.), 341-345.
 
32
 
33
KNAUS, D., MITTENDORF, E., SCH UBLE, P., AND SHERIDAN, P. 1995. Highlighting relevant passages for users of the interactive SPIDER retrieval system. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD.
 
34
35
 
36
 
37
 
38
MACLANE, S. AND BIRKOFF, G. 1967. Algebra. Macmillan, New York, NY.
 
39
MITRA, M., BUCKLEY, C., SINGHAL, A., AND CARDIE, C. 1997. An analysis of statistical and syntactic phrases. In Proceedings of the 1997 Intelligent Multimedia Information Retrieval Systems Conference (RIAO '97, Montreal, Canada, June), 200-214.
40
 
41
NOREAULT, T., KOLL, M., AND MCGILL, M. g. 1977. Automatic ranked output from Boolean searches in SIRE. J. Am. Soc. Inf. Sci. 28, 6, 333-339.
 
42
RADECKI, T. 1982. Reducing the perils of merging Boolean and weighted retrieval systems. J. Doc. 38, 1 (Sept.), 207-211.
 
43
ROBERTSON, S. E. 1978. On the nature of fuzz: A diatribe. J. Am. Soc. Inf. Sci. 29, 6, 304-307.
 
44
 
45
ROBERTSON, S. E., WALKER, S., JONES, S., HANCOCK-BEAULIEU, M. M., AND GATFORD, M. 1994. Okapi at TREC-3. In Proceedings of the 3rd Text Retrieval Conference (TREC-3, Nov.), 109-126.
46
 
47
48
49
 
50
SCHIETTECATTE, F. AND FLORANCE, V. 1995. Document retrieval using the MPS information server. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 401-419.
 
51
TAHANI, V. 1978. A fuzzy model of document retrieval systems. Inf. Process. Manage. 12, 3, 177-187.
 
52
TEASDALE, S. 1920. Flame and Shadow. Macmillan, New York, NY.
 
53
 
54
 
55
 
56
 
57
WALLER, W. G. AND KRAFT, D. H. 1979. A mathematical model for a weighted Boolean retrieval system. Inf. Process. Manage. 15, 5, 235-245.
 
58
 
59
WILKINSON, R. AND ZOBEL, g. 1994. Comparison of fragmentation schemes for document retrieval. In Proceedings of the 3rd Text Retrieval Conference (TREC-3, Nov.), 81-84.
60

CITED BY  9

Collaborative Colleagues:
Charles L. A. Clarke: colleagues
Gordon V. Cormack: colleagues