|
ABSTRACT
The first part of this paper briefly describes a mathematical framework (called the containment model) that provides the operations and data structures for a text dominated database with a hierarchical structure. The database is considered to be a hierarchical collection of continuous extents each extent being a word, word phrase, text element or non-text element. The filter operations making up a search command are expressed in terms of containment criteria that specify whether a contiguous extent will be selected or rejected during a search. This formalism, comprised of the mathematical framework and its associated language, defines a conceptual layer upon which we can construct a well-defined higher level layer, specifically the user interface that serves to provide a level of functionality that is closer to the needs of the user and the application domain.
With the conceptual layer established, we go on to describe the design and implementation of a versatile interface which handles queries that search and navigate a heterogeneous collection of structured documents. Interface functionality is provided by a set of “worker” modules supported by an “environment” that is the same for all interfaces. The interface environment allows a worker to communicate with the underlying text retrieval engine using a well-defined command protocol that is based on a small set of filter operators. The overall design emphasizes: a) interface flexibility for a variety of search and browsing capabilities, b) the modular independence of the interface with respect to its underlying retrieval engine, and c) the advantages to be accrued by defining retrieval commands using operators that are part of a text algebra that provides a sound theoretical foundation for the database.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Peter G. Anick , Rex A. Flynn , David R. Hanssen, Addressing the requirements of a dynamic corporate textual information base, Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, p.163-172, October 13-16, 1991, Chicago, Illinois, United States
[doi> 10.1145/122860.122876]
|
| |
2
|
ATA/AIA Subcommittee 89-9C (1990, June). CD-ROM lnterchangeability Standard - SFQL: Structured Full-Text Query Language, ATA Draft Standard 89-9C.SFQL2-R1-1990, version 2.0.
|
| |
3
|
|
| |
4
|
Cooper, W. S. (1983). Exploiting the maximum entropy principle to increase retrieval effectiveness. Journal of the American Society for Information Science, 34(1), 31-39.
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
Gonnet, G. H. (1987). Examples of PAT applied to the Oxford English Dictionary. Centre for the New Oxford English Dictionary, Univ. of Waterloo.
|
 |
10
|
|
| |
11
|
Harman, D. & Candela, G. (1990). Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society for Information Science, 41(8), 581-589.
|
 |
12
|
|
| |
13
|
International Standards Organization (1986, October). Information Processing - Text and office systems - Standard Generalized Markup Language (SGML), (ISO 8879), Geneva: ISO.
|
| |
14
|
International Standards Organization (1988, March). Information Processing - Text and office systems - Office Document Architecture (ODA) and Interchange Format, Part 1: introduction and general principles, (ISO 8613-1), Geneva: ISO.
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
Raymond, D. R. (1991). Reading source code. Technical Report TR 74.070, Centre for Advanced Studies, IBM Canada Ltd., Dept. 81/894, 895 Don Mills Road, North York, Ontario, M3C 1W3, Canada.
|
| |
20
|
|
 |
21
|
|
| |
22
|
|
 |
23
|
|
 |
24
|
Jean Tague , Airi Salminen , Charles McClellan, Complete formal model for information retrieval systems, Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, p.14-20, October 13-16, 1991, Chicago, Illinois, United States
[doi> 10.1145/122860.122862]
|
| |
25
|
|
 |
26
|
|
CITED BY 12
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Johan List , Vojkan Mihajlovic , Vojkan Mihajlovi , Georgina Ramírez , Arjen P. Vries , Djoerd Hiemstra , Henk Ernst Blok, TIJAH: Embracing IR Methods in XML Databases, Information Retrieval, v.8 n.4, p.547-570, December 2005
|
|
|
|
|
|
Lin-Ju Yeh , Hsiu-Hsen Yao , Yuan-Kuo Chen, SSQL: a semi-structured query language for SGML document retrievals, Proceedings of the 14th annual international conference on Systems documentation: Marshaling new technological forces: building a corporate, academic, and user-oriented triangle, p.221-228, October 19-22, 1996, Research Triangle Park, North Carolina, United States
|
|
|
|
|