|
ABSTRACT
The bit-sliced index (BSI) was originally defined in [ONQ97]. The current paper introduces the concept of BSI arithmetic. For any two BSI's X and Y on a table T, we show how to efficiently generate new BSI's Z, V, and W, such that Z = X + Y, V = X - Y, and W = MIN(X, Y); this means that if a row r in T has a value x represented in BSI X and a value y in BSI Y, the value for r in BSI Z will be x + y, the value in V will be x - y and the value in W will be MIN(x, y). Since a bitmap representing a set of rows is the simplest bit-sliced index, BSI arithmetic is the most straightforward way to determine multisets of rows (with duplicates) resulting from the SQL clauses UNION ALL (addition), EXCEPT ALL (subtraction), and INTERSECT ALL (min) (see [OO00, DB2SQL] for definitions of these clauses). Another contribution of the current paper is to generalize BSI range restrictions from [ONQ97] to a new non-Boolean form: to determine the top k BSI-valued rows, for ally meaningful value k between one and the total number of rows in T. Together with bit-sliced addition, this permits us to solve a common basic problem of text retrieval: given an object-relational table T of rows representing documents, with a collection type column K representing keyword terms, we demonstrate an efficient algorithm to find k documents that share the largest number of terms with some query list Q of terms. A great deal of published work on such problems exists in the Information Retrieval (IR) field. The algorithm we introduce, which we call Bit-Sliced Term-Matching, or BSTM, uses an approach comparable in performance to the most efficient known IR algorithm, a major improvement on current DBMS text searching algorithms, with the advantage that it uses only indexing we propose for native database operations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
BBG+95
|
Hal Berenson , Phil Bernstein , Jim Gray , Jim Melton , Elizabeth O'Neil , Patrick O'Neil, A critique of ANSI SQL isolation levels, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.1-10, May 22-25, 1995, San Jose, California, United States
|
 |
BMWZ95
|
|
| |
BYTE97
|
Byte Magazine, Ann O'Leary. Managing Mission- Critical Text {with ORACLE ConText Cartridge}. http://www.byte.com/art/9709/sec4/art1.htm
|
 |
CHI98
|
|
 |
CHI99
|
|
| |
DB2SQL
|
Full Select syntax in DB2 SQL Reference Text, http://www.csa.ru/dblab/DB2/db2s0/fullslt.htm
|
| |
HARM92
|
D. K. Harmon, Ed. Proceedings of TREC for Text Retrieval Conference (Washington D.C., Nov. 1992) National Institute of Standards Special Publication 500-207. NIST, Washington, D.C.
|
| |
JAC95
|
K. Jacobs, with contributors: R. Bamford, G. Doherty, K. Haas, M. Holt, F. Putzolu, B. Quigley. Concurrency Control: Transaction Isolation and Serializability in SQL92 and Oracle7. Oracle White Paper, Part No. A33745, July, 1995.
|
 |
KZS99
|
|
| |
MANO95
|
|
| |
MURT82
|
Fionn Murtagh. A Very Fast, Exact Nearest Neighbour Algorithm for use in Information Retrieval. Information Technology: Research and Development 1982, Vol. 1, Pages 275-283.
|
| |
MURT99
|
|
 |
MZ96
|
|
| |
ON87
|
|
 |
ONQ97
|
|
| |
OO00
|
|
| |
PCW97
|
PCWEEK ONLINE, Timothy Dyck. ConText Gets faster and friendlier. {Also discusses DB2 Text Extender.} http://www8.zdnet.com/eweek/reviews/0414/14cont.html
|
| |
PW83
|
Shirley A. Perry and Peter Willet. A Review of the use of Inverted Files for Best Match Searching in Information Retrieval Systems. J. of Information Science 6 (1083) 59-66.
|
| |
SALT89
|
|
| |
VH96
|
E. Voorhees and D. Harmon. Overview of the Fifth Text REtrieval Conference (TREC-5). Proceedings of the 5th Text Retrieval Conference, Nov. 1996, NIST, http://trec.nist.gov/pubs.html
|
| |
WMB94
|
|
| |
WUB98
|
|
 |
WU99
|
|
 |
ZM98
|
|
|