ACM Home Page
Please provide us with feedback. Feedback
Bit-sliced index arithmetic
Full text PdfPdf (183 KB)
Source ACM SIGMOD Record archive
Volume 30 ,  Issue 2  (June 2001) table of contents
Pages: 47 - 57  
Year of Publication: 2001
ISSN:0163-5808
Also published in ...
Authors
Denis Rinfret  UMass/Boston, Dept. of CS, UMass/Boston, Boston, MA
Patrick O'Neil  UMass/Boston & Microsoft Research, Dept. of CS, UMass/Boston, Boston, MA
Elizabeth O'Neil
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 54,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/376284.375669
What is a DOI?

ABSTRACT

The bit-sliced index (BSI) was originally defined in [ONQ97]. The current paper introduces the concept of BSI arithmetic. For any two BSI's X and Y on a table T, we show how to efficiently generate new BSI's Z, V, and W, such that Z = X + Y, V = X - Y, and W = MIN(X, Y); this means that if a row r in T has a value x represented in BSI X and a value y in BSI Y, the value for r in BSI Z will be x + y, the value in V will be x - y and the value in W will be MIN(x, y). Since a bitmap representing a set of rows is the simplest bit-sliced index, BSI arithmetic is the most straightforward way to determine multisets of rows (with duplicates) resulting from the SQL clauses UNION ALL (addition), EXCEPT ALL (subtraction), and INTERSECT ALL (min) (see [OO00, DB2SQL] for definitions of these clauses). Another contribution of the current paper is to generalize BSI range restrictions from [ONQ97] to a new non-Boolean form: to determine the top k BSI-valued rows, for ally meaningful value k between one and the total number of rows in T. Together with bit-sliced addition, this permits us to solve a common basic problem of text retrieval: given an object-relational table T of rows representing documents, with a collection type column K representing keyword terms, we demonstrate an efficient algorithm to find k documents that share the largest number of terms with some query list Q of terms. A great deal of published work on such problems exists in the Information Retrieval (IR) field. The algorithm we introduce, which we call Bit-Sliced Term-Matching, or BSTM, uses an approach comparable in performance to the most efficient known IR algorithm, a major improvement on current DBMS text searching algorithms, with the advantage that it uses only indexing we propose for native database operations.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

BBG+95
BMWZ95
 
BYTE97
Byte Magazine, Ann O'Leary. Managing Mission- Critical Text {with ORACLE ConText Cartridge}. http://www.byte.com/art/9709/sec4/art1.htm
CHI98
CHI99
 
DB2SQL
Full Select syntax in DB2 SQL Reference Text, http://www.csa.ru/dblab/DB2/db2s0/fullslt.htm
 
HARM92
D. K. Harmon, Ed. Proceedings of TREC for Text Retrieval Conference (Washington D.C., Nov. 1992) National Institute of Standards Special Publication 500-207. NIST, Washington, D.C.
 
JAC95
K. Jacobs, with contributors: R. Bamford, G. Doherty, K. Haas, M. Holt, F. Putzolu, B. Quigley. Concurrency Control: Transaction Isolation and Serializability in SQL92 and Oracle7. Oracle White Paper, Part No. A33745, July, 1995.
KZS99
 
MANO95
 
MURT82
Fionn Murtagh. A Very Fast, Exact Nearest Neighbour Algorithm for use in Information Retrieval. Information Technology: Research and Development 1982, Vol. 1, Pages 275-283.
 
MURT99
MZ96
 
ON87
ONQ97
 
OO00
 
PCW97
PCWEEK ONLINE, Timothy Dyck. ConText Gets faster and friendlier. {Also discusses DB2 Text Extender.} http://www8.zdnet.com/eweek/reviews/0414/14cont.html
 
PW83
Shirley A. Perry and Peter Willet. A Review of the use of Inverted Files for Best Match Searching in Information Retrieval Systems. J. of Information Science 6 (1083) 59-66.
 
SALT89
 
VH96
E. Voorhees and D. Harmon. Overview of the Fifth Text REtrieval Conference (TREC-5). Proceedings of the 5th Text Retrieval Conference, Nov. 1996, NIST, http://trec.nist.gov/pubs.html
 
WMB94
 
WUB98
WU99
ZM98


Collaborative Colleagues:
Denis Rinfret: colleagues
Patrick O'Neil: colleagues
Elizabeth O'Neil: colleagues