ACM Home Page
Please provide us with feedback. Feedback
An efficient filter for approximate membership checking
Full text PdfPdf (387 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data table of contents
Vancouver, Canada
SESSION: Research Session 17: Probabilistic II table of contents
Pages 805-818  
Year of Publication: 2008
ISBN:978-1-60558-102-6
Authors
Kaushik Chakrabarti  Microsoft Research, Redmond, WA, USA
Surajit Chaudhuri  Microsoft Research, Redmond, WA, USA
Venkatesh Ganti  Microsoft Research, Redmond, WA, USA
Dong Xin  Microsoft Research, Redmond, WA, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 34,   Downloads (12 Months): 301,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1376616.1376697
What is a DOI?

ABSTRACT

We consider the problem of identifying sub-strings of input text strings that approximately match with some member of a potentially large dictionary. This problem arises in several important applications such as extracting named entities from text documents and identifying biological concepts from biomedical literature. In this paper, we develop a filter-verification framework, and propose a novel in-memory filter structure. That is, we first quickly filter out sub-strings that cannot match with any dictionary member, and then verify the remaining sub-strings against the dictionary. Our method does not produce false negatives. We demonstrate the efficiency and effectiveness of our filter over real datasets, and show that it significantly outperforms the previous best-known methods in terms of both filtering power and computation time.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
 
5
6
 
7
 
8
 
9
10
 
11
 
12
 
13
T. H. Haveliwala, A. Gionis, and P. Indyk. Scalable techniques for clustering the web. In WebDB (Informal Proceedings), pages 129--134, 2000.
 
14
 
15
M. Narayanan and R. M. Karp. Gapped local similarity search with provable guarantees. In WABI, pages 74--86, 2004.
16
 
17
A. Singhal. Modern information retrieval: A brief overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 24(4):35--43, 2001.
 
18
 
19
A. C.-C. Yao and F. F. Yao. Dictionary loop-up with small errors. In CPM, pages 387--394, 1995.
 
20
X. Zhou, X. Zhang, and X. Hu. Maxmatcher: Biological concept extraction using approximate dictionary lookup. In PRICAI, pages 1145--1149, 2006.


Collaborative Colleagues:
Kaushik Chakrabarti: colleagues
Surajit Chaudhuri: colleagues
Venkatesh Ganti: colleagues
Dong Xin: colleagues