ACM Home Page
Please provide us with feedback. Feedback
Scalable ad-hoc entity extraction from text collections
Full text PdfPdf (1.45 MB)
Source
Proceedings of the VLDB Endowment archive
Volume 1 ,  Issue 1  (August 2008) table of contents
SESSION: Text and keyword query processing table of contents
Pages 945-957  
Year of Publication: 2008
ISSN:2150-8097
Authors
Sanjay Agrawal  Microsoft Research
Kaushik Chakrabarti  Microsoft Research
Surajit Chaudhuri  Microsoft Research
Venkatesh Ganti  Microsoft Research
Publisher
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 74,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1453856.1453958
What is a DOI?

ABSTRACT

Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the "ad-hoc" entity extraction task where entities of interest are constrained to be from a list of entities that is specific to the task. In such scenarios, traditional entity extraction techniques that process all the documents for each ad-hoc entity extraction task can be significantly expensive. In this paper, we propose an efficient approach that leverages the inverted index on the documents to identify the subset of documents relevant to the task and processes only those documents. We demonstrate the efficiency of our techniques on real datasets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
E. Agichtein and L. Gravano. Querying text databases for efficient information extraction. In Proceedings of ICDE Conference, 2003.
 
2
A. Agresti. An introduction, to categorical data analysis. Wiley, 2007.
3
 
4
D. E. Appelt and D. Israel. Introduction to Information Extraction Technology. IJCAI-99 Tutorial, 1999.
 
5
6
7
 
8
9
10
11
 
12
13
 
14
H. Jerry, R. Douglas, E. Appelt, J. Bear, D. Israel, M. Kameyama, M. Stickel, and M. Tyson. Fastus: A cascaded finite-state transducer for extracting information from natural-language text, 1996.
 
15
16
17
 
18


Collaborative Colleagues:
Sanjay Agrawal: colleagues
Kaushik Chakrabarti: colleagues
Surajit Chaudhuri: colleagues
Venkatesh Ganti: colleagues