|
ABSTRACT
Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the "ad-hoc" entity extraction task where entities of interest are constrained to be from a list of entities that is specific to the task. In such scenarios, traditional entity extraction techniques that process all the documents for each ad-hoc entity extraction task can be significantly expensive. In this paper, we propose an efficient approach that leverages the inverted index on the documents to identify the subset of documents relevant to the task and processes only those documents. We demonstrate the efficiency of our techniques on real datasets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Agichtein and L. Gravano. Querying text databases for efficient information extraction. In Proceedings of ICDE Conference, 2003.
|
| |
2
|
A. Agresti. An introduction, to categorical data analysis. Wiley, 2007.
|
 |
3
|
|
| |
4
|
D. E. Appelt and D. Israel. Introduction to Information Extraction Technology. IJCAI-99 Tutorial, 1999.
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
Oren Etzioni , Michael Cafarella , Doug Downey , Stanley Kok , Ana-Maria Popescu , Tal Shaked , Stephen Soderland , Daniel S. Weld , Alexander Yates, Web-scale information extraction in knowitall: (preliminary results), Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988687]
|
| |
12
|
|
 |
13
|
Panagiotis G. Ipeirotis , Eugene Agichtein , Pranay Jain , Luis Gravano, To search or to crawl?: towards a query optimizer for text-centric tasks, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142504]
|
| |
14
|
H. Jerry, R. Douglas, E. Appelt, J. Bear, D. Israel, M. Kameyama, M. Stickel, and M. Tyson. Fastus: A cascaded finite-state transducer for extracting information from natural-language text, 1996.
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
CITED BY 2
|
|
Wei Wang , Chuan Xiao , Xuemin Lin , Chengqi Zhang, Efficient approximate entity extraction with edit distance constraints, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
|
|