ACM Home Page
Please provide us with feedback. Feedback
Toward best-effort information extraction
Full text PdfPdf (538 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data table of contents
Vancouver, Canada
SESSION: Research Session 21: Provenance, Integration and Extraction table of contents
Pages 1031-1042  
Year of Publication: 2008
ISBN:978-1-60558-102-6
Authors
Warren Shen  University of Wisconsin, Madison, WI, USA
Pedro DeRose  University of Wisconsin, Madison, WI, USA
Robert McCann  Microsoft, Redmond, WA, USA
AnHai Doan  University of Wisconsin, Madison, WI, USA
Raghu Ramakrishnan  Yahoo! Research, Santa Clara, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 315,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1376616.1376718
What is a DOI?

ABSTRACT

Current approaches to develop information extraction (IE) programs have largely focused on producing precise IE results. As such, they suffer from three major limitations. First, it is often difficult to execute partially specified IE programs and obtain meaningful results, thereby producing a long "debug loop". Second, it often takes a long time before we can obtain the first meaningful result (by finishing and running a precise IE program), thereby rendering these approaches impractical for time-sensitive IE applications. Finally, by trying to write precise IE programs we may also waste a significant amount of effort, because an approximate result -- one that can be produced quickly -- may already be satisfactory in many IE settings.

To address these limitations, we propose iFlex, an IE approach that relaxes the precise IE requirement to enable best-effort IE. In iFlex, a developer U uses a declarative language to quickly write an initial approximate IE program P with a possible-worlds semantics. Then iFlex evaluates P using an approximate query processor to quickly extract an approximate result. Next, U examines the result, and further refines P if necessary, to obtain increasingly more precise results. To refine P, U can enlist a next-effort assistant, which suggests refinements based on the data and the current version of P. Extensive experiments on real-world domains demonstrate the utility of the iFlex approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
E. Agichtein and S. Sarawagi. Scalable information extraction and integration. In KDD-06.
 
2
L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE-08.
 
3
4
 
5
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In ACL-2002.
 
6
 
7
 
8
Y. Ding, D. W. Embley, and S. W. Liddle. Automatic creation and simplified querying of semantic Web content: An approach based on information-extraction ontologies. In ASWC-06.
 
9
Y. Ding, D. W. Embley, and S. W. Liddle. Enriching OWL with instance recognition semantics for automated semantic annotation. In ER Workshops, 2007.
 
10
Y. Ding, D. W. Embley, and S. W. Liddle. Enriching OWL with instance recognition semantics for automated semantic annotation. In ER Workshops, 2007.
11
 
12
13
 
14
 
15
16
 
17
 
18
 
19
 
20
W. Shen, P. DeRose, R. McCann, R. Ramakrishnan, and A. Doan. Towards best-effort information extraction. Technical report, 2008.
 
21


Collaborative Colleagues:
Warren Shen: colleagues
Pedro DeRose: colleagues
Robert McCann: colleagues
AnHai Doan: colleagues
Raghu Ramakrishnan: colleagues