| Toward best-effort information extraction |
| Full text |
Pdf
(538 KB)
|
Source
|
International Conference on Management of Data
archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
table of contents
Vancouver, Canada
SESSION: Research Session 21: Provenance, Integration and Extraction
table of contents
Pages 1031-1042
Year of Publication: 2008
ISBN:978-1-60558-102-6
|
|
Authors
|
|
Warren Shen
|
University of Wisconsin, Madison, WI, USA
|
|
Pedro DeRose
|
University of Wisconsin, Madison, WI, USA
|
|
Robert McCann
|
Microsoft, Redmond, WA, USA
|
|
AnHai Doan
|
University of Wisconsin, Madison, WI, USA
|
|
Raghu Ramakrishnan
|
Yahoo! Research, Santa Clara, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 39, Downloads (12 Months): 315, Citation Count: 3
|
|
|
ABSTRACT
Current approaches to develop information extraction (IE) programs have largely focused on producing precise IE results. As such, they suffer from three major limitations. First, it is often difficult to execute partially specified IE programs and obtain meaningful results, thereby producing a long "debug loop". Second, it often takes a long time before we can obtain the first meaningful result (by finishing and running a precise IE program), thereby rendering these approaches impractical for time-sensitive IE applications. Finally, by trying to write precise IE programs we may also waste a significant amount of effort, because an approximate result -- one that can be produced quickly -- may already be satisfactory in many IE settings. To address these limitations, we propose iFlex, an IE approach that relaxes the precise IE requirement to enable best-effort IE. In iFlex, a developer U uses a declarative language to quickly write an initial approximate IE program P with a possible-worlds semantics. Then iFlex evaluates P using an approximate query processor to quickly extract an approximate result. Next, U examines the result, and further refines P if necessary, to obtain increasingly more precise results. To refine P, U can enlist a next-effort assistant, which suggests refinements based on the data and the current version of P. Extensive experiments on real-world domains demonstrate the utility of the iFlex approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Agichtein and S. Sarawagi. Scalable information extraction and integration. In KDD-06.
|
| |
2
|
L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE-08.
|
| |
3
|
|
 |
4
|
|
| |
5
|
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In ACL-2002.
|
| |
6
|
|
| |
7
|
Pedro DeRose , Warren Shen , Fei Chen , AnHai Doan , Raghu Ramakrishnan, Building structured web community portals: a top-down, compositional, and incremental approach, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
| |
8
|
Y. Ding, D. W. Embley, and S. W. Liddle. Automatic creation and simplified querying of semantic Web content: An approach based on information-extraction ontologies. In ASWC-06.
|
| |
9
|
Y. Ding, D. W. Embley, and S. W. Liddle. Enriching OWL with instance recognition semantics for automated semantic annotation. In ER Workshops, 2007.
|
| |
10
|
Y. Ding, D. W. Embley, and S. W. Liddle. Enriching OWL with instance recognition semantics for automated semantic annotation. In ER Workshops, 2007.
|
 |
11
|
|
| |
12
|
|
 |
13
|
Georg Gottlob , Christoph Koch , Robert Baumgartner , Marcus Herzog , Sergio Flesca, The Lixto data extraction project: back and forth between theory and practice, Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 14-16, 2004, Paris, France
[doi> 10.1145/1055558.1055560]
|
| |
14
|
|
| |
15
|
Joseph M. Hellerstein , Ron Avnur , Andy Chou , Christian Hidber , Chris Olston , Vijayshankar Raman , Tali Roth , Peter J. Haas, Interactive Data Analysis: The Control Project, Computer, v.32 n.8, p.51-59, August 1999
[doi> 10.1109/2.781635]
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
W. Shen, P. DeRose, R. McCann, R. Ramakrishnan, and A. Doan. Towards best-effort information extraction. Technical report, 2008.
|
| |
21
|
|
CITED BY 3
|
|
|
|
|
AnHai Doan , Jeffrey F. Naughton , Raghu Ramakrishnan , Akanksha Baid , Xiaoyong Chai , Fei Chen , Ting Chen , Eric Chu , Pedro DeRose , Byron Gao , Chaitanya Gokhale , Jiansheng Huang , Warren Shen , Ba-Quy Vuong, Information extraction challenges in managing unstructured data, ACM SIGMOD Record, v.37 n.4, December 2008
|
|
|
Xiaoyong Chai , Ba-Quy Vuong , AnHai Doan , Jeffrey F. Naughton, Efficiently incorporating user feedback into information extraction and integration programs, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|