|
ABSTRACT
In information extraction, uncertainty is ubiquitous. For this reason, it is useful to provide users querying extracted data with explanations for the answers they receive. Providing the provenance for tuples in a query result partially addresses this problem, in that provenance can explain why a tuple is in the result of a query. However, in some cases explaining why a tuple is not in the result may be just as helpful. In this work we focus on providing provenance-style explanations for non-answers and develop a mechanism for providing this new type of provenance. Our experience with an information extraction prototype suggests that our approach can provide effective provenance information that can help a user resolve their doubts over non-answers to a query.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
GATE. http://gate.ac.uk/ie/annie.html.
|
| |
2
|
MALLET. http://mallet.cs.umass.edu.
|
| |
3
|
Computer Research Association. http://www.cra.org/.
|
| |
4
|
MinorThird. http://minorthird.sourceforge.net.
|
| |
5
|
|
| |
6
|
Deepavali Bhagwat , Laura Chiticariu , Wang-Chiew Tan , Gaurav Vijayvargiya, An annotation management system for relational databases, Proceedings of the Thirtieth international conference on Very large data bases, p.900-911, August 31-September 03, 2004, Toronto, Canada
|
| |
7
|
C. Binnig, D. Kossmann, E. Lo. Reverse Query Processing. In ICDE, 2007.
|
 |
8
|
Jihad Boulos , Nilesh Dalvi , Bhushan Mandhani , Shobhit Mathur , Chris Re , Dan Suciu, MYSTIQ: a system for finding more answers by using probabilities, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
[doi> 10.1145/1066157.1066277]
|
| |
9
|
|
| |
10
|
M. J. Cafarella, C. Re, D. Suciu, and O. Etzioni. Structured querying of web text data: A technical challenge. In CIDR, 2007.
|
| |
11
|
|
 |
12
|
|
| |
13
|
J. Chomicki. Consistent Query Answering: Five Easy Pieces. In ICDT, 2007.
|
| |
14
|
Eric Chu , Akanksha Baid , Ting Chen , AnHai Doan , Jeffrey Naughton, A relational approach to incrementally extracting and querying structure in unstructured data, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
| |
15
|
W. Cohen and A. McCallum. Information extraction from the web. In KDD, 2003.
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
Pedro DeRose , Warren Shen , Fei Chen , AnHai Doan , Raghu Ramakrishnan, Building structured web community portals: a top-down, compositional, and incremental approach, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
 |
20
|
|
| |
21
|
M. Garofalakis and D. Suciu. Special issue on probabilistic data management. In IEEE Data Engineering Bulletin, 2006.
|
 |
22
|
|
| |
23
|
M. Gubanov and P. A. Bernstein. Structural text search and comparison using automatically extracted schema. In WebDB, 2006.
|
| |
24
|
A. Jain, A. Doan, L. Gravano Optimizing SQL Queries over Text Databases In ICDE, 2008.
|
 |
25
|
|
 |
26
|
Panagiotis G. Ipeirotis , Eugene Agichtein , Pranay Jain , Luis Gravano, To search or to crawl?: towards a query optimizer for text-centric tasks, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142504]
|
| |
27
|
T. S. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Eng. Bull., 29(1), 2006.
|
| |
28
|
S. Sarawagi. Automation in information extraction and data integration. In VLDB, 2002.
|
 |
29
|
Warren Shen , Pedro DeRose , Robert McCann , AnHai Doan , Raghu Ramakrishnan, Toward best-effort information extraction, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
[doi> 10.1145/1376616.1376718]
|
| |
30
|
|
| |
31
|
D. Suciu. Managing imprecisions with probabilistic databases. In Twente Data Management, 2006.
|
| |
32
|
W. C. Tan. Research problems in data provenance. IEEE Data Eng. Bull., 27(4), 2004.
|
| |
33
|
D. Weld, F. Wu, E. Adar, S. Amershi, J. Fogarty, R. Hoffmann, K. Patel, M. Skinner Intelligence in Wikipedia In AAAI, 2008.
|
| |
34
|
J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.
|
| |
35
|
|
CITED BY 3
|
|
|
|
|
AnHai Doan , Jeffrey F. Naughton , Raghu Ramakrishnan , Akanksha Baid , Xiaoyong Chai , Fei Chen , Ting Chen , Eric Chu , Pedro DeRose , Byron Gao , Chaitanya Gokhale , Jiansheng Huang , Warren Shen , Ba-Quy Vuong, Information extraction challenges in managing unstructured data, ACM SIGMOD Record, v.37 n.4, December 2008
|
|
|
Xiaoyong Chai , Ba-Quy Vuong , AnHai Doan , Jeffrey F. Naughton, Efficiently incorporating user feedback into information extraction and integration programs, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|