|
ABSTRACT
As humans, we have expectations for the results of any action, e.g. we expect at least one student to be returned when we query a university database for student records. When these expectations are not met, traditional database users often explore datasets via a series of slightly altered SQL queries. Yet most database access is via limited interfaces that deprive end users of the ability to alter their query in any way to garner better understanding of the dataset and result set. Users are unable to question why a particular data item is Not in the result set of a given query. In this work, we develop a model for answers to WHY NOT? queries. We show through a user study the usefulness of our answers, and describe two algorithms for finding the manipulation that discarded the data item of interest. Moreover, we work through two different methods for tracing the discarded data item that can be used with either algorithm. Using our algorithms, it is feasible for users to find the manipulation that excluded the data item of interest, and can eliminate the need for exhausting debugging.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Deepavali Bhagwat , Laura Chiticariu , Wang-Chiew Tan , Gaurav Vijayvargiya, An annotation management system for relational databases, Proceedings of the Thirtieth international conference on Very large data bases, p.900-911, August 31-September 03, 2004, Toronto, Canada
|
| |
3
|
Shawn Bowers, Timothy McPhillips, Martin Wu, and Bertram LudÃd'scher. Project histories: Managing data provenance across collection-oriented scientific workflow runs. In DILS, pages 27--29, 2007.
|
 |
4
|
|
| |
5
|
|
 |
6
|
Steven P. Callahan , Juliana Freire , Emanuele Santos , Carlos E. Scheidegger , Cláudio T. Silva , Huy T. Vo, VisTrails: visualization meets data management, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142574]
|
| |
7
|
Adriane Chapman , H. V. Jagadish, Provenance and the Price of Identity, Provenance and Annotation of Data and Processes: Second International Provenance and Annotation Workshop, IPAW 2008, Salt Lake City, UT, USA, June 17-18, 2008. Revised Selected Papers, Springer-Verlag, Berlin, Heidelberg, 2008
[doi> 10.1007/978-3-540-89965-5_12]
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
Susan Davidson, Sarah Cohen-Boulakia, Anat Eyal, Bertram Ludascher, Timothy McPhillips, Shawn Bowers, and Juliana Freire. Provenance in scientific workflow systems. IEEE Data Engineering Bulletin, 32(4):44--50, 2007.
|
| |
12
|
Ian T. Foster , Jens-S. Vöckler , Michael Wilde , Yong Zhao, Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation, Proceedings of the 14th International Conference on Scientific and Statistical Database Management, p.37-46, July 24-26, 2002
[doi> 10.1109/SSDM.2002.1029704]
|
| |
13
|
P. Groth , S. Miles , Weijian Fang , S. C. Wong , K.-P. Zauner , L. Moreau, Recording and using provenance in a protein compressibility experiment, Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium, p.201-208, July 24-27, 2005
[doi> 10.1109/HPDC.2005.1520960]
|
| |
14
|
|
| |
15
|
Magesh Jayapandian, Adriane Chapman, et al. Michigan Molecular Interactions (MiMI): Putting the jigsaw puzzle together. Nucleic Acids Research, pages D566--D571, Jan 2007.
|
 |
16
|
|
| |
17
|
Simon Miles , Sylvia C. Wong , Weijian Fang , Paul Groth , Klaus-Peter Zauner , Luc Moreau, Provenance-based validation of e-science experiments, Web Semantics: Science, Services and Agents on the World Wide Web, v.5 n.1, p.28-38, March, 2007
[doi> 10.1016/j.websem.2006.11.003]
|
| |
18
|
Luc Moreau , Bertram Ludäscher , Ilkay Altintas , Roger S. Barga , Shawn Bowers , Steven Callahan , George Chin, Jr. , Ben Clifford , Shirley Cohen , Sarah Cohen-Boulakia , Susan Davidson , Ewa Deelman , Luciano Digiampietri , Ian Foster , Juliana Freire , James Frew , Joe Futrelle , Tara Gibson , Yolanda Gil , Carole Goble , Jennifer Golbeck , Paul Groth , David A. Holland , Sheng Jiang , Jihie Kim , David Koop , Ales Krenek , Timothy McPhillips , Gaurang Mehta , Simon Miles , Dominic Metzger , Steve Munroe , Jim Myers , Beth Plale , Norbert Podhorszki , Varun Ratnakar , Emanuele Santos , Carlos Scheidegger , Karen Schuchardt , Margo Seltzer , Yogesh L. Simmhan , Claudio Silva , Peter Slaughter , Eric Stephan , Robert Stevens , Daniele Turi , Huy Vo , Mike Wilde , Jun Zhao , Yong Zhao, Special Issue: The First Provenance Challenge, Concurrency and Computation: Practice & Experience, v.20 n.5, p.409-418, April 2008
[doi> 10.1002/cpe.v20:5]
|
| |
19
|
Michi Mutsuzaki, Martin Theobald, et al. Trio-One: Layering uncertainty and lineage on a conventional DBMS. In CIDR, pages 269--274, 2007.
|
 |
20
|
Brad A. Myers , David A. Weitzman , Andrew J. Ko , Duen H. Chau, Answering why and why not questions in user interfaces, Proceedings of the SIGCHI conference on Human Factors in computing systems, April 22-27, 2006, Montréal, Québec, Canada
[doi> 10.1145/1124772.1124832]
|
| |
21
|
Tom Oinn , Mark Greenwood , Matthew Addis , M. Nedim Alpdemir , Justin Ferris , Kevin Glover , Carole Goble , Antoon Goderis , Duncan Hull , Darren Marvin , Peter Li , Phillip Lord , Matthew R. Pocock , Martin Senger , Robert Stevens , Anil Wipat , Chris Wroe, Taverna: lessons in creating a workflow environment for the life sciences: Research Articles, Concurrency and Computation: Practice & Experience, v.18 n.10, p.1067-1100, August 2006
[doi> 10.1002/cpe.v18:10]
|
 |
22
|
|
| |
23
|
D. De Roure and C. Goble. myExperiment - a web 2.0 virtual research environment. In International Workshop on Virtual Research Environments and Collaborative Work Environments, 2007.
|
| |
24
|
Jennifer Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.
|
| |
25
|
|
|