|
ABSTRACT
In many data sharing settings, such as within the biological and biomedical communities, global data consistency is not always attainable: different sites' data may be dirty, uncertain, or even controversial. Collaborators are willing to share their data, and in many cases they also want to selectively import data from others --- but must occasionally diverge when they disagree about uncertain or controversial facts or values. For this reason, traditional data sharing and data integration approaches are not applicable, since they require a globally consistent data instance. Additionally, many of these approaches do not allow participants to make updates; if they do, concurrency control algorithms or inconsistency repair techniques must be used to ensure a consistent view of the data for all users.In this paper, we develop and present a fully decentralized model of collaborative data sharing, in which participants publish their data on an ad hoc basis and simultaneously reconcile updates with those published by others. Individual updates are associated with provenance information, and each participant accepts only updates with a sufficient authority ranking, meaning that each participant may have a different (though conceptually overlapping) data instance. We define a consistency semantics for database instances under this model of disagreement, present algorithms that perform reconciliation for distributed clusters of participants, and demonstrate their ability to handle typical update and conflict loads in settings involving the sharing of curated data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Marcelo Arenas , Leopoldo Bertossi , Jan Chomicki, Consistent query answers in inconsistent databases, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.68-79, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
[doi> 10.1145/303976.303983]
|
| |
2
|
A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence database and its supplement TrEMBL. Nucleic Acids Research, 28:45--48, 2000.
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
Concurrent versions system. Available from www.cvshome.org.
|
 |
8
|
Alan Demers , Dan Greene , Carl Hauser , Wes Irish , John Larson , Scott Shenker , Howard Sturgis , Dan Swinehart , Doug Terry, Epidemic algorithms for replicated database maintenance, Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, p.1-12, August 10-12, 1987, Vancouver, British Columbia, Canada
[doi> 10.1145/41840.41841]
|
 |
9
|
W. Keith Edwards , Elizabeth D. Mynatt , Karin Petersen , Mike J. Spreitzer , Douglas B. Terry , Marvin M. Theimer, Designing and implementing asynchronous collaborative applications with Bayou, Proceedings of the 10th annual ACM symposium on User interface software and technology, p.119-128, October 14-17, 1997, Banff, Alberta, Canada
[doi> 10.1145/263407.263530]
|
| |
10
|
J. N. Foster, M. B. Greenwald, J. T. Moore, B. C. Pierce, and A. Schmitt. Combinators for bi-directional tree transformations: A linguistic approach to the view update problem. Technical Report MS-CIS-004-15, University of Pennsylvania, July 2004.
|
| |
11
|
Hector Garcia-Molina , Yannis Papakonstantinou , Dallan Quass , Anand Rajaraman , Yehoshua Sagiv , Jeffrey Ullman , Vasilis Vassalos , Jennifer Widom, The TSIMMIS Approach to Mediation: Data Models and Languages, Journal of Intelligent Information Systems, v.8 n.2, p.117-132, March/April 1997
[doi> 10.1023/A:1008683107812]
|
 |
12
|
|
| |
13
|
A. Y. Halevy, Z. G. Ives, D. Suciu, and I. Tatarinov. Schema mediation in peer data management systems. In ICDE, pages 505--516, March 2003.
|
| |
14
|
Z. Ives, N. Khandelwal, A. Kapur, and M. Cakir. Orchestra: Rapid, collaborative sharing of dynamic data. In CIDR, pages 107--118, January 2005.
|
 |
15
|
|
 |
16
|
Anne-Marie Kermarrec , Antony Rowstron , Marc Shapiro , Peter Druschel, The IceCube approach to the reconciliation of divergent replicas, Proceedings of the twentieth annual ACM symposium on Principles of distributed computing, p.210-218, August 2001, Newport, Rhode Island, United States
[doi> 10.1145/383962.384020]
|
 |
17
|
|
 |
18
|
|
| |
19
|
D. Lembo, M. Lenzerini, and R. Rosati. Source inconsistency and incompleteness in data integration. In KRDB '02, April 2002.
|
| |
20
|
|
 |
21
|
|
| |
22
|
P. Mork, R. Shaker, A. Halevy, and P. Tarczy-Hornoch. PQL: A declarative query language over dynamic biological schemata. In American Medical Informatics Association (AMIA) Symposium, 2002, November 2002.
|
 |
23
|
|
| |
24
|
National Center for Biotechnology Information. GenBank. Available from www.ncbi.nlm.nih.gov/GenBank/.
|
| |
25
|
D. S. Parker, Jr., G. J. Popek, G. Rudisin, A. Stoughton, B. J. Walker, E. Walton, J. M. Chow, D. A. Edwards, S. Kiser, and C. S. Kline. Detection of mutual inconsistency in distributed systems. IEEE Trans. Software Eng., 9(3):240--247, 1983.
|
| |
26
|
B. C. Pierce, T. Jim, and J. Vouillon. Unison: A portable, cross-platform file synchronizer, 1999--2001.
|
 |
27
|
Sylvia Ratnasamy , Paul Francis , Mark Handley , Richard Karp , Scott Schenker, A scalable content-addressable network, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.161-172, August 2001, San Diego, California, United States
|
| |
28
|
|
| |
29
|
|
| |
30
|
Mahadev Satyanarayanan , James J. Kistler , Puneet Kumar , Maria E. Okasaki , Ellen H. Siegel , David C. Steere, Coda: A Highly Available File System for a Distributed Workstation Environment, IEEE Transactions on Computers, v.39 n.4, p.447-459, April 1990
[doi> 10.1109/12.54838]
|
 |
31
|
Ion Stoica , Robert Morris , David Karger , M. Frans Kaashoek , Hari Balakrishnan, Chord: A scalable peer-to-peer lookup service for internet applications, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.149-160, August 2001, San Diego, California, United States
|
| |
32
|
J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.
|
CITED BY 11
|
|
Alon Halevy , Michael Franklin , David Maier, Principles of dataspace systems, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.1-9, June 26-28, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zachary G. Ives , Todd J. Green , Grigoris Karvounarakis , Nicholas E. Taylor , Val Tannen , Partha Pratim Talukdar , Marie Jacob , Fernando Pereira, The ORCHESTRA Collaborative Data Sharing System, ACM SIGMOD Record, v.37 n.3, September 2008
|
|