ACM Home Page
Please provide us with feedback. Feedback
Reconciling while tolerating disagreement in collaborative data sharing
Full text PdfPdf (401 KB)
Source International Conference on Management of Data archive
Proceedings of the 2006 ACM SIGMOD international conference on Management of data table of contents
Chicago, IL, USA
SESSION: P2P systems & data streams table of contents
Pages: 13 - 24  
Year of Publication: 2006
ISBN:1-59593-434-0
Authors
Nicholas E. Taylor  University of Pennsylvania
Zachary G. Ives  University of Pennsylvania
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 102,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1142473.1142476
What is a DOI?

ABSTRACT

In many data sharing settings, such as within the biological and biomedical communities, global data consistency is not always attainable: different sites' data may be dirty, uncertain, or even controversial. Collaborators are willing to share their data, and in many cases they also want to selectively import data from others --- but must occasionally diverge when they disagree about uncertain or controversial facts or values. For this reason, traditional data sharing and data integration approaches are not applicable, since they require a globally consistent data instance. Additionally, many of these approaches do not allow participants to make updates; if they do, concurrency control algorithms or inconsistency repair techniques must be used to ensure a consistent view of the data for all users.In this paper, we develop and present a fully decentralized model of collaborative data sharing, in which participants publish their data on an ad hoc basis and simultaneously reconcile updates with those published by others. Individual updates are associated with provenance information, and each participant accepts only updates with a sufficient authority ranking, meaning that each participant may have a different (though conceptually overlapping) data instance. We define a consistency semantics for database instances under this model of disagreement, present algorithms that perform reconciliation for distributed clusters of participants, and demonstrate their ability to handle typical update and conflict loads in settings involving the sharing of curated data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence database and its supplement TrEMBL. Nucleic Acids Research, 28:45--48, 2000.
 
3
4
 
5
 
6
 
7
Concurrent versions system. Available from www.cvshome.org.
8
9
 
10
J. N. Foster, M. B. Greenwald, J. T. Moore, B. C. Pierce, and A. Schmitt. Combinators for bi-directional tree transformations: A linguistic approach to the view update problem. Technical Report MS-CIS-004-15, University of Pennsylvania, July 2004.
 
11
12
 
13
A. Y. Halevy, Z. G. Ives, D. Suciu, and I. Tatarinov. Schema mediation in peer data management systems. In ICDE, pages 505--516, March 2003.
 
14
Z. Ives, N. Khandelwal, A. Kapur, and M. Cakir. Orchestra: Rapid, collaborative sharing of dynamic data. In CIDR, pages 107--118, January 2005.
15
16
17
18
 
19
D. Lembo, M. Lenzerini, and R. Rosati. Source inconsistency and incompleteness in data integration. In KRDB '02, April 2002.
 
20
21
 
22
P. Mork, R. Shaker, A. Halevy, and P. Tarczy-Hornoch. PQL: A declarative query language over dynamic biological schemata. In American Medical Informatics Association (AMIA) Symposium, 2002, November 2002.
23
 
24
National Center for Biotechnology Information. GenBank. Available from www.ncbi.nlm.nih.gov/GenBank/.
 
25
D. S. Parker, Jr., G. J. Popek, G. Rudisin, A. Stoughton, B. J. Walker, E. Walton, J. M. Chow, D. A. Edwards, S. Kiser, and C. S. Kline. Detection of mutual inconsistency in distributed systems. IEEE Trans. Software Eng., 9(3):240--247, 1983.
 
26
B. C. Pierce, T. Jim, and J. Vouillon. Unison: A portable, cross-platform file synchronizer, 1999--2001.
27
 
28
 
29
 
30
31
 
32
J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.

CITED BY  11

Collaborative Colleagues:
Nicholas E. Taylor: colleagues
Zachary G. Ives: colleagues