|
ABSTRACT
A critical reality in data integration is that knowledge from different sources may often be conflicting with each other. Conflict resolutioncan be costly and, if done without proper context, can be ineffective. In this paper, we propose a novel query-driven and feedback-based approach (FICSR1) to conflict resolution when integrating data sources. In particular, instead of relying on traditional model based definition of consistency, we introduce a ranked interpretation. This not only enables FICSR to deal with the complexity of the conflict resolution process, but also helps achieve a more direct match between the users' (subjective) interpretation of the data and the system's (objective) treatment of the available alternatives. Consequently, the ranked interpretation leads to new opportunities for bi-directional (data informsover ↔ user) feedback cycle for conflict resolution: given a query, (a) a preliminary ranking of candidate results on data can inform the user regarding constraints critical to the query, while (b) user feedback regarding the ranks can be exploited to inform the system about user's relevant domain knowledge. To enable this feedback process, we develop data structures and algorithms for efficient off-line conflict/agreement analysis of the integrated data as well as for on-line query processing, candidate result enumeration, and validity analysis. The results are brought together and evaluated in the FICSR system.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. Alchourron, P. Gardenfors, and D. Makinson. On the logic of theory change: Partial meet contraction and revision functions", In J.Symbolic Logic, 1985.
|
| |
2
|
M. Arenas and L. Libkin. XML data exchange: consistency and query answering. In PODS, 2005.
|
| |
3
|
O. Banjelloun, A. D. Sarma, A. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In VLDB, 2006.
|
| |
4
|
L. Bertossi. Consistent query answering in databases. VLDB, 2006.
|
| |
5
|
A. Bonifati, E. Chang, and L. Lakshmanan. Heptox: Marrying XML and heterogeneity in your P2P databases. In VLDB, 2005.
|
| |
6
|
P. Bouquet, F. Giunchiglia, F. van Harmelen, L. Serafini, and H. Stuckenschmidt. COWL: Contextualizing Ontologies. ISWC'03.
|
| |
7
|
C. Boutilier, R. I. Brafman, and C. Geib. Structured Reachability Analysis for Markov Decision Processes. UAI'98.
|
| |
8
|
K. S. Candan, J. Grant, and V. Subrahmanian. A unified treatment of null values using constraints. Information Systems J., 98(1--4), 1997.
|
| |
9
|
|
| |
10
|
S. Conrad, M. Höding, G. Saake, I. Schmitt, and C. Türker. Schema Integration with Integrity Constraints. In BNCOD, 1997.
|
| |
11
|
A. Doan, P. Domingos, and A. Y. Levy. Learning source description for data integration. In WebDB, 2000.
|
| |
12
|
J. Doyle. A truth maintenance system. J. of Artificial Intelligence, 1979.
|
| |
13
|
R. Fagin. Combining fuzzy information from multiple systems. In PODS, 1996.
|
| |
14
|
M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In Proc. ICSLP, 1988.
|
| |
15
|
A. Halevy, Z. G. Ives, D. Suciu, and I. Tatarinov. Schema mediation in peer data management. In ICDE, 2003.
|
 |
16
|
|
| |
17
|
A. Jhingran. Enterprise information mashups: Integrating information, simply. In VLDB, 2006.
|
| |
18
|
A. Kementsietsidis, M. Arenas, and R. Miller. Mapping data in peer-to-peer systems: Semantics and algorithmic issues. SIGMOD'03.
|
| |
19
|
M. Lenzerini. Data integration: a theoretical perspective. PODS'02.
|
| |
20
|
M. Liu and T. W. Ling. A data model for semistructured data with partial and inconsistent information. In LNCS 1777, 2000.
|
| |
21
|
J. Madhavan, P. A. Bernstein, and E. Rahm. Generic schema matching with cupid. In VLDB, 2001.
|
| |
22
|
R. Mercer and V. Risch. Properties of maximal cliques of a pair-wise compatibility graph for three nonmonotonic reasoning system. In Answer Set Programming, 2003.
|
| |
23
|
R. Miller, L. Haas, and M. Hernandez. Schema mapping as query discovery. In VLDB, 2000.
|
| |
24
|
T. Milo and S. Zohar. Using schema matching to simplify heterogeneous data translation. In VLDB, 1998.
|
| |
25
|
P. Mitra, G. Wiederhold, and M. Kersten. A graph oriented model for articulation of ontology interdependencies. In EDBT, 2000.
|
| |
26
|
L. Palopoli, D. Sacca, and D. Ursino. An automatic technique for detecting type conflicts in database schemes. In CIKM, 1998.
|
| |
27
|
M. Pascoal and E. Martins. A new implementation of Yen's ranking loopless paths algorithm. 4OR -- Quarterly Journal of the Belgian, French and Italian Operations Research Societies, 2003.
|
| |
28
|
Y. Qi, K. S. Candan, M. L. Sapino, and K. Kintigh. Using QUEST for integrating taxonomies in the presence of misalignments and conflicts. In SIGMOD Demos, 2007.
|
| |
29
|
Y. Qi, K. S. Candan, M. L. Sapino, and K. Kintigh. QUEST: QUery-driven Exploration of Semistructured Data with ConflicTs and partial knowledge, CleanDB, 2006.
|
| |
30
|
|
| |
31
|
N. E. Taylor and Z. G. Ives. Reconciling while tolerating disagreement in collaborative data sharing. In SIGMOD, 2006.
|
| |
32
|
C. Turker and G. Saake. Deriving relationships between integrity constraints for schema comparison. In Advances in Databases and Information Systems, 1998.
|
| |
33
|
M. W. W. Vermeer and P. M. G. Apers. The role of integrity constraints in database interoperation. In VLDBJ, 1996.
|
| |
34
|
J. Y. Yen. Finding the k shortest loopless paths in a network. Management Science, 1971.
|
 |
35
|
|
| |
36
|
C. Zaniolo. A unified semantics for active and deductive databases. In Proc. First Int'l Workshop Rules in Database Systems 1994.
|
CITED BY 5
|
|
Yan Qi , K. Selçuk Candan , Junichi Tatemura , Songting Chen , Fenglin Liao, Supporting OLAP operations over imperfectly integrated taxonomies, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|