|
ABSTRACT
Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This problem is much more challenging than the corresponding one for relational or flat-file data. In order to describe changes better, we base our work not just on the traditional “atomic” insert, delete, update operations, but also on operations that move an entire sub-tree of nodes, and that copy an entire sub-tree. These operations allows us to describe changes in a semantically more meaningful way. Since this change detection problem is NP-hard, in this paper we present a heuristic change detection algorithm that yields close to “minimal” descriptions of the changes, and that has fewer restrictions than previous algorithms. Our algorithm is based on transforming the change detection problem to a problem of computing a minimum-cost edge cover of a bipartite graph. We study the quality of the solution produced by our algorithm, as well as the running time, both analytically and experimentally.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
CGM97
|
S. Chawathe and H. Garcia-Molina. Meaningful change detection in structured data. Available at URL http://w~ra-db, stanford, edu, 1997. Extended version.
|
| |
CGMH+94
|
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The Tsimmis project: Integration of heterogeneous information sources. In Proceedings of lOOth Anniversary Meeting of the information Processing Society of Japan, pages 7-18, Tokyo, Japan, October 1994.
|
 |
CRGMW96
|
Sudarshan S. Chawathe , Anand Rajaraman , Hector Garcia-Molina , Jennifer Widom, Change detection in hierarchically structured information, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.493-504, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
HHS+
|
M. Haertel, D. Hayes, R. Stallman, L. Tower, P. Eggert., and W. Davison. The GNU diff program. Texinfo system documentation. Available by anonymous FTP from prep. ai .mit. edu.
|
| |
Law76
|
E. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart and Winston, 1976.
|
| |
LGM96
|
|
| |
Mye86
|
E. Myers. An O(UO) difference algorithm and its variations. Algorithmica, 1(2):251-266, 1986.
|
| |
PS82
|
|
| |
Rot
|
E. Rothberg. The wmatch program for finding a maximum-weight matching for undirected graphs. Live OR collection. Available at URL http ://w~. orsoc, org. uk.
|
| |
SWZS94
|
D. Shasha, J. Wang, K. Zhang, and F. Shih. Exact and approximate algorithms for unordered tree matching. IEEE Transactions on Systems, Man, and Cybernetics, 24(4):668-678, April 1994.
|
| |
SZ90
|
|
 |
Wag75
|
|
 |
WF74
|
|
| |
WMG90
|
|
| |
WU95
|
J. Widom and J. Ullman. The C3 project: Changes, consistency, and configurations in heterogeneous distributed information systems. Unpublished manuscript; available at URL http://www-db, stan~ord, edu, 1995.
|
| |
ZS89
|
|
| |
ZWS95
|
K. Zhang, J. Wang, and D. Shasha. On the editing distance between undirected acyclic graphs. International Journal of Foundations of Computer Science, 1995.
|
CITED BY 53
|
|
Ling Liu , Calton Pu , Wei Tang, WebCQ-detecting and delivering information changes on the web, Proceedings of the ninth international conference on Information and knowledge management, p.512-519, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pascal Molli , Gérald Oster , Hala Skaf-Molli , Abdessamad Imine, Using the transformational approach to build a safe and generic data synchronizer, Proceedings of the 2003 international ACM SIGGROUP conference on Supporting group work, November 09-12, 2003, Sanibel Island, Florida, USA
|
|
|
|
|
|
|
|
|
S. B. Davidson , J. Crabtree , B. P. Brunk , J. Schug , V. Tannen , G. C. Overton , C. J. Stoeckert, Jr., K2/Kleisli and GUS: experiments in integrated access to genomic data sources, IBM Systems Journal, v.40 n.2, p.512-531, February 2001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Raihan Al-Ekram , Archana Adma , Olga Baysal, diffX: an algorithm to detect changes in multi-version XML documents, Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research, p.1-11, October 17-20, 2005, Toranto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Sudipto Guha , H. V. Jagadish , Nick Koudas , Divesh Srivastava , Ting Yu, Approximate XML joins, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alex Verstak , Naren Ramakrishnan , Layne T. Watson , Jian He , Clifford A. Shaffer , Kyung Kyoon Bae , Jing Jiang , William H. Tranter , Theodore S. Rappaport, BSML: A binding schema markup language for data interchange in problem solving environments, Scientific Programming, v.11 n.3, p.199-224, August 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tancred Lindholm , Jaakko Kangasharju, How to edit gigabyte XML files on a mobile phone with XAS, RefTrees, and RAXS, Proceedings of the 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services, July 21-25, 2008, Dublin, Ireland
|
|