| PIXSAR: incremental reclustering of augmented XML trees |
| Full text |
Pdf
(186 KB)
|
Source
|
Workshop On Web Information And Data Management
archive
Proceeding of the 10th ACM workshop on Web information and data management
table of contents
Napa Valley, California, USA
SESSION: Data mining and clustering
table of contents
Pages 17-24
Year of Publication: 2008
ISBN:978-1-60558-260-3
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 67, Citation Count: 0
|
|
|
ABSTRACT
XML is one of the primary encoding schemes for data and knowledge. We investigate incremental physical data clustering in systems that store XML documents using a native format. We formulate the XML clustering problem as an augmented (with sibling edges) tree partitioning problem and propose the PIXSAR (Practical Incremental XML Sibling Augmented Reclustering) algorithm for incrementally clustering XML documents. We show the fundamental importance of workload-driven dynamically rearranging storage. PIXSAR incrementally executes reclustering operations on selected subgraphs of the global augmented document tree. The subgraphs are implied by significant changes in the workload. As the workload changes, PIXSAR incrementally djusts the XML data layout so as to better fit the workload. PIXSAR's main parameters are the radius, in pages, of the augmented portion to be reclustered and the way reclustering is triggered. We briefly explore some of the effects of indexes; a full treatment of indexes is the subject of another paper. We use an experimental data clustering system that includes a fast disk simulator and File System simulator for storing native XML data. We use a novel method for 'exporting' the Saxon query processor into our setting. Experimental results indicate that using PIXSAR significantly reduces the number of page faults (counting ALL page faults incurred while querying the document as well as maintenance operations) thereby resulting in improved query performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Kevin Beyer , Roberta J. Cochrane , Vanja Josifovski , Jim Kleewein , George Lapis , Guy Lohman , Bob Lyle , Fatma Özcan , Hamid Pirahesh , Normen Seemann , Tuong Truong , Bert Van der Linden , Brian Vickery , Chun Zhang, System RX: one part relational, one part XML, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
[doi> 10.1145/1066157.1066197]
|
| |
2
|
R. Bordawekar and O. Shmueli. Flexible workload-aware clustering of xml documents. In XSym, 2004.
|
| |
3
|
|
| |
4
|
R. Bordawekar and O. Shmueli. An Algorithm for Partitioning Trees Augmented with Sibling Edges. Technical report, IBM Research, June 2006.
|
 |
5
|
|
| |
6
|
T. Fiebig, S. Helmer, C. Kanne, J. Mildenberger, G. Moerkotte, R. Schiele, and T. Westmann. Anatomy of a Native XML Database System. Technical Report, University of Mannheim, 2002.
|
| |
7
|
D. Florescu and D. Kossmann. Storing and querying xml data using an rdbms. IEEE Data Engineering Bulletin, 1999.
|
| |
8
|
M. S. Garey and D. S. Johnson. Computers and Intractability. W. H. Freeman and Co., 1979.
|
| |
9
|
|
| |
10
|
|
| |
11
|
D. S. Johnson and K. A. Niemi. On knapsacks, partitions, and a new dynamic programming technique for trees. Mathematics of Operations Research, 8(1), 1983.
|
| |
12
|
W. T. M. Jr., P. J. Schweitzer, and T. W. White. Problem Decomposition and Data Reorganization by a Clustering Technique. Operations Research, 1997.
|
| |
13
|
C. Kanne and G. Moerkotte. Efficient Storage of XML Data. In ICDE, 2000.
|
| |
14
|
|
| |
15
|
M. Kay. SAXON: XSLT and XQuery Engine. www.saxonica.com.
|
| |
16
|
B. W. Kernighan. Optimal sequential partitions of graphs. J. Assoc. Comp. Mach., 1971.
|
| |
17
|
B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell Sys. Tech. J., 1970.
|
| |
18
|
S. Kundu and J. Misra. A linear tree partitioning algorithm. SIAM, 1977.
|
| |
19
|
J. A. Lukes. Efficient Algorithm for the Partitioning of Trees. IBM Journal of Research and Development, 1974.
|
 |
20
|
Mark L. Mcauliffe , Michael J. Carey , Marvin H. Solomon, Vclusters: a flexible, fine-grained object clustering mechanism, Proceedings of the 13th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, p.230-243, October 18-22, 1998, Vancouver, British Columbia, Canada
|
 |
21
|
|
| |
22
|
Albrecht Schmidt , Florian Waas , Martin Kersten , Michael J. Carey , Ioana Manolescu , Ralph Busse, XMark: a benchmark for XML data management, Proceedings of the 28th international conference on Very Large Data Bases, p.974-985, August 20-23, 2002, Hong Kong, China
|
| |
23
|
|
 |
24
|
|
| |
25
|
W3C. XML: Extensible Markup Language. http://www.w3.org/XML/.
|
| |
26
|
|
|