ACM Home Page
Please provide us with feedback. Feedback
PIXSAR: incremental reclustering of augmented XML trees
Full text PdfPdf (186 KB)
Source
Workshop On Web Information And Data Management archive
Proceeding of the 10th ACM workshop on Web information and data management table of contents
Napa Valley, California, USA
SESSION: Data mining and clustering table of contents
Pages 17-24  
Year of Publication: 2008
ISBN:978-1-60558-260-3
Authors
Lila Shnaiderman  Technion, Haifa, Israel
Oded Shmueli  Technion, Haifa, Israel
Rajesh Bordawekar  IBM T. J. Watson Research Center, Hawthorne, NY, USA
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 67,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458502.1458506
What is a DOI?

ABSTRACT

XML is one of the primary encoding schemes for data and knowledge. We investigate incremental physical data clustering in systems that store XML documents using a native format. We formulate the XML clustering problem as an augmented (with sibling edges) tree partitioning problem and propose the PIXSAR (Practical Incremental XML Sibling Augmented Reclustering) algorithm for incrementally clustering XML documents. We show the fundamental importance of workload-driven dynamically rearranging storage.

PIXSAR incrementally executes reclustering operations on selected subgraphs of the global augmented document tree. The subgraphs are implied by significant changes in the workload. As the workload changes, PIXSAR incrementally djusts the XML data layout so as to better fit the workload. PIXSAR's main parameters are the radius, in pages, of the augmented portion to be reclustered and the way reclustering is triggered. We briefly explore some of the effects of indexes; a full treatment of indexes is the subject of another paper.

We use an experimental data clustering system that includes a fast disk simulator and File System simulator for storing native XML data. We use a novel method for 'exporting' the Saxon query processor into our setting. Experimental results indicate that using PIXSAR significantly reduces the number of page faults (counting ALL page faults incurred while querying the document as well as maintenance operations) thereby resulting in improved query performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
R. Bordawekar and O. Shmueli. Flexible workload-aware clustering of xml documents. In XSym, 2004.
 
3
 
4
R. Bordawekar and O. Shmueli. An Algorithm for Partitioning Trees Augmented with Sibling Edges. Technical report, IBM Research, June 2006.
5
 
6
T. Fiebig, S. Helmer, C. Kanne, J. Mildenberger, G. Moerkotte, R. Schiele, and T. Westmann. Anatomy of a Native XML Database System. Technical Report, University of Mannheim, 2002.
 
7
D. Florescu and D. Kossmann. Storing and querying xml data using an rdbms. IEEE Data Engineering Bulletin, 1999.
 
8
M. S. Garey and D. S. Johnson. Computers and Intractability. W. H. Freeman and Co., 1979.
 
9
 
10
 
11
D. S. Johnson and K. A. Niemi. On knapsacks, partitions, and a new dynamic programming technique for trees. Mathematics of Operations Research, 8(1), 1983.
 
12
W. T. M. Jr., P. J. Schweitzer, and T. W. White. Problem Decomposition and Data Reorganization by a Clustering Technique. Operations Research, 1997.
 
13
C. Kanne and G. Moerkotte. Efficient Storage of XML Data. In ICDE, 2000.
 
14
 
15
M. Kay. SAXON: XSLT and XQuery Engine. www.saxonica.com.
 
16
B. W. Kernighan. Optimal sequential partitions of graphs. J. Assoc. Comp. Mach., 1971.
 
17
B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell Sys. Tech. J., 1970.
 
18
S. Kundu and J. Misra. A linear tree partitioning algorithm. SIAM, 1977.
 
19
J. A. Lukes. Efficient Algorithm for the Partitioning of Trees. IBM Journal of Research and Development, 1974.
20
21
 
22
 
23
24
 
25
W3C. XML: Extensible Markup Language. http://www.w3.org/XML/.
 
26

Collaborative Colleagues:
Lila Shnaiderman: colleagues
Oded Shmueli: colleagues
Rajesh Bordawekar: colleagues