ACM Home Page
Please provide us with feedback. Feedback
XML data partitioning strategies to improve parallelism in parallel holistic twig joins
Full text PdfPdf (794 KB)
Source Conference On Ubiquitous Information Management And Communication archive
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication table of contents
Suwon, Korea
SESSION: Data analysis and mining I table of contents
Pages 471-480  
Year of Publication: 2009
ISBN:978-1-60558-405-8
Authors
Imam Machdi  University of Tsukuba, Japan
Toshiyuki Amagasa  University of Tsukuba, Japan
Hiroyuki Kitagawa  University of Tsukuba, Japan
Sponsor
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 65,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1516241.1516322
What is a DOI?

ABSTRACT

Parallel XML query processing systems that process numerous queries over large heterogeneous XML documents often experience under-performance due to workload imbalance and low CPU/system utilization, because conventional partitioning strategies cannot serve well for state-of-the-art query processing algorithms, such as holistic twig joins. Consequently, partitioning and distributing heterogeneous XML documents onto a parallel cluster system have lead to such an intricacy issue for maintaining good query performance. In this paper, we propose XML data partitioning strategies that are able to alleviate system performance degradation due to workload imbalance, especially for parallel holistic twig joins processing. The proposed XML data partitioning strategies aim at improving workload balance on both static data distribution and dynamic data distribution. In the first strategy we refine an XML partition having a high cost by series of XML data partition refinements with various levels of granularities from document, query, and subquery, up to node streams. The selection of the granularity level for refining a high cost partition is contextually dependent on the overall workload balance in the system. In the second strategy for dynamic data distribution, we dynamically handle low system utilization when there are many idle nodes in the system. We propose an XML data redistribution approach by partitioning XML data on the fly at the stream nodes-based granularity.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Niagara query engine. http://www.cs.wisc.edu/niagara.
 
2
Stanford university infolab. http://infolab.stanford.edu/pub/movies/dtd.html.
 
3
 
4
 
5
J.-M. Bremer and M. Gertz. On Distributing XML Repositories. In International Workshop on the Web and Databases (WebDB), pages 73--78, 2003.
6
7
 
8
 
9
10
 
11
 
12
13
14
 
15
16
17
 
18
 
19
20
21

Collaborative Colleagues:
Imam Machdi: colleagues
Toshiyuki Amagasa: colleagues
Hiroyuki Kitagawa: colleagues