| XML data partitioning strategies to improve parallelism in parallel holistic twig joins |
| Full text |
Pdf
(794 KB)
|
| Source
|
Conference On Ubiquitous Information Management And Communication
archive
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
table of contents
Suwon, Korea
SESSION: Data analysis and mining I
table of contents
Pages 471-480
Year of Publication: 2009
ISBN:978-1-60558-405-8
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 65, Citation Count: 0
|
|
|
ABSTRACT
Parallel XML query processing systems that process numerous queries over large heterogeneous XML documents often experience under-performance due to workload imbalance and low CPU/system utilization, because conventional partitioning strategies cannot serve well for state-of-the-art query processing algorithms, such as holistic twig joins. Consequently, partitioning and distributing heterogeneous XML documents onto a parallel cluster system have lead to such an intricacy issue for maintaining good query performance. In this paper, we propose XML data partitioning strategies that are able to alleviate system performance degradation due to workload imbalance, especially for parallel holistic twig joins processing. The proposed XML data partitioning strategies aim at improving workload balance on both static data distribution and dynamic data distribution. In the first strategy we refine an XML partition having a high cost by series of XML data partition refinements with various levels of granularities from document, query, and subquery, up to node streams. The selection of the granularity level for refining a high cost partition is contextually dependent on the overall workload balance in the system. In the second strategy for dynamic data distribution, we dynamically handle low system utilization when there are many idle nodes in the system. We propose an XML data redistribution approach by partitioning XML data on the fly at the stream nodes-based granularity.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Niagara query engine. http://www.cs.wisc.edu/niagara.
|
| |
2
|
Stanford university infolab. http://infolab.stanford.edu/pub/movies/dtd.html.
|
| |
3
|
|
| |
4
|
|
| |
5
|
J.-M. Bremer and M. Gertz. On Distributing XML Repositories. In International Workshop on the Web and Databases (WebDB), pages 73--78, 2003.
|
 |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
 |
13
|
Ying Guang Li , Stéphane Bressan , Gillian Dobbie , Zoé Lacroix , Mong Li Lee , Ullas Nambiar , Bimlesh Wadhwa, XOO7: applying OO7 benchmark to XML query processing tool, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502585.502614]
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
Albrecht Schmidt , Florian Waas , Martin Kersten , Michael J. Carey , Ioana Manolescu , Ralph Busse, XMark: a benchmark for XML data management, Proceedings of the 28th international conference on Very Large Data Bases, p.974-985, August 20-23, 2002, Hong Kong, China
|
| |
19
|
|
 |
20
|
|
 |
21
|
Chun Zhang , Jeffrey Naughton , David DeWitt , Qiong Luo , Guy Lohman, On supporting containment queries in relational database management systems, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.425-436, May 21-24, 2001, Santa Barbara, California, United States
|
|