|
ABSTRACT
Collaborative research in various scientific disciplines requires support for scalable data management enabling the efficient correlation of globally distributed data sources. Motivated by the expected data rates of upcoming projects and a growing number of users, communities explore new data management techniques for achieving high throughput. Community-driven data grids deliver such high-throughput data distribution for scientific federations by partitioning data according to application-specific data and query characteristics. Query hot spots are an important and challenging problem in this environment. Existing approaches to load-balancing from Peer-to-Peer (P2P) data management and sensor networks do not directly meet the requirements of a data-intensive e-science environment. In this paper, our contributions are partitioning schemes based on multi-dimensional index structures enabling communities to trade off data load balancing and handling query hot spots via splitting and replication. We evaluate the partitioning schemes with two typical kinds of data sets from the astrophysics domain and workloads extracted from Sloan Digital Sky Survey (SDSS) query traces and perform throughput measurements in real and simulated networks. The experiments demonstrate the improved workload distribution capabilities and give promising directions for the development of future community grids.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
|
 |
4
|
Adina Crainiceanu , Prakash Linga , Ashwin Machanavajjhala , Johannes Gehrke , Jayavel Shanmugasundaram, P-ring: an efficient and robust P2P range index structure, Proceedings of the 2007 ACM SIGMOD international conference on Management of data, June 11-14, 2007, Beijing, China
[doi> 10.1145/1247480.1247507]
|
 |
5
|
|
| |
6
|
C. du Mouza, W. Litwin, and P. Rigaux. SD-Rtree: A Scalable Distributed Rtree. In ICDE, pages 296--305, Istanbul, Turkey, Apr. 2007.
|
| |
7
|
H. Enke, M. Steinmetz, T. Radke, A. Reiser, T. Röblitz, and M. Högqvist. AstroGrid-D: Enhancing Astronomic Science with Grid Technology. In German e-Science Conference, Baden-Baden, Germany, May 2007.
|
| |
8
|
R. A. Finkel and J. L. Bentley. Quad Trees: A Data Structure for Retrieval on Composite Keys. Acta Informatica, 4:1--9, Mar. 1974.
|
| |
9
|
|
| |
10
|
D. Hilbert. Über die stetige Abbildung einer Linie auf ein Flächenstück. Math. Ann., 38:459--460, 1891.
|
| |
11
|
|
| |
12
|
Richard Kuntschke , Tobias Scholl , Sebastian Huber , Alfons Kemper , Angelika Reiser , Hans-Martin Adorf , Gerard Lemson , Wolfgang Voges, Grid-Based Data Stream Processing in e-Science, Proceedings of the Second IEEE International Conference on e-Science and Grid Computing, p.30, December 04-06, 2006
[doi> 10.1109/E-SCIENCE.2006.78]
|
| |
13
|
V. Markl and R. Bayer. Processing Relational OLAP Queries with UB-Trees and Multidimensional Hierarchical Clustering. In DMDW, page 1, Stockholm, Sweden, June 2000.
|
| |
14
|
William O'Mullane , Nolan Li , Maria Nieto-Santisteban , Alex Szalay , Ani Thakar, Batch is Back: CasJobs, Serving Multi-TB Data on the Web, Proceedings of the IEEE International Conference on Web Services, p.33-40, July 11-15, 2005
[doi> 10.1109/ICWS.2005.29]
|
 |
15
|
|
| |
16
|
T. Pitoura, N. Ntarmos, and P. Triantafillou. Replication, Load Balancing, and Efficient Range Query Processing in DHT Data Networks. In EDBT, pages 131--148, Munich, Germany, Mar. 2006.
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
Tobias Scholl , Bernhard Bauer , Benjamin Gufler , Richard Kuntschke , Angelika Reiser , Alfons Kemper, Scalable community-driven data sharing in e-science grids, Future Generation Computer Systems, v.25 n.3, p.290-300, March, 2009
[doi> 10.1016/j.future.2008.05.006]
|
| |
21
|
Tobias Scholl , Bernhard Bauer , Benjamin Gufler , Richard Kuntschke , Daniel Weber , Angelika Reiser , Alfons Kemper, HiSbase: histogram-based P2P main memory data management, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
| |
22
|
|
| |
23
|
V. Singh, J. Gray, A. Thakar, A. Szalay, J. Raddick, B. Boroski, S. Lebedeva, and B. Yanny. SkyServer Traffic Report - The First Five Years. Technical Report MS-TR-2006-190, Microsoft Research, Microsoft Cooperation, Redmond, WA, USA, Dec. 2006.
|
| |
24
|
V. Springel, S. D. M. White, A. Jenkins, C. S. Frenk, N. Yoshida, L. Gao, J. Navarro, R. Thacker, D. Croton, J. Helly, J. A. Peacock, S. Cole, P. Thomas, H. Couchman, A. Evrard, J. Colberg, and F. Pearce. Simulating the joint evolution of quasars, galaxies and their large-scale distribution. Nature, 435:629--636, June 2005.
|
 |
25
|
Ion Stoica , Robert Morris , David Karger , M. Frans Kaashoek , Hari Balakrishnan, Chord: A scalable peer-to-peer lookup service for internet applications, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.149-160, August 2001, San Diego, California, United States
|
 |
26
|
Alexander S. Szalay , Jim Gray , Ani R. Thakar , Peter Z. Kunszt , Tanu Malik , Jordan Raddick , Christopher Stoughton , Jan vandenBerg, The SDSS skyserver: public access to the sloan digital sky server data, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
[doi> 10.1145/564691.564758]
|
 |
27
|
|
| |
28
|
X. Wang, R. Burns, A. Terzis, and A. Deshpande. Network-Aware Join Processing in Global-Scale Database Federations. In ICDE, pages 586--595, Cancun, Mexico, Apr. 2008.
|
| |
29
|
|
|