| Data mining using high performance data clouds: experimental studies using sector and sphere |
| Full text |
Pdf
(272 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Las Vegas, Nevada, USA
SESSION: Industrial papers
table of contents
Pages 920-927
Year of Publication: 2008
ISBN:978-1-60558-193-4
|
|
Authors
|
|
Robert Grossman
|
University of Illinois at Chicago and Open Data Group, Chicago, IL, USA
|
|
Yunhong Gu
|
University of Illinois at Chicago, Chicago, IL, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 91, Downloads (12 Months): 782, Citation Count: 3
|
|
|
ABSTRACT
We describe the design and implementation of a high performance cloud that we have used to archive, analyze and mine large distributed data sets. By a cloud, we mean an infrastructure that provides resources and/or services over the Internet. A storage cloud provides storage services, while a compute cloud provides compute services. We describe the design of the Sector storage cloud and how it provides the storage services required by the Sphere compute cloud. We also describe the programming paradigm supported by the Sphere compute cloud. Sector and Sphere are designed for analyzing large data sets using computer clusters connected with wide area high performance networks (for example, 10+ Gb/s). We describe a distributed data mining application that we have developed using Sector and Sphere. Finally, we describe some experimental studies comparing Sector/Sphere to Hadoop.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Amazon. Amazon Simple Storage Service (Amazon S3). www.amazon.com/s3.
|
| |
2
|
Jay Beale, Andrew R Baker, and Joel Esler. Snort IDS and IPS Toolkit. Syngress, 2007.
|
| |
3
|
Dhruba Borthaku. The hadoop distributed file system: Architecture and design. retrieved from lucene.apache.org/hadoop, 2007.
|
| |
4
|
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Chapman and Hall, New York, 1984.
|
| |
5
|
|
| |
6
|
National Center for Data Mining at the University of Illinois at Chicago. The large data archives project.
|
| |
7
|
|
 |
8
|
|
| |
9
|
Jim Gray and Alexander S. Szalay. The world-wide telescope. Science, 293:2037--2040, 2001.
|
| |
10
|
|
| |
11
|
Robert L. Grossman and Yunhong Gu. Sc 2006 bandwidth challenge: National center for data mining - udt. retrieved from https://scinet.supercomp.org/2006/bwc/graphs/challengencdm.png, 2006.
|
| |
12
|
Robert L Grossman, Michael Sabala, Yunhong Gu, Anushka Anand, Matt Handley, Rajmonda Sulo, and Lee Wilkinson. Distributed discovery in e-science: Lessons from the angle project. In Next Generation Data Mining (NGDM '07), 2008.
|
| |
13
|
|
| |
14
|
|
| |
15
|
Hillol Kargupta. Proceedings of Next Generation Data Mining 2007. Taylor and Francis, 2008.
|
| |
16
|
Amazon Web Services LLC. Amazon web services developer connection. retrieved from developer.amazonwebservices.com on November 1, 2007.
|
| |
17
|
John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Kruger, Aaron E. Lefohn, and Timothy J.Purcell. A survey of general-purpose computation on graphics hardware. In Eurographics 2005, pages 21--51, 2005.
|
| |
18
|
The Sector Project. Sector, a distributed storage and computing infrastructure, version 1.4.
|
 |
19
|
Ion Stoica , Robert Morris , David Karger , M. Frans Kaashoek , Hari Balakrishnan, Chord: A scalable peer-to-peer lookup service for internet applications, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.149-160, August 2001, San Diego, California, United States
|
| |
20
|
Hbase Development Team. Hbase: Bigtable-like structured storage for hadoop hdfs. http://wiki.apache.org/lucene-hadoop/Hbase, 2007.
|
CITED BY 3
|
|
Michael Zeller , Robert Grossman , Christoph Lingenfelder , Michael R. Berthold , Erik Marcade , Rick Pechter , Mike Hoskins , Wayne Thompson , Rich Holada, Open standards and cloud computing: KDD-2009 panel report, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|
|
Ioan Raicu , Ian T. Foster , Yong Zhao , Philip Little , Christopher M. Moretti , Amitabh Chaudhary , Douglas Thain, The quest for scalable support of data-intensive workloads in distributed systems, Proceedings of the 18th ACM international symposium on High performance distributed computing, June 11-13, 2009, Garching, Germany
|
|
|
Malcolm P. Atkinson , Jano I. van Hemert , Liangxiu Han , Ally Hume , Chee Sun Liew, A distributed architecture for data mining and integration, Proceedings of the second international workshop on Data-aware distributed computing, p.11-20, June 09-10, 2009, Garching, Germany
|
|