ACM Home Page
Please provide us with feedback. Feedback
Data mining using high performance data clouds: experimental studies using sector and sphere
Full text PdfPdf (272 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Industrial papers table of contents
Pages 920-927  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
Robert Grossman  University of Illinois at Chicago and Open Data Group, Chicago, IL, USA
Yunhong Gu  University of Illinois at Chicago, Chicago, IL, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 91,   Downloads (12 Months): 782,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1402000
What is a DOI?

ABSTRACT

We describe the design and implementation of a high performance cloud that we have used to archive, analyze and mine large distributed data sets. By a cloud, we mean an infrastructure that provides resources and/or services over the Internet. A storage cloud provides storage services, while a compute cloud provides compute services. We describe the design of the Sector storage cloud and how it provides the storage services required by the Sphere compute cloud. We also describe the programming paradigm supported by the Sphere compute cloud. Sector and Sphere are designed for analyzing large data sets using computer clusters connected with wide area high performance networks (for example, 10+ Gb/s). We describe a distributed data mining application that we have developed using Sector and Sphere. Finally, we describe some experimental studies comparing Sector/Sphere to Hadoop.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Amazon. Amazon Simple Storage Service (Amazon S3). www.amazon.com/s3.
 
2
Jay Beale, Andrew R Baker, and Joel Esler. Snort IDS and IPS Toolkit. Syngress, 2007.
 
3
Dhruba Borthaku. The hadoop distributed file system: Architecture and design. retrieved from lucene.apache.org/hadoop, 2007.
 
4
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Chapman and Hall, New York, 1984.
 
5
 
6
National Center for Data Mining at the University of Illinois at Chicago. The large data archives project.
 
7
8
 
9
Jim Gray and Alexander S. Szalay. The world-wide telescope. Science, 293:2037--2040, 2001.
 
10
 
11
Robert L. Grossman and Yunhong Gu. Sc 2006 bandwidth challenge: National center for data mining - udt. retrieved from https://scinet.supercomp.org/2006/bwc/graphs/challengencdm.png, 2006.
 
12
Robert L Grossman, Michael Sabala, Yunhong Gu, Anushka Anand, Matt Handley, Rajmonda Sulo, and Lee Wilkinson. Distributed discovery in e-science: Lessons from the angle project. In Next Generation Data Mining (NGDM '07), 2008.
 
13
 
14
 
15
Hillol Kargupta. Proceedings of Next Generation Data Mining 2007. Taylor and Francis, 2008.
 
16
Amazon Web Services LLC. Amazon web services developer connection. retrieved from developer.amazonwebservices.com on November 1, 2007.
 
17
John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Kruger, Aaron E. Lefohn, and Timothy J.Purcell. A survey of general-purpose computation on graphics hardware. In Eurographics 2005, pages 21--51, 2005.
 
18
The Sector Project. Sector, a distributed storage and computing infrastructure, version 1.4.
19
 
20
Hbase Development Team. Hbase: Bigtable-like structured storage for hadoop hdfs. http://wiki.apache.org/lucene-hadoop/Hbase, 2007.


Collaborative Colleagues:
Robert Grossman: colleagues
Yunhong Gu: colleagues