ACM Home Page
Please provide us with feedback. Feedback
Parallel database processing on a 100 Node PC cluster: cases for decision support query processing and data mining
Full text PdfPdf (158 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) table of contents
San Jose, CA
Pages: 1 - 16  
Year of Publication: 1997
ISBN:0-89791-985-8
Authors
Takayuki Tamura  The University of Tokyo, 7-22-1 Roppongi, Minato-ku, Tokyo 106, Japan
Masato Oguchi  The University of Tokyo, 7-22-1 Roppongi, Minato-ku, Tokyo 106, Japan
Masaru Kitsuregawa  The University of Tokyo, 7-22-1 Roppongi, Minato-ku, Tokyo 106, Japan
Sponsors
IEEE-CS\DATC : IEEE Computer Society
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 63,   Citation Count: 10
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/509593.509642
What is a DOI?

ABSTRACT

We developed a PC cluster system consists of 100 PCs. Each PC employs the 200MHz Pentium Pro CPU and is connected with others through an ATM switch. We picked up two kinds of data intensive applications. One is decision support query processing. And the other is data mining, specifically, association rule mining.As a high speed network, ATM technology has recently come to be a de facto standard. While other high performance network standards are also available, ATM networks are widely used from local area to widely distributed environments. One of the problems of the ATM networks is its high latencies, in contrast to their higher bandwidths. This is usually considered a serious flaw of ATM in composing high performance massively parallel processors. However, applications such as large scale database analyses are insensitive to the communication latency, requiring only the bandwidth.On the other hand, the performance of personal computers is increasing rapidly these days while the prices of PCs continue to fall at a much faster rate than workstations'. The 200MHz Pentium Pro CPU is competitive in integer performance to the processor chips found in workstations. Although it is still weak at floating point operations, they are not frequently used in database applications.Thus, by combining PCs and ATM switches we can construct a large scale parallel platform very easily and very inexpensively. In this paper, we examine how such a system can help the data warehouse processing, which currently runs on expensive high-end mainframes and/or workstation servers.In our first experiment, we used the most complex query of the standard benchmark, TPC-D, on a 100 GB database to evaluate the system compared with commercial parallel systems. Our PC cluster exhibited much higher performance compared with those in current TPC benchmark reports. Second, we parallelized association rule mining and ran large scale data mining on the PC cluster. Sufficiently high linearity was obtained. Thus we believe that such commodity based PC clusters will play a very important role in large scale database processing.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
4
 
5
 
6
7
 
8
 
9
J. Heinanen. Multiprotocol encapsulation over ATM adaptation layer 5. Technical Report RFC1483, 1993.
 
10
 
11
 
12
M. Laubach. Classical IP and ARP over ATM. Technical Report RFC1577, 1994.
 
13
 
14
 
15
 
16
T. Tamura, M. Nakamura, M. Kitsuregawa, and Y. Ogawa. Implementation and performance evaluation of the parallel relational database server SDC-II. In Proceedings of International Conference on Parallel Processing, 25th, pages I-212-I-221, 1996.
 
17
TPC. TPC Benchmark™ D (Decision Support). Standard Specification Revision 1.1, Transaction Processing Performance Council, 1995.

CITED BY  10
Collaborative Colleagues:
Takayuki Tamura: colleagues
Masato Oguchi: colleagues
Masaru Kitsuregawa: colleagues