| Parallel database processing on a 100 Node PC cluster: cases for decision support query processing and data mining |
| Full text |
Pdf
(158 KB)
|
| Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM)
table of contents
San Jose, CA
Pages: 1 - 16
Year of Publication: 1997
ISBN:0-89791-985-8
|
|
Authors
|
|
Takayuki Tamura
|
The University of Tokyo, 7-22-1 Roppongi, Minato-ku, Tokyo 106, Japan
|
|
Masato Oguchi
|
The University of Tokyo, 7-22-1 Roppongi, Minato-ku, Tokyo 106, Japan
|
|
Masaru Kitsuregawa
|
The University of Tokyo, 7-22-1 Roppongi, Minato-ku, Tokyo 106, Japan
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 63, Citation Count: 10
|
|
|
ABSTRACT
We developed a PC cluster system consists of 100 PCs. Each PC employs the 200MHz Pentium Pro CPU and is connected with others through an ATM switch. We picked up two kinds of data intensive applications. One is decision support query processing. And the other is data mining, specifically, association rule mining.As a high speed network, ATM technology has recently come to be a de facto standard. While other high performance network standards are also available, ATM networks are widely used from local area to widely distributed environments. One of the problems of the ATM networks is its high latencies, in contrast to their higher bandwidths. This is usually considered a serious flaw of ATM in composing high performance massively parallel processors. However, applications such as large scale database analyses are insensitive to the communication latency, requiring only the bandwidth.On the other hand, the performance of personal computers is increasing rapidly these days while the prices of PCs continue to fall at a much faster rate than workstations'. The 200MHz Pentium Pro CPU is competitive in integer performance to the processor chips found in workstations. Although it is still weak at floating point operations, they are not frequently used in database applications.Thus, by combining PCs and ATM switches we can construct a large scale parallel platform very easily and very inexpensively. In this paper, we examine how such a system can help the data warehouse processing, which currently runs on expensive high-end mainframes and/or workstation servers.In our first experiment, we used the most complex query of the standard benchmark, TPC-D, on a 100 GB database to evaluate the system compared with commercial parallel systems. Our PC cluster exhibited much higher performance compared with those in current TPC benchmark reports. Second, we parallelized association rule mining and ran large scale data mining on the PC cluster. Sufficiently high linearity was obtained. Thus we believe that such commodity based PC clusters will play a very important role in large scale database processing.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
|
 |
3
|
Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau , David E. Culler , Joseph M. Hellerstein , David A. Patterson, High-performance sorting on networks of workstations, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.243-254, May 11-15, 1997, Tucson, Arizona, United States
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
J. Heinanen. Multiprotocol encapsulation over ATM adaptation layer 5. Technical Report RFC1483, 1993.
|
| |
10
|
|
| |
11
|
|
| |
12
|
M. Laubach. Classical IP and ARP over ATM. Technical Report RFC1577, 1994.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
T. Tamura, M. Nakamura, M. Kitsuregawa, and Y. Ogawa. Implementation and performance evaluation of the parallel relational database server SDC-II. In Proceedings of International Conference on Parallel Processing, 25th, pages I-212-I-221, 1996.
|
| |
17
|
TPC. TPC Benchmark™ D (Decision Support). Standard Specification Revision 1.1, Transaction Processing Performance Council, 1995.
|
CITED BY 10
|
|
|
|
|
|
|
|
|
|
|
K. Goda , T. Tamura , M. Kitsuregawa , A. Chowdhury , O. Frieder, Query optimization for vector space problems, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.416-417, September 2001, New Orleans, Louisiana, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|