|
ABSTRACT
Building commodity networked storage systems is an important architectural trend; Commodity servers hosting a moderate number of consumer-grade disks and interconnected with a high-performance network are an attractive option for improving storage system scalability and cost-efficiency. However, such systems incur significant overheads and are not able to deliver to applications the available throughput. We examine in detail the sources of overheads in such systems, using a working prototype to quantify the overheads associated with various parts of the I/O protocol. We optimize our base protocol to deal with small requests by batching them at the network level and without any I/O-specific knowledge. We also redesign our protocol stack to allow for asynchronous event processing, in-line, during send-path request processing. These techniques improve performance for a 8-disk SATA RAID0 array from 200 to 290 MBytes/s (45% improvement). Using a ramdisk, peak performance improves from 320 to 474 MBytes/s (48% improvement), which is 72% of the maximum possible throughput in our experimental setup. We also analyze the remaining system bottlenecks, and find that although commodity storage systems have potential for building high-performance I/O subsystems, traditional network and I/O protocols are not fully capable of delivering this potential.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
An Infiniband Technology Overview. Infiniband Trade Association, http://www.infinibandta.org/ibta.
|
| |
2
|
Chelsio Communications. http://www.chelsio.com.
|
| |
3
|
Mellanox Technologies. http://www.mellanox.com.
|
| |
4
|
Myri-10G Overview. http://www.myricom.com/Myri-10G/overview/.
|
| |
5
|
Rocket I/O User Guide. Xilinx Inc, http://www.xilinx.com/bvdocs-/userguides/ug024.pdf.
|
| |
6
|
|
| |
7
|
Nanette J. Boden , Danny Cohen , Robert E. Felderman , Alan E. Kulawik , Charles L. Seitz , Jakov N. Seizovic , Wen-King Su, Myrinet: A Gigabit-per-Second Local Area Network, IEEE Micro, v.15 n.1, p.29-36, February 1995
[doi> 10.1109/40.342015]
|
| |
8
|
L. Chung, J. Gray, B. Worthington, and R. Horst. Windows 2000 Disk I/O Performance. Technical Report MS-TR-2000-55, Microsoft Research, 2000.
|
| |
9
|
D. Dalessandro, P. Wyckoff, and G. Montry. Initial Performance Evaluation of the NetEffect 10 Gigabit iWARP Adapter. In Proceedings of the RAIT Workshop (in conjunction with the IEEE International Conference on Cluster Computing), 2006.
|
 |
10
|
|
| |
11
|
Dave Dunning , Greg Regnier , Gary McAlpine , Don Cameron , Bill Shubert , Frank Berry , Anne Marie Merritt , Ed Gronke , Chris Dodd, The Virtual Interface Architecture, IEEE Micro, v.18 n.2, p.66-76, March 1998
[doi> 10.1109/40.671404]
|
| |
12
|
B. Hausauer. iWARP: Reducing Ethernet Overhead in Data Center Designs. CommsDesign, Feb. 2004. http://www.commsdesign.com-/article/printableArticle.jhtml?articleID=51202855.
|
| |
13
|
I/O Performance Inc. The xdd I/O Benchmark. http://www.ioperformance.com.
|
| |
14
|
|
| |
15
|
J. Liu, B. Chandrasekaran, W. Yu, J. Wu, D. Buntinas, S. Kini, and D. Panda. Microbenchmark Performance Comparison of High-Speed Cluster Interconnects. IEEE Micro, 24(1):42--51, 2004.
|
| |
16
|
|
| |
17
|
J. Liu, D. Panda, and M. Banikazem. Evaluating the Impact of RDMA on Storage I/O over InfiniBand. In Proceedings of the SAN Workshop (in conjunction with HPCA Conference), 2004.
|
| |
18
|
M. Marazakis, V. Papaefstathiou, G. Kalokairinos, and A. Bilas. Experiences from Debugging a PCI-X-based RDMA-capable NIC. In Proceedings of the RAIT Workshop (in conjunction with IEEE International Conference on Cluster Computing, 2006.
|
 |
19
|
Manolis Marazakis , Konstantinos Xinidis , Vassilis Papaefstathiou , Angelos Bilas, Efficient remote block-level I/O over an RDMA-capable NIC, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
[doi> 10.1145/1183401.1183417]
|
| |
20
|
D. Mayhew and V. Krishnan. PCI Express and Advanced Switching: Evolutionary Path to Building Next-Generation Interconnects. In Proceedings of the 11th IEEE Symposium on High Performance Interconnects, 2003.
|
| |
21
|
Mindshare, Inc. and T. Shanley. PCI-X System Architecture. Addison-Wesley Professional, 2001.
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
J. Pinkerton. The case for rdma, 2002. RDMA Consortium, http://www.rdmaconsortium.org/home/The_Case_for_RDMA-02053.pdf.
|
| |
26
|
Greg Regnier , Srihari Makineni , Ramesh Illikkal , Ravi Iyer , Dave Minturn , Ram Huggahalli , Don Newell , Linda Cline , Annie Foong, TCP Onloading for Data Center Servers, Computer, v.37 n.11, p.48-58, November 2004
[doi> 10.1109/MC.2004.223]
|
| |
27
|
|
| |
28
|
J. Smith, and C. Traw. Operating System Support for End-to-end GBps Networking. IEEE Network, 7(2), 1993.
|
 |
29
|
|
 |
30
|
Yuanyuan Zhou , Angelos Bilas , Suresh Jagannathan , Cezary Dubnicki , James F. Philbin , Kai Li, Experiences with VI communication for database storage, Proceedings of the 29th annual international symposium on Computer architecture, May 25-29, 2002, Anchorage, Alaska
|
|