|
ABSTRACT
The proliferation of digital data has resulted in a mushrooming of data-intensive applications, especially in the area of unstructured data processing. Given the growing popularity of unstructured data processing applications (e.g., FlickrTM, Google MapsTM), it is important to rethink system architectures to efficiently run these applications, from both the performance and power viewpoints. In this paper, we revisit active storage, which proposed offloading computation to disk drive processors, as a possible system architecture for these applications. Unlike previous work, we evaluate the microarchitectural aspects of active storage and perform an in-depth examination of the design of the offload processors. Using a set of unstructured data processing benchmarks, we examine two choices along the I/O path where the computation can be offloaded in existing system architectures -- a disk drive processor and a disk array controller. Our evaluation demonstrates that there are interesting tradeoffs in the choice of each location and that microarchitectural enhancements to these processors can provide significant performance boosts. We show that active storage architectures can provide large power savings, by using lower-power processors along the I/O path, while exploiting the data-level parallelism on the storage side of the system.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Anurag Acharya , Mustafa Uysal , Joel Saltz, Active disks: programming model, algorithms and evaluation, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.81-91, October 02-07, 1998, San Jose, California, United States
|
| |
2
|
AMD Opteron Processor Power and Thermal Data Sheet, May 2006.
|
| |
3
|
Khalil Amiri , David Petrou , Gregory R. Ganger , Garth A. Gibson, Dynamic function placement for data-intensive cluster computing, Proceedings of the annual conference on USENIX Annual Technical Conference, p.25-25, June 18-23, 2000, San Diego, California
|
| |
4
|
|
| |
5
|
ARM Collaborates With Seagate For Hard Disc Drive Control, June 2002. ARM Press Release.
|
| |
6
|
H. Boral and D. DeWitt. Database Machines: An Idea Whose Time Has Passed? In Proceedings of the International Workshop on Database Machines, pages 166?-187, September 1983.
|
 |
7
|
|
| |
8
|
|
 |
9
|
|
| |
10
|
R. Bryant. Data-Intensive Supercomputing: The Case for DISC. Technical Report CMU-CS-07-128, School of Computer Science, Carnegie Mellon University, May 2007.
|
 |
11
|
|
| |
12
|
M. Casey and M. Slaney. Song intersection by approximate nearest neighbor search. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), pages 144?-149, October 2006.
|
| |
13
|
R. Chamberlain, M. Franklin, and R. Indeck. Exploiting Reconfigurability for Text Search. In Proceedings of the Workshop on High Performance Embedded Computing (HPEC), September 2006.
|
| |
14
|
Exegy TextMiner (Whitepaper). http://www.exegy.com.
|
| |
15
|
J. Gantz, D. Reinsel, C. Chute, V. Schlichting, J. McArthur, S. Minton, I. Xheneti, A. Toncheva, and A. Manfrediz. The Expanding Digital Universe - A Forecaset of Worldwide Information Growth Through 2010, March 2007. IDC Whitepaper.
|
| |
16
|
R. Gens. Geospatial Data Fusion - Seminar Talk, Geophysical Institute, University of Alaska Fairbanks, February 2004.
|
 |
17
|
|
 |
18
|
|
| |
19
|
John J. Hartman , Peter A. Bigot , Patrick Bridges , Brady Montz , Rob Piltz , Oliver Spatscheck , Todd A. Proebsting , Larry L. Peterson , Andy Bavier, Joust: A Platform for Liquid Software, Computer, v.32 n.4, p.50-56, April 1999
[doi> 10.1109/2.755005]
|
 |
20
|
|
| |
21
|
Larry Huston , Rahul Sukthankar , Rajiv Wickremesinghe , M. Satyanarayanan , Gregory R. Ganger , Erik Riedel , Anastassia Ailamaki, Diamond: A Storage Architecture for Early Discard in Interactive Search, Proceedings of the 3rd USENIX Conference on File and Storage Technologies, March 31-31, 2004, San Francisco, CA
|
| |
22
|
IBM Unstructured Information Management Architecture. http://www.research.ibm.com/UIMA/.
|
| |
23
|
Intel PXA 255 Processor. http://www.intel.com/design/pca/prodbref/252780.htm.
|
| |
24
|
Intel PXA255 Processor - Electrical, Mechanical, and Thermal Specification, February 2004.http://www.intel.com/design/pca/applicationsprocessors/manuals/278780.htm.
|
| |
25
|
International Technology Roadmap for Semiconductors - 2006 Update, 2006. http://www.itrs.net.
|
| |
26
|
B. Jarvinen, C. Neumann, and M. Davis. A Tropical Cyclone Data Tape for the North Atlantic Basin, 1886--1983: Contents, Limitations, and Uses. Technical Report NWS NHC 22, National Oceanic and Atmospheric Administration (NOAA), 1984.
|
 |
27
|
|
| |
28
|
Y. Kim, S. Gurumurthi, and A. Sivasubramaniam. Understanding the Performance-Temperature Interactions in Disk I/O of Server Workloads. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pages 179?-189, February 2006.
|
 |
29
|
Rakesh Kumar , Dean M. Tullsen , Parthasarathy Ranganathan , Norman P. Jouppi , Keith I. Farkas, Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance, Proceedings of the 31st annual international symposium on Computer architecture, p.64, June 19-23, 2004, München, Germany
|
| |
30
|
T. Lehmann, M. Guld, C. Thies, B. Fischer, D. Keysers, M. Kohnen, H. Schubert, and B. Wein. Content-based Image Retrieval in Medical Applications. Methods of Information in Medicine, 43(4):354?-361, 2004.
|
| |
31
|
X. Ma and A. N. Reddy. MVSS: An Active Storage Architecture. IEEE Transactions on Parallel and Distributed Systems, 14(10):993-?1005, 2003.
|
| |
32
|
Micron. http://www.micron.com/.
|
| |
33
|
MIT Center for Biological and Computational Learning (CBCL) Face Recognition Database. http://cbcl.mit.edu/software-datasets/heisele/facerecognitiondatabase.html.
|
| |
34
|
|
| |
35
|
|
| |
36
|
NASA World Wind. http://worldwind.arc.nasa.gov/.
|
| |
37
|
|
 |
38
|
|
| |
39
|
Muthian Sivathanu , Vijayan Prabhakaran , Florentina I. Popovici , Timothy E. Denehy , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau, Semantically-Smart Disk Systems, Proceedings of the 2nd USENIX Conference on File and Storage Technologies, March 31-31, 2003, San Francisco, CA
|
| |
40
|
|
| |
41
|
|
| |
42
|
|
| |
43
|
The Netezza Performance Server System. http://www.netezza.com/products/products.cfm.
|
| |
44
|
A. Waxman, D. Fay, B. Rhodes, T. McKenna, R. Ivey, N. Bomberger, V. Bykoski, and G. Carpenter. Information Fusion for Image Analysis: Geospatial Foundations for Higher-Level Fusion. In Proceedings of the International Conference on Information Fusion (ISIF), pages 562-?569 Vol. 1, July 2002.
|
| |
45
|
R. Weber. SCSI Object-Based Storage Device Commands (OSD). Technical Report T10/1355-D, InterNational Committee for Information Technology Standards, July 2004.
|
| |
46
|
C. White. Consolidating, Accessing, and Analyzing Unstructured Data, December 2005. Business Intelligence Network article.
|
| |
47
|
W. W. Wilcke , R. B. Garner , C. Fleiner , R. F. Freitas , R. A. Golding , J. S. Glider , D. R. Kenchammana-Hosekote , J. L. Hafner , K. M. Mohiuddin , K. K. Rao , R. A. Becker-Szendy , T. M. Wong , O. A. Zaki , M. Hernandez , K. R. Fernandez , H. Huels , H. Lenk , K. Smolin , M. Ries , C. Goettert , T. Picunko , B. J. Rubin , H. Kahn , T. Loo, IBM intelligent Bricks project: petabytes and beyond, IBM Journal of Research and Development, v.50 n.2/3, p.181-197, March 2006
[doi> 10.1147/rd.502.0181]
|
|