|
ABSTRACT
This article extends our prior work to show that a straightforward use of 3D stacking technology enables the design of compact energy-efficient servers. Our proposed architecture, called PicoServer, employs 3D technology to bond one die containing several simple, slow processing cores to multiple memory dies sufficient for a primary memory. The multiple memory dies are composed of DRAM. This use of 3D stacks readily facilitates wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency means that thermal constraints, a concern with 3D stacking, are easily satisfied. We extend our original analysis on PicoServer to include: (1) a wider set of server workloads, (2) the impact of multithreading, and (3) the on-chip DRAM architecture and system memory usage. PicoServer is intentionally simple, requiring only the simplest form of 3D technology where die are stacked on top of one another. Our intent is to minimize risk of introducing a new technology (3D) to implement a class of low-cost, low-power compact server architectures.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
3DRISC. 2004. FaStack 3D RISC super-8051 microcontroller. http://www.tachyonsemi.com/OtherICs/datasheets/TSCR8051Lx_1_5Web.pdf.
|
| |
2
|
ARM11MPcore. 2004. ARM 11 MPcore. http://www.arm.com/products/CPUs/ARM11MPCoreMultiprocessor.html.
|
| |
3
|
Banerjee, K., Souri, S. J., Kapur, P., and Saraswat, K. C. 2001. 3-D ICs: A novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration. Proc. IEEE 89, 5 (May), 602--533.
|
 |
4
|
|
| |
5
|
Nathan L. Binkert , Ronald G. Dreslinski , Lisa R. Hsu , Kevin T. Lim , Ali G. Saidi , Steven K. Reinhardt, The M5 Simulator: Modeling Networked Systems, IEEE Micro, v.26 n.4, p.52-60, July 2006
[doi> 10.1109/MM.2006.82]
|
| |
6
|
Bryan Black , Murali Annavaram , Ned Brekelbaum , John DeVale , Lei Jiang , Gabriel H. Loh , Don McCaule , Pat Morrow , Donald W. Nelson , Daniel Pantuso , Paul Reed , Jeff Rupley , Sadasivan Shankar , John Shen , Clair Webb, Die Stacking (3D) Microarchitecture, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.469-479, December 09-13, 2006
[doi> 10.1109/MICRO.2006.18]
|
| |
7
|
|
| |
8
|
Bryant, R., Hawkes, J., Steiner, J., Barnes, J., and Higdon, J. 2004. Scaling Linux to the extreme from 64 to 512 processors. In the Linux Symposium.
|
| |
9
|
Chiang, T.-Y., Souri, S. J., Chui, C. O., and Saraswat, K. C. 2001. Thermal analysis of heterogeneous 3-D ICs with various integration scenario. In IEDM Tech. Digest, 681--684.
|
| |
10
|
Clark, L. T., Hoffman, E. J., Miller, J., Biyani, M., Liao, Y., Strazdus, S., Morrow, M., Verlarde, K. E., and Yarch, M. A. 2001. An embedded 32-b microprocessor core for low-power and high-performance applications. IEEE J. Solid State Circ. 36, 11 (Nov.), 1599--1608.
|
| |
11
|
Congduc, E. L. 2004. Packet classification in the NIC for improved SMP-based Internet servers. In Proceedings of the International Conference on Networking.
|
| |
12
|
W. Rhett Davis , John Wilson , Stephen Mick , Jian Xu , Hao Hua , Christopher Mineo , Ambarish M. Sule , Michael Steer , Paul D. Franzon, Demystifying 3D ICs: The Pros and Cons of Going Vertical, IEEE Design & Test, v.22 n.6, p.498-510, November 2005
[doi> 10.1109/MDT.2005.136]
|
| |
13
|
Flynn, M. J. and Hung, P. 2004. Computer architecture and technology: Some thoughts on the road ahead. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, 3--16.
|
| |
14
|
|
 |
15
|
|
| |
16
|
Gupta, S., Hilbert, M., Hong, S., and Patti, R. 2004. Techniques for producing 3D ICs with high-density interconnect. www.tezzaron.com/about/papers/ieee_vmic_2004_finalsecure.pdf.
|
| |
17
|
Ho, R. and Horowitz, M. 2001. The future of wires. Proc. IEEE 89, 4 (Apr.).
|
 |
18
|
Wei Huang , Mircea R. Stan , Kevin Skadron , Karthik Sankaranarayanan , Shougata Ghosh , Sivakumar Velusam, Compact thermal modeling for temperature-aware design, Proceedings of the 41st annual conference on Design automation, June 07-11, 2004, San Diego, CA, USA
[doi> 10.1145/996566.996800]
|
| |
19
|
ITRS 2005. ITRS roadmap. Tech. Rep.
|
| |
20
|
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
Koyanagi, M. 2005. Different approaches to 3D chips. http://asia.stanford.edu/events/Spring05/slides/051205-Koyanagi.pdf.
|
| |
25
|
Kunkel, S. R., Eickemeyer, R. J., Lipasti, M. H., Mullins, T. J., O'Krafka, B., Rosenberg, H., VanderWiel, S. P., Vitale, P. L., and Whitley, L. D. 2000. A performance methodology for commercial servers. IBM J. Res. Develop. 44, 6.
|
 |
26
|
|
| |
27
|
Lee, K., Nakamura, T., Ono, T., Yamada, Y., Mizukusa, T., Hashimoto, H., Park, K., Kurino, H., and Koyanagi, M. 2000. Three-Dimensional shared memory fabricated using wafer stacking technology. In IEDM Tech. Digest, 165--168.
|
 |
28
|
Kevin Lim , Parthasarathy Ranganathan , Jichuan Chang , Chandrakant Patel , Trevor Mudge , Steven Reinhardt, Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments, Proceedings of the 35th International Symposium on Computer Architecture, p.315-326, June 21-25, 2008
|
 |
29
|
Gian Luca Loi , Banit Agrawal , Navin Srivastava , Sheng-Chih Lin , Timothy Sherwood , Kaustav Banerjee, A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy, Proceedings of the 43rd annual conference on Design automation, July 24-28, 2006, San Francisco, CA, USA
[doi> 10.1145/1146909.1147160]
|
| |
30
|
LS3 2007. (LS)3-Libre streaming, Libre software, Libre standards an open multimedia streaming project. http://streaming.polito.it/.
|
| |
31
|
Lu, J. 2005. Wafer-Level 3D hyper-integration technology platform. www.rpi.edu/~luj/RPI_3D_Research_0504.pdf.
|
| |
32
|
MacGillivray, G. 2005. Process vs. density in DRAMs. http://www.eetasia.com/ARTICLES/2005SEP/B/2005SEP01_STOR_TA.pdf.
|
| |
33
|
Maltz, D. A. and Bhagwat, P. 1998. TCP splicing for application layer proxy performance. Res. Rep. RC 21139, IBM. March.
|
| |
34
|
|
| |
35
|
MicronDRAM 2008. The Micron system-power calculator. http://www.micron.com/support/part_info/powercalc.
|
| |
36
|
|
| |
37
|
NetRAM. 2005. Evolution of network memory. http://www.jedex.org/images/pdf/jack_troung_samsung.pdf.
|
| |
38
|
NSNIC 2001. National semiconductor DP83820 10 /100 /1000 Mb/s PCI ethernet network interface controller.
|
| |
39
|
Ohsawa, T., Fujita, K., Hatsuda, K., Higashi, T., Shino, T., Minami, Y., Nakajima, H., Morikado, M., Inoh, K., Hamamoto, T., Watanabe, S., Fujii, S., and Furuyama, T. 2006. Design of a 128-Mb SOI DRAM Using the Floating Body Cell (FBC). IEEE J. Solid State Circ. 41, 1 (Jan).
|
| |
40
|
OSDL. 2006. OSDL dataBase test suite. http://www.osdl.net/lab_activities/kernel_testing/osdl_database_test_suite/.
|
| |
41
|
|
| |
42
|
Ricci, F., Clark, L. T., Beatty, T., Yu, W., Bashmakov, A., Demmons, S., Fox, E., Miller, J., Biyani, M., and Haigh, J. 2005. A 1.5GHz 90nm embedded microprocessor core. In Proceedings of the Symposium on VLSI Circuits.
|
| |
43
|
RLDRAM. 2008. RLDRAMA memory. http://www.micron.com/products/dram/rldram/.
|
| |
44
|
Schutz, J. and Webb, C. 2004. A scalable X86 CPU design for 90 nm process. In Proceedings of the International Solid-State Circuits Conference.
|
| |
45
|
Shah, M., Barreh, J., Brooks, J., Golla, R., Grohoski, G., Gura, N., Hetherington, R., Jordan, P., Luttrell, M., Olson, C., Saha, B., Sheahan, D., Spracklen, L., and Wynn, A. 2007. UltraSPARC T2: A highly-threaded, power-efficient, SPARC SOC. In Asian Solid-State Circuirts Conference.
|
| |
46
|
SPECWeb. 1999. SPECweb99 benchmark. http://www.spec.org/osg/web99/.
|
| |
47
|
SPECWeb. 2005. SPECweb2005 benchmark. http://www.spec.org/web2005/.
|
| |
48
|
Sun Fire T2000. 2008. Sun Fire T2000 server power calculator. http://www.sun.com/servers/coolthreads/t2000/calc/index.jsp.
|
| |
49
|
Wendell, D., Lin, J., Kaushik, P., Seshadri, S., Wang, A., Sundararaman, V., Wang, P., McIntyre, H., Kim, S., Hsu, W., Park, H., Levinsky, G., Lu, J., Chirania, M., Heald, R., and Lazar, P. 2004. A 4MB on-chip l2 cache for a 90nm 1.6GHz 64b SPARC microprocessor. In Proceedings of the International Solid-State Circuits Conference.
|
| |
50
|
Xue, L., Liu, C. C., Kim, H.-S., Kim, S., and Tiwari, S. 2003. Three-Dimensional integration: Technology, use, and issues for mixed-signal applications. IEEE Trans. Electron Devices 50, 601--609.
|
|