ACM Home Page
Please provide us with feedback. Feedback
Timestamp snooping: an approach for extending SMPs
Full text PdfPdf (1.30 MB)
Source ACM SIGPLAN Notices archive
Volume 35 ,  Issue 11  (November 2000) table of contents
Pages: 25 - 36  
Year of Publication: 2000
ISSN:0362-1340
Authors
Milo M. K. Martin  Computer Sciences Department, University of Wisconsin-Madison
Daniel J. Sorin  Computer Sciences Department, University of Wisconsin-Madison
Anastassia Ailamaki  Computer Sciences Department, University of Wisconsin-Madison
Alaa R. Alameldeen  Computer Sciences Department, University of Wisconsin-Madison
Ross M. Dickson  Computer Sciences Department, University of Wisconsin-Madison
Carl J. Mauer  Computer Sciences Department, University of Wisconsin-Madison
Kevin E. Moore  Computer Sciences Department, University of Wisconsin-Madison
Manoj Plakal  Computer Sciences Department, University of Wisconsin-Madison
Mark D. Hill  Computer Sciences Department, University of Wisconsin-Madison
David H. Wood  Computer Sciences Department, University of Wisconsin-Madison
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 24,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/356989.356992
What is a DOI?

ABSTRACT

Symmetric multiprocessor (SMP) servers provide superior performance for the commercial workloads that dominate the Internet. Our simulation results show that over one-third of cache misses by these applications result in cache-to-cache transfers, where the data is found in another processor's cache rather than in memory. SMPs are optimized for this case by using snooping protocols that broadcast address transactions to all processors. Conversely, directory-based shared-memory systems must indirectly locate the owner and sharers through a directory, resulting in larger average miss latencies.This paper proposes timestamp snooping, a technique that allows SMPs to i) utilize high-speed switched interconnection networks and ii) exploit physical locality by delivering address transactions to processors and memories without regard to order. Traditional snooping requires physical ordering of transactions. Timestamp snooping works by processing address transactions in a logical order. Logical time is maintained by adding a few bits per address transaction and having network switches perform a handshake to ensure on-time delivery. Processors and memories then reorder transactions based on their timestamps to establish a total order.We evaluate timestamp snooping with commercial workloads on a 16-processor SPARC system using the Simics full-system simulator. We simulate both an indirect (butterfly) and a direct (torus) network design. For OLTP, DSS, web serving, web searching, and one scientific application, timestamp snooping with the butterfly network runs 6-28% faster than directories, at a cost of 13-43% more link traffic. Similarly, with the torus network, timestamp snooping runs 6-29% faster for 17-37% more link traffic. Thus, timestamp snooping is worth considering when buying more interconnect bandwidth is easier than reducing interconnect latency.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
Altavista Business Solutions. http://doc.altavista.com/ business_sohitions/bus_solutions.html.
 
4
Apache HTTP Server Project. http://www.apache.org/ httpd.html.
 
5
E. Artiaga, N. Navarro, X. Martorell, and Y. Becerra. Implementing PARMACS Macros for Shared Memory Multiprocessor Environments. Technical report, Polytechnic University of Catalunya, Department of Computer Architecture Technical Report UPC-DAC-1997-07, Jan. 1997.
6
7
8
9
 
10
R. Bisiani, A. Nowatzyk, and M. Ravishankar. Coherent Shared Memory on a Message Passing Machine. In Proceedings of the 1989 International Conference on Parallel Processing, pages 1-133-141. ICPP, August 1989.
 
11
J. Borkenhagen and S. Storino. 4th Generation 64-bit PowerPC-Compatible Commercial Processor Design. IBM Whitepaper, January 13, 1999, http://www.rs6000.ibm.com/ resource/technology/nstar.pdf.
 
12
 
13
K. Diefendorff. Power4 Focuses on Memory Bandwidth. Microprocessor Report, 13(13), Oct. 1999.
 
14
 
15
S.J. Frank. Tightly Coupled Multiprocessor System Speeds Memory-access Times. Electronics, 57(1):164-169, Jan. 1984.
16
17
18
 
19
L. Gwennap. Alpha 21364 to Ease Memory Bottleneck. Microprocessor Report, Oct. 1998.
 
20
21
22
 
23
S. Kunkel. Personal Communication, Apr. 2000.
 
24
25
26
 
27
 
28
C.E. Leiserson. Systolic Priority Queues. In Caltech Conference on VLSI, pages 199-214, Jan. 1979.
29
 
30
P.S. Magnusson etal. SimlCS/sun4m: A Virtual Workstation. In Proceedings of Usenix Annual Technical Conference, June 1998.
 
31
 
32
A. Nowatzyk. Performance Analysis of Hypercube Based Ensemble Machine Architectures. Phd thesis, Carnegie- Mellon, 1989.
 
33
A. Nowatzyk, M. Monger, M. Parkin, E. Kelly, M. Borwne, G. Aybay, and D. Lee. S3.mp: A Multiprocessor in a Matchbox. In Proc. PASA, 1993.
 
34
G.M. Papadopoulos. SC99 State-of-the-Field Address, 1999.
 
35
F. Pong, M. Dubois, and K. Lee. Design and Performance of SMPs with Asynchronous Caches. Technical Report HPL- 1999-149, HP Labs, Nov. 1999.
 
36
 
37
 
38
A. Singhal, D. Broniarczyk, F. Cerauskis, J. Price, L. Yaun, C. Cheng, D. Doblar, S. Fosth, N. Agarwal, K. Harvery, E. Hagersten, and B. Liencres. Gigaplane: A High Performance Bus of Large SMPs. In IEEE Hot Interconnects, pages 41-52, Aug. 1996.
 
39
D.J. Sorin, M. Plakal, M.D. Hill, A.E. Condon, M.M. Martin, and D.A. Wood. Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol. Technical Report 1412, Computer Sciences Department, University of Wisconsin-Madison, Mar. 2000.
40
 
41
Transaction Processing Performance Council. TPC Benchmark C, Draft Specification, Revision 4.0.q, Aug. 1999.
 
42
Transaction Processing Performance Council. TPC Benchmark H (Decision Support), Standard Specification, Revision 1.1.0, June 1999.
 
43
G. White and P. Vogt. Profusion (tin): A Buffered, Cache Coherent Crossbar Switch. In IEEE Hot Interconnects, pages 87-96, Aug. 1997.
 
44
45


Collaborative Colleagues:
Milo M. K. Martin: colleagues
Daniel J. Sorin: colleagues
Anastassia Ailamaki: colleagues
Alaa R. Alameldeen: colleagues
Ross M. Dickson: colleagues
Carl J. Mauer: colleagues
Kevin E. Moore: colleagues
Manoj Plakal: colleagues
Mark D. Hill: colleagues
David H. Wood: colleagues