Niagara: a 32-Way Multithreaded SPARC Processor P. Kongetira, K. Aingaran, K.Olokotun Sun Microsystems Presented by Bogdan Romanescu
Goal Commercial server applications: High thread level parallelism (TLP) Large numbers of parallel client requests Low instruction level parallelism (ILP) High cache miss rates Many unpredictable branches Frequent load-load dependencies Power, cooling, and space are major concerns for data centers
Sun’s Solution UltraSPARC T1 processor “the highest-throughput and most eco-responsible processor ever created”® Multicore Fine-grain multithreading within core Simple pipelines Small L1 cache Shared L2 Metric: Performance/Watt
Architecture
Sparc pipe Hazards: UltraSPARC II style Single issue 6 stage: F, S, D, E, M, W Shared units: L1 $ TLB X units pipe registers Hazards: Data Structural
Integer Register file One register file / thread SPARC window: in, out, local registers Highly integrated cell structure to support 4 threads: 8 windows of 32 locations / thread 3 read ports + 2 write ports Read/write: single cycle latency 1 Active Window Cell (copy of the architectural set window)
Thread scheduling Thread selection based on: Select & Fetch coupled Previous long latency instruction in pipe Instruction type LRU status Select & Fetch coupled
Memory 16 KB 4 way set assoc. I$/ core 8 KB 4 way set assoc. D$/ core 3MB 12 way set assoc. L2 $ shared 4 x 750KB independent banks 2 cycle throughput, 8 cycle latency Direct link to DRAM & Jbus Manages cache coherence for the 8 cores CAM based directory Write through allocate LD no-allocate ST
Performance Test\Architecture Sun Fire T2000 IBM p5-550 with 2 dual-core Power5 chips Dell PowerEdge SPECjbb2005 (Java server software) business operations/ sec 63,378 61,789 24,208 (SC1425 with dual single-core Xeon) SPECweb2005 (Web server performance) 14,001 7,881 4,850 (2850 with two dual-core Xeon processors) NotesBench (Lotus Notes performance) 16,061 14,740
“Home run“ ? Relatively slow single-thread performance Poor floating-point performance Lack of software support ( Sun Fire T2000 does not support Linux or Windows) Price Concurrency counterattack no place as a general-purpose computer running databases small low-end market segment ? Niagara II & The “Rock” – multiprocessor & enhanced single thread support
References [1] P. Kongetira, et al, “A 32-Way Multithreaded SPARC Processor,” IEEE Micro, vol. 25, pp. 21-29, Mar., 2005. [2] A. S. Leon, et al, “A Power-Efficient High-Throughput 32-Thread SPARC Processor”, ISSCC 2006 , SESSION 5 , PROCESSORS [3] S. Chaudhry, S. Yip, P. Caprioli and M. Tremblay, “High Performance Throughput Computing” , IEEE Micro, vol. 25, Issue 3, 2005 [4] http://opensparc.sunsource.net/nonav/opensparct1.html [5] http://www.sun.com/processors/UltraSPARC-T1/features.xml [6] http://www.sun.com/servers/coolthreads/t1000/benchmarks.jsp [7] http://news.com.com/Sun+begins+Sparc+phase+of+server+overhaul/2163-1010_3-5983365.html [8] http://h71028.www7.hp.com/ERC/cache/280124-0-0-0-121.html