Multi-core Processing The Past and The Future Amir Moghimi, ASIC Course, UT ECE
The Past Instruction Level Parallelism (ILP) Enhanced Processors Wide Dynamic Execution [1] with techniques such as: speculative execution (using branch prediction) out of order execution (using register renaming and reservation stations) super scalar (using multiple-issue instruction cache and reorder buffer) e.g. Intel P6 Micro-architecture Used in: Pentium® Pro processor, Pentium® II processor and Pentium® III processors
ILP Limitations Window size limitation [2] due to: 2450 comparisons for register dependency detection among 50 instructions in one clock cycle! A branch instruction every 5 instructions on average Imperfect branch prediction Serial nature of the application with true data dependencies So, how to use this huge amount of silicon coming every year and a half? Use multiple cores on a single die
Multi-core Basics A multi-core chip is one which combines two or more independent processing cores into a single die (also known as Chip Multi-Processor) Four main questions arise [3]: How the application is developed? How do they share data? How do they physically communicate? How scalable is the architecture?
Given Answers For parallel application development, use the thread concept formerly proposed for discrete multi-processor systems # of Proc Communication model Message passing8 to 2048 Shared address NUMA8 to 256 UMA2 to 64 Physical connection Network8 to 256 Bus2 to 36 [3]
Chip Level Multi-threading Implemented in superscalar processors before introducing multi-core chips Multi-threading Methods: Fine-grained Coarse-grained Simultaneous MT e.g. Intel HyperThreading Technology
4-way Threading Processor [3] Thread AThread B Thread CThread D Time → Issue slots → SMTFine MTCoarse MT
Now Multi-core Processing A simple look at a multi-core processor (IBM Xenon used in MS-Xbox 360) Simple but effective Core 0 L1D L1I Core 1 L1D L1I Core 2 L1D L1I 1MB UL2 [4]
A More Powerful Design STI Cell (used in PS3) [8]
A Comparison Sun UltraSPARC T1 [5] 4-way MT SPARC pipe Crossbar 4-way banked L2 Memory controllers I/O shared funcs [3]
UltraSPARC T1 vs. Pentium EE [5]
UltraSPARC T1 vs. Pentium EE Performance Comparison running SPEC JBB 2000, TPC-C, TPC-W, and XML Test as server benchmarks and SPEC CPU2000 as the serial benchmark [5] Pentium Extreme Edition Die Photo [5]
Now the Trend Intel will deliver a quad-core (4 full execution cores) processor in the first quarter of 2007 [1] “We forecast that more than 85 percent of our server processors and more than 70 percent of our mobile and desktop Pentium® family processor shipments will be multi-core–based by the end of 2006” [7] Intel plans to have 32 cores on a die till 2015 [7] But do not forget the high power density and memory bandwidth issues!
Thanks Any questions?
References John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach 2nd Edition. Morgan Kaufmann, PSU CSE 431, Mary Jane Irwin, Computer Architecture, Fall 2005, Lecture lnxw09XBoxDesignhttp://www-128.ibm.com/developerworks/power/library/pa-fpfxbox/?ca=dgr- lnxw09XBoxDesign 5. kalender/23b3b3ee6c4e487e6f4205fa03e783bc.0.0/Niagara_CMT.pdf 6. James Laudon: Performance/Watt: the new server focus. SIGARCH Computer Architecture News 33(4): 5-13 (2005)