+ CS 325: CS Hardware and Software Organization and Architecture Multicore Computers 1
+ Outline Introduction Motivation for Multi-Core What is multi-core processor? Properties of Multi-core systems Applications benefit from multi-core Multiprocessor memory types Multi-core design Symmetric multi-core processor Asymmetric multi-core processor Advantages & disadvantages of multi-core 2
+ Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved organization Increased clock frequency Increase in Parallelism Pipelining Superscalar (multi-issue) Simultaneous multithreading (SMT) Diminishing returns More complexity requires more logic Increasing chip area for coordinating and signal transfer logic Harder to design, make and debug 3
+ Introduction Flood of Computer Tasks(1990’s) Increasing number of computer users Server management ▪ We need better performance of PC or Server. → These demands accelerate the development of microprocessor. Emergence of Multi-core Processor(2000’s) Improvements over single core ▪ Put execution cores in one die 4
+ Increased Complexity Power requirements grow exponentially with chip density and clock frequency Can use more chip area for cache Smaller Order of magnitude lower power requirements By 2016 >100 billion transistors on 300mm 2 die >1 billion transistors for logic 5
+ Increased Complexity Multicore has the potential for near-linear improvement Needs some programming effort Won’t work for all problems Unlikely that one core can use all of a huge cache effectively, so add processing units (cores) to make an MPSoC (Multiprocessing System on Chip) 6
+ Power and Memory Considerations Less action More action We passed 50%!!! Is this a RAM or a processor? 7
+ Chip Utilization of Transistors Cache CPU 8
+ Effective Applications for Multicore Processors Database (e.g. Select *) Servers handling independent transactions Multi-threaded native applications Lotus Domino, Siebel CRM Multi-process applications Oracle, SAP, PeopleSoft Java applications Java VM is multi-threaded with scheduling and memory management Sun’s Java Application Server, IBM Websphere, Tomcat Multi-instance applications One application running multiple times 9
+ Motivation for Multi-Core Exploits increased feature-size and density Increases functional units per chip Limits energy consumption per operation Constrains growth in processor complexity 10
+ Multi-Core Computer A multi-core processor is a processing system composed of two or more independent cores (or CPUs). The cores are typically integrated onto a single integrated circuit die (known as a chip multiprocessor or CMP). A many-core processor is one in which the number of cores is large enough that traditional multi- processor techniques are no longer efficient Somewhere in the range of several tens of cores - and likely requires a network on chip. 11
+ Multi-Core Computer dual-core processor contains two independent microprocessors. A dual core set-up is somewhat comparable to having multiple, separate processors installed in the same computer. But because the two processors are actually plugged into the same socket, the connection between them is faster. Ideally, a dual core processor is nearly twice as powerful as a single core processor. In practice, performance gains are about 50%: A dual core processor is likely to be about one-and-a-half times as powerful as a single core processor. 12
+ Multi-Core Computer A multi-core processor implements multiprocessing in a single physical package. Cores may or may not share caches May implement message passing or shared memory inter-core communication methods. All cores are identical in symmetric multi-core systems. EX: Intel Core 2 Duo They are not identical in asymmetric multi-core systems. EX: IBM Cell Processor 13
+ CMP benefits with a shared on-chip cache memory, communication events can be reduced to just a handful of processor cycles. therefore with low latencies, communication delays have a much smaller impact on overall performance. threads can also be much smaller and still be effective. automatic parallelization more feasible. 14
+ Core i7 and Duo Let us review these two Intel architectures… 15
+ Individual Core Architecture Intel Core Duo uses superscalar cores More than one instruction executed at a time during a clock cycle. Intel Core i7 uses simultaneous multi-threading (SMT) Scales up number of threads supported (extended superscalar architecture) 4 SMT cores, each supporting 4 threads appears as 16 core (i7 has 2 threads per CPU) Core i7Core 2 duo 16
+ Intel x86 Multicore Organization - Core Duo 2006 Two x86 superscalar, shared L2 cache Dedicated L1 cache per core 32KB instruction and 32KB data Thermal control unit per core Manages chip heat dissipation with sensors, clock speed is throttled Maximize performance within thermal constraints Improved ergonomics (quiet fan) Advanced Programmable Interrupt Controlled (APIC) Inter-process interrupts between cores Routes interrupts to appropriate core Includes timer so OS can self-interrupt a core 17
+ Intel x86 Multicore Organization - Core Duo Power Management Logic Monitors thermal conditions and CPU activity Adjusts voltage (and thus power consumption) Can switch on/off individual logic subsystems to save power Split-bus transactions can sleep on one end 2MB shared L2 cache Dynamic allocation MESI support for L1 caches Extended to support multiple Core Duo in SMP (not SMT) L2 data shared between local cores (fast) or external Bus interface is FSB 18
+ Intel x86 Multicore Organization - Core i7 November 2008 Four x86 SMT processors Dedicated L2, shared L3 cache Speculative pre-fetch for caches On chip DDR3 memory controller Three 8 byte channels (192 bits) giving 32GB/s No front side bus (just like labs 1 & 2 with the SDRAM controller) QuickPath Interconnect Cache coherent point-to-point link High speed communications between processor chips Total bandwidth 25.6GB/s 19
+ What applications benefit from multi-core? Database servers Web servers Telecommunication markets Multimedia applications Scientific applications In general, applications with Thread-level parallelism (as opposed to instruction-level parallelism) 20
+ Multi-core architectures Replicate multiple processor cores on a single die. The cores fit on a single processor socket. 21
+ The cores run in parallel (like on a uniprocessor) core1core1 core2core2 core3core3 core4core4 several threads 22
+ Programming for multi-core Programmers must use threads or processes. Write parallel algorithms. OS will map threads/processes to cores Spread the workload across multiple cores. 23
+ Examples Editing a photo while recording a TV show through a digital video recorder. Downloading software while running an anti- virus program. “Anything that can be threaded today will map efficiently to multi-core”. BUT: some applications difficult to parallelize. Examples? Piped processes 24
+ Multiprocessor memory types Shared memory: In this model, there is one (large) common shared memory for all processors. Distributed memory: In this model, each processor has its own (small) local memory, and its content is not replicated anywhere else. 25
+ Microprocessor Design Taking the idea of superscalar operations to the next level, it is possible to put multiple microprocessor cores onto a single chip, and have the cores operate in parallel with one another. 26
+ Symmetric Multi-core Processor(SMP ) A symmetric multi-core processor is one that has multiple cores on a single chip, and all of those cores are identical. Example: Intel i3, i5, i7 The Intel i series CPU is an example of a symmetric multi-core processor. The i series can have either 2 cores on chip (“i3”) or 4 cores on chip (“i5/i7”). Each core in the i series CPU is symmetrical, and can function independently of one another. It requires a mixture of scheduling software and hardware to farm tasks out to each core. 27
+ Symmetric Multi-core Processor Applications Personal Computers Servers/Clusters 28
+ Asymmetric Multi-core Processor An asymmetric multi-core processor is one that has multiple cores on a single chip, but those cores might be different designs. For instance, there could be 2 general purpose cores and 2 vector cores on a single chip. 29
+ Asymmetric Multi-core Processor(ASMP) – Cell Processor Applications Super Computing: ▪ IBM's latest supercomputer, IBM Roadrunner, is a hybrid of General Purpose CISC Opteron as well as Cell processors. 30
+ Applications Home cinema ▪ Toshiba is considering producing HDTVs using Cell. They have already presented a system to decode 48 standard definition MPEG-2 streams. This can enable a viewer to choose a channel based on dozens of thumbnail videos displayed on the screen in the same time. Asymmetric Multi-core Processor(ASMP) – Cell Processor 31
+ Applications Video Processing Card ▪ Some companies, such as Leadtek, have plans to release a PCI-E card based upon the Cell to allow for "faster than real time" transcoding of H.264, MPEG-2 and MPEG-4 video. Asymmetric Multi-core Processor(ASMP) – Cell Processor 32
+ Applications Console Video Games ▪ The first major commercial application of Cell was in Sony's PlayStation 3 game console. ▪ This video game console contains the first production application of the Cell processor, clocked at 3.2 GHz and containing seven out of eight operational cores Asymmetric Multi-core Processor(ASMP) – Cell Processor 33
+ Future Based on the unique features, Cell can bridge the gap between conventional desktop processors and more specialized high-performance processors, such as the NVIDIA and ATI graphics-processors (GPUs). Asymmetric Multi-core Processor(ASMP) – Cell Processor 34
+ Challenges resulting from multi-core Aggravates memory wall Memory bandwidth ▪ Way to get data out of memory banks ▪ Way to get data into multi-core processor array Memory latency Fragments L3 cache Pins become strangle point ▪ Rate of pin growth projected to slow and flatten ▪ Rate of bandwidth per pin (pair) projected to grow slowly Requires mechanisms for efficient inter-processor coordination Synchronization Mutual exclusion Context switching 35
+ Advantages of Multi-core Cache circuitry can operate at a much higher clock rate than is possible if the signals have to travel off-chip. Signals between different CPUs (cores) travel shorter distances, those signals degrade less. These higher quality signals allow more data to be sent in a given time period. A dual-core processor uses slightly less power than two coupled single-core processors. 36
+ Disadvantages of Multi-core Ability of multi-core processors to increase application performance depends on the use of multiple threads within applications. Most Current video games will run faster on a 3 GHz single- core processor than on a 2GHz dual-core processor (of the same core architecture. Two processing cores sharing the same system bus and memory bandwidth limits the real-world performance advantage. If a single core is close to being memory bandwidth limited, going to dual-core might only give 30% to 70% improvement. If memory bandwidth is not a problem, a 90% improvement can be expected. 37
+ Conclusion Multi-core processors represent an important new trend in computer architecture. Decreased power consumption and heat generation. Minimized wire lengths and interconnect latencies. They enable true thread-level parallelism with great energy efficiency and scalability. To utilize their full potential, applications will need to move from a single to a multi-threaded model. Parallel programming techniques likely to gain importance. The difficult problem is not building multi-core hardware, but programming it in a way that lets mainstream applications benefit from the continued growth in CPU performance. 38