Download presentation
Presentation is loading. Please wait.
Published byBrynn Buttles Modified over 10 years ago
1
Intel Multi-Core Technology
2
New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm Technology Intel Turbo Boost technology – Changing frequency depending on workload Intel Hyper-Threading Technology – Two threads on a single core Tera-scale computing – Intend to scale multi-core to 100 cores and beyond
3
Multi-Core Hyper-Thread Multi-core chips allow 2 or more cores on a single package on a computer. Multi-core chips do more work per clock cycle, running at lower clock frequency. Hyper-thread allows efficient use of a single processor – by allowing multiple threads to share the core’s resources
4
Interaction with the Operating System OS perceives each core as a separate processor OS scheduler maps threads/processes to different cores Most major OS support multi-core today: Windows, Linux, Mac OS X, …
5
Why multi-core ? Difficult to make single-core clock frequencies even higher Deeply pipelined circuits: – heat problems – speed of light problems – difficult design and verification – large design teams necessary – server farms need expensive air-conditioning Many new applications are multithreaded General trend in computer architecture (shift towards more parallelism)
6
Instruction-level parallelism Parallelism at the machine-instruction level The processor can re-order, pipeline instructions, split them into microinstructions, do aggressive branch prediction, etc. Instruction-level parallelism enabled rapid increases in processor speeds over the last 15 years
7
Thread-level parallelism (TLP) This is parallelism on a more coarser scale Server can serve each client in a separate thread (Web server, database server) A computer game can do AI, graphics, and physics in three separate threads Single-core superscalar processors cannot fully exploit TLP Multi-core architectures are the next step in processor evolution: explicitly exploiting TLP
8
What applications benefit from multi-core? Database servers Web servers (Web commerce) Compilers Multimedia applications Scientific applications, CAD/CAM In general, applications with Thread-level parallelism (as opposed to instruction-level parallelism)
9
A technique complementary to multi-core: Simultaneous multithreading Problem addressed: The processor pipeline can get stalled: – Waiting for the result of a long floating point (or integer) operation – Waiting for data to arrive from memory Other execution units wait unused BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers IntegerFloating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus
10
Simultaneous multithreading (SMT) Permits multiple independent threads to execute SIMULTANEOUSLY on the SAME core Weaving together multiple “threads” on the same core 1.Example: if one thread is waiting for a floating point operation to complete, another thread can use the integer units
11
Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers IntegerFloating Point L1 D-Cache D-TLB uCode ROMBTB L2 Cache and Control Bus Thread 1: floating point
12
Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers IntegerFloating Point L1 D-Cache D-TLB uCode ROMBTB L2 Cache and Control Bus Thread 2: integer operation
13
SMT processor: both threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers IntegerFloating Point L1 D-Cache D-TLB uCode ROMBTB L2 Cache and Control Bus Thread 1: floating pointThread 2: integer operation
14
SMT not a “true” parallel processor Enables better threading (e.g. up to 30%) OS and applications perceive each simultaneous thread as a separate “virtual processor” The chip has only a single copy of each resource Compare to multi-core: each core has its own copy of resources
15
Multi-core: threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers IntegerFloating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers IntegerFloating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus
16
Multi-core: threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers IntegerFloating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers IntegerFloating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus
17
Combining Multi-core and SMT Cores can be SMT-enabled (or not) The different combinations: – Single-core, non-SMT: standard uniprocessor – Single-core, with SMT – Multi-core, non-SMT – Multi-core, with SMT The number of SMT threads: 2, 4, or sometimes 8 simultaneous threads Intel calls them “Hyper-Threads” (HT Technology)
18
SMT Dual-core: all four threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers IntegerFloating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers IntegerFloating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus
19
Comparison: multi-core vs SMT Multi-core: – Since there are several cores, each is smaller and not as powerful (but also easier to design and manufacture) – However, great with thread-level parallelism SMT – Can have one large and fast superscalar core – Great performance on a single thread – Mostly still only exploits instruction-level parallelism
20
The memory hierarchy for threading If simultaneous multithreading only: – all caches shared Multi-core chips: – L1 caches private – L2 caches private in some architectures and shared in others Memory is always shared
21
Intel Xeon Dual-core Dual-core Intel Xeon processors Each core is hyper-threaded Private L1 caches Shared L2 caches memory L2 cache L1 cache C O R E 1C O R E 0 hyper-threads
22
Designs with private L2 caches memory L2 cache L1 cache C O R E 1C O R E 0 L2 cache memory L2 cache L1 cache C O R E 1C O R E 0 L2 cache L3 cache A design with L3 caches Example: Intel Itanium 2 Both L1 and L2 are private Examples: AMD Opteron, AMD Athlon, Intel Pentium D
23
Private vs shared caches Advantages of private: – They are closer to core, so faster access – Reduces contention Advantages of shared: – Threads on different cores can share the same cache data – More cache space available if a single (or a few) high-performance thread runs on the system
24
The cache coherence problem Since we have private caches: How to keep the data consistent across caches? Each core should perceive the memory as a monolithic array, shared by all the cores MESI cache Coherence Protocol
25
The Core i3 500 series products are dual cores and they do have hyper- threading and support virtualization, but they do not have Turbo Boost.virtualization The Core i5 600 series products are dual cores which have hyper-threading, Turbo Boost, virtualization, and the AES instruction set.
26
TDP: Thermal Design Power
27
The Turbo Boost Technology When using fewer cores, transistors built into the chip disconnected from the power bus When programs need a single thread, then the connected core is automatically pumped up with extra voltage and over clock for a short period of time until the job is done. The Turbo Boost decodes when to do what to maximize performance
28
The Turbo Boost Technology Of course, consistently over clocking a machine can overheat the chip and render it useless fairly rapidly Intel has ensured that its Mobile Nehalem parts (codenamed Clarksfield) protect themselves through self monitoring and shutting down if temperature limits are breached. (How about constantly shutting down the cores !!)
29
Teaching a new course at UCCS, fall 2012 ECE5990/4990 Power Electronics Graduate students welcome to take this course
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.