Multi-Core Computing Osama Awwad Department of Computer Science

Slides:



Advertisements
Similar presentations
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
Structure of Computer Systems
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
1 Multi-core architectures Jernej Barbic , Spring 2007 May 3, 2007.
1 Pipelining for Multi- Core Architectures. 2 Multi-Core Technology Single Core Dual CoreMulti-Core + Cache + Cache Core 4 or more cores.
Multi-core processors. History In the early 1970’s the first Microprocessor was developed by Intel. It was a 4 bit machine that was named the 4004 The.
Carnegie Mellon /18-243: Introduction to Computer Systems Instructors: Bill Nace and Gregory Kesden (c) All Rights Reserved. All work.
Hyper-Threading, Chip multiprocessors and both Zoran Jovanovic.
Comp-TIA Standards.  AMD- (Advanced Micro Devices) An American multinational semiconductor company that develops computer processors and related technologies.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Practical PC, 7th Edition Chapter 17: Looking Under the Hood
Multi Core Processor Submitted by: Lizolen Pradhan
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
1 Multi-core architectures Zonghua Gu Acknowledgement: Slides taken from Jernej Barbic’s lecture notes.
Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.
Comparing Intel’s Core with AMD's K8 Microarchitecture IS 3313 December 14 th.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Shashwat Shriparv InfinitySoft.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.
Copyright © Curt Hill Parallelism in Processors Several Approaches.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Multi-Core Architectures 1. Single-Core Computer 2.
HyperThreading ● Improves processor performance under certain workloads by providing useful work for execution units that would otherwise be idle ● Duplicates.
E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Carnegie Mellon /18-243: Introduction to Computer Systems Instructors: Anthony Rowe and Gregory Kesden 27 th (and last) Lecture, 28 April 2011 Multi-Core.
Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
CPU Central Processing Unit
COMP 740: Computer Architecture and Implementation
Multiprocessing.
Visit for more Learning Resources
Parallel Processing - introduction
Multi-core processors
Multi-core processors
Computer Structure Multi-Threading
Hot Processors Of Today
INTEL HYPER THREADING TECHNOLOGY
Assembly Language for Intel-Based Computers, 5th Edition
Guide to Operating Systems, 5th Edition
Multi-core processors
What happens inside a CPU?
Phnom Penh International University (PPIU)
Hyperthreading Technology
Multi-Processing in High Performance Computer Architecture:
Multi-core architectures
BIC 10503: COMPUTER ARCHITECTURE
Microprocessor & Assembly Language
Multi-Core Architectures
CPE 631: Multithreading: Thread-Level Parallelism Within a Processor
Adaptive Single-Chip Multiprocessing
Chapter 1 Introduction.
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Multithreaded Programming
Computer Evolution and Performance
/ Computer Architecture and Design
CSC3050 – Computer Architecture
CS 286 Computer Organization and Architecture
8 – Simultaneous Multithreading
Presentation transcript:

Multi-Core Computing Osama Awwad Department of Computer Science Western Michigan University Thursday, September 20, 2018

Multi-Core Computer A multi-core microprocessor is one that combines two or more independent processors into a single package, often a single integrated circuit (IC). A dual-core device contains two independent microprocessors. In general, multi-core microprocessors allow a computing device to exhibit some form of thread-level parallelism (TLP) without including multiple microprocessors in separate physical packages. 9/20/2018

Major Technology Providers The latest versions of many architectures use multi-core, including PA-RISC (PA-8800), IBM POWER (POWER7), SPARC (UltraSPARC IV), and various processors from Intel and AMD. There is some controversy as to whether multiple cores on a chip is the same thing as multiple processors. Major technology providers are divided on this issue. IBM considers its dual-core POWER4 and POWER5 to be two processors, just packaged together. Sun Microsystems, in contrast, considers its UltraSPARC IV to be a multi-threaded rather than multi-processor chip. Intel considers their multi-core designs to be a single processor. This is not an idle debate, because software is often more expensive when licensed for more processors. Microsoft, Red Hat Linux, Suse Linux will license their OS per chip, not per core 9/20/2018

Single-core computer 9/20/2018

Multi-core architectures Replicate multiple processor cores on a single die. Core 1 Core 2 Core 3 Core 4 9/20/2018 Multi-core CPU chip

Multi-core CPU chip The cores fit on a single processor socket Also called CMP (Chip Multi-Processor) core 1 2 3 4 9/20/2018

The cores run in parallel thread 1 thread 2 thread 3 thread 4 core 1 core 2 core 3 core 4 9/20/2018

Within each core, threads are time-sliced (just like on a uniprocessor) several threads several threads several threads several threads core 1 core 2 core 3 core 4 9/20/2018

Interaction with OS OS perceives each core as a separate processor OS scheduler maps threads/processes to different cores Most major OS support multi-core today 9/20/2018

Why multi-core ? Difficult to make single-core clock frequencies even higher Many new applications are multithreaded General trend in computer architecture (shift towards more parallelism) 9/20/2018

Instruction-level parallelism Parallelism at the machine-instruction level The processor can re-order, pipeline instructions, split them into microinstructions, do aggressive branch prediction, etc. Instruction-level parallelism enabled rapid increases in processor speeds over the last 15 years 9/20/2018

Thread-level parallelism (TLP) This is parallelism on a more coarser scale Server can serve each client in a separate thread (Web server, database server) A computer game can do AI, graphics, and physics in three separate threads Single-core superscalar processors cannot fully exploit TLP Multi-core architectures are the next step in processor evolution: explicitly exploiting TLP 9/20/2018

General context: Multiprocessors Multiprocessor is any computer with several processors SIMD Single instruction, multiple data Modern graphics cards MIMD Multiple instructions, multiple data Lemieux cluster, Pittsburgh supercomputing center 9/20/2018

Multiprocessor memory types Shared memory: In this model, there is one (large) common shared memory for all processors Distributed memory: In this model, each processor has its own (small) local memory, and its content is not replicated anywhere else 9/20/2018

Multi-core processor is a special kind of a multiprocessor: All processors are on the same chip Multi-core processors are MIMD: Different cores execute different threads (Multiple Instructions), operating on different parts of memory (Multiple Data). Multi-core is a shared memory multiprocessor: All cores share the same memory 9/20/2018

What applications benefit from multi-core? Database servers Web servers (Web commerce) Telecommuncation markets: 6WINDGate (datapath and control plane) Multimedia applications Scientific applications, CAD/CAM In general, applications with Thread-level parallelism (as opposed to instruction-level parallelism) Each can run on its own core 9/20/2018

More examples Editing a photo while recording a TV show through a digital video recorder Downloading software while running an anti-virus program “Anything that can be threaded today will map efficiently to multi-core” BUT: some applications difficult to parallelize 9/20/2018

Simultaneous multithreading (SMT) Permits multiple independent threads to execute SIMULTANEOUSLY on the SAME core Weaving together multiple “threads” on the same core Example: if one thread is waiting for a floating point operation to complete, another thread can use the integer units 9/20/2018

Without SMT, only a single thread can run at any given time L1 D-Cache D-TLB Integer Floating Point Schedulers L2 Cache and Control Uop queues Rename/Alloc BTB Trace Cache uCode ROM Decoder Bus BTB and I-TLB Thread 1: floating point 9/20/2018

Without SMT, only a single thread can run at any given time L1 D-Cache D-TLB Integer Floating Point Schedulers L2 Cache and Control Uop queues Rename/Alloc BTB Trace Cache uCode ROM Decoder Bus BTB and I-TLB Thread 2: integer operation 9/20/2018

SMT processor: both threads can run concurrently L1 D-Cache D-TLB Integer Floating Point Schedulers L2 Cache and Control Uop queues Rename/Alloc BTB Trace Cache uCode ROM Decoder Bus BTB and I-TLB Thread 2: integer operation Thread 1: floating point 9/20/2018

But: Can’t simultaneously use the same functional unit L1 D-Cache D-TLB Integer Floating Point Schedulers L2 Cache and Control Uop queues Rename/Alloc BTB Trace Cache uCode ROM Decoder This scenario is impossible with SMT on a single core (assuming a single integer unit) Bus BTB and I-TLB Thread 1 Thread 2 9/20/2018 IMPOSSIBLE

SMT not a “true” parallel processor Enables better threading (e.g. up to 30%) OS and applications perceive each simultaneous thread as a separate “virtual processor” The chip has only a single copy of each resource Compare to multi-core: each core has its own copy of resources 9/20/2018

Multi-core: threads can run on separate cores L1 D-Cache D-TLB L1 D-Cache D-TLB Integer Floating Point Integer Floating Point Schedulers Schedulers L2 Cache and Control Uop queues L2 Cache and Control Uop queues Rename/Alloc Rename/Alloc BTB Trace Cache uCode ROM BTB Trace Cache uCode ROM Decoder Decoder Bus Bus BTB and I-TLB BTB and I-TLB 9/20/2018 Thread 1 Thread 3

Multi-core: threads can run on separate cores L1 D-Cache D-TLB L1 D-Cache D-TLB Integer Floating Point Integer Floating Point Schedulers Schedulers L2 Cache and Control Uop queues L2 Cache and Control Uop queues Rename/Alloc Rename/Alloc BTB Trace Cache uCode ROM BTB Trace Cache uCode ROM Decoder Decoder Bus Bus BTB and I-TLB BTB and I-TLB 9/20/2018 Thread 2 Thread 4

Combining Multi-core and SMT Cores can be SMT-enabled (or not) The different combinations: Single-core, non-SMT: standard uniprocessor Single-core, with SMT Multi-core, non-SMT Multi-core, with SMT: The number of SMT threads: 2, 4, or sometimes 8 simultaneous threads Intel calls them “hyper-threads” 9/20/2018

SMT Dual-core: all four threads can run concurrently  L1 D-Cache D-TLB L1 D-Cache D-TLB Integer Floating Point Integer Floating Point Schedulers Schedulers L2 Cache and Control Uop queues L2 Cache and Control Uop queues Rename/Alloc Rename/Alloc BTB Trace Cache uCode ROM BTB Trace Cache uCode ROM Decoder Decoder Bus Bus BTB and I-TLB BTB and I-TLB 9/20/2018 Thread 1 Thread 2 Thread 3 Thread 4

Comparison: multi-core vs SMT Since there are several cores, each is smaller and not as powerful (but also easier to design and manufacture) However, great with thread-level parallelism SMT Can have one large and fast superscalar core Great performance on a single thread Mostly still only exploits instruction-level parallelism 9/20/2018

The memory hierarchy If simultaneous multithreading only: all caches shared Multi-core chips: L1 caches private L2 caches private in some architectures and shared in others Memory is always shared 9/20/2018

Dual-core Intel Xeon processors hyper-threads Dual-core Intel Xeon processors Each core is hyper-threaded Private L1 caches Shared L2 caches C O R E 1 C O R E 0 L1 cache L1 cache L2 cache memory 9/20/2018

Designs with private L2 caches C O R E 1 C O R E 0 C O R E 1 C O R E 0 L1 cache L1 cache L1 cache L1 cache L2 cache L2 cache L2 cache L2 cache L3 cache L3 cache memory memory Both L1 and L2 are private Examples: AMD Opteron, AMD Athlon, Intel Pentium D A design with L3 caches Example: Intel Itanium 2 9/20/2018

Windows Task Manager core 2 core 1 9/20/2018

Advantages /Disadvantages 9/20/2018

Advantages Cache coherency circuitry can operate at a much higher clock rate than is possible if the signals have to travel off-chip Signals between different CPUs travel shorter distances, those signals degrade less These higher quality signals allow more data to be sent in a given time period since individual signals can be shorter and do not need to be repeated as often A dual-core processor uses slightly less power than two coupled single-core processors 9/20/2018

Disadvantages Ability of multi-core processors to increase application performance depends on the use of multiple threads within applications. Most Current video games will run faster on a 3 GHz single-core processor than on a 2GHz dual-core processor (of the same core architecture Two processing cores sharing the same system bus and memory bandwidth limits the real-world performance advantage. If a single core is close to being memory bandwidth limited, going to dual-core might only give 30% to 70% improvement If memory bandwidth is not a problem, a 90% improvement can be expected 9/20/2018

Conclusion Multi-core chips an important new trend in computer architecture Several new multi-core chips in design phases Parallel programming techniques likely to gain importance 9/20/2018

References http://en.wikipedia.org/wiki/Multi-core_(computing) www.princeton.edu/~jdonald/research/hyperthreading/garg_report.pdf www.cs.cmu.edu/~barbic/multi-core.ppt 9/20/2018