Computer Architecture Lecture 4 17th May, 2006

Slides:



Advertisements
Similar presentations
The Interaction of Simultaneous Multithreading processors and the Memory Hierarchy: some early observations James Bulpin Computer Laboratory University.
Advertisements

Computer Architecture Lecture 3 – Part 2 15 th May, 2006 Abhinav Agarwal Veeramani V.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Better answers The Alpha and Microprocessors: Continuing the Performance Lead Beyond Y2K Shubu Mukherjee, Ph.D. Principal Hardware Engineer.
Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
1 Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1)
Instruction Level Parallelism (ILP) Colin Stevens.
Multithreading and Dataflow Architectures CPSC 321 Andreas Klappenecker.
Chapter Hardwired vs Microprogrammed Control Multithreading
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.
1 Lecture 10: ILP Innovations Today: ILP innovations and SMT (Section 3.5)
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
8 – Simultaneous Multithreading. 2 Review from Last Time Limits to ILP (power efficiency, compilers, dependencies …) seem to limit to 3 to 6 issue for.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
1 Multi-core processors 12/1/09. 2 Multiprocessors inside a single chip It is now possible to implement multiple processors (cores) inside a single chip.
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Hardware Multithreading. Increasing CPU Performance By increasing clock frequency By increasing Instructions per Clock Minimizing memory access impact.
10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.
Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
Final Review Prof. Mike Schulte Advanced Computer Architecture ECE 401.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
Processor Level Parallelism 1
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
COMP 740: Computer Architecture and Implementation
Computer Architecture Principles Dr. Mike Frank
SECTIONS 1-7 By Astha Chawla
Simultaneous Multithreading
Simultaneous Multithreading
Multi-core processors
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Cache Memory Presentation I
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
/ Computer Architecture and Design
Hyperthreading Technology
Lecture: SMT, Cache Hierarchies
Computer Architecture Lecture 3 – Part 1 11th May, 2006
Computer Architecture Lecture 3
Levels of Parallelism within a Single Processor
CMPT 886: Computer Architecture Primer
Hardware Multithreading
Lecture: SMT, Cache Hierarchies
CPE 631: Multithreading: Thread-Level Parallelism Within a Processor
Coe818 Advanced Computer Architecture
Lecture: SMT, Cache Hierarchies
Computer Evolution and Performance
/ Computer Architecture and Design
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Lecture: SMT, Cache Hierarchies
Levels of Parallelism within a Single Processor
Hardware Multithreading
CS 286 Computer Organization and Architecture
8 – Simultaneous Multithreading
Lecture 22: Multithreading
The University of Adelaide, School of Computer Science
Presentation transcript:

Computer Architecture Lecture 4 17th May, 2006 Abhinav Agarwal Veeramani V.

Recap Simple Pipeline – hazards and solution Data hazards Static compiler techniques – load delay slot, etc. Hardware solutions – Data forwarding, out-of-order execution, register renaming Control hazards Static compiler techniques Hardware speculation through branch predictors Structural hazards Increase hardware resources Superscalar out-of-order execution Memory organisation May 17, 2006 EE Summer Camp '06

Memory Organization in processors Caches inside the chip Faster – ‘Closer’ SRAM cells They contain recently-used data They contain data in ‘blocks’ May 17, 2006 EE Summer Camp '06

Rational behind caches Principle of spatial locality Principle of temporal locality Replacement policy (LRU, LFU, etc.) Principle of inclusivity May 17, 2006 EE Summer Camp '06

Outline Instruction Level Parallelism Thread-level Parallelism Fine-Grain multithreading Simultaneous multithreading Sharable resources & Non-sharable resources Chip Multiprocessor Some design issues May 17, 2006 EE Summer Camp '06

Instruction Level Parallelism Overlap execution of many instructions ILP techniques try to reduce data and control dependencies Issue out-of-order independent instructions May 17, 2006 EE Summer Camp '06

Thread Level Parallelism Two different threads have more independent instructions Better utilization of functional units Multi-thread performance is improved drastically May 17, 2006 EE Summer Camp '06

A simple pipeline May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel

Superscalar pipeline May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel

Speculative execution May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel

Fine Grained Multithreading May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel

Simultaneous Multithreading May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel

Out of Order Execution May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel

SMT pipeline May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel

Resources – Replication required Program counters Register maps May 17, 2006 EE Summer Camp '06

Replication not required Register file (rename space) Instruction queue Branch predictor First and second level caches etc. May 17, 2006 EE Summer Camp '06

Chip multiprocessor Number of transistors going up Have more than one core on the chip These still share the caches May 17, 2006 EE Summer Camp '06

Some design issues Trade-off in choosing the cache size Power and performance Super pipelining trade-off Higher clock frequency and speculation penalty + Power Power consumption May 17, 2006 EE Summer Camp '06

Novel techniques for power Clock gating Run non-critical elements at a slower clock Reduce voltage swings (Voltage of operation) Sleep Mode/ Standby Mode Dynamic Voltage Frequency scaling May 17, 2006 EE Summer Camp '06