C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Advanced Computers Architecture Lecture 4 By Rohit Khokher Department.

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 Advanced Computers Architecture UNIT 2 CACHE MEOMORY Lecture7.
Lecture 4: CPU Performance
Instruction-Level Parallel Processors {Objective: executing two or more instructions in parallel} 4.1 Evolution and overview of ILP-processors 4.2 Dependencies.
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
Superscalar processors Review. Dependence graph S1S2 Nodes: instructions Edges: ordered relations among the instructions Any ordering-based transformation.
Optimizing single thread performance Dependence Loop transformations.
Parallel computer architecture classification
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Pipeline and Vector Processing (Chapter2 and Appendix A)
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Instruction-Level Parallelism (ILP)
Computer Architecture Pipelines & Superscalars. Pipelines Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub$2, $1, $3 and $12, $2, $5 or $13, $6, $2.
EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©
S. Barua – CPSC 440 CHAPTER 6 ENHANCING PERFORMANCE WITH PIPELINING This chapter presents pipelining.
Instruction Level Parallelism (ILP) Colin Stevens.
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
DLX Instruction Format
Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )
CS 300 – Lecture 24 Intro to Computer Architecture / Assembly Language The LAST Lecture!
Topic ? Course Overview. Guidelines Questions are rated by stars –One Star Question  Easy. Small definition, examples or generic formulas –Two Stars.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
Basics and Architectures
Computer Architecture Pipeline
Introduction 9th January, 2006 CSL718 : Architecture of High Performance Systems.
Outline Classification ILP Architectures Data Parallel Architectures
Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
ECE3056A Architecture, Concurrency, and Energy Lecture: Pipelined Microarchitectures Prof. Moinuddin Qureshi Slides adapted from: Prof. Mutlu (CMU)
Pipelining and Parallelism Mark Staveley
CA406 Computer Architecture Pipelines... continued.
Computer Architecture Pipeline. Motivation  Non-pipelined design  Single-cycle implementation  The cycle time depends on the slowest instruction 
Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010
EKT303/4 Superscalar vs Super-pipelined.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
Processor Level Parallelism 1
PipeliningPipelining Computer Architecture (Fall 2006)
DICCD Class-08. Parallel processing A parallel processing system is able to perform concurrent data processing to achieve faster execution time The system.
PARALLEL COMPUTER ARCHITECTURE
CDA3101 Recitation Section 8
Parallel computer architecture classification
buses, crossing switch, multistage network.
Parallel Processing - introduction
Lecture 6: Advanced Pipelines
Coe818 Advanced Computer Architecture
buses, crossing switch, multistage network.
Overview Parallel Processing Pipelining
Mattan Erez The University of Texas at Austin
AN INTRODUCTION ON PARALLEL PROCESSING
Mattan Erez The University of Texas at Austin
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Pipelining.
The University of Adelaide, School of Computer Science
Lecture 5: Pipeline Wrap-up, Static ILP
COMPUTER ORGANIZATION AND ARCHITECTURE
Pipelining.
Presentation transcript:

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Advanced Computers Architecture Lecture 4 By Rohit Khokher Department of Computer Science, Sharda University, Greater Noida, India

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 High Performance Architectures  Who needs high performance systems?  How do you achieve high performance?  How to analyses or evaluate performance?

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Outline of my lecture  Classification  ILP Architectures  Data Parallel Architectures  Process level Parallel Architectures  Issues in parallel architectures  Cache coherence problem  Interconnection networks

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Classification of Parallel Computing  Flynn’s Classification  Feng’s Classification  Händler’s Classification  Modern (Sima, Fountain & Kacsuk) Classification

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Feng’s Classification  Feng [1972] also proposed a scheme on the basis of degree of parallelism to classify computer architectures.  Maximum number of bits that can be processed every unit of time by the system is called ‘ maximum degree of parallelism’.  Feng’s scheme performed sequential and parallel operations at bit and words level.  The four types of Feng’s classification are as follows:-  WSBS ( Word Serial Bit Serial)  WPBS ( Word Parallel Bit Serial) (Staran)  WSBP ( Word Serial Bit Parallel) (Conventional Computers)  WPBP ( Word Parallel Bit Parallel) (ILLIAC IV)

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT K word length bit slice length MPP STARAN C.mmP PDP11IBM370 IlliacIV CRAY-1

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Modern Classification Parallel architectures Data-parallel architectures Function-parallel architectures

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Data Parallel Architectures Data-parallel architectures Vector architectures Associative And neural architectures SIMDs Systolic architectures

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Function Parallel Architectures Function-parallel architectures Instr level Parallel Arch Thread level Parallel Arch Process level Parallel Arch (ILPs) (MIMDs) Pipelined processors VLIWs Superscalar processors Distributed Memory MIMD Shared Memory MIMD

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Motivation  Non-pipelined design  Single-cycle implementation  The cycle time depends on the slowest instruction  Every instruction takes the same amount of time  Multi-cycle implementation  Divide the execution of an instruction into multiple steps  Each instruction may take variable number of steps (clock cycles)  Pipelined design  Divide the execution of an instruction into multiple steps (stages)  Overlap the execution of different instructions in different stages  Each cycle different instruction is executed in different stages  For example, 5-stage pipeline (Fetch-Decode-Read-Execute-Write),  5 instructions are executed concurrently in 5 different pipeline stages  Complete the execution of one instruction every cycle (instead of every 5 cycle)  Can increase the throughput of the machine 5 times

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Example of Pipeline LD R1 <- A ADD R5, R3, R4 LD R2 <- B SUB R8, R6, R7 ST C <- R5 FDREW FDREW FDREW FDREW FDREW FDREW FDREW FDREW FDREW F Non-pipelined processor: 25 cycles = number of instrs (5) * number of stages (5) Pipelined processor: 9 cycles = start-up latency (4) + number of instrs (5) Filling the pipeline Draining the pipeline 5 stage pipeline: Fetch – Decode – Read – Execute - Write

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1  Data Dependence  Read-After-Write (RAW) dependence  True dependence  Must consume data after the producer produces the data  Write-After-Write (WAW) dependence  Output dependence  The result of a later instruction can be overwritten by an earlier instruction  Write-After-Read (WAR) dependence  Anti dependence  Must not overwrite the value before its consumer  Notes  WAW & WAR are called false dependences, which happen due to storage conflicts  All three types of dependences can happen for both registers and memory locations  Characteristics of programs (not machines)

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Example Example 1 1 LD R1 <- A 2 LD R2 <- B 3 MULT R3, R1, R2 4 ADD R4, R3, R2 5 SUB R3, R3, R4 6 ST A <- R3 FDREW FDREW FDRRR FDDDR DRFDD RAW dependence: 1->3, 2-> 3, 2->4, 3 -> 4, 3 -> 5, 4-> 5, 5-> 6 WAW dependence: 3-> 5 WAR dependence: 4 -> 5, 1 -> 6 (memory location A) EW RRREW RREW Pipeline bubbles due to RAW dependences (Data Hazards) Execution Time: 18 cycles = start-up latency (4) + number of instrs (6) + number of pipeline bubbles (8) FDFFDDRRREW FF