Download presentation
Presentation is loading. Please wait.
Published byAlban Brooks Modified over 9 years ago
1
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Advanced Computers Architecture Lecture 4 By Rohit Khokher Department of Computer Science, Sharda University, Greater Noida, India
2
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 High Performance Architectures Who needs high performance systems? How do you achieve high performance? How to analyses or evaluate performance?
3
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Outline of my lecture Classification ILP Architectures Data Parallel Architectures Process level Parallel Architectures Issues in parallel architectures Cache coherence problem Interconnection networks
4
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Classification of Parallel Computing Flynn’s Classification Feng’s Classification Händler’s Classification Modern (Sima, Fountain & Kacsuk) Classification
5
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Feng’s Classification Feng [1972] also proposed a scheme on the basis of degree of parallelism to classify computer architectures. Maximum number of bits that can be processed every unit of time by the system is called ‘ maximum degree of parallelism’. Feng’s scheme performed sequential and parallel operations at bit and words level. The four types of Feng’s classification are as follows:- WSBS ( Word Serial Bit Serial) WPBS ( Word Parallel Bit Serial) (Staran) WSBP ( Word Serial Bit Parallel) (Conventional Computers) WPBP ( Word Parallel Bit Parallel) (ILLIAC IV)
6
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 1163264 1 16 64 256 16K word length bit slice length MPP STARAN C.mmP PDP11IBM370 IlliacIV CRAY-1
7
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Modern Classification Parallel architectures Data-parallel architectures Function-parallel architectures
8
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Data Parallel Architectures Data-parallel architectures Vector architectures Associative And neural architectures SIMDs Systolic architectures
9
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Function Parallel Architectures Function-parallel architectures Instr level Parallel Arch Thread level Parallel Arch Process level Parallel Arch (ILPs) (MIMDs) Pipelined processors VLIWs Superscalar processors Distributed Memory MIMD Shared Memory MIMD
10
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Motivation Non-pipelined design Single-cycle implementation The cycle time depends on the slowest instruction Every instruction takes the same amount of time Multi-cycle implementation Divide the execution of an instruction into multiple steps Each instruction may take variable number of steps (clock cycles) Pipelined design Divide the execution of an instruction into multiple steps (stages) Overlap the execution of different instructions in different stages Each cycle different instruction is executed in different stages For example, 5-stage pipeline (Fetch-Decode-Read-Execute-Write), 5 instructions are executed concurrently in 5 different pipeline stages Complete the execution of one instruction every cycle (instead of every 5 cycle) Can increase the throughput of the machine 5 times
11
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Example of Pipeline LD R1 <- A ADD R5, R3, R4 LD R2 <- B SUB R8, R6, R7 ST C <- R5 FDREW FDREW FDREW FDREW FDREW FDREW FDREW FDREW FDREW F Non-pipelined processor: 25 cycles = number of instrs (5) * number of stages (5) Pipelined processor: 9 cycles = start-up latency (4) + number of instrs (5) Filling the pipeline Draining the pipeline 5 stage pipeline: Fetch – Decode – Read – Execute - Write
12
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Data Dependence Read-After-Write (RAW) dependence True dependence Must consume data after the producer produces the data Write-After-Write (WAW) dependence Output dependence The result of a later instruction can be overwritten by an earlier instruction Write-After-Read (WAR) dependence Anti dependence Must not overwrite the value before its consumer Notes WAW & WAR are called false dependences, which happen due to storage conflicts All three types of dependences can happen for both registers and memory locations Characteristics of programs (not machines)
13
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Example Example 1 1 LD R1 <- A 2 LD R2 <- B 3 MULT R3, R1, R2 4 ADD R4, R3, R2 5 SUB R3, R3, R4 6 ST A <- R3 FDREW FDREW FDRRR FDDDR DRFDD RAW dependence: 1->3, 2-> 3, 2->4, 3 -> 4, 3 -> 5, 4-> 5, 5-> 6 WAW dependence: 3-> 5 WAR dependence: 4 -> 5, 1 -> 6 (memory location A) EW RRREW RREW Pipeline bubbles due to RAW dependences (Data Hazards) Execution Time: 18 cycles = start-up latency (4) + number of instrs (6) + number of pipeline bubbles (8) FDFFDDRRREW FF
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.