CSE 8383 - Advanced Computer Architecture Week-4 Week of Feb 2, 2004 engr.smu.edu/~rewini/8383.

CSE 8383 - Advanced Computer Architecture Week-4 Week of Feb 2, 2004 engr.smu.edu/~rewini/8383

Contents Reservation Table Latency Analysis State Diagrams MAL and its bounds Delay Insertion Throughput Group Work Introduction to Multiprocessors

Reservation Table A reservation table displays the time- space flow of data through the pipeline for one function evaluation A static pipeline is specified by a single reservation table A dynamic pipeline may be specified by multiple reservation tables

Static Pipeline X X X X S1 S2 S3 S4 Time

Dynamic Pipeline XXX XX XXX YY Y YYY S1 S2 S3 S1 S2 S3

Reservation Table (Cont.) The number of columns in a reservation table is called the evaluation time of a given function. The checkmarks in a row correspond to the time instants (cycles) that a particular stage will be used. Multiple checkmarks in a row  repeated usage of the same stage in different cycles

Reservation Table (Cont.) Contiguous checkmarks  extended usage of a stage over more than one cycle Multiple checkmarks in one column  multiple stages are used in parallel A dynamic pipeline may allow different initiations to follow a mix of reservation table

Reservation Table 1234567 AXXX BXX CXX DX

Latency Analysis The number of cycles between two initiations is the latency between them A latency of k  two initiations are separated by k cycles Collision  resource conflict between two initiations Latencies that cause collision  forbidden latencies

Collision with latency 2 & 5 in evaluating X X1X2X1X2 X1 X1X2 X1X2 X1X2 X1 S1 S2 S3 X1X2 X1X1 X2 X1 X2 S1 S2 S3 5 2

Latency Analysis (cont.) Latency Sequence  a sequence of permissible latencies between successive initiations Latency Cycle  a latency sequence that repeats the same subsequence (cycle) indefinitely Latency Sequence  1, 8 Latencies Cycle  (1,8)  1, 8, 1, 8, 1, 8 …

Latency Analysis (cont.) Average Latency (of a latency cycle)  sum of all latencies / number of latencies along the cycle Constant Cycle  One latency value Objective  Obtain the shortest average latency between initiations without causing collisions.

Latency Cycle (1,8) 123456789101112131415161718192021 X1X1 X2X2 X1X1 X2X2 X1X1 X2X2 X3X3 X4X4 X3X3 X4X4 X3X3 X4X5X5 X6X6 X1X1 X2X2 X1X1 X2X2 X3X3 X4X4 X3X3 X4X4 X5X5 X6X6 X1X1 X2X2 X1X1 X2X2 X1X1 X2X2 X3X3 X4X4 X3X3 X4X4 X3X3 X4X4 X5X5 Average Latency = (1+8)/2 = 4.5

Latency Cycle (6) 123456789101112131415161718192021 X1X1 X1X1 X2X2 X1X1 X2X2 X3X3 X2X2 X3X3 X4X4 X3X3 X1X1 X1X1 X2X2 X2X2 X3X3 X3X3 X4X4 X1X1 X1X1 X1X1 X2X2 X2X2 X2X2 X3X3 X3X3 X3X3 X4X4 Average Latency = 6

Collision Vector C = (C m, C m-1, …, C 2, C 1 ) C i = 1 if latency i causes collision (forbidden) C i = 0 if latency i is permissible C m = 1 (always) maximum forbidden latency Maximum forbidden latency: m <= n-1 n = number of column in reservation table

Collision Vector (X after X) Forbidden Latencies: 2, 4, 5, 7 Collision Vector = 1 0 1 1 0 1 0

Collision Vector (Y after Y) Forbidden Latencies: 2, 4 Collision Vector = 1 0 1 0

State Diagram It specifies the permissible state transitions among successive initiations Collision vector corresponds to the initial state at time t = 1 (initial collision vector) The next state comes at time t + p, where p is a permissible latency in the range 1 <= p < m

Right Shift Register The next state can be obtained with the help of an m-bit shift register 0 0 1 Collision Safe to allow an initiation Each 1-bit shift corresponds to increase in the latency by 1

The next state The next state is obtained by bitwise ORing the initial collision vector with the shifted register C.V. = 1 0 1 1 0 1 0 (first state) 0 1 0 1 1 0 1 C.V. 1-bit right shifted 1 0 1 1 0 1 0 initial C.V. ---------------- OR 1 1 1 1 1 1 1

State Diagram for X 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1 1 3 6 8+8+ 6 8+8+ 8+8+ 3*3* 1*1*

Cycles Simple cycles  each state appears only once (3), (6), (8), (1, 8), (3, 8), and (6,8) Greedy Cycles  simple cycles whose edges are all made with minimum latencies from their respective starting states (1,8), (3)  one of them is MAL

MAL Minimum Average latency At least one of the greedy cycles will lead to the MAL Consider state diagram for Y, MAL is 3 (See diagram)

State Diagram for Y 1 0 1 1 1 0 1 1 0 1 1 3 5+5+ 5+5+ 5+5+ 3*3* 1*1*

Bounds on the MAL MAL is lower bounded by the maximum number of checkmarks in any row of the reservation table. (Shar, 1972) MAL is lower than or equal to the average latency of any greedy cycle in the state diagram. (Shar, 1972) The average latency of any greedy cycle is upper-bounded by the number of 1’s in the initial collision vector plus 1. This is also an upper bund on the MAL. (Shar, 1972)

Delay Insertion The purpose is to modify the reservation table, yielding a new collision vector This may lead to a modified state diagram, which may produce greedy cycles meeting the lower bound on MAL

Example S1 S2 S3 output

Example (Cont.) 12345 S1XX S2XX S3XX Forbidden Latencies: 1, 2, 4 C.V.  1 0 1 1

Example (Cont.) State Diagram 1 0 1 1 3* 5+ MAL = 3

Example (Cont.) S1 S2 S3 output D1 D2

Example (Cont.) 1234567 S1XX S2XX S3XX D1X D2X Forbidden: 2, 6 C.V.  1 0 0 0 1 0

Group Activity 1 Find the State Diagram

Pipeline Throughput The average number of task initiations per clock cycle The inverse of MAL

Group Activity 2 1234 S1XX S2X S3X C.V State DiagramSimple Cycles Greedy Cycles MAL Throughput (t = 20 ns)

Multiprocessors

Introduction Uniprocessor systems are not capable of delivering solutions to some problems in reasonable time Multiple processors cooperate to jointly execute a single computational task in order to speed up its execution Speed-up versus Quality-up

Architecture Background Three major Components Processors Memory Modules Interconnection Network

Parallel and Distributed Computers MIMD Shared Memory Bus based Switch based CC-NUMA MIMD Distributed Memory SIMD Computers Clusters Grid Computing

MIMD Shared Memory Systems Interconnection Networks MMMM PPPPP

Bus Based & switch based SM Systems Global Memory P C P C P C P C P C P C P C MMMM

Cache Coherent NUMA Interconnection Network M C P M C P M C P M C P

MIMD Distributed Memory Systems Interconnection Networks MMMM PPPP

SIMD Computers Processor Memory P M P M P M P M P M P M P M P M P M P M P M P M P M P M P M P M von Neumann Computer Some Interconnection Network

Clusters M C P I/O OS M C P I/O OS M C P I/O OS Middleware Programming Environment Interconnection Network

Grids Grids are geographically distributed platforms for computation. They provide dependable, consistent, pervasive, and inexpensive access to high end computational capabilities.

Interconnection Network Taxonomy Interconnection Network Static Dynamic Bus-basedSwitch-based 1-D2-DHC SingleMultiple SSMS Crossbar

CSE 8383 - Advanced Computer Architecture Week-4 Week of Feb 2, 2004 engr.smu.edu/~rewini/8383.

Similar presentations

Presentation on theme: "CSE 8383 - Advanced Computer Architecture Week-4 Week of Feb 2, 2004 engr.smu.edu/~rewini/8383."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 8383 - Advanced Computer Architecture Week-4 Week of Feb 2, 2004 engr.smu.edu/~rewini/8383.

Similar presentations

Presentation on theme: "CSE 8383 - Advanced Computer Architecture Week-4 Week of Feb 2, 2004 engr.smu.edu/~rewini/8383."— Presentation transcript:

Similar presentations

About project

Feedback