Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February 2, 2006 Session 6
Computer Science and Engineering Copyright by Hesham El-Rewini Contents Reservation Table Latency Analysis State Diagrams MAL and its bounds Delay Insertion Throughput Group Work
Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table A reservation table displays the time- space flow of data through the pipeline for one function evaluation A static pipeline is specified by a single reservation table A dynamic pipeline may be specified by multiple reservation tables
Computer Science and Engineering Copyright by Hesham El-Rewini Static Pipeline X X X X S1 S2 S3 S4 Time
Computer Science and Engineering Copyright by Hesham El-Rewini Dynamic Pipeline XXX XX XXX YY Y YYY S1 S2 S3 S1 S2 S3
Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table (Cont.) The number of columns in a reservation table is called the evaluation time of a given function. The checkmarks in a row correspond to the time instants (cycles) that a particular stage will be used. Multiple checkmarks in a row repeated usage of the same stage in different cycles
Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table (Cont.) Contiguous checkmarks extended usage of a stage over more than one cycle Multiple checkmarks in one column multiple stages are used in parallel A dynamic pipeline may allow different initiations to follow a mix of reservation table
Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table AXXX BXX CXX DX
Computer Science and Engineering Copyright by Hesham El-Rewini Latency Analysis The number of cycles between two initiations is the latency between them A latency of k two initiations are separated by k cycles Collision resource conflict between two initiations Latencies that cause collision forbidden latencies
Computer Science and Engineering Copyright by Hesham El-Rewini Collision with latency 2 & 5 in evaluating X X1X2X1X2 X1 X1X2 X1X2 X1X2 X1 X1X2 X1X1 X2 X1 X2 S1 S2 S3 S1 S2 S3 5 2
Computer Science and Engineering Copyright by Hesham El-Rewini Latency Analysis (cont.) Latency Sequence a sequence of permissible latencies between successive initiations Latency Cycle a latency sequence that repeats the same subsequence (cycle) indefinitely Latency Sequence 1, 8 Latencies Cycle (1,8) 1, 8, 1, 8, 1, 8 …
Computer Science and Engineering Copyright by Hesham El-Rewini Latency Analysis (cont.) Average Latency (of a latency cycle) sum of all latencies / number of latencies along the cycle Constant Cycle One latency value Objective Obtain the shortest average latency between initiations without causing collisions.
Computer Science and Engineering Copyright by Hesham El-Rewini Latency Cycle (1,8) X1X1 X2X2 X1X1 X2X2 X1X1 X2X2 X3X3 X4X3X3 X4X4 X3X4X5X5 X6X6 X1X1 X2X2 X1X1 X2X2 X3X4X3X3 X4X4 X5X5 X6X6 X1X1 X2X2 X1X1 X2X2 X1X1 X2X2 X3X4X4 X3X3 X4X4 X3X3 X4X5X5 Average Latency = (1+8)/2 = 4.5
Computer Science and Engineering Copyright by Hesham El-Rewini Latency Cycle (6) X1X1 X1X1 X2X2 X1X1 X2X3X3 X2X2 X3X3 X4X4 X3X3 X1X1 X1X1 X2X2 X2X2 X3X3 X3X3 X4X4 X1X1 X1X1 X1X1 X2X2 X2X2 X3X3 X3X3X3 X4X4 Average Latency = 6
Computer Science and Engineering Copyright by Hesham El-Rewini Collision Vector C = (C m, C m-1, …, C 2, C 1 ) C i = 1 if latency i causes collision (forbidden) C i = 0 if latency i is permissible C m = 1 (always) maximum forbidden latency Maximum forbidden latency: m <= n-1 n = number of column in reservation table
Computer Science and Engineering Copyright by Hesham El-Rewini Collision Vector (X after X) Forbidden Latencies: 2, 4, 5, 7 Collision Vector =
Computer Science and Engineering Copyright by Hesham El-Rewini Collision Vector (Y after Y) Forbidden Latencies: 2, 4 Collision Vector =
Computer Science and Engineering Copyright by Hesham El-Rewini Single Function Controller C.V. Gate Grant X Grant X if 0 0 OR X after X
Computer Science and Engineering Copyright by Hesham El-Rewini Controller for a dual-function pipeline C.V. M after M C.V. M after A Gate Grant AGrant M OR Grant M if 0 0 C.V. A after M C.V. A after A Gate Grant AGrant M OR Grant A if 0 0
Computer Science and Engineering Copyright by Hesham El-Rewini State Diagram It specifies the permissible state transitions among successive initiations Collision vector corresponds to the initial state at time t = 1 (initial collision vector) The next state comes at time t + p, where p is a permissible latency in the range 1 <= p < m
Computer Science and Engineering Copyright by Hesham El-Rewini Right Shift Register The next state can be obtained with the help of an m-bit shift register Collision Safe to allow an initiation Each 1-bit shift corresponds to increase in the latency by 1
Computer Science and Engineering Copyright by Hesham El-Rewini The next state The next state is obtained by bitwise ORing the initial collision vector with the shifted register C.V. = (first state) C.V. 1-bit right shifted initial C.V OR
Computer Science and Engineering Copyright by Hesham El-Rewini State Diagram for X *3* 1*1*
Computer Science and Engineering Copyright by Hesham El-Rewini Cycles Simple cycles each state appears only once (3), (6), (8), (1, 8), (3, 8), and (6,8) Greedy Cycles simple cycles whose edges are all made with minimum latencies from their respective starting states (1,8), (3) one of them is MAL
Computer Science and Engineering Copyright by Hesham El-Rewini MAL Minimum Average latency At least one of the greedy cycles will lead to the MAL Consider state diagram for Y, MAL is 3 (See diagram)
Computer Science and Engineering Copyright by Hesham El-Rewini State Diagram for Y *3* 1*1*
Computer Science and Engineering Copyright by Hesham El-Rewini Bounds on the MAL MAL is lower bounded by the maximum number of checkmarks in any row of the reservation table. (Shar, 1972) MAL is lower than or equal to the average latency of any greedy cycle in the state diagram. (Shar, 1972) The average latency of any greedy cycle is upper-bounded by the number of 1’s in the initial collision vector plus 1. This is also an upper bund on the MAL. (Shar, 1972)
Computer Science and Engineering Copyright by Hesham El-Rewini Delay Insertion The purpose is to modify the reservation table, yielding a new collision vector This may lead to a modified state diagram, which may produce greedy cycles meeting the lower bound on MAL
Computer Science and Engineering Copyright by Hesham El-Rewini Example S1 S2 S3 output
Computer Science and Engineering Copyright by Hesham El-Rewini Example (Cont.) S1XX S2XX S3XX Forbidden Latencies: 1, 2, 4 C.V.
Computer Science and Engineering Copyright by Hesham El-Rewini Example (Cont.) State Diagram * 5+ MAL = 3
Computer Science and Engineering Copyright by Hesham El-Rewini Example (Cont.) S1 S2 S3 output D1 D2
Computer Science and Engineering Copyright by Hesham El-Rewini Example (Cont.) S1XX S2XX S3XX D1X D2X Forbidden: 2, 6 C.V.
Computer Science and Engineering Copyright by Hesham El-Rewini Group Activity 1 Find the State Diagram
Computer Science and Engineering Copyright by Hesham El-Rewini Pipeline Throughput The average number of task initiations per clock cycle The inverse of MAL
Computer Science and Engineering Copyright by Hesham El-Rewini Group Activity S1XX S2X S3X C.V State DiagramSimple Cycles Greedy Cycles MAL Throughput (t = 20 ns)