Download presentation
Presentation is loading. Please wait.
Published byRandolf Ross Modified over 8 years ago
1
Computer Architecture: A Constructive Approach Pipelining combinational circuits Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*, Massachusetts Institute of Technology Derek Chiou, The University of Texas at Austin * Joel Emer, Li-Shiuan Peh, Murali Vijayaraghavan, Asif Khan, Abhinav Agarwal, Myron King 1
2
Contents FFT: Another complex combinational circuit and its folded implementations New Bluespec concepts structure type Overloading Elastic vs Inelastic pipelines for arithmetic circuits Valid/Invalid data and the Maybe type Multi-rule systems Effect of Concurrency properties of FIFOs on pipelines Bluespec execution model 2
3
Combinational IFFT in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power,... * * * * + - - + + - - + *j t2t2 t0t0 t3t3 t1t1 3
4
4-way Butterfly Node function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) x); t’s (twiddle coefficients) are mathematically derivable constants for each bfly4 and depend upon the position of bfly4 the in the network * * * * + - - + + - - + *i x0x1x2x3x0x1x2x3 t0t1t2t3t0t1t2t3 4
5
BSV code: 4-way Butterfly function Vector#(4,Complex#(s)) bfly4 (Vector#(4,Complex#(s)) t, Vector#(4,Complex#(s)) x); Vector#(4,Complex#(s)) m, y, z; m[0] = x[0] * t[0]; m[1] = x[1] * t[1]; m[2] = x[2] * t[2]; m[3] = x[3] * t[3]; y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]); z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3]; return(z); endfunction Polymorphic code: works on any type of numbers for which *, + and - have been defined * * * * + - - + + - - + *i my z Note: Vector does not mean storage; just a group of wires with names 5
6
Language notes: Sequential assignments Sometimes it is convenient to reassign a variable (x is zero everywhere except in bits 4 and 8): Order matters! This will usually result in introduction of muxes in a circuit as the following example illustrates: Bit#(32) x = 0; let y = x+1; if(p) x = 100; let z = x+1; x 0 100 p +1 z y Bit#(32) x = 0; x[4] = 1; x[8] = 1; 6
7
Complex Arithmetic Addition z R = x R + y R z I = x I + y I Multiplication z R = x R * y R - x I * y I z I = x R * y I + x I * y R 7
8
Representing complex numbers as a struct typedef struct{ Int#(t) r; Int#(t) i; } Complex#(numeric type t) deriving (Eq,Bits); Notice the Complex type is parameterized by the size of Int chosen to represent its real and imaginary parts If x is a struct then its fields can be selected by writing x.r and x.i 8
9
Bluespec code for Addition typedef struct { Int#(t) r; Int#(t) i; } Complex#(numeric type t) deriving (Eq,Bits); function Complex#(t) cAdd (Complex#(t) x, Complex#(t) y); Int#(t) real = x.r + y.r; Int#(t) imag = x.i + y.i; return(Complex{r:real, i:imag}); endfunction What is the type of this + ? 9
10
Overloading (Type classes) The same symbol can be used to represent different but related operators using Type classes A type class groups a bunch of types with similarly named operations. For example, the type class Arith requires that each type belonging to this type class has operators +,-, *, / etc. defined We can declare Complex type to be an instance of Arith type class 10
11
Overloading +, * instance Arith#(Complex#(t)); function Complex#(t) \+ (Complex#(t) x, Complex#(t) y); Int#(t) real = x.r + y.r; Int#(t) imag = x.i + y.i; return(Complex{r:real, i:imag}); endfunction function Complex#(t) \* (Complex#(t) x, Complex#(t) y); Int#(t) real = x.r * y.r – x.i * y.i; Int#(t) imag = x.r * y.i + x.i * y.r; return(Complex{r:real, i:imag}); endfunction … endinstance The type context allows the compiler to pick the appropriate definition of an operator 11
12
Combinational IFFT in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute stage_f function repeat stage_f three times function Vector#(64, Complex#(n)) stage_f (Bit#(2) stage, Vector#(64, Complex#(n)) stage_in); function Vector#(64, Complex#(n)) ifft (Vector#(64, Complex#(n)) in_data); 12
13
BSV Code: Combinational IFFT function Vector#(64, Complex#(n)) ifft (Vector#(64, Complex#(n)) in_data); //Declare vectors Vector#(4,Vector#(64, Complex#(n))) stage_data; stage_data[0] = in_data; for (Bit#(2) stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]); endfunction The for-loop is unfolded and stage_f is inlined during static elaboration Remember: no notion of loops or procedures during execution 13
14
BSV Code: Combinational IFFT- Unfolded function Vector#(64, Complex#(n)) ifft (Vector#(64, Complex#(n)) in_data); //Declare vectors Vector#(4,Vector#(64, Complex#(n))) stage_data; stage_data[0] = in_data; for (Bit#(2) stage = 0; stage < 3; stage = stage + 1 stage_data[stage+1] = stage_f(stage,stage_data[stage] return(stage_data[3]); endfunction Stage_f can be inlined now; it could have been inlined before loop unfolding also. stage_data[1] = stage_f(0,stage_data[0]); stage_data[2] = stage_f(1,stage_data[1]); stage_data[3] = stage_f(2,stage_data[2]); 14
15
Bluespec Code for stage_f function Vector#(64, Complex#(n)) stage_f (Bit#(2) stage, Vector#(64, Complex#(n)) stage_in); Vector#(64, Complex#(n)) stage_temp, stage_out; for (Integer i = 0; i < 16; i = i + 1) begin Integer idx = i * 4; Vector#(4, Complex#(n)) x; x[0] = stage_in[idx]; x[1] = stage_in[idx+1]; x[2] = stage_in[idx+2]; x[3] = stage_in[idx+3]; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, x); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; return(stage_out); endfunction twid’s are mathematically derivable constants 15
16
Combinational IFFT Lot of area and long combinational delay Folded or multi-cycle version can save area and reduce the combinational delay but throughput per clock cycle gets worse Pipelining: a method to increase the circuit throughput by evaluating multiple IFFTs in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute IFFT i IFFT i-1 IFFT i+1 3 different datasets in the pipeline 16
17
Inelastic pipeline x sReg1inQ f0f1f2 sReg2outQ rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2)); endrule This is real IFFT code; just replace f0, f1 and f2 with stage_f code This rule can fire only if Rule Atomicity: Either all or none of the state elements inQ, outQ, sReg1 and sReg2 will be updated - inQ has an element - outQ has space 17
18
Stage functions f1, f2, and f3 function f0(x); return (stage_f(0,x)); endfunction function f1(x); return (stage_f(1,x)); endfunction function f2(x); return (stage_f(2,x)); endfunction 18
19
Implicit Conditions Bluespec figures out if a rule can fire based on both explicit conditions and implicit conditions For example, if a rule must enqueue into a FIFO, the FIFO cannot be full 19
20
Inelastic pipeline Making implicit guard conditions explicit x sReg1inQ f0f1f2 sReg2outQ rule sync-pipeline (!inQ.empty() && !outQ.full); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2)); endrule Suppose sReg1 and sReg2 have data, outQ is not full but inQ is empty. What behavior do you expect? Leave green and red data in the pipeline? 20
21
Pipeline bubbles x sReg1inQ f0f1f2 sReg2outQ rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2)); endrule Red and Green tokens should move even if there is nothing in inQ! Modify the rule to deal with these conditions Also if there is no token in sReg2 then nothing should be enqueued in the outQ Valid bits or the Maybe type 21
22
Explicit encoding of Valid/Invalid data rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= f0(inQ.first()); inQ.deq(); sReg1f <= Valid end else sReg1f <= Invalid; sReg2 <= f1(sReg1); sReg2f <= sReg1f; if (sReg2f == Valid) outQ.enq(f2(sReg2)); endrule sReg1 sReg2 x inQ f0f1f2 outQ Valid/Invalid 22 Must compile with --aggressive-conditions
23
When is this rule enabled? inQ sReg1f sReg2foutQ NEVVNF NEVVF NEVINF NEVIF NEIVNF NEIVF NEIINF NEIIF EVVNF EVVF EVINF EVIF EIVNF EIVF EIINF EIIF Yes No Yes No Yes No Yes No Yes1 Yes Yes1 = yes but no change inQ sReg1f sReg2foutQ rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= f0(inQ.first()); inQ.deq(); sReg1f <= Valid end else sReg1f <= Invalid; sReg2 <= f1(sReg1); sReg2f <= sReg1f; if (sReg2f == Valid) outQ.enq(f2(sReg2)); endrule 23
24
The Maybe type A useful type to capture valid/invalid data typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); data valid/invalid Registers contain Maybe type values Some useful functions on Maybe type: isValid(x) returns true if x is Valid validValue(x) returns the data value in x if x is Valid 24
25
Using the Maybe type typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); data valid/invalid Registers contain Maybe type values rule sync-pipeline if (True); if (inQ.notEmpty()) begin sReg1 <= Valid f0(inQ.first()); inQ.deq(); end else sReg1 <= Invalid; sReg2 <= isValid(sReg1)? Valid f1(validValue(sReg1)) : Invalid; if isValid(sReg2) outQ.enq(f2(validValue(sReg2))); endrule 25
26
Pattern-matching: An alternative syntax to extract data structure components The &&& is a conjunction, and allows pattern-variables to come into scope from left to right case (m) matches tagged Invalid : return 0; tagged Valid.x : return x; endcase if (m matches (Valid.x) &&& (x > 10)) typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); x will get bound to the appropriate part of m 26
27
The Maybe type data in the pipeline typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); data valid/invalid Registers contain Maybe type values rule sync-pipeline if (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid(f0(inQ.first())); inQ.deq(); end else sReg1 <= tagged Invalid; case (sReg1) matches tagged Valid.sx1: sReg2 <= tagged Valid f1(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid.sx2: outQ.enq(f2(sx2)); endcase endrule sx1 will get bound to the appropriate part of sReg1 27
28
Elastic pipeline Use FIFOs instead of pipeline registers xx fifo1 inQ f1f2f3 fifo2 outQ rule stage1 if (True); fifo1.enq(f1(inQ.first()); inQ.deq();endrule rule stage2 if (True); fifo2.enq(f2(fifo1.first()); fifo1.deq();endrule rule stage3 if (True); outQ.enq(f3(fifo2.first()); fifo2.deq();endrule Can all three rules fire concurrently? What is the firing condition for each rule? Can tokens be left inside the pipeline? No need for Maybe types 28
29
Firing conditions for reach rule xx fifo1 inQ f1f2f3 fifo2 outQ inQ fifo1 fifo2outQ NENE,NFNE,NFNF NENE,NFNE,NFF NENE,NFNE,FNF NENE,NFNE,FF …. YesYesYes YesYesNo YesNoYes YesNoNo …. rule1 rule2 rule3 This is the first example we have seen where multiple rules may be ready to execute concurrently Can we execute multiple rules together? 29
30
Informal analysis xx fifo1 inQ f1f2f3 fifo2 outQ inQ fifo1 fifo2outQ NENE,NFNE,NFNF NENE,NFNE,NFF NENE,NFNE,FNF NENE,NFNE,FF …. YesYesYes YesYesNo YesNoYes YesNoNo …. rule1 rule2 rule3 FIFOs must permit concurrent enq and deq for the three rules to fire concurrently 30
31
Concurrency when the FIFOs do not permit concurrent enq and deq xx fifo1inQ f1f2f3 fifo2outQ not empty & not full not empty & not full At best alternate stages in the pipeline will be able to fire concurrently 31
32
Pipelined designs expressed using Multiple rules If rules for different pipeline stages never fire in the same cycle then the design can hardly be called a pipelined design If all the enabled rules fire in parallel every cycle then, in general, wrong results can be produced We need a clean model for concurrent firing of rules 32
33
Inelastic vs Elastic Pipelines Inelastic pipeline: typically only one rule; the designer controls precisely which activities go on in parallel downside: The rule can get too complicated -- easy to make mistakes; difficult to make changes Elastic pipeline: several smaller rules, each easy to write, easier to make changes downside: sometimes rules do not fire concurrently when they should 33
34
Executing Bluespec Rules 34
35
Bluespec Rule Execution A Bluespec program consists of state elements and rules, aka, Guarded Atomic Actions (GAA) that operate on the state elements Application of a rule modifies some state elements of the system in a deterministic manner current state next state values next state computation f x f x guard reg en’s nextState AND 35
36
Bluespec Execution Model Repeatedly: Select a rule to execute Compute the state updates Make the state updates Highly non- deterministic User annotations can help in rule selection 36
37
One-rule-at-time-semantics The legal behavior of a Bluespec program can always be explained by observing the state updates obtained by applying only one rule at a time Implementation concern: Schedule multiple rules concurrently without violating one-rule-at-a-time semantics 37
38
Concurrent scheduling The Bluespec compiler determines which rules, among the rules whose guards are ready, can be executed concurrently It then divides the rules into disjoint sets such that the rules within each set are conflict free Among conflicting sets of enabled rules it picks one set by some predetermined priority The designers need to develop intuition about which rules can be executed concurrently and which ones conflict and thus, must be executed one by one 38
39
What Did We Learn? Bluespec features Structures Overloading Pipelining Registers, with and without valid FIFOs Next: More on executing Bluespec rules 39
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.