Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Slides:



Advertisements
Similar presentations
Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
Advertisements

March 2007http://csg.csail.mit.edu/arvindSemantics-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Folding and Pipelining complex combinational circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
Stmt FSM Richard S. Uhler Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology (based on a lecture prepared by Arvind)
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.
December 10, 2009 L29-1 The Semantics of Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute.
Registers CPE 49 RMUTI KOTAT.
Computer Architecture: A Constructive Approach Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.
February 14, 2007L04-1http://csg.csail.mit.edu/6.375/ Bluespec-1: Design methods to facilitate rapid growth of SoCs Arvind Computer Science & Artificial.
September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture Combinational circuits-2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Folded Combinational Circuits as an example of Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Multiple Clock Domains (MCD) Continued … Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology November.
Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.
September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
March 1, 2006http://csg.csail.mit.edu/6.375/L09-1 Bluespec-3: Architecture exploration using static elaboration Arvind Computer Science & Artificial Intelligence.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 14, 2011L04-1.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Overview Logistics Last lecture Today HW5 due today
Introduction to Bluespec: A new methodology for designing Hardware
Introduction to Bluespec: A new methodology for designing Hardware
Concurrency properties of BSV methods and rules
Folded Combinational Circuits as an example of Sequential Circuits
Bluespec-6: Modeling Processors
Folded “Combinational” circuits
Scheduling Constraints on Interface methods
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Sequential Circuits Constructive Computer Architecture Arvind
Sequential Circuits: Constructive Computer Architecture
Combinational ALU Constructive Computer Architecture Arvind
Combinational circuits-2
Stmt FSM Arvind (with the help of Nirav Dave)
Performance Specifications
Pipelining combinational circuits
Multirule Systems and Concurrent Execution of Rules
Constructive Computer Architecture: Guards
BSV Types Constructive Computer Architecture Tutorial 1 Andy Wright
Sequential Circuits Constructive Computer Architecture Arvind
Pipelining combinational circuits
BSV objects and a tour of known BSV problems
Constructive Computer Architecture: Well formed BSV programs
Modules with Guarded Interfaces
Pipelining combinational circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Introduction to Bluespec: A new methodology for designing Hardware
Multirule systems and Concurrent Execution of Rules
Stmt FSM Arvind (with the help of Nirav Dave)
Constructive Computer Architecture: Guards
Reading: Study Chapter (including Booth coding)
Control Hazards Constructive Computer Architecture: Arvind
Montek Singh Mon, Mar 28, 2011 Lecture 11
Design methods to facilitate rapid growth of SoCs
Multirule systems and Concurrent Execution of Rules
Introduction to Bluespec: A new methodology for designing Hardware
Constructive Computer Architecture: Well formed BSV programs
Constructive Computer Architecture Tutorial 1 Andy Wright 6.S195 TA
Presentation transcript:

Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 18, L05-1

Content So far we have seen modules with methods which are called by rules outside the body Now we will see examples where a module may also contain rules gcd A common way to implement large combinational circuits is by folding where registers hold the state from one iteration to the next Implementing imperative loops Multiplication September 18,

Programming with rules: A simple example Euclid’s algorithm for computing the Greatest Common Divisor (GCD): subtract 36 subtract 63 swap 33 subtract 03 subtract answer: September 18,

Reg#(Bit#(32)) x <- mkReg(0); Reg#(Bit#(32)) y <- mkReg(0); rule gcd; if (x >= y) begin x <= x – y; end else if (x != 0) begin x <= y; y <= x; end endrule method Action start(Bit#(32) a, Bit#(32) b); x <= a; y <= b; endmethod method Bit#(32) result; return y; endmethod method Bool resultRdy; return x == 0; endmethod method Bool busy; return x != 0; endmethod GCD module Euclidean Algorithm A rule inside a module may execute anytime If x is 0 then the rule has no effect Start method should be called only if busy is False. The result is available only when resultRdy is True. September 18,

Circuits for GCD xy -> x-y (s 2 )x>y (s 3 ) !=0 x!=0 (s 1 ) 1 0 startEn x>y(s 3 ) x-y(s 2 ) 0 1 x!=0(s 1 ) xy 0 1 x>y(s 3 ) x!=0(s 1 ) xy startEn b a Busy ResultRdy Result A September 18,

Expressing a loop using registers int s = s0; for (int i = 0; i < 32; i = i+1) { s = f(s); } return s; C-code sel < 32 0 notDone +1 i en sel = start en = start | notDone s0 f sel s en We need two registers to hold s and i values from one iteration to the next. These registers are initialized when the computation starts and updated every cycle until the computation terminates September 18,

Expressing a loop in BSV < 32 notDone +1 sel 0 i en sel = start en = start | notDone f s0 sel s en Reg#(Bit#(32)) s <- mkRegU(); Reg#(Bit#(6)) i <- mkReg(32); rule step; if (i < 32) begin s <= f(s); i <= i+1; end endrule When a rule executes: all the registers are read at the beginning of a clock cycle computations to evaluate the next value of the registers are performed Registers that need to be updated are updated at the end of the clock cycle Muxes are need to initialize the registers September 18,

Combinational 32-bit multiply function Bit#(64) mul32(Bit#(32) a, Bit#(32) b); Bit#(32) tp = 0; Bit#(32) prod = 0; for(Integer i = 0; i < 32; i = i+1) begin Bit#(32) m = (a[i]==0)? 0 : b; Bit#(33) sum = add32(m,tp,0); prod[i:i] = sum[0]; tp = sum[32:1]; end return {tp,prod}; endfunction Combinational circuit uses 31 add32 circuits We can reuse the same add32 circuit if we store the partial results in a register September 18,

Multiply using registers function Bit#(64) mul32(Bit#(32) a, Bit#(32) b); Bit#(32) prod = 0; Bit#(32) tp = 0; for(Integer i = 0; i < 32; i = i+1) begin Bit#(32) m = (a[i]==0)? 0 : b; Bit#(33) sum = add32(m,tp,0); prod[i:i] = sum[0]; tp = sum[32:1]; end return {tp,prod}; endfunction Need registers to hold a, b, tp, prod and i Update the registers every cycle until we are done Combinational version September 18,

Sequential Circuit for Multiply Reg#(Bit#(32)) a <- mkRegU(); Reg#(Bit#(32)) b <- mkRegU(); Reg#(Bit#(32)) prod <-mkRegU(); Reg#(Bit#(32)) tp <- mkReg(0); Reg#(Bit#(6)) i <- mkReg(32); rule mulStep if (i < 32); Bit#(32) m = (a[i]==0)? 0 : b; Bit#(33) sum = add32(m,tp,0); prod[i] <= sum[0]; tp <= sum[32:1]; i <= i+1; endrule state elements a rule to describe the dynamic behavior So that the rule has no effect until i is set to some other value similar to the loop body in the combinational version September 18,

Dynamic selection requires a mux a[i] a i a[0],a[1],a[2],… a >> 0 when the selection indices are regular then it is better to use a shift operator (no gates!) September 18,

Replacing repeated selections by shifts Reg#(Bit#(32)) a <- mkRegU(); Reg#(Bit#(32)) b <- mkRegU(); Reg#(Bit#(32)) prod <-mkRegU(); Reg#(Bit#(32)) tp <- mkReg(0); Reg#(Bit#(6)) i <- mkReg(32); rule mulStep if (i < 32); Bit#(32) m = (a[0]==0)? 0 : b; a > 1; Bit#(33) sum = add32(m,tp,0); prod <= {sum[0], prod[31:1]}; tp <= sum[32:1]; i <= i+1; endrule September 18,

Circuit for Sequential Multiply bIn b a i == 32 0 done +1 prod result (low) [30:0] aIn << 31:0 tp s1 s2 s1 s1 = start_en s2 = start_en | !done result (high) 31 0 add :1 0 << September 18,

Circuit analysis Number of add32 circuits has been reduced from 31 to one, though some registers and muxes have been added The longest combinational path has been reduced from 62 FAs to one add32 plus a few muxes The sequential circuit will take 31 clock cycles to compute an answer September 18,

Observations These programs are not very complex and yet it would have been tedious to express these programs in a state table or as a circuit directly BSV method calls are not available in Verilog/VHDL, and thus such programs sometimes require tedious programming Even the meaning of double-write errors is not standardized across tool implementations in Verilog September 18,

A subtle problem done ? workQ doneQ let x = workQ.first; workQ.deq; if (isDone(x)) begin doneQ.enq(x); end else begin workQ.enq(doStep(x)); end while(!isDone(x)) { x = doStep(x); } Double write problem for 1-element Fifo doStep Later we will design FIFOs to permit simultaneous enq and deq September 18,

Illegal Actions – Double Write x <= e1; x <= e2; x <= e1; if( p ) x <= e2; if( p ) x <= e1; else x <= e2; Not an error Parallel composition of two actions is illegal if it creates the possibility of a double-write error, that is, if its constituent (sub)actions invoke the same action method September 18,

Shared counters redCgreenC if (inQA.first.color == Red) begin redQA.enq(inQA.first.value); inQA.deq; redC <= redC+1; end else begin greenQA.enq(inQA.first.value); inQA.deq; greenC <= greenC+1; end; if (inQB.first.color == Red) begin redQB.enq(inQB.first.value); inQB.deq; redC <= redC+1; end else begin greenQB.enq(inQB.first.value); inQB.deq; greenC <= greenC+1; end What is wrong with this code? Double write error Ignoring full/empty conditions inQA redQA greenQA inQB redQB greenQB September 18,

Pipelining Combinational Functions Lot of area and long combinational delay Folded or multi-cycle version can save area and reduce the combinational delay but throughput per clock cycle gets worse Pipelining: a method to increase the circuit throughput by evaluating multiple inputs xixi x i-1 x i+1 3 different datasets in the pipeline f0f1f2 September 18,

Inelastic vs Elastic pipeline x fifo1inQ f0f1f2 fifo2outQ x sReg1inQ f0f1f2 sReg2outQ Inelastic: all pipeline stages move synchronously Elastic: A pipeline stage can process data if its input FIFO is not empty and output FIFO is not Full Most complex processor pipelines are a combination of the two styles September 18,