Circuit Optimization CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos Prvulovic
What is on the FPGA chip? 3 Apr 2014Logic Optimization2
An FPGA Logic Element (LE) 3 Apr 2014Logic Optimization3
FPGA Logic Elements 3 Apr 2014Logic Optimization4
Inside a Logic Element 3 Apr 2014Logic Optimization5
The ALU With enough optimization, ALU by itself may become the critical path – Don’t try any of this until ALU is on the critical path! It needs to add, subtract, and, or, nor, etc. – Depending on which instruction is currently in A stage What we used before simplifies the R stage – No need to “decode” ALU operation, just feed IR[3:0] to ALU as control signal But decoding still happens! – Must decide which operation produces the ALU output! – All operations get done, then a MUX to choose And it’s a large and slow MUX We can do (much) better than that 3 Apr 2014Logic Optimization6
A (Much) Better ALU Overall ALU latency is – Time for longest operation, plus – Time to select between that and other results Which is the longest operation – What does + do? – What does – do? – What does < do? – What does <= do? – What does OR do? – … 3 Apr 2014Logic Optimization7
Aha! Obviously AND, OR, XOR and their cousins are way faster than their cousins – Let’s just do them all, select between them (don’t care what is selected if this isn’t a logic op) – The whole thing done way before adder value is ready But +,-,<,<= need to use an add/sub unit – One add/sub unit for all three, or one for each? 3 Apr 2014Logic Optimization8
One +,- unit? Good: – Less hardware (32 one-bit adders with carry bits) – No need to select between ADD and SUB afterwards Bad: – Must account for time to flip second operand (for SUB) before the adder/subtracter can begin its work! – Must use a full 1-bit adder for LSB bit If ADD was separate, it can use a half-adder (no carry-in) for the LSB But SUB needs Cin to be 1, so it uses full adders for all 16 bits Turns out both bad things don’t matter! – Why? Hint: What would be the longest path if we have an adder and a separate subtractor, then chose between their results? 3 Apr 2014Logic Optimization9
What about LT,LE,GT,EQ,NE,etc.? How do we do LT? How do we do EQ? How do we do LTE? What about NE, GT, GTE? 3 Apr 2014Logic Optimization10
One ADD/SUB unit So we have a single adder with – 32-bit data inputs (aluin1, aluin2) – A one bit carry-in input Controlled by a new addsub control signal – If adding, don’t flip aluin2 bits, Cin is 0 – If subtracting (for – or for <), flip aluin2, Cin is 1 After this, we need to – For ADD, SUB: ADD/SUB output goes to aluout – For logic operations: logic unit output goes to aluout – For LT: MSB of ADD/SUB output goes to LSB of ALU 3 Apr 2014Logic Optimization11
Producing ALU output So we have – Logic result – Add/sub result – Comparison result We have three control signals, each enables one of these to get to ALU output – Can use a tri-state bus for ALU output, use enable signals to output results of different sub-units – Can “and” each with its enable signal, then “or” those to get final ALU result Almost identical to what tri-state bus ends up doing – Can use MUXes to select among these three One 3-input MUX controlled by a 2-bit signal Better: 2-input MUX for the faster two sub-units, control by 1-bit signal 2-input MUX between that and slowest one, another 1-bit control 3 Apr 2014Logic Optimization12