Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)

Slides:



Advertisements
Similar presentations
Arithmetic for Computers
Advertisements

©UCB CPSC 161 Lecture 6 Prof. L.N. Bhuyan
Computer Architecture ECE 361 Lecture 6: ALU Design
1 CONSTRUCTING AN ARITHMETIC LOGIC UNIT CHAPTER 4: PART II.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3: IT Students.
Chapter 3 Arithmetic for Computers. Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's.
EECC550 - Shaaban #1 Lec # 7 Spring MIPS Integer ALU Requirements Add, AddU, Sub, SubU, AddI, AddIU:  2’s complement adder/sub with overflow.
Lecture 9 Sept 28 Chapter 3 Arithmetic for Computers.
1 Representing Numbers Using Bases Numbers in base 10 are called decimal numbers, they are composed of 10 numerals ( ספרות ) = 9* * *10.
ECE 232 L8.Arithm.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 8 Computer.
1  2004 Morgan Kaufmann Publishers Chapter Three.
1 Chapter 4: Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture Assembly Language and.
Chapter Four Arithmetic and Logic Unit
ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 6: Logic/Shift Instructions Partially adapted from Computer Organization and Design, 4.
1  1998 Morgan Kaufmann Publishers Chapter Four Arithmetic for Computers.
COE 308: Computer Architecture (T041) Dr. Marwan Abu-Amara Integer & Floating-Point Arithmetic (Appendix A, Computer Architecture: A Quantitative Approach,
Week 7.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 7 [Adapted from Dave Patterson’s UCB CS152 slides and Mary.
Chapter 3 Arithmetic for Computers. Arithmetic Where we've been: Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's.
ECE 232 L9.Mult.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 9 Computer Arithmetic.
Ceg3420 l7 Multiply 1 Fall 1998 © U.C.B. CEG3420 Computer Design Lecture 7: VHDL, Multiply, Shift.
1 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2)
Computer Arithmetic Nizamettin AYDIN
Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.
Computing Systems Basic arithmetic for computers.
ECE232: Hardware Organization and Design
07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.
CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.
1 EGRE 426 Fall 08 Chapter Three. 2 Arithmetic What's up ahead: –Implementing the Architecture 32 operation result a b ALU.
1  1998 Morgan Kaufmann Publishers Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture.
Lecture 6: Multiply, Shift, and Divide
EEL-4713C Computer Architecture Introduction: the Logic Design Process
Computer Architecture Chapter 3 Instructions: Arithmetic for Computer Yu-Lun Kuo 郭育倫 Department of Computer Science and Information Engineering Tunghai.
Cs 152 l6 Multiply 1 DAP Fa 97 © U.C.B. ECE Computer Architecture Lecture Notes Multiply, Shift, Divide Shantanu Dutt Univ. of Illinois at.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Lecture notes Reading: Section 3.4, 3.5, 3.6 Multiplication
Kavita Bala CS 3410, Spring 2014 Computer Science Cornell University.
Csci 136 Computer Architecture II – Multiplication and Division
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
1 Modified from  Modified from 1998 Morgan Kaufmann Publishers Chapter Three: Arithmetic for Computers Section 2 citation and following credit line is.
CS152 / Kubiatowicz Lec6.1 2/12/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 6 Multiply, Divide, Shift February 12, 2003 John.
CS/EE 3700 : Fundamentals of Digital System Design Chris J. Myers Lecture 5: Arithmetic Circuits Chapter 5 (minus 5.3.4)
1 ELEN 033 Lecture 4 Chapter 4 of Text (COD2E) Chapters 3 and 4 of Goodman and Miller book.
Addition, Subtraction, Logic Operations and ALU Design
Cpu control.1 2/14 Datapath Components for Lab The Processor! ( th ed)
순천향대학교 정보기술공학부 이 상 정 1 3. Arithmetic for Computers.
Number Representation and Arithmetic Circuits
1 ALU for Computers (MIPS) design a fast ALU for the MIPS ISA requirements ? –support the arithmetic/logic operations: add, addi addiu, sub, subu, and,
Prof. Hsien-Hsin Sean Lee
CMPUT Computer Organization and Architecture I1 CMPUT229 - Fall 2003 Topic6: Logic, Multiply and Divide Operations José Nelson Amaral.
Computer Arthmetic Chapter Four P&H. Data Representation Why do we not encode numbers as strings of ASCII digits inside computers? What is overflow when.
EE204 L03-ALUHina Anwar Khan EE204 Computer Architecture Lecture 03- ALU.
By Wannarat Computer System Design Lecture 3 Wannarat Suntiamorntut.
1 CPTR 220 Computer Organization Computer Architecture Assembly Programming.
William Stallings Computer Organization and Architecture 8th Edition
Computer System Design Lecture 3
Computer Arthmetic Chapter Four P&H.
Integer Multiplication and Division
Morgan Kaufmann Publishers Arithmetic for Computers
ECEG-3202 Computer Architecture and Organization
Computer Architecture
Computer Architecture EECS 361 Lecture 6: ALU Design
Morgan Kaufmann Publishers Arithmetic for Computers
Presentation transcript:

arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)

arithmetic.2 2/15 Requirements: CPU needs a 32-bit ALU (1) Functional Specification inputs: 2 x 32-bit operands A, B, 4-bit mode outputs:32-bit result S, 1-bit carry, 1 bit overflow operations:add, addu, sub, subu, and, or, xor, nor, slt, sltU (2) Block Diagram (schematic symbol/ Verilog description) ALU AB m ovf S 32 4 c

arithmetic.3 2/15 1-bit adder Review (Appendix B.5, B.6) ABCCoSum Sum = a!bc! + ab!c! + a!b!c+abc = a b c = XOR Carryout = a!bc + ab!c + abc! + abc a b Sum Cin Co A B Cin sum 2 units of delay from A/B to sum 1unit of delay from Cin to sum

arithmetic.4 2/15 Carry Out circuit Cin a b Cout 2 units of delay from Cin to Cout

arithmetic.5 2/15 1-bit ALU cell: ADD, AND, OR A B 1-bit Full Adder CarryOut Mux CarryIn Result add and or S-select ABCCoCo O Full Adder (3->2 element)

arithmetic.6 2/15 Additional operations: Subtract, AND, OR A - B = A + (– B) = A + B + 1 –form two complement by invert and add one A B 1-bit Full Adder CarryOut Mux CarryIn Result add and or S-select invert

arithmetic.7 2/15 1-bit ALU: AND, OR, a+b, a+b! Most significant bit ALU Delays Result = 1 gate delay From a to result = 2 Form b to Result = 2 (ignore b invert)

arithmetic.8 2/15 Final 32-bit ALU, including zero detect Operation

arithmetic.9 2/15 Behavioral Representation: verilog, RTL FYI) module ALU(A, B, m, S, c, ovf); input [0:31] A, B; input [0:3] m; output [0:31] S; output c, ovf; reg [0:31] S; reg c, ovf; B, m) begin case (m) 0: S = A + B;... end endmodule Code written, simulated & verified translated into hardware (mapped) How complex digital design is done

arithmetic.10 2/15 Overflow ?? - 4-bit example Examples: = 10 but = - 9 but... 2’s ComplementBinaryDecimal Decimal – 6 – 4 – 5 7

arithmetic.11 2/15 Overflow Detection Overflow: arithmetic result too large (or too small) to represent properly –Example: - 8  4-bit binary number  7 When adding operands with different signs, overflow cannot occur! Overflow occurs when adding: –2 positive numbers and sum is negative –2 negative numbers and the sum is positive On your own: Prove you can detect overflow by: –Carry into MSB  Carry out of MSB – 6 –4 – 5 7 0

arithmetic.12 2/15 Overflow Detection Logic Carry into MSB  Carry out of MSB –For a N-bit ALU: Overflow = CarryIn[N - 1] XOR CarryOut[N - 1] CarryIn0 A0 B0 1-bit ALU Result0 CarryOut0 A1 B1 1-bit ALU Result1 CarryIn1 CarryOut1 A2 B2 1-bit ALU Result2 CarryIn2 A3 B3 1-bit ALU Result3 CarryIn3 CarryOut3 Overflow XYX XOR Y

arithmetic.13 2/15 MIPS ALU requirements Add, AddU, Sub, SubU, AddI, AddIU –=> 2’s complement adder/sub with overflow detection And, Or, AndI, OrI, Xor, Xori, Nor –=> Logical AND, logical OR, XOR, nor SLTI, SLTIU (set less than) –=> 2’s complement adder with inverter, check sign bit of result ALU must support these ops

arithmetic.14 2/15 MIPS arithmetic instruction format - Review Signed arithmetic generate overflow, no carry R-type: I-Type: opRsRtRdfunct opRsRtImmed 16 Typeopfunct ADDI10xx ADDIU11xx SLTI12xx SLTIU13xx ANDI14xx ORI15xx XORI16xx LUI17xx Typeopfunct ADD0040 ADDU0041 SUB0042 SUBU0043 AND0044 OR0045 XOR0046 NOR0047 Typeopfunct SLT0052 SLTU0053

arithmetic.15 2/15 Ripple Adder Performance? Critical Path of n-bit Rippled-carry adder is n*CP A0 B0 1-bit ALU Result0 CarryIn0 CarryOut0 A1 B1 1-bit ALU Result1 CarryIn1 CarryOut1 A2 B2 1-bit ALU Result2 CarryIn2 CarryOut2 A3 B3 1-bit ALU Result3 CarryIn3 CarryOut3 Very slow: Must improve Assume t = carry delay / bit 32- bit ALU needs 32 * t units of delay 64-bit ALU needs 64 * t units of delay A B Cin sum 2 units of delay from A/B to sum 1unit of delay from Cin to sum

arithmetic.16 2/15 Fast Addition : Carry Lookahead Carry Inputs can be precomputed by logic c1 = g0 + c0  p0 = a0  b0 + c0  (a0 + b0) p0 = a0 + b0 g0 = a0  b0 c2 = g1 + p1  c1 = g1 + p1  g0 + p1  p0  c0 = a1  b1 + c1  a1 + b1) p1 = a1 + b1 g1 = a1  b1 c3 = g2 + p2  g1 + p2  p1  g0 + p2  p1  p0  c0 c4 = g3 + p3  g2 + p3  p2  g1 + p3  p2  p1  g0 + p3  p2  p1  p0  c0 C 4 = func( a 3, b 3, a 2, b 2, a 1, b 1, a 0, b 0, c 0 ) 1 unit delay each p, g 1 unit delay 3 units of delay

arithmetic.17 2/15 Fast Addition: Carry Look Ahead – 4 bits ABC-out 000“kill” 01C-in“propagate” 10C-in“propagate” 111“generate” g = a and b 1 delay p = a or b C0 = Cin c1 = g0 + c0  p0 c2 = g1 + g0  p1  + c0  p0  p1 c3 = g2 + g1  p2 + g0  p1  p2 + c0  p0  p1  p2 a0 b0 a1 b1 a2 b2 a3 b3 S S S S g p g p g p g p G0= g3 + p3  g2 + p3  p2  g1 + p3  p2  p1  g0 C4 =... P0 = p3  p2  p1  p0 3 units of delay for G0 3 units of delay for c1, c2, c3, (c4) 4 units of delay for S1, S2, S

arithmetic.18 2/15 Carry Lookahead – 2 nd level – 16 bits Add 2 nd level abstraction for more practical 4-bit units Each P i, G i handles 4 bits at a time, 0-3, 4-7, 8-11,..) P0 = p3  p2  p1  p0; G0 = g3 + p3  g2 + p3  p2  g1 + p3  p2  p1  g0 P1 = p7  p6  p5  p4; G1 = g7 + p7  g6 + p7  p6  g5 + p7  p6  p5  g4 P2 = p11  p10  p9  p8; G2 =g11 + p11  g10 + p11  p10  g9 + p11  p10  p9  g8 P3 = p15  p14  p13  p12; G3 = ……. 3 units of delay for G0, G1, G2, G3 2 units of delay for P0, P1, P2, P3

arithmetic.19 2/15 Fast Addition: Cascaded Carry Look-ahead (16-bit): CLACLA 4-bit Adder 4-bit Adder 4-bit Adder c4 = G0 + C0  P0 c8 = G1 + G0  P1 + C0  P0  P1 c12 = G2 + G1  P2 + G0  P1  P2 + C0  P0  P1  P2 G P G0 P0 c16 =... C0 5 units of delay for c8, c12, c16 c4 has 4 units of delay c8 c

arithmetic.20 2/15 Carry Lookahead Homework You are required to calculate the performance of a 16-bit Carry lookahead adder similar to the one discussed in class. The design has 2 options 1. assuming ripple carry is used inside each 4-bit cell 2.Carry lookahead is used inside each 4-bit cell Both cases use carry lookahead at predicting 4-bit boundary carries [c4, c8, c12] Draw a table showing the delay of each adder bit i.e. Sum0 - Sum 15; as well as the carry at each stage of the design – for the 2 designs

arithmetic.21 2/15 8-bit carry lookahead adder (4-bit block is also CLA) c5= g4 + c4.p4 Delays 1 4 1

arithmetic.22 2/15 8-bit CLA – uses ripple carry inside 4-bit block a0 b0 Result0 Result1 Result2 Result3 a1 b1 a2 b2 a3 b3 a4 b4 Result4 Result5 Result6 Result7 a7 b7 a6 b6 a5 b5 2 nd level carry lookahead c

arithmetic.23 2/15 Additional MIPS ALU requirements Mult, MultU, Div, DivU => Need 32-bit multiply and divide, signed and unsigned Sll, Srl, Sra => Need left shift, right shift, right shift arithmetic by 0 to 31 bits Nor (leave as exercise !) => logical NOR or use 2 steps: (A OR B) XOR

arithmetic.24 2/15 Multiply, Divide & Shift

arithmetic.25 2/15 MIPS arithmetic instructions InstructionExampleMeaningComments add add $1,$2,$3$1 = $2 + $33 operands; exception possible subtractsub $1,$2,$3$1 = $2 – $33 operands; exception possible add immediateaddi $1,$2,100$1 = $ constant; exception possible add unsignedaddu $1,$2,$3$1 = $2 + $33 operands; no exceptions subtract unsignedsubu $1,$2,$3$1 = $2 – $33 operands; no exceptions add imm. unsign.addiu $1,$2,100$1 = $ constant; no exceptions multiply mult $2,$3Hi, Lo = $2 x $364-bit signed product multiply unsignedmultu$2,$3Hi, Lo = $2 x $3 64-bit unsigned product divide div $2,$3Lo = $2 ÷ $3,Lo = quotient, Hi = remainder Hi = $2 mod $3 divide unsigned divu $2,$3Lo = $2 ÷ $3,Unsigned quotient & remainder Hi = $2 mod $3 Move from Himfhi $1$1 = HiUsed to get copy of Hi Move from Lomflo $1$1 = LoUsed to get copy of Lo

arithmetic.26 2/15 MULTIPLY (unsigned) Paper and pencil example (unsigned): Multiplicand 1000A Multiplier 1001B Product m bits x n bits = m+n bit product Binary makes it easy: –0 => place 0 ( 0 x multiplicand) –1 => place a copy ( 1 x multiplicand) 4 versions of multiply hardware & algorithm: –successive refinement

arithmetic.27 2/15 Fast Multiply== Array Multiplier Stage i accumulates A * 2 i if B i == 1 Q: How much hardware for 32 bit multiplier? Critical path? B0B0 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 B1B1 B2B2 B3B3 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P Bi AjAj Multiplicand A Multiplier B Product P Cell delays ?

arithmetic.28 2/15 Multiplier operation At each stage shift multiplicand left ( x 2) Multiplier bit B i determines : add in shifted multiplicand Accumulate 2n bit partial product at each stage B0B0 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 B1B1 B2B2 B3B3 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P Multiplication, using shift & Add

arithmetic.29 Multiplication, using shift & Add long-multiplication approach 1000 × Length of product is the sum of operand lengths multiplicand multiplier product 2/15

arithmetic.30 Multiplication Hardware using shift & Add Initially 0 2/15

arithmetic.31 Optimized Multiplier using shift & Add Perform steps in parallel: add/shift One cycle per partial-product addition ok, if frequency of multiplications is low 2/15 32 – bit ALU, multiplicand

arithmetic.32 2/15 Multiply Algorithm Done Yes: 32 repetitions 2. Shift the Product register right 1 bit. No: < 32 repetitions 1. Test Product0 Product0 = 0 Product0 = 1 1a. Add multiplicand to the left half of product & place the result in the left half of Product register 32nd repetition? Start : : : : : : : : Product Multiplicand

arithmetic.33 2/15 MIPS logical instructions InstructionExampleMeaningComment and and $1,$2,$3$1 = $2 & $33 reg. operands; Logical AND or or $1,$2,$3$1 = $2 | $33 reg. operands; Logical OR xor xor $1,$2,$3$1 = $2  $33 reg. operands; Logical XOR nor nor $1,$2,$3$1 = ~($2 |$3)3 reg. operands; Logical NOR and immediate andi $1,$2,10$1 = $2 & 10Logical AND reg, constant or immediate ori $1,$2,10$1 = $2 | 10Logical OR reg, constant xor immediate xori $1, $2,10 $1 = ~$2 &~10Logical XOR reg, constant shift left logical sll $1,$2,10$1 = $2 << 10Shift left by constant shift right logical srl $1,$2,10$1 = $2 >> 10Shift right by constant shift right arithm. sra $1,$2,10$1 = $2 >> 10Shift right (sign extend) shift left logical sllv $1,$2,$3$1 = $2 << $3 Shift left by variable shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable

arithmetic.34 2/15 How shift instructions are implemented Two kinds: logical-- value shifted in is always "0" arithmetic-- on right shifts, sign extend msblsb"0" msblsb"0" instruction can request 0 to 32 bits to be shifted! 1011  1110 shift right arithmetic by  1011 shift right logical by 2

arithmetic.35 –Shift value can be either be: 5 bit unsigned integer Specified in bottom byte of another register. Example: ADDr0, r1, r2, LSL#7 Semantics: r2 is shifted left by 7 & then added to r1 Result Operand 1 Barrel Shifter Operand 2 ALU ARM :: Barrel Shifter: 2/1 4

arithmetic.36 2/15 Barrel Shifter, used in ICs Shift Right using one transistor per switch

arithmetic.37 Barrel Shifter, used in ICs Shift ……Left & right D3 D2 D1 D0 A5 A4 A3 A2A1A0 SR0SR1SR2 SL 1SL 2SL3

arithmetic.38 2/15 Summary: Multiply & Shift Multiply: successive refinement to see final design –32-bit Adder, 64-bit shift register, 32-bit Multiplicand Register Fast multiply  Array multiplier Shifter: success refinement 1/bit at a time shift register to barrel shifter

arithmetic.39 2/15 Floating Point Arithmetic How to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large numbers, e.g.,  10 9 Fixed point Floating point: a number system with floating decimal point Normalized numbers: no leading 0’s, single digit before decimal point 1.0 x x

arithmetic.40 2/15 Floating Point Notation – IEEE 754 FP 6.02 x x exponent radix (base) Mantissa decimal point Sign, magnitude IEEE F.P. ± 1.M x 2 e Issues: –Arithmetic (+, -, *, / ) –Representation, Normal form –Range and Precision, Single, Double –Rounding –Exceptions (e.g., divide by zero, overflow, underflow)

arithmetic.41 2/15 Floating-Point Arithmetic Floating point numbers in IEEE 754 standard: single precision 1823 sign exponent: excess 127 binary integer mantissa: sign + magnitude, normalized binary significand w/ hidden integer bit: 1.M actual exponent is e = E SE M N = (-1) 2 (1.M) S E < E < = = Numbers that can be represented is in the range: (1.0) to2 127 ( ) Double Precision IEEE 754 [64-bits] Exponent = 11 bits, Bias = 1023, Mantissa = 52, Sign= 1bit 127

arithmetic.42 2/15 Exponent Bias used to simplify comparisons If we use 2’s complement, not good for sorting and comparison most negative most positive exponent exponent

arithmetic.43 2/15 Floating Point – Example review Represents –bias = 127 for 32-bit word –S = 1: negative 0: positive or zero Example (from fraction to floating point representation) -0.75

arithmetic.44 2/15 Floating-Point Example - review Represent –0.75 ––0.75 = (–1) 1 × × 2 –1 –S = 1 –Fraction = 1000…00 2 –Exponent = –1 + Bias = 126 Single: – = 126 = Double: – = 1022 = Single: …00 Double: …00

arithmetic.45 2/15 Addition – Multiply Algorithm issues For addition (or subtraction) : (1) compute Ye - Xe (getting ready to align binary point) (2) right shift Xm that many positions to form Xm 2 (3) compute Xm 2 + Ym (4) for multiply, doubly biased exponent must be corrected: Xe = 7 Ye = -3 Excess 8 extra subtraction step of the bias amount Xe-Ye Xe = 1111 Ye = = 15 = 5 20 = =

arithmetic.46 2/15 Floating Point Addition Step 1: align, round Step 2: add Step 3: normalize, check overflow or underflow Step 4: round Example:

arithmetic.47 2/15 Floating Point Multiplication Step 1: add exponents, subtract bias, Mpy mantissas Step 2: normalize and check over/underflow Step 3: round Step 4: check sign Example:

arithmetic.48 FP Adder Hardware more complex than integer adder Doing it in one clock cycle - takes too long –Much longer than integer operations –Slower clock would penalize all instructions FP adder usually takes several cycles – pipelined 2/15

arithmetic.49 FP Adder Hardware Step 1 Step 2 Step 3 Step 4 2/15

arithmetic.50 2/15 Floating Point: Overflow & Underflow Exponent too large to be represented Underflow: negative exponent too small to fit in exponent field

arithmetic.51 2/15 Summary of Floating Point Arithmetic IEEE floating point standard 32 bit and 64 bit Converting decimal numbers to floating point and vice versa Overflow and underflow Floating point add and multiply