Arithmetic for Computers

Slides:



Advertisements
Similar presentations
1 ECE 4436ECE 5367 Computer Arithmetic I-II. 2 ECE 4436ECE 5367 Addition concepts 1 bit adder –2 inputs for the operands. –Third input – carry in from.
Advertisements

ECE2030 Introduction to Computer Engineering Lecture 13: Building Blocks for Combinational Logic (4) Shifters, Multipliers Prof. Hsien-Hsin Sean Lee School.
1 Lecture 12: Hardware for Arithmetic Today’s topics:  Designing an ALU  Carry-lookahead adder Reminder: Assignment 5 will be posted in a couple of days.
Mohamed Younis CMCS 411, Computer Architecture 1 CMCS Computer Architecture Lecture 7 Arithmetic Logic Unit February 19,
CMPE 325 Computer Architecture II
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 8 - Multiplication.
1 CS 140 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris.
Arithmetic II CPSC 321 E. J. Kim. Today’s Menu Arithmetic-Logic Units Logic Design Revisited Faster Addition Multiplication (if time permits)
1 Representing Numbers Using Bases Numbers in base 10 are called decimal numbers, they are composed of 10 numerals ( ספרות ) = 9* * *10.
Computer Structure - The ALU Goal: Build an ALU  The Arithmetic Logic Unit or ALU is the device that performs arithmetic and logical operations in the.
Integer Multiplication and Division ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering.
Computer Organization Multiplication and Division Feb 2005 Reading: Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann.
Chap 3.3~3.5 Construction an Arithmetic Logic Unit (ALU) Jen-Chang Liu, Spring 2006.
Integer Multiplication and Division
1 Lecture 8: Binary Multiplication & Division Today’s topics:  Addition/Subtraction  Multiplication  Division Reminder: get started early on assignment.
Arithmetic-Logic Units CPSC 321 Computer Architecture Andreas Klappenecker.
ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 6: Logic/Shift Instructions Partially adapted from Computer Organization and Design, 4.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
1  1998 Morgan Kaufmann Publishers Chapter Four Arithmetic for Computers.
Chapter 3 Arithmetic for Computers. Arithmetic Where we've been: Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's.
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
1 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2)
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 10-1 The ALU.
Chapter 6-1 ALU, Adder and Subtractor
07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
Lecture 6: Multiply, Shift, and Divide
Csci 136 Computer Architecture II – Constructing An Arithmetic Logic Unit Xiuzhen Cheng
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /19/2013 Lecture 7: 32-bit ALU, Fast Carry Lookahead Instructor: Ashraf Yaseen DEPARTMENT OF MATH &
1 Arithmetic Logic Unit ALU. 2 The Bus Concept 3 CPU Building Blocks  Registers (IR, PC, ACC)  Control Unit (CU)  Arithmetic Logic Unit (ALU)
Computing Systems Designing a basic ALU.
Arithmetic Logic Unit (ALU) Anna Kurek CS 147 Spring 2008.
Integer Multiplication and Division
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
Ch3b- 2 EE/CS/CPE Computer Organization  Seattle Pacific University There is logic to it andRd, Rs, RtRd
1 Arithmetic I Instructor: Mozafar Bag-Mohammadi Ilam University.
IT253: Computer Organization
Integer Multiplication and Division ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering.
MIPS ALU. Building from the adder to ALU ALU – Arithmetic Logic Unit, does the major calculations in the computer, including – Add – And – Or – Sub –
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Arithmetic: Part II.
EI 209 Chapter 3.1CSE, 2015 EI 209 Computer Organization Fall 2015 Chapter 3: Arithmetic for Computers Haojin Zhu ( )
Appendix C Basics of Digital Logic Part I. Florida A & M University - Department of Computer and Information Sciences Modern Computer Digital electronics.
Addition, Subtraction, Logic Operations and ALU Design
CDA 3101 Spring 2016 Introduction to Computer Organization
ECE/CS 552: Arithmetic I Instructor:Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes partially based on set created by Mark Hill.
Arithmetic-Logic Units. Logic Gates AND gate OR gate NOT gate.
Computer Arthmetic Chapter Four P&H. Data Representation Why do we not encode numbers as strings of ASCII digits inside computers? What is overflow when.
MIPS ALU. Exercise – Design a selector? I need a circuit that takes two input bits, a and b, and a selector bit s. The function is that if s=0, f=a. if.
Integer Multiplication and Division COE 301 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University.
EE204 L03-ALUHina Anwar Khan EE204 Computer Architecture Lecture 03- ALU.
Integer Multiplication and Division ICS 233 Computer Architecture & Assembly Language Prof. Muhamed Mudawar College of Computer Sciences and Engineering.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
Computer System Design Lecture 3
Computer Arthmetic Chapter Four P&H.
Combinational Circuits
Computer Architecture & Operations I
Integer Multiplication and Division
Single Bit ALU 3 R e s u l t O p r a i o n 1 C y I B v b 2 L S f w d O
Morgan Kaufmann Publishers
Multiplication & Division
CDA 3101 Summer 2007 Introduction to Computer Organization
CS352H: Computer Systems Architecture
Lecture 8: Addition, Multiplication & Division
Lecture 8: Addition, Multiplication & Division
CSE 140 Lecture 14 Standard Combinational Modules
Instructor: Mozafar Bag-Mohammadi University of Ilam
Combinational Circuits
Number Representation
MIPS ALU.
MIPS ALU.
Presentation transcript:

Arithmetic for Computers Chapter 3 Sections 3.1 – 3.5 & 3.8 Appendix C.1 – C.3, C.5 – C.6 Dr. Iyad F. Jafar

Outline Addition and Subtraction Overflow Detection Faster Addition The 1-Bit ALU The 32-bit MIPS ALU Shift Operations Multiplication Division Floating Point Numbers Fallacies and Pitfalls

Addition and Subtraction Add corresponding bits including the sign bit and ignore the carry out of the MSB For subtraction, add the negative 4 + 3 7 0100 0011 0111 -4 + 3 -1 1100 0011 1111 4 - 3 1 0100 1101 1 0001 -4 + 3 -1 1100 0011 1111 -4 - (-3) 1100 1101

Detecting Overflow When do we get overflow? Adding two positive numbers and get a negative number When we add two negative numbers and get a positive number Investigate the sign bit! Cout Cin Cout 1 Cin 1 Cout Cin 1 Cout 1 Cin Cout Cin 1 Cout 1 Cin + + - - - - 0 + 0 0 + 0 1 + 1 1 + 1 1 + 0 1 + 0 + + - - + + + - 1 + - 1 - 1 + No overflow Overflow Overflow No Overflow No Overflow No Overflow Overflow when carry into sign bit does not equal the carry out Cin Cout Overflow

Addition and Subtraction How to perform addition in hardware? Design 32-bit adder (two 32-bit inputs !!!!) Cell design ! 1-bit Full Adder A B Cin Cout Sum 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 CarryIn A1 + Sum B1 CarryOut B AB 00 01 11 10 1 A Cin Cout B AB 00 01 11 10 1 A Cin Sum 1 1 1 1 1 1 1 1 Cout = AB + BCin + ACin Sum = A Å B Å Cin

Addition and Subtraction 32-bit ripple-carry adder Cascade 32 copies and wire them up through the Cin and Cout How long does it take to get the result ? A31 B31 A2 B2 A1 B1 A0 B0 FA FA FA FA C32 S31 S2 S1 S0

Addition and Subtraction 32-bit ripple-carry Subtractor Subtraction is addition of the negative! Compute the 2s complement = 1s complement + 1 B31 B1 B2 B0 A31 A2 A1 A0 FA FA FA FA 1 B32 D31 D2 D1 D0

Addition and Subtraction 32-bit ripple-carry adder/subtractor Redundancy in hardware!! Subtraction is addition of the negative! Use one adder and configure the second input Remember X Å 1 = X’ and X Å 0 = X Add/Sub 0  ADD 1  Subtract B31 B1 B2 B0 A31 A2 A1 A0 FA FA FA FA C32 S31 S2 S1 S0

Coi = AiBi + AiCini + BiCini Faster Addition The ripple-carry adder is slow! We have to wait until the carry is propagated to the final position in order to read out the addition or subtraction result. Carry generation is associated with two levels of gates at each bit position Coi = AiBi + AiCini + BiCini Total delay = gate delay x 2 x number of bits Example 16 bit adder  delay is 32 delay units Can we go faster? What if we generate the carries in parallel?

Faster Addition The carries can be expressed by the Adders inputs and c0 exclusively! Add a separate hardware to compute the carry in parallel! Carry-lookahead Adder A31 – A0 c4 B31 – B0 c3 c2 c1 c0

Faster Addition In a 4-bit adder, the equations of the carries are c1 = (b0 . c0) + (a0 . c0) + (a0 . b0) c2 = (b1 . c1) + (a1 . c1) + (a1 . b1) c3 = (b2 . c2) + (a2 . c2) + (a2 . b2) c4 = (b3 . c3) + (a3 . c3) + (a3 . b3) By substitution c2 = (a1 . a0 . b0) + (a1 . a0 . c0) + (a1 . b0 . c0) + (b1 . a0 . b0) + (b1 . a0 . c0 ) + (b1 . b0 . c0) + (a1 . b1) c3 = (b2 . a1 . a0 . b0) + (b2 . a1 . a0 . c0) + (b2 . a1 . b0 . c0) + (b2 . b1 . a0 . b0) + (b2 . b1 . a0 . c0 ) + (b2 . b1 . b0 . c0) + (b2 . a1 . b1) + (a2 . a1 . a0 . b0) + (a2 . a1 . a0 . c0) + (a2 . a1 . b0 . c0) + (a2 . b1 . a0 . b0) + (a2 . b1 . a0 . c0 ) + (a2 . b1 . b0 . c0) + (a2 . a1 . b1) + (a2 . b2) c4 = …… All carries require two gate delays ! However, imagine the equation/cost if the adder is 32 bits ??

Faster Addition ci+1 = (ai . bi) + (bi . ci) + (ai . ci) We can reduce the logic cost by simple simplification ci+1 = (ai . bi) + (bi . ci) + (ai . ci) = (ai . bi) + (ai + bi) . ci = gi + pi . ci gi : carry generate pi : carry propagate Carry equations for 4 bit adder c1 = g0 + p0 . c0 c2 = g1 + p1. c1 = g1 + (p1 . g0) + (p1 . p0 . c0) c3 = g2 + p2. c2 = g2 + (p2 . g1) + (p2 . p1 . g0) + (p2 . p1 . p0 . c0) c4 = g3 + p3. c3= g3 + (p3 . g2) + (p3 . p2 . g1) + (p3 . p2 . p1 . g0) + (p3 . p2 . p1 . p0 . c0) Delay to generate c4 is 3 gate delay Still cost is high for large adders ! ! !

Faster Addition 2nd Level of Abstraction Example: 16-bit adder. assume that we have four 4-bit carry- lookahead adders These 4-bit adders will be designed to produce supper generate (G) and propagate (P) signals P  the four bits propagate a carry to the next four bits G  the four bits generate a carry to the next four bits The super carry signals are fed to a separate carry generation unit A3-A0 B3-B0 4-bit CLA c0 S3-S0 G0 P0

Faster Addition Need to generate the carry propagate and generate signals at higher level Think of each 4-bit adder block as a single unit that can either generate or propagate a carry. A15-A12 B15-B12 A11-A8 B11-B8 A7-A4 B7-B4 A3-A0 B3-B0 4-bit CLA 4-bit CLA 4-bit CLA 4-bit CLA C0 S15-S12 S11-S8 S7-S4 S3-S0 Carry Generation Unit G3 P3 C3 G2 P2 C2 G1 P1 C1 G0 P0 C4

Faster Addition Super propagate signals Super generate signals P0 = p3⋅p2⋅p1⋅p0 (how can the first 4-bit adder propagate c0?) P1 = p7⋅p6⋅p5⋅p4 P2 = p11⋅p10⋅p9⋅p8 P3 = p15⋅p14⋅p13⋅p12 Super generate signals G0 = g3+(p3 ⋅ g2)+(p3⋅p2⋅g1)+(p3⋅p2⋅p1⋅g0) G1 = g7+(p7 ⋅ g6)+(p7⋅p6⋅g5)+(p7⋅p6⋅p5⋅g4) G2 = g11+(p11 ⋅ g10)+(p11⋅p10⋅g9)+(p11⋅p10⋅p9⋅g8) G3 = g15+(p15 ⋅ g14)+(p15⋅p14⋅g13)+(p15⋅p14⋅p13⋅g12) Carry signal at higher levels are C1 = G0 + (P0 ⋅ c0) C2 = G1 + (P1 ⋅ G0) + (P1⋅P0⋅c0) C3 = G2 + (P2 ⋅ G1) + (P2⋅P1⋅G0) + (P2⋅P1⋅P0⋅c0) C4 = G3 + (P3 ⋅ G2) + (P3⋅P2⋅G1) + (P3⋅P2⋅P1⋅G0) + (P3⋅P2⋅P1⋅P0⋅c0)

Faster Addition Each supper carry signal is two level implementation in terms of Pi and Gi Pi is one level of gates while Gi is two and expressed in terms of pi and gi pi and gi are one level of gates Total delay is 2 + 2 + 1 = 5 16-bit CLA is ~6 times faster than the 16- bit ripple carry adder

Designing the ALU We want to design an ALU that Supports logic operations Supports arithmetic operations Supports the set-on-less-than instruction Supports test for equality With special handling to sign extension zero extension overflow detection 32 m (operation) result A B ALU 4 zero ovf 1

Two operands, two results. We need only one result... Use 2-to MUX Designing the ALU We start by 1-bit ALU Starting with logical operations is easier since they map directly to hardware Two operands, two results. We need only one result... Use 2-to MUX Operation AB A 1 B Result Function Operation A and B A or B 1 A+B The Operation input comes from logic that looks at the opcode

Connect Cin(from previous bit) and Cout (to next bit) Designing the ALU How about addition? Add an Adder 1 Operation Result A B Cin Connect Cin(from previous bit) and Cout (to next bit) 2 1 Expand Mux to 3-to-1 (Op is now 2 bits) + Function Operation A and B 00 A or B 01 A + B 10 Cout

Designing the ALU How about subtraction? + Use the same adder for subtraction Cin 1 A Operation Result + 2 Cout BInvert Depending operation, choose whether to compute the 2s complement of B or not (MUX or XOR) B 1 For 2s complement, define the Binvert signal and set Cin of LSB to 1 Function Operation BInvert Cin A and B 00 x A or B 01 A + B 10 A - B 1

Designing the ALU Can we add the NOR instruction? + AInvert BInvert Cin 1 A Operation Result + 2 Cout No need to add a NOR gate !! 1 Use Demorgan’s theorem, an inverter and 2-to-1 MUX B 1 Define the Ainvert signal Function Operation BInvert Cin AInvert A and B 00 x A or B 01 A + B 10 A - B 1 A nor B

Designing the ALU Building the 32-bit ALU Simply, we need to wire up 32 copies of the ALU we designed earlier with special care to the LSB ALU The Cin and Binvert signals are the same, tie them together into one signal BNegate AInvert BNegate 1 A Operation Result + 2 Cout 1 B 1 LSB ALU

Designing the ALU Building the 32-bit ALU BNegate Operation ALU0 A0 B0 Result0 Cin Cout Note that the Cin and Bnegate for the LSB are the same in order to compute the 2s complement in case of subtraction ALU1 Result1 A1 B1 Cout Cin B2 ALU2 Result2 Cin A2 Cout ALU31 Result31 Cin A31 B31 Cout Cout

Designing the ALU Supporting SLT instruction Expand the multiplexer for one more input (Less). Subtract the two registers and feed the sign bit (the result of bit 31) back to the Less input of the LSB ALU The Less inputs of remaining ALUs is 0.

Designing the ALU The second version of 32-bit ALU For SLT instruction, the MSB is fed back to the LSB while other bits are set to zero! The operation is basically subtraction BNegate Operation ALU0 A0 B0 Result0 Cin Less Cout ALU1 Result1 A1 B1 Cout Less Cin B2 ALU2 Result2 Cin A2 Less Cout ALU31 Result31 Cin A31 B31 Cout Less OverFlow Set Cout

Designing the ALU Supporting Branch instructions Basically, subtract two registers! However, we need to generate a signal that indicates whether the result is zero or not. Simply OR the result bits and take the complement. This signal will be used to make the selection between the branch address and the PC. Example on using the Zero signal to select the address for BEQ instruction

Designing the ALU The 32-bit ALU BNegate Operation ALU0 A0 B0 Result0 Cin Less Cout ALU1 Result1 A1 B1 Cout Less Cin B2 ALU2 Result2 Cin A2 Less Cout ALU31 Result31 Cin A31 B31 Cout Less OverFlow Set Cout

List of Supported Operations Designing the ALU The 32-bit ALU List of Supported Operations Function Operation BNegate AInvert A and B 00 A or B 01 A + B 10 A - B 1 A nor B SLT 11 BEQ BNE

Shift Operations Encoding Shift operations are commonly needed! MIPS ISA specifies three shift instructions Two logical shift instructions SLL $rt, $rs, shift_amount #R[rt] = R[rs] << shift_amount SRL $rt, $rs, shift_amount #R[rt] = R[rs] >> shift_amount One arithmetic shift instruction SRA $rt, $rs, shift_amount #R[rt] = R[rs] >> shift_amount What is the difference? Unlike the SRL, the SRA instruction preserves the sign of the number! Encoding op rs rt rd shamt funct R-type 6 5 5 5 5 6

Shift Operations Example 1. 1. You need to extract the 2nd byte of a 4-byte word in $t1 0010 0011 0111 0110 1010 1111 0000 1101 $t1 srl $t1, $t1, 8 8 0000 0000 0010 0011 0111 0110 1010 1111 $t1 0000 0000 0000 0000 0000 0000 1111 1111 andi $t1, $t1, 0x00FF 0000 0000 0000 0000 0000 0000 1010 1111 $t1 2. You want to multiply $t3 by 8 (note: 8 equals 23) 0000 0000 0000 0000 0000 0000 0000 0101 $t3 (equals 5) sll $t3, $t3, 3 # move 3 places to the left 0000 0000 0000 0000 0000 0000 0010 1000 $t3 (equals 40)

Shift Operations How are these instructions implemented? Outside the ALU Shift registers  slow; shifting by one bit requires one cycle! Barrel Shifters A digital circuit that can shift a data word by a specified number of bits in one clock cycle, if long enough! Simply a set of multiplexors !

Shift Operations D0 D3 D2 D1 Y0 Y1 Y2 Y3 Example 2. 4-bit barrel shifter (rotate to left by 0, 1, 2, or 3 bits) 4-bit Barrel Shifter 4 D Y S1 S0 Shift Value Output S1 S0 Y3 Y2 Y1 Y0 0 0 D3 D2 D1 D0 0 1 D2 D1 D0 D3 D1 D0 D3 D2 1 1 D0 D3 D2 D1

Multiplication In Binary... Multiplicand 4 2 1 Multiplying two 3-digit numbers A and B Multiplier x 1 2 3 1 2 6 3 n partial products, where B is n digits long 8 4 2 + 4 2 1 n - 1 additions 5 1 7 8 3 In Binary... 6 x 5 1 1 0 Each partial product is either: 110 (A*1) or 000 (A*0) x 1 0 1 1 1 0 0 0 0 + 1 1 0 Equals 30 Note: Product may take as many as two times the number of bits! 1 1 1 1 0

Multiplication Multiplication Steps 1 1 0 0 0 1 1 0 0 1 1 0 Step1: LSB of multiplier is 1  Add a copy of multiplicand x 1 0 1 1 0 1 Step2: Shift multiplier right to reveal new LSB Shift multiplicand left to multiply by 2 1 1 0 Step 3: LSB of multiplier is 0  Add zero 0 0 0 0 + 1 1 0 0 0 Step 4: Shift multiplier right, multiplicand left 0 0 1 1 0 1 1 1 1 0 Step 5: LSB of multiplier is 1  Add a copy of multiplicand Step 6: Add partial products Done! Thus, we need hardware to: 1. Hold multiplier (32 bits) and shift it right 2. Hold multiplicand (32 bits) and shift it left (requires 64 bits) 3. Hold product (result) (64 bits) 4. Add the multiplicand to the current result

Multiplication Multiplication Hardware 1. Hold multiplier (32 bits) and shift it right 2. Hold multiplicand (32 bits) and shift it left (requires 64 bits) 3. Hold product (result) (64 bits) 4. Add the multiplicand to the current result 5. Control the whole process Multiplicand 64 bit Shift Left LSB Multiplier 32 bit Shift Right 64-bit Product 64 bit Write Control

Multiplication Example 3. (4-bit multiplication) Multiplicand Multiplier Product xxxx1101 0101 00000000 Initial Values 1-->Add Multiplicand to Product Shift M’cand left, M’plier right + xxx11010 0010 00001101 0-->Do nothing Shift M’cand left, M’plier right xx110100 0001 00001101 + x1101000 0000 01000001 1-->Add Multiplicand to Product Shift M’cand left, M’plier right 11010000 0000 01000001 0-->Do nothing Shift M’cand left, M’plier right Control 8-bit 000000000 8 bit Write xxxx1101 ShLeft 0101 4 bit ShRight

Multiplication A Cheaper Implementation Even though we’re only adding 32 bits at a time, we need a 64- bit adder Instead, hold the multiplicand still and shift the product register right! Now we’re only adding 32 bits each time Extra bit for carryout 32-bit 32 bit Control RH Product 64 bit Write Multiplicand Multiplier Shift Right LH Product

Multiplication A Cheaper than the Cheaper Implementation Note that we’re shifting bits out of the multiplier and into the product Why not put these together into the same register?!! As space opens up in the multiplier, overwrite it with the product bits 32-bit 32 bit Control Multiplier 64 bit Write Multiplicand LH Product Shift Right LSB

Multiplication Fast Multiplication Use 31 32-bit adders to compute the partial products One input is the multiplicand ANDed with a multiplier, and the other is the partial product from previous step. Question? Show the multiplication tree to compute 5 X 3. Assume unsigned numbers represented using 3 bits and we have 4-bit ALU.

Multiplication MIPS Multiplication mult $s0, $s1 # hi||lo = $s0 * $s1 Two multiplication instructions mult $s0, $s1 # hi||lo = $s0 * $s1 multu $s0, $s1 # hi||lo = $s0 * $s1 The result is 64 bits and it stored in two special registers LO  holds the lower 32 bits of the result Hi  holds the upper 32 bits of the result The contents of these registers can be read using two special instructions op rs rt rd shamt funct R-type 6 5 5 5 5 6 mfhi $t5 # move Hi to register $t5 mflo $t6 # move Lo to register $t6

Multiplication MIPS Multiplication (NOTES) Question! Both multiplication instructions ignore overflow! It is the responsibility of the software to check if the result fits into 32 bits ! For MULTU, there is no overflow if hi is 0 For MULT, there is no overflow if hi is the replicated sign of lo Question! Modify the designed multiplier to support signed multiplication.

Dividend = Divisor * Quotient + Remainder Division Dividend = Divisor * Quotient + Remainder 14 5 1 1 1 quotient divisor 101 1001001 3 2 2 1 -000 73 15 48323 100 1 -45 dividend -101 3 3 100 -30 -101 3 2 11 -30 -101 2 3 1 1 -15 remainder -000 8 3 11 Idea: Repeatedly subtract divisor. Shift as appropriate.

Looking at the alignment a little differently… Division Looking at the alignment a little differently… 1 1 1 1 1 1 101 1001001 0101 01001001 Make the dividend 8 bits and the divisor 4 bits by filling in with 0’s -000 -01010000 01001001 100 1 -00101000 -101 Each iteration, re-express the entire remainder as 8 bits Note: At any step, the dividend = divisor * quotient + current remainder 00100001 100 -00010100 -101 00001101 11 -00001010 -101 00000011 1 1 Try subtracting the divisor from the current remainder each time – if it doesn’t fit, restore the remainder -00000101 -000 00000011 11

Division Division Hardware 1. Hold divisor (32 bits) and shift it right (requires 64 bits) 2. Hold remainder (64 bits) 3. Hold quotient (result) (32 bits) and shift it left 4. Subtract the divisor from the current result 5. Control the whole process Algorithm Control 64-bit Remainder 64 bit Write Divisor Shift Right Quotient 32 bit Shift Left initialize registers (divisor in LHS); for (i=0; i<33; i++) { remainder -= divisor; if (remainder < 0) { remainder+=divisor; left shift quotient 1, LSB=0 } else { left shift quotient 1, LSB=1 }

Division Read pages 236 -242

Division MIPS Division div $s0, $s1 Signed division R-type Two multiplication instructions div $s0, $s1 divu $s0, $s1 As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0 Signed division Remember the signs of the dividend and divisor and use to determine the sign of the quotient The sign of the remainder is always the same as the dividend (Check by yourself the division of 5/2 using different combinations of the signs of the dividend and the divisor) # hi = $s0 / $s1 # lo = $s0 mod $s1 op rs rt rd shamt funct R-type 6 5 5 5 5 6

Floating Point Numbers Numbers used so far are 32-bit integers! How about larger and smaller values? How about fractions? 4,600,000,000 or 4.6 x 109 0.0000000000000000000000000166 or 1.6 x 10-27 3.5 , - 0.0213 The IEEE 754 FP Standard ! Uses 32 (single precision) or 64 bits (double precision) to represent numbers Any number is represented by 3 parts: sign, significand, and exponent Used in most computers

Floating Point Numbers The IEEE 754 FP Standard Single precision (32 bits) Normalized representation (no leading zeros and one none zero bit to the left of binary point in the significand) Since the bit to the left of the binary point is always 1, it is implied and not stored in the fraction (WHY!) Value = (-1)sign x (Fraction+1) x 2Exponent Smallest number is 1.175494350822288e-038 Largest number is 3.402823466385289e+038 Sign Exponent Fraction 1 bit 8 bits 23 bits

Floating Point Numbers The IEEE 754 FP Standard Double precision (64 bits) Normalized representation (no leading zeros and one none zero bit to the left of binary point in the significand) Since the bit to the left of the binary point is always 1, it is implied and not stored in the fraction (WHY!) Value = (-1)sign x (Fraction+1) x 2Exponent Smallest number is 2.225073858507201e-308 Largest number is 1.797693134862316e+308 Sign Exponent Fraction 1 bit 11 bits 52 bits

Floating Point Numbers The IEEE 754 FP Standard ! The way numbers are represented simplifies sorting of floating numbers using integer comparison The fraction is sign-magnitude The exponent is signed 2s complement Placing the exponent before the significand The exponent is biased A constant value is added to represent all exponents with positive numbers In single precision, bias is 127 Exponent -3 is represented as -3 + 127 = 124 Exponent 5 is represented as 5 + 127 = 132 While in double precision , the bias is 1023 So in biased notation Value = (-1)sign x (Fraction+1) x 2Exponent - Bias

Floating Point Numbers Example 4. Show the IEEE754 representation of - 0.75 using single and double precision formats (0.75)ten = (0.11)two (-0.75) ten = (-0.11)two (we use sign and magnitude) in binary scientific notation -0.11two x 20 in normalized binary scientific notation -1.1two x 2-1 add the bias to the exponent In single precision add 127  -1.1two x 2126 In double precision add 1023  -1.1two x 21022 convert the exponent into binary 126 = (01111110)2 1022 = (01111111110)2 drop the 1 on the left of the binary point and fill the corresponding fields

Floating Point Numbers Example 4. Show the IEEE754 representation of - 0.75 using single and double precision formats Single precision Double precision

Floating Point Numbers Example 5. What is the value represented by the following IEEE754 number? N = (-1)S x (1+Fraction) x 2(Exponent – Bias) = (-1)1 x (1+0.25) x 2(129 – 127) = -1 x 1.25 x 22 = -1.25 x 4 = -5

Floating Point Numbers Special Numbers in IEEE 754 Standard Single Precision Double Precision Object Represented E (8) F (23) E (11) F (52) true zero (0) nonzero ± denormalized number ± 1-254 anything ± 1-2046 ± floating point number ± 255 ± 2047 ± infinity 255 2047 not a number (NaN)

Floating Point Numbers Addition of floating numbers Analogy to adding floating decimals Example: 9.999x101 + 1.610 x 10-1 using four digits) Steps to perform (F1  2E1) + (F2  2E2) = F3  2E3 Step 1: Restore the hidden bit in F1 and in F2 Step 1: Align fractions by right shifting F2 by E1 - E2 positions (assuming E1  E2) Step 2: Add the resulting F2 to F1 to form F3 Step 3: Normalize F3 (so it is in the form 1.XXXXX …) and check for overflow/underflow in the exponent Step 4: Round F3 and possibly normalize F3 again Step 5: Rehide the most significant bit of F3 before storing the result

Floating Point Numbers Example 6. Show how to add 0.625 and -0.125 using floating point binary representation In normalized scientific notation this is equivalent 1.010 x 2-1 + -1.000 x 2-3 Align exponents 1.010 x 2-1 + -0.010 x 2-1 Add significands 1.000 x 2-1 Normalize the sum (if necessary) and check for overflow/underflow Round the sum and normalize again

Floating Point Numbers Addition hardware of floating numbers

Floating Point Numbers Accurate Arithmetic In arithmetic we are restricted with the number of bits. Thus we may need to truncate the operand with smallest power to fit into the available bits IEEE754 standards define two extra bits to the right of the numbers; the guard and round bits. Decimal example: 2.56 x 100 + 2.34 x 102 Assume significand is represented in 3 digits only Without guard and round digits (truncation occurs for two digits) (2.34 + 0.02) x 102 = 2.36 x 102 With guard digit, we don’t have to truncate the small number when shifted to the right to match the large number (2.3400 + 0.0256) x 102 = 2.3656 x 102 = 2.37 x 102 (after rounding) Sticky bit !

Floating Point Numbers MIPS Floating Point Support MIPS ISA defines a separate floating point register file Register $f0 -$f31 (each is 32 bit) Registers are combined in pairs for double precision arithmetic Some instructions lwc1 $f1,54($s2) #$f1 = Memory[$s2+54] swc1 $f1,58($s4) #Memory[$s4+58] = $f1 add.s $f2,$f4,$f6 #$f2 = $f4 + $f6 add.d $f2,$f4,$f6 #$f2||$f3 = $f4||$f5 + $f6||$f7

Floating Point Numbers MIPS Floating Point Support Compare instructions Branch instruction c.x.s $f2,$f4 #if($f2 x $f4) cond=1; else cond=0 c.x.d $f2,$f4 #$f2||$f3 x $f4||$f5 cond=1; # else cond=0 bclt 25 #if(cond==1) go to PC+4+100 bclf 25 #if(cond==0) go to PC+4+100

Fallacies and Pitfalls Fallacy 1. Only theoretical mathematicians care about floating point accuracy (The Pentium bug 1994) Pitfall 1. Just as left shift instruction can replace an integer multiply by a power of 2, a right shift is the same as integer division by power of 2. Pitfall 2. The MIPS instruction addiu sign-extends its 16-bit immediate Not true for signed numbers. Pitfall 2  all immediates are represented using 15 bits.