CS 314 Computer Organization Fall Chapter 3: Arithmetic for Computers

Slides:



Advertisements
Similar presentations
CSE431 Chapter 3.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 3: Arithmetic for Computers Mary Jane Irwin ( )
Advertisements

Chapter 3 Arithmetic for Computers. Exam 1 CSCE
Arithmetic for Computers
CMPT 334 Computer Organization Chapter 3 Arithmetic for Computers [Adapted from Computer Organization and Design 5 th Edition, Patterson & Hennessy, ©
Chapter 3 Arithmetic for Computers. Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's.
361 div.1 Computer Architecture ECE 361 Lecture 7: ALU Design : Division.
Lecture 9 Sept 28 Chapter 3 Arithmetic for Computers.
1 Representing Numbers Using Bases Numbers in base 10 are called decimal numbers, they are composed of 10 numerals ( ספרות ) = 9* * *10.
ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 4: Arithmetic / Data Transfer Instructions Partially adapted from Computer Organization.
1  2004 Morgan Kaufmann Publishers Chapter Three.
Computer Organization Multiplication and Division Feb 2005 Reading: Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann.
CSE431 L03 MIPS Arithmetic Review.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 03: MIPS Arithmetic Review Mary Jane Irwin (
Chap 3.3~3.5 Construction an Arithmetic Logic Unit (ALU) Jen-Chang Liu, Spring 2006.
Computer Systems Organization: Lecture 3
1 Chapter 4: Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture Assembly Language and.
Lecture 5 Sept 14 Goals: Chapter 2 continued MIPS assembly language instruction formats translating c into MIPS - examples.
1 Lecture 8: Binary Multiplication & Division Today’s topics:  Addition/Subtraction  Multiplication  Division Reminder: get started early on assignment.
ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 6: Logic/Shift Instructions Partially adapted from Computer Organization and Design, 4.
1  1998 Morgan Kaufmann Publishers Chapter Four Arithmetic for Computers.
Chapter 3 Arithmetic for Computers. Arithmetic Where we've been: Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's.
Arithmetic for Computers
1 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2)
1 EGRE 426 Fall 08 Chapter Three. 2 Arithmetic What's up ahead: –Implementing the Architecture 32 operation result a b ALU.
Lecture 6: Multiply, Shift, and Divide
Computer Architecture Chapter 3 Instructions: Arithmetic for Computer Yu-Lun Kuo 郭育倫 Department of Computer Science and Information Engineering Tunghai.
Chapter 3 Arithmetic for Computers (Integers). Florida A & M University - Department of Computer and Information Sciences Arithmetic for Computers Operations.
Chapter 10 The Assembly Process. What Assemblers Do Translates assembly language into machine code. Assigns addresses to all symbolic labels (variables.
Integer Multiplication and Division ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering.
Csci 136 Computer Architecture II – Multiplication and Division
Computer Organization CS224 Fall 2012 Lessons 7 and 8.
CPE 232 MIPS Arithmetic1 CPE 232 Computer Organization MIPS Arithmetic – Part I Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin (
1 ELEN 033 Lecture 4 Chapter 4 of Text (COD2E) Chapters 3 and 4 of Goodman and Miller book.
EI 209 Chapter 3.1CSE, 2015 EI 209 Computer Organization Fall 2015 Chapter 3: Arithmetic for Computers Haojin Zhu ( )
COM181 Computer Hardware Lecture 6: The MIPs CPU.
Computer Arthmetic Chapter Four P&H. Data Representation Why do we not encode numbers as strings of ASCII digits inside computers? What is overflow when.
Integer Multiplication and Division COE 301 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University.
9/23/2004Comp 120 Fall September Chapter 4 – Arithmetic and its implementation Assignments 5,6 and 7 posted to the class web page.
EE204 L03-ALUHina Anwar Khan EE204 Computer Architecture Lecture 03- ALU.
By Wannarat Computer System Design Lecture 3 Wannarat Suntiamorntut.
CSE 340 Computer Architecture Spring 2016 MIPS Arithmetic Review.
Based on slides from D. Patterson and www-inst.eecs.berkeley.edu/~cs152/ COM 249 – Computer Organization and Assembly Language Chapter 3 Arithmetic For.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
Computer System Design Lecture 3
Computer Arthmetic Chapter Four P&H.
CS 230: Computer Organization and Assembly Language
Integer Multiplication and Division
Arithmetic for Computers
Morgan Kaufmann Publishers
Instructions: Language of the Computer
Computer Organization and Design Instruction Sets - 2
Lecture 8: Binary Multiplication & Division
CDA 3101 Summer 2007 Introduction to Computer Organization
Computer Organization and Design Instruction Sets
Lecture 8: Addition, Multiplication & Division
Lecture 8: Addition, Multiplication & Division
The University of Adelaide, School of Computer Science
ECE232: Hardware Organization and Design
Rocky K. C. Chang 6 November 2017
Instruction encoding The ISA defines Format = Encoding
The University of Adelaide, School of Computer Science
Flow of Control -- Conditional branch instructions
A 1-Bit Arithmetic Logic Unit
CSC3050 – Computer Architecture
MIPS Assembly.
Morgan Kaufmann Publishers Arithmetic for Computers
Flow of Control -- Conditional branch instructions
Chapter 3 Arithmetic for Computers
CS352H Computer Systems Architecture
Number Representation
MIPS Arithmetic and Logic Instructions
Presentation transcript:

CS 314 Computer Organization Fall 2017 Chapter 3: Arithmetic for Computers Combinations to AV system, etc. 0808 (1988 in 113 IST) Call AV hot line at 8-777-0035 Haojin Zhu (http://tdt.sjtu.edu.cn/~hjzhu/ ) [Adapted from Computer Organization and Design, 4th Edition, Patterson & Hennessy, © 2012, MK]

Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number of instruction formats opcode always the first 6 bits Smaller is faster limited instruction set limited number of registers in register file limited number of addressing modes Make the common case fast arithmetic operands from the register file (load-store machine) allow instructions to contain immediate operands Good design demands good compromises three instruction formats

Specifying Branch Destinations Use a register (like in lw and sw) added to the 16-bit offset which register? Instruction Address Register (the PC) its use is automatically implied by instruction PC gets updated (PC+4) during the fetch cycle so that it holds the address of the next instruction limits the branch distance to -215 to +215-1 (word) instructions from the (instruction after the) branch instruction, but most branches are local anyway Note that two low order 0’s are concatenated to the 16 bit field giving an eighteen bit address 0111111111111111 00 = 2**15 –1 words (2**17 – 1 bytes) PC Add 32 offset 16 00 sign-extend from the low order 16 bits of the branch instruction branch dst address ? 4

Other Control Flow Instructions MIPS also has an unconditional branch instruction or jump instruction: j label #go to label Instruction Format (J Format): 0x02 26-bit address If-then-else code compilation PC 4 32 26 00 from the low order 26 bits of the jump instruction Why shift left by two bits?

Review: MIPS Addressing Modes Illustrated 1. Register addressing op rs rt rd funct Register word operand op rs rt offset 2. Base (displacement) addressing base register Memory word or byte operand 3. Immediate addressing op rs rt operand 4. PC-relative addressing op rs rt offset Program Counter (PC) Memory branch destination instruction 5. Pseudo-direct addressing op jump address Program Counter (PC) Memory jump destination instruction ||

Number Representations 32-bit signed numbers (2’s complement): 0000 0000 0000 0000 0000 0000 0000 0000two = 0ten 0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten ... 0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten 0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten 1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten 1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten ... 1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten 1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten MSB LSB maxint minint Converting <32-bit values into 32-bit values copy the most significant bit (the sign bit) into the “empty” bits 0010 -> 0000 0010 1010 -> 1111 1010 sign extend versus zero extend (lb vs. lbu)

MIPS Arithmetic Logic Unit (ALU) 32 m (operation) result A B ALU 4 zero ovf 1 Must support the Arithmetic/Logic operations of the ISA add, addi, addiu, addu sub, subu mult, multu, div, divu sqrt and, andi, nor, or, ori, xor, xori beq, bne, slt, slti, sltiu, sltu With special handling for sign extend – addi, addiu, slti, sltiu zero extend – andi, ori, xori overflow detection – add, addi, sub

Result indicating overflow Dealing with Overflow Overflow occurs when the result of an operation cannot be represented in 32-bits, i.e., when the sign bit contains a value bit of the result and not the proper sign bit When adding operands with different signs or when subtracting operands with the same sign, overflow can never occur Operation Operand A Operand B Result indicating overflow A + B ≥ 0 < 0 A - B MIPS signals overflow with an exception (aka interrupt) – an unscheduled procedure call where the EPC contains the address of the instruction that caused the exception

Addition & Subtraction Just like in grade school (carry/borrow 1s) 0111 0111 0110 + 0110 - 0110 - 0101 Two's complement operations are easy do subtraction by negating and then adding 0111  0111 - 0110  + 1010 Overflow (result too large for finite computer word) e.g., adding two n-bit numbers does not yield an n-bit number 0111 + 0001 1101 0001 0001 for lecture 0001 1 0001 1000

Building a 1-bit Binary Adder carry_in A B carry_in carry_out S 1 A 1 bit Full Adder S B carry_out S = A xor B xor carry_in carry_out = A&B | A&carry_in | B&carry_in (majority function) How can we use it to build a 32-bit adder? How can we modify it easily to build an adder/subtractor?

Building 32-bit Adder 1-bit FA A0 B0 S0 c0=carry_in c1 Just connect the carry-out of the least significant bit FA to the carry-in of the next least significant bit and connect . . . 1-bit FA A1 B1 S1 c2 1-bit FA A2 B2 S2 c3 Ripple Carry Adder (RCA) advantage: simple logic, so small (low cost) disadvantage: slow and lots of glitching (so lots of energy consumption) c32=carry_out 1-bit FA A31 B31 S31 c31 . . .

A 32-bit Ripple Carry Adder/Subtractor add/sub 1-bit FA S0 c0=carry_in c1 S1 c2 S2 c3 c32=carry_out S31 c31 . . . A0 A1 A2 A31 Remember 2’s complement is just complement all the bits add a 1 in the least significant bit B0 control (0=add,1=sub) B0 if control = 0 !B0 if control = 1 For lecture A 0111  0111 B - 0110  + 1001 1 0001 1 0001

Overflow Detection Logic Carry into MSB ! = Carry out of MSB For a N-bit ALU: Overflow = CarryIn [N-1] XOR CarryOut [N-1] A0 B0 1-bit ALU Result0 CarryIn0 CarryOut0 A1 B1 Result1 CarryIn1 CarryOut1 A2 B2 Result2 CarryIn2 CarryOut2 A3 B3 Result3 CarryIn3 CarryOut3 X Y X XOR Y 1 1 1 1 Recall the XOR gate implements the not equal function: that is, its output is 1 only if the inputs have different values. Therefore all we need to do is connect the carry into the most significant bit and the carry out of the most significant bit to the XOR gate. Then the output of the XOR gate will give us the Overflow signal. +1 = 42 min. (Y:22) 1 1 why? Overflow

Multiply Binary multiplication is just a bunch of right shifts and adds n multiplicand multiplier partial product array can be formed in parallel and added in parallel for faster multiplication n double precision product 2n

More complicated than addition Multiplication More complicated than addition Can be accomplished via shifting and adding 0010 (multiplicand) x_1011 (multiplier) 0010 0010 (partial product 0000 array) 0010 00010110 (product) In every step multiplicand is shifted next bit of multiplier is examined (also a shifting step) if this bit is 1, shifted multiplicand is added to the product

Multiplication Algorithm 1 In every step multiplicand is shifted next bit of multiplier is examined (also a shifting step) if this bit is 1, shifted multiplicand is added to the product

Comments on Multiplicand Algorithm 1 Performance Three basic steps for each bit It requires 100 clock cycles to multiply two 32-bit numbers If each step took a clock cycle, How to improve it? Motivation (Performing the operations in parallel): Putting multiplier and the product together Shift them together

Refined Multiplicand Algorithm 2 add 32-bit ALU shift right product multiplier Control 32-bit ALU and multiplicand is untouched the sum keeps shifting right at every step, number of bits in product + multiplier = 64, hence, they share a single 64-bit register

Add and Right Shift Multiplier Hardware 0 1 1 0 = 6 multiplicand add 32-bit ALU shift right product For lecture multiplier Control 0 0 0 0 0 1 0 1 = 5 add 0 1 1 0 0 1 0 1 0 0 1 1 0 0 1 0 add 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 add 0 1 1 1 1 0 0 1 0 0 1 1 1 1 0 0 add 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 0 = 30

Exercise Using 4-bit numbers to save space, multiply 2ten*3ten, or 0010two * 0011two

dividend = quotient x divisor + remainder Division Division is just a bunch of quotient digit guesses and left shifts and subtracts dividend = quotient x divisor + remainder n n quotient dividend divisor partial remainder array remainder n

Division 1001ten Quotient Divisor 1000ten | 1001010ten Dividend -1000 10ten Remainder At every step, shift divisor right and compare it with current dividend if divisor is larger, shift 0 as the next bit of the quotient if divisor is smaller, subtract to get new dividend and shift 1 as the next bit of the quotient

First Version of Hardware for Division A comparison requires a subtract; the sign of the result is examined; if the result is negative, the divisor must be added back

Divide Algorithm 3. Shift the Divisor register right1 bit. Start 1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register. Remainder >=0 Remainder < 0 Test Remainder 2b. Restore the original value by adding the Divisor reg to the Remainder reg and place the sum in the Remainder reg. Also shift the Quotient register to the left, setting the new LSB to 0 2a. Shift the Quotient register to the left Q: 0000 D: 0010 0000 R: 0000 0111 –D = 1110 0000 1: R = R–D Q: 0000 D: 0010 0000 R: 1110 0111 2b: +D, sl Q, 0 Q: 0000 D: 0010 0000 R: 0000 0111 3: Shr D Q: 0000 D: 0001 0000 R: 0000 0111 –D = 1111 0000 1: R = R–D Q: 0000 D: 0001 0000 R: 1111 0111 2b: +D, sl Q, 0 Q: 0000 D: 0001 0000 R: 0000 0111 3: Shr D Q: 0000 D: 0000 1000 R: 0000 0111 –D = 1111 1000 1: R = R–D Q: 0000 D: 0000 1000 R: 1111 1111 2b: +D, sl Q, 0 Q: 0000 D: 0000 1000 R: 0000 0111 3: Shr D Q: 0000 D: 0000 0100 R: 0000 0111 –D = 1111 1100 1: R = R–D Q: 0000 D: 0000 0100 R: 0000 0011 2a: sl Q, 1 Q: 0001 D: 0000 0100 R: 0000 0011 3: Shr D Q: 0000 D: 0000 0010 R: 0000 0011 –D = 1111 1110 1: R = R–D Q: 0000 D: 0000 0010 R: 0000 0001 2a: sl Q, 1 Q: 0011 D: 0000 0010 R: 0000 0001 3: Shr D Q: 0011 D: 0000 0001 R: 0000 0001 Recommend show 2’s comp of divisor, show lines for subtract divisor and restore remainder Supplement: 1st edition needs 33 iterations while 2nd and 3rd versions don’t. setting the new rightmost bit to 1. 3. Shift the Divisor register right1 bit. No: < 33 repetitions 33rd repetition? Yes: 33 repetitions Done

Divide Example Divide 7ten (0000 0111two) by 2ten (0010two) Iter Step Quot Divisor Remainder Initial values 1 2 3 4 5

Divide Example Divide 7ten (0000 0111two) by 2ten (0010two) Iter Step Quot Divisor Remainder Initial values 0000 0010 0000 0000 0111 1 Rem = Rem – Div Rem < 0  +Div, shift 0 into Q Shift Div right 0001 0000 1110 0111 2 Same steps as 1 0000 1000 1111 0111 3 0000 0100 4 Rem >= 0  shift 1 into Q 0001 0000 0010 0000 0011 5 Same steps as 4 0011 0000 0001

Efficient Division Remainder Quotient Divisor 64-bit ALU Control Shift Right Shift Left Write Control 32 bits 64 bits divisor subtract 32-bit ALU shift left dividend remainder quotient Control

Left Shift and Subtract Division Hardware 0 0 1 0 = 2 divisor subtract 32-bit ALU shift left dividend For lecture remainder quotient Control 0 0 0 0 0 1 1 0 = 6 0 0 0 0 1 1 0 0 sub 1 1 1 0 1 1 0 0 rem neg, so ‘ient bit = 0 0 0 0 0 1 1 0 0 restore remainder 0 0 0 1 1 0 0 0 sub 1 1 1 1 1 1 0 0 rem neg, so ‘ient bit = 0 0 0 0 1 1 0 0 0 restore remainder 0 0 1 1 0 0 0 0 sub 0 0 0 1 0 0 0 1 rem pos, so ‘ient bit = 1 0 0 1 0 0 0 1 0 sub 0 0 0 0 0 0 1 1 rem pos, so ‘ient bit = 1 = 3 with 0 remainder

Restoring Unsigned Integer Division s(0) = z for j = 1 to k if 2 s(j-1) - 2k d > 0 qk-j = 1 s(j) = 2 s(j-1) - 2k d else qk-j = 0 s(j) = 2 s(j-1) K=32, put divisor in the left 32 bit register the remainder shift left by 1 bit No need to restore the remainder in the case of R-D>0, Restore the remainder In the case of R-D<0,

Non-Restoring Unsigned Integer Division s(1) = 2 z - 2k d for j = 2 to k if s(j-1)  0 qk-(j-1) = 1 s(j) = 2 s(j-1) - 2k d else qk-(j-1) = 0 s(j) = 2 s(j-1) + 2k d end for if s(k)  0 q0 = 1 q0 = 0 Correction step If in the last step, remainder –divisor >0, Perform subtraction why? If in the last step, remainder –divisor <0, Perform addition

s(1) = 2 z - 2k d s(0) = z for j = 2 to k if s(j-1)  0 for j = 1 to k considering two consequent steps j-1 and j, in particular 2s(j-2) - 2k d <0 s(1) = 2 z - 2k d for j = 2 to k if s(j-1)  0 qk-(j-1) = 1 s(j) = 2 s(j-1) - 2k d else qk-(j-1) = 0 s(j) = 2 s(j-1) + 2k d end for if s(k)  0 q0 = 1 q0 = 0 Correction step s(0) = z for j = 1 to k if 2 s(j-1) - 2k d > 0 qk-j = 1 s(j) = 2 s(j-1) - 2k d else qk-j = 0 s(j) = 2 s(j-1) In the j-1 step, Restoring Algorithm computes qk-j = 0 s(j-1) = 2 s(j-2) equal In the subsequent j step, Restoring Algorithm computes 2 s(j-1) - 2k d == 2*2 s(j-2) - 2k d Why? Non-Restoring Algorithm s(j-1) = 2 s(j-2) - 2k d Restoring Unsigned Integer Division In the subsequent j step, non-Restoring Algorithm computes 2 s(j-1) + 2k d = 2*2 s(j-2) - 2*2k d +2k d = 2*2 s(j-2) - 2k d Non-Restoring Unsigned Integer Division 2x-y= 2(x-y)+y

Non-restoring algorithm set subtract_bit true 1: If subtract bit true: Subtract the Divisor register from the Remainder and place the result in the remainder register else Add the Divisor register to the Remainder and place the result in the remainder register 2:If Remainder >= 0 Shift the Quotient register to the left, setting rightmost bit to 1 Set subtract bit to false 3: Shift the Divisor register right 1 bit if < 33rd rep goto 1 Add Divisor register to remainder and place in Remainder register exit

Example: Perform n + 1 iterations for n bits Remainder 0000 1011 Divisor 00110000 ----------------------------------- Iteration 1: (subtract) Rem 1101 1011 Quotient 0 Divisor 0001 1000 Iteration 2: (add) Rem 11110011 Q00 Divisor 0000 1100 Iteration 3: Rem 11111111 Q000 Divisor 0000 0110 ----------------------------------- Iteration 4: (add) Rem 0000 0101 Q0001 Divisor 0000 0011 Iteration 5: (subtract) Rem 0000 0010 Q 00011 Divisor 0000 0001 Since reminder is positive, done. Q = 0011 and Rem = 0010

Exercise Calculate A divided by B using restoring and non-restoring division. A=26, B=5

MIPS Divide Instruction Divide (div and divu) generates the reminder in hi and the quotient in lo div $s0, $s1 # lo = $s0 / $s1 # hi = $s0 mod $s1 Instructions mfhi rd and mflo rd are provided to move the quotient and reminder to (user accessible) registers in the register file 0 16 17 0 0 0x1A Seems odd to me that the machine doesn’t support a double precision dividend in hi || lo but it looks like it doesn’t As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0.

Lecture 1