Lecture 7: Multiplication and Floating Point EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering.

Slides:



Advertisements
Similar presentations
Spring 2013 Advising Starts this week! CS2710 Computer Organization1.
Advertisements

Lecture 11 Oct 12 Circuits for floating-point operations addition multiplication division (only sketchy)
Computer Organization CS224 Fall 2012 Lesson 19. Floating-Point Example  What number is represented by the single-precision float …00 
Lecture 16: Computer Arithmetic Today’s topic –Floating point numbers –IEEE 754 representations –FP arithmetic Reminder –HW 4 due Monday 1.
When NOT to use Unsigned? Don’t Use Just Because Number Nonzero – C compilers on some machines generate less efficient code unsigned i; for (i = 1; i
Lecture 15: Computer Arithmetic Today’s topic –Division 1.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3: IT Students.
Floating Point Numbers
Chapter 3 Arithmetic for Computers. Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's.
CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (2)
Computer ArchitectureFall 2007 © September 5, 2007 Karem Sakallah CS 447 – Computer Architecture.
CMPE12cCyrus Bazeghi 1 What do floating-point numbers represent? Rational numbers with non-repeating expansions in the given base within the specified.
Lecture 9 Sept 28 Chapter 3 Arithmetic for Computers.
ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 4: Arithmetic / Data Transfer Instructions Partially adapted from Computer Organization.
ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 6: Logic/Shift Instructions Partially adapted from Computer Organization and Design, 4.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Computer Arithmetic Nizamettin AYDIN
Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.
Carnegie Mellon 1 This Week: Integers Integers  Representation: unsigned and signed  Conversion, casting  Expanding, truncating  Addition, negation,
Computing Systems Basic arithmetic for computers.
Topics Numeric Encodings Unsigned & Two’s complement Programming Implications C promotion rules Basic operations Addition, negation, multiplication Programming.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /14/2013 Lecture 16: Floating Point Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.
CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.
Oct. 18, 2007SYSC 2001* - Fall SYSC2001-Ch9.ppt1 See Stallings Chapter 9 Computer Arithmetic.
CSF 2009 Arithmetic for Computers Chapter 3. Arithmetic for Computers Operations on integers Addition and subtraction Multiplication and division Dealing.
CSC 221 Computer Organization and Assembly Language
Floating Point Representation for non-integral numbers – Including very small and very large numbers Like scientific notation – –2.34 × –
Lecture 12: Integer Arithmetic and Floating Point CS 2011 Fall 2014, Dr. Rozier.
Chapter 3 Arithmetic for Computers (Integers). Florida A & M University - Department of Computer and Information Sciences Arithmetic for Computers Operations.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Lecture notes Reading: Section 3.4, 3.5, 3.6 Multiplication
Computer Arithmetic See Stallings Chapter 9 Sep 10, 2009
CS 105 “Tour of the Black Holes of Computing” Topics Numeric Encodings Unsigned & Two’s complement Programming Implications C promotion rules Basic operations.
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CE-321: Computer.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers.
Division Check for 0 divisor Long division approach – If divisor ≤ dividend bits 1 bit in quotient, subtract – Otherwise 0 bit in quotient, bring down.
“The course that gives CMU its Zip!” Topics Basic operations –Addition, negation, multiplication Programming Implications –Consequences of overflow.
Arithmetic for Computers Chapter 3 1. Arithmetic for Computers  Operations on integers  Addition and subtraction  Multiplication and division  Dealing.
Chapter 9 Computer Arithmetic
William Stallings Computer Organization and Architecture 8th Edition
Computer Architecture & Operations I
Morgan Kaufmann Publishers Arithmetic for Computers
Computer Architecture & Operations I
This Week: Integers Integers Summary
Computer Architecture & Operations I
Morgan Kaufmann Publishers Arithmetic for Computers
Morgan Kaufmann Publishers
Integer Representations and Arithmetic
Morgan Kaufmann Publishers
William Stallings Computer Organization and Architecture 7th Edition
Arithmetic for Computers
Representing Information (2)
ECEG-3202 Computer Architecture and Organization
Representing Information (2)
Chapter 8 Computer Arithmetic
Operations and Arithmetic
Morgan Kaufmann Publishers Arithmetic for Computers
Presentation transcript:

Lecture 7: Multiplication and Floating Point EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM)

LAB 2

Lab Phases: Recursive Phase 1 – Factorial Phase 2 - Fibonacci

Lab Phases: Arrays Phase 4 – Sum Array Phase 5 – Find Item Phase 6 – Bubble Sort

Lab Phases: Trees Array representation: [1,2,3,4,5,6,7,0,0,0,0,0,0,0,0] Phase 7 – Tree Height Phase 8 – Tree Traversal [1,2,5,0,0,4,0,0,3,6,0,0,7,0,0]

INTEGER ARITHMETIC

Addition and Subtraction

Half adder Inputs Outputs ABCS

Full Adder

Inputs Outputs ABCinCoutS

In Groups, Implement a 1-bit Full Adder Inputs Outputs ABCinCoutS

Getting a Full Adder Half AdderFull Adder

Putting Together Multiple Bits

Making it Faster Carry Look Ahead Adder

Making it Even Faster Carry-Select Adder Kogge-Stone Adder

How do we get subtraction? XB2T(X)B2U(X) –88 –79 –610 –511 –412 –313 –214 –

How do we get subtraction? XB2T(X)B2U(X) –88 –79 –610 –511 –412 –313 –214 – x ~x

Multiplication Start with long-multiplication approach 1000 × Length of product is the sum of operand lengths multiplicand multiplier product

Multiplication Hardware Initially 0

Optimized Multiplier Perform steps in parallel: add/shift One cycle per partial-product addition That’s ok, if frequency of multiplications is low

Faster Multiplier Uses multiple adders – Cost/performance tradeoff Can be pipelined Several multiplication performed in parallel

Multiplication Computing Exact Product of w-bit numbers x, y – Either signed or unsigned Ranges – Unsigned: 0 ≤ x * y ≤ (2 w – 1) 2 = 2 2w – 2 w Up to 2w bits – Two’s complement min: x * y ≥ (–2 w–1 )*(2 w–1 –1) = –2 2w–2 + 2 w–1 Up to 2w–1 bits – Two’s complement max: x * y ≤ (–2 w–1 ) 2 = 2 2w–2 Up to 2w bits, but only for (TMin w ) 2 Maintaining Exact Results – Would need to keep expanding word size with each product computed – Done in software by “arbitrary precision” arithmetic packages

Unsigned Multiplication in C Standard Multiplication Function – Ignores high order w bits Implements Modular Arithmetic UMult w (u, v)=u · v mod 2 w u v * u · v True Product: 2*w bits Operands: w bits Discard w bits: w bits UMult w (u, v)

Code Security Example #2 SUN XDR library – Widely used library for transferring data between machines void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size); ele_src malloc(ele_cnt * ele_size)

XDR Code void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) { /* * Allocate buffer for ele_cnt objects, each of ele_size bytes * and copy from locations designated by ele_src */ void *result = malloc(ele_cnt * ele_size); if (result == NULL) /* malloc failed */ return NULL; void *next = result; int i; for (i = 0; i < ele_cnt; i++) { /* Copy object i to destination */ memcpy(next, ele_src[i], ele_size); /* Move pointer to next memory region */ next += ele_size; } return result; }

XDR Vulnerability What if: – ele_cnt = – ele_size = 4096 = 2 12 – Allocation= ?? How can I make this function secure? malloc(ele_cnt * ele_size)

Signed Multiplication in C Standard Multiplication Function – Ignores high order w bits – Some of which are different for signed vs. unsigned multiplication – Lower bits are the same u v * u · v True Product: 2*w bits Operands: w bits Discard w bits: w bits TMult w (u, v)

Power-of-2 Multiply with Shift Operation – u << k gives u * 2 k – Both signed and unsigned Examples – u << 3==u * 8 – u << 5 - u << 3==u * 24 – Most machines shift and add faster than multiply Compiler generates this code automatically u 2k2k * u · 2 k True Product: w+k bits Operands: w bits Discard k bits: w bits UMult w (u, 2 k ) k 000 TMult w (u, 2 k ) 000

Multiply on ARM MUL{ }{S} Rd, Rm, Rs Rd = Rm * Rs MLA{ }{S} Rd, Rm, Rs, Rn Rd = Rm * Rs + Rn

Division Check for 0 divisor Long division approach – If divisor ≤ dividend bits 1 bit in quotient, subtract – Otherwise 0 bit in quotient, bring down next dividend bit Restoring division – Do the subtract, and if remainder goes < 0, add divisor back Signed division – Divide using absolute values – Adjust sign of quotient and remainder as required n-bit operands yield n-bit quotient and remainder quotient dividend remainder divisor

Division Hardware Initially dividend Initially divisor in left half

Optimized Divider One cycle per partial-remainder subtraction Looks a lot like a multiplier! – Same hardware can be used for both

Faster Division Can’t use parallel hardware as in multiplier – Subtraction is conditional on sign of remainder Faster dividers (e.g. SRT devision) generate multiple quotient bits per step – Still require multiple steps

Division in ARM ARMv6 has no DIV instruction.

Division in ARM ARMv6 has no DIV instruction. N = D x Q + R with 0 <= |R| < |D| N/D = Q + R

An Algorithm for Division

FLOATING POINT

Carnegie Mellon Fractional binary numbers What is ?

2i2i 2 i /2 1/4 1/8 2 -j bibi b i-1 b2b2 b1b1 b0b0 b -1 b -2 b -3 b -j Carnegie Mellon Fractional Binary Numbers Representation – Bits to right of “binary point” represent fractional powers of 2 – Represents rational number:

Carnegie Mellon Fractional Binary Numbers: Examples ValueRepresentation 5 3/ / / Observations  Divide by 2 by shifting right  Multiply by 2 by shifting left  Numbers of form … 2 are just below 1.0  1/2 + 1/4 + 1/8 + … + 1/2 i + … ➙ 1.0  Use notation 1.0 – ε

Carnegie Mellon Representable Numbers Limitation – Can only exactly represent numbers of the form x/2 k – Other rational numbers have repeating bit representations ValueRepresentation – 1/ [01]… 2 – 1/ [0011]… 2 – 1/ [0011]… 2

Floating Point Standard Defined by IEEE Std Developed in response to divergence of representations – Portability issues for scientific code Now almost universally adopted Two representations – Single precision (32-bit) – Double precision (64-bit)

IEEE Floating-Point Format S: sign bit (0  non-negative, 1  negative) Normalize significand: 1.0 ≤ |significand| < 2.0 – Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) – Significand is Fraction with the “1.” restored Exponent: excess representation: actual exponent + Bias – Ensures exponent is unsigned – Single: Bias = 127; Double: Bias = 1203 SExponentFraction single: 8 bits double: 11 bits single: 23 bits double: 52 bits

Floating-Point Addition Consider a 4-digit decimal example – × × 10 –1 1. Align decimal points – Shift number with smaller exponent – × × Add significands – × × 10 1 = × Normalize result & check for over/underflow – × Round and renormalize if necessary – × 10 2

Floating-Point Addition Now consider a 4-digit binary example – × 2 –1 + – × 2 –2 (0.5 + –0.4375) 1. Align binary points – Shift number with smaller exponent – × 2 –1 + – × 2 –1 2. Add significands – × 2 –1 + – × 2 – 1 = × 2 –1 3. Normalize result & check for over/underflow – × 2 –4, with no over/underflow 4. Round and renormalize if necessary – × 2 –4 (no change) =

FP Adder Hardware Much more complex than integer adder Doing it in one clock cycle would take too long – Much longer than integer operations – Slower clock would penalize all instructions FP adder usually takes several cycles – Can be pipelined

FP Adder Hardware Step 1 Step 2 Step 3 Step 4

FP Arithmetic Hardware FP multiplier is of similar complexity to FP adder – But uses a multiplier for significands instead of an adder FP arithmetic hardware usually does – Addition, subtraction, multiplication, division, reciprocal, square-root – FP  integer conversion Operations usually takes several cycles – Can be pipelined

Floating Point Floating Point is handled by a FPU, floating point unit.

Pentium FDIV Bug Intel’s Pentium 5 – Professor Thomas Nicely noticed inconsistencies in calculations when adding Pentiums to his cluster – Floating-point division operations didn’t quite come out right. Off by 61 parts per million

Pentium FDIV Bug Intel acknowledged the flaw, but claimed it wasn’t serious. Wouldn’t affect most users. Byte magazine estimated only 1 in 9 billion floating point operations would suffer the error.

Pentium FDIV Bug Total cost to Intel? $450 million

WRAP UP

For next time Read Rest of Chapter Midterm 1 Approaching! – February 13th