Download presentation
Presentation is loading. Please wait.
Published byEleanor Angelica Cook Modified over 9 years ago
1
1 ECE369 Sections 3.5, 3.6 and 3.9
2
2 ECE369 Number Systems Fixed Point: Binary point of a real number in a certain position –Can treat real numbers as integers, do the addition or subtraction normally –Conversion 9.8125 to fixed point (4 binary digits) Addition or division rule Keep multiplying fraction by 2, anytime there is a carry out insert 1 otherwise insert 0 and then left shift (= 1001.1101) Scientific notation: –3.56*10^8 (not 35.6*10^7) –May have any number of fraction digits (floating)
3
3 ECE369 Floating point (a brief look) We need a way to represent –Numbers with fractions, e.g., 3.1416 –Very small numbers, e.g., 0.000000001 –Very large numbers, e.g., 3.15576 x 10 9 Representation: –Sign, exponent, fraction: (–1) sign x fraction x 2 exponent –More bits for fraction gives more accuracy –More bits for exponent increases range IEEE 754 floating point standard: –single precision: 8 bit exponent, 23 bit fraction –double precision: 11 bit exponent, 52 bit fraction
4
4 ECE369 IEEE 754 floating-point standard 1.f x 2 e 1.s 1 s 2 s 3 s 4…. s n x2 e Leading “1” bit of significand is implicit Exponent is “biased” to make sorting easier –All 0s is smallest exponent, all 1s is largest –Bias of 127 for single precision and 1023 for double precision If exponent bits are all 0s and if fraction bits are all 0s, then zero If exponent bits are all 1s and if fraction bits are all 0s, then +/- infinity
5
5 ECE369 Single Precision Range Exponents 00000000 and 11111111 reserved Smallest value –Exponent: 00000001 actual exponent = 1 – 127 = –126 –Fraction: 000…00 significand = 1.0 –±1.0 × 2 –126 ≈ ±1.2 × 10 –38 Largest value –exponent: 11111110 actual exponent = 254 – 127 = +127 –Fraction: 111…11 significand ≈ 2.0 –±2.0 × 2 +127 ≈ ±3.4 × 10 +38
6
6 ECE369 Double Precision Range Exponents 0000…00 and 1111…11 reserved Smallest value –Exponent: 00000000001 actual exponent = 1 – 1023 = –1022 –Fraction: 000…00 significand = 1.0 –±1.0 × 2 –1022 ≈ ±2.2 × 10 –308 Largest value –Exponent: 11111111110 actual exponent = 2046 – 1023 = +1023 –Fraction: 111…11 significand ≈ 2.0 –±2.0 × 2 +1023 ≈ ±1.8 × 10 +308
7
7 ECE369 Single Precision –summary: (–1) sign x (1+significand) x 2 (exponent – bias) Example: 11/100 = 11/10 2 = 0.11 = 1.1x10 -1 –Decimal: -.75 = -3/4 = -3/2 2 –Binary: -.11 = -1.1 x 2 -1 –IEEE single precision: 1 01111110 10000000000000000000000 –exponent-bias=-1 => exponent = 126 = 01111110
8
8 ECE369 Opposite Way SignExponentFraction - 1290x2 -1 +1x2 -2 =0.25
9
9 ECE369 Example Represent –0.75 ––0.75 = (–1) 1 × 1.1 2 × 2 –1 –S = 1 –Fraction = 1000…00 2 –Exponent = –1 + Bias Single: –1 + 127 = 126 = 01111110 2 Double: –1 + 1023 = 1022 = 01111111110 2 Single: 1011111101000…00 Double: 1011111111101000…00
10
10 ECE369 Floating point addition 1.610x10 -1 + 9.999x10 1 0.01610x10 1 + 9.999x10 1 10.015x10 1 1.0015x10 2 1.002x10 2
11
11 ECE369 Floating point addition Step 1 Step 2 Step 3 Step 4
12
12 ECE369 Add 0.5 10 and -0.4375 10
13
13 ECE369 Multiplication
14
14 ECE369 Floating point multiply To multiply two numbers –Add the two exponent (remember access 127 notation) –Produce the result sign as exor of two signs –Multiply significand portions –Results will be 1x.xxxxx… or 01.xxxx…. –In the first case shift result right and adjust exponent –Round off the result –This may require another normalization step
15
15 ECE369 Multiplication 0.5 10 and -0.4375 10
16
16 ECE369 Floating point divide To divide two numbers –Subtract divisor’s exponent from the dividend’s exponent (remember access 127 notation) –Produce the result sign as exor of two signs –Divide dividend’s significand by divisor’s significand portions –Results will be 1.xxxxx… or 0.1xxxx…. –In the second case shift result left and adjust exponent –Round off the result –This may require another normalization step
17
17 ECE369 Floating point complexities Operations are somewhat more complicated (see text) In addition to overflow we can have “underflow” Accuracy can be a big problem –IEEE 754 keeps two extra bits, guard and round –Four rounding modes –Positive divided by zero yields “infinity” –Zero divide by zero yields “not a number” –Other complexities Implementing the standard can be tricky Not using the standard can be even worse –See text for description of 80x86 and Pentium bug!
18
18 ECE369 FP Arithmetic Hardware FP multiplier is of similar complexity to FP adder –But uses a multiplier for significands instead of an adder FP arithmetic hardware usually does –Addition, subtraction, multiplication, division, reciprocal, square- root –FP integer conversion Operations usually takes several cycles –Can be pipelined
19
19 ECE369 FP Instructions in MIPS FP hardware is coprocessor 1 –Adjunct processor that extends the ISA Separate FP registers –32 single-precision: $f0, $f1, … $f31 –Paired for double-precision: $f0/$f1, $f2/$f3, … Release 2 of MIPs ISA supports 32 × 64-bit FP reg’s FP instructions operate only on FP registers –Programs generally don’t do integer ops on FP data, or vice versa –More registers with minimal code-size impact FP load and store instructions –lwc1, ldc1, swc1, sdc1 e.g., ldc1 $f8, 32($sp)
20
20 ECE369 FP Instructions in MIPS Single-precision arithmetic –add.s, sub.s, mul.s, div.s e.g., add.s $f0, $f1, $f6 Double-precision arithmetic –add.d, sub.d, mul.d, div.d e.g., mul.d $f4, $f4, $f6 Single- and double-precision comparison –c.xx.s, c.xx.d (xx is eq, lt, le, …) –Sets or clears FP condition-code bit e.g. c.lt.s $f3, $f4 Branch on FP condition code true or false –bc1t, bc1f e.g., bc1t TargetLabel
21
21 ECE369 3.9- Fallacies and Pitfalls: Right Shift and Division Left shift by i places multiplies an integer by 2 i Right shift divides by 2 i ? –Only for unsigned integers For signed integers –Arithmetic right shift: replicate the sign bit –e.g., –5 / 4 11111011 2 >> 2 = 11111110 2 = –2 Rounds toward –∞ –c.f. 11111011 2 >>> 2 = 00111110 2 = +62
22
22 ECE369 3.9- Fallacies and Pitfalls: Associativity Parallel programs may interleave operations in unexpected orders –Assumptions of associativity may fail Need to validate parallel programs under varying degrees of parallelism
23
23 ECE369 3.9- Fallacies and Pitfalls: Who Cares About FP Accuracy? Important for scientific code –But for everyday consumer use? “My bank balance is out by 0.0002¢!” The Intel Pentium FDIV bug –The market expects accuracy –See Colwell, The Pentium Chronicles
24
24 ECE369 Concluding Remarks Bits have no inherent meaning –Interpretation depends on the instructions applied Computer representations of numbers –Finite range and precision –Need to account for this in programs
25
25 ECE369 Concluding Remarks ISAs support arithmetic –Signed and unsigned integers –Floating-point approximation to reals Bounded range and precision –Operations can overflow and underflow MIPS ISA –Core instructions: 54 most frequently used 100% of SPECINT, 97% of SPECFP –Other instructions: less frequent
26
26 ECE369 Lets Build a Processor, Introduction to Instruction Set Architecture First Step Into Your Project !!! How could we build a 1-bit ALU for add, and, or? Need to support the set-on-less-than instruction (slt) –slt is an arithmetic instruction –produces a 1 if a < b and 0 otherwise –use subtraction: (a-b) < 0 implies a < b Need to support test for equality (beq $t5, $t6, Label) –use subtraction: (a-b) = 0 implies a = b How could we build a 32-bit ALU? 32 operation result a b ALU Must Read Appendix
27
27 ECE369 One-bit adder Takes three input bits and generates two output bits Multiple bits can be cascaded c out = a.b + a.c in + b.c in sum = a b c in
28
28 ECE369 Building a 32 bit ALU
29
29 ECE369 Two's complement approach: just negate b and add. How do we negate? A very clever solution: What about subtraction (a – b) ? 000 = and 001 = or 010 = add 000 = and 001 = or 010 = add 110 = subtract
30
30 ECE369 Supporting Slt Can we figure out the idea? 000 = and 001 = or 010 = add 110 = subtract 111 = slt
31
31 ECE369 Test for equality Notice control lines 000 = and 001 = or 010 = add 110 = subtract 111 = slt Note: Zero is a 1 if result is zero!
32
32 ECE369 How about “a nor b” 000 = and 001 = or 010 = add 110 = subtract 111 = slt
33
33 ECE369 Big Picture
34
34 ECE369 Conclusion We can build an ALU to support an instruction set –key idea: use multiplexor to select the output we want –we can efficiently perform subtraction using two’s complement –we can replicate a 1-bit ALU to produce a 32-bit ALU Important points about hardware –all of the gates are always working –speed of a gate is affected by the number of inputs to the gate –speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) Our primary focus: comprehension, however, –Clever changes to organization can improve performance (similar to using better algorithms in software) How about my instruction smt (set if more than)???
35
35 ECE369 ALU Summary We can build an ALU to support addition Our focus is on comprehension, not performance Real processors use more sophisticated techniques for arithmetic Where performance is not critical, hardware description languages allow designers to completely automate the creation of hardware!
36
36 ECE369 Optional Reading
37
37 ECE369 Overflow
38
38 ECE369 Formulation
39
39 ECE369 A Simpler Formula ?
40
40 ECE369 Problem: Ripple carry adder is slow! Is a 32-bit ALU as fast as a 1-bit ALU? Is there more than one way to do addition? Can you see the ripple? How could you get rid of it? c1 = a0b0 + a0c0 + b0c0 c2 = a1b1 + a1c1 + b1c1c2 = c3 = a2b2 + a2c2 + b2c2c3 = c4 = a3b3 + a3c3 + b3c3c4 = Not feasible! Why?
41
41 ECE369 Carry Bit
42
42 ECE369 Generate/Propagate aiai bibi c i+1 00 01 10 11 aiai bibi 00 01 10 11 0 1 0 0 1 0 1 1
43
43 ECE369 Generate/Propagate (Ctd.)
44
44 ECE369 Carry-look-ahead adder Motivation: –If we didn't know the value of carry-in, what could we do? –When would we always generate a carry? g i = a i. b i –When would we propagate the carry? p i = a i + b i Did we get rid of the ripple? c1 = g0 + p0c0 c2 = g1 + p1c1 c2 = g1 + p1g0 + p1p0c0 c3 = g2 + p2c2 c3 = g2 + p2g1 + p2p1g0 + p2p1p0c0 c4 = g3 + p3c3 c4 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3p2p1p0c0 Feasible! Why? c1 = a0b0 + a0c0 + b0c0 c2 = a1b1 + a1c1 + b1c1c2 = c3 = a2b2 + a2c2 + b2c2c3 = c4 = a3b3 + a3c3 + b3c3c4 = a3 a2 a1 a0 b3 b2 b1 b0
45
45 ECE369 A 4-bit carry look-ahead adder Generate g and p term for each bit Use g’s, p’s and carry in to generate all C’s Also use them to generate block G and P CLA principle can be used recursively
46
46 ECE369 16 Bit CLA
47
47 ECE369 Gate Delay for 16 bit Adder 1 1+2 1+2+2
48
48 ECE369 64-bit carry lookahead adder
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.