Floating Point Numbers & Parallel Computing
Outline Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing …
Fixed-point Numbers How to represent rational numbers in binary? One way: define binary “point” between integer and fraction Analogous to point between integer and fraction for decimal numbers: integerpointfraction
Fixed-point Numbers Point’s position is static (cannot be changed) E.g., point goes between 3 rd and 4 th bits of byte: bits for integer component 4 bits for fraction component
Fixed-point Numbers Integer component: binary interpreted as before LSB is = = 4+2 = 6
Fixed-point Numbers Fraction component: binary interpreted slightly differently MSB is = = = 0.75
Fixed-point Numbers = = = 0.75 = = 4+2 =
Fixed-point Numbers How to represent negative numbers? 2’s complement notation
Fixed-point Numbers 1.Invert bits 2.Add 1 3.Convert to fixed-point decimal 4.Multiply by = = = =
Outline Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing …
Floating Point Numbers Analogous to scientific notation E.g., 4.1 × 10 3 = 4100 Gets around limitations of constant integer and fraction sizes Allows representation of very small and very large numbers 10
Floating Point Numbers Just like scientific notation, floating point numbers have: sign (±) mantissa (M) base (B) exponent (E) × 10 3 = 4100 M = 4.1 B = 10 E = 3
Floating Point Numbers Floating point numbers in binary bits sign 1 bit exponent 8 bits mantissa 23 bits
Floating Point Numbers Example: convert 228 to floating point 228 = = × sign = positive exponent = 7 mantissa = base = 2 (implicit)
Floating Point Numbers 228 = = × sign = positive (0) exponent = 7 mantissa = base = 2 (implicit)
Floating Point Numbers In binary floating point, MSB of mantissa is always 1 No need to store MSB of mantissa (1 is implied) Called the “implicit leading 1”
Floating Point Numbers Exponent must represent both positive and negative numbers Floating point uses biased exponent Original exponent plus a constant bias 32-bit floating point uses bias 127 E.g., exponent -4 (2 -4 ) would be = 123 = E.g., exponent 7 (2 7 ) would be = 134 =
Floating Point Numbers E.g., 228 in floating point binary (IEEE 754 standard) sign bit = 0 (positive) 8-bit biased exponent E = number – bias E = 134 – 127 = 7 23-bit mantissa without implicit leading 1
Floating Point Numbers Special cases: 0, ±∞, NaN 18 valuesign bitexponentmantissa 0N/A …000 +∞+∞ …000 -∞-∞ …000 NaNN/A non-zero
Floating Point Numbers Single versus double precision Single: 32-bit float Range: ± × > ± × Double: 64-bit double Range: ± × > ± × # bits (total) # sign bits # exponent bits # mantissa bits float double
Outline Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing …
Superscalar Processors Multiple hardwired copies of datapath Allows multiple instructions to execute simultaneously E.g., 2-way superscalar processor Fetches / executes 2 instructions per cycle 2 ALUs 2-port memory unit 6-port register file (4 source, 2 write back) 21
Superscalar Processors Datapath for 2-way superscalar processor 22 2 ALUs 2-port memory unit 6-port register file
Superscalar Processors Pipeline for 2-way superscalar processor 2 instructions per cycle: 23
Superscalar Processors Commercial processors can be 3, 4, or even 6-way superscalar Very difficult to manage dependencies and hazards 24 Intel Nehalam (6-way superscalar)
Outline Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing …
Multithreading (Terms) Process: program running on a computer Can have multiple processes running at same time E.g., music player, web browser, anti-virus, word processor Thread: each process has one or more threads that can run simultaneously E.g., word processor: threads to read input, print, spell check, auto-save 26
Multithreading (Terms) Instruction level parallelism (ILP): # of instructions that can be executed simultaneously for program / microarchitecture Practical processors rarely achieve ILP greater than 2 or 3 Thread level parallelism (TLP): degree to which a process can be split into threads 27
Multithreading Keeps processor with many execution units busy Even if ILP is low or program is stalled (waiting for memory) For single-core processors, threads give illusion of simultaneous execution Threads take turns executing (according to OS) OS decides when a thread’s turn begins / ends 28
Multithreading When one thread’s turn ends: -- OS saves architectural state -- OS loads architectural state of another thread -- New thread begins executing This is called a context switch If context switch is fast enough, user perceives threads as running simultaneously (even on single-core) 29 context switch
Multithreading Multithreading does NOT improve ILP, but DOES improve processor throughput Threads use resources that are otherwise idle Multithreading is relatively inexpensive Only need to save PC and register file 30 idle next task… vs
Outline Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing …
Homogeneous Multiprocessing AKA symmetric multiprocessing (SMP) 2 or more identical processors with single shared memory Easier to design (than heterogeneous) Multiple cores on same (or different) chip(s) In 2005, architectures made shift to SMP 32
Homogeneous Multiprocessing Multiple cores can execute threads concurrently True simultaneous execution Multi-threaded programming can be tricky.. 33 core #1 core #2 core #3 core #4 single-core multi-core threads w/ single-core vs. multi-core
Outline Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing …
35 Heterogeneous Multiprocessing AKA asymmetric multiprocessing (AMP) 2 (or more) different processors Specialized processors used for specific tasks E.g., graphics, floating point, FPGAs Adds complexity Nvidia GPU
Heterogeneous Multiprocessing Clustered: Each processor has its own memory E.g., PCs connected on a network Memory not shared, must pass information between nodes… Can be costly 36