Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

Similar presentations


Presentation on theme: "Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing""— Presentation transcript:

1 Multiprocessing

2 Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

3 Processor Parallelism Process Parallelism : Ability run multiple instruction streams simultaneously

4 Flynn's Taxonomy Categorization of architectures based on – Number of simultaneous instructions – Number of simultaneous data items

5 Flynn's Taxonomy Categorization of architectures

6 SISD SISD : Single Instruction – Single Data – One instruction sent to one processing unit to work on one piece of data – May be pipelined or superscalar

7 Flynn's Taxonomy Categorization of architectures

8 SIMD Roots ILLIAC IV – One instruction issued to 64 processing units

9 SIMD Roots Cray I – Vector processor – One instruction applied to all elements of vector register

10 Modern SIMD x86 Processors – SSE Units : Streaming SIMD Execution – Operate on special 128 bit registers 4 32bit chunks 2 64bit chunks 16 8 bit chiunks …

11 Modern SIMD Graphics Cards http://www.nvidia.com/object/ fermi-architecture.html http://www.nvidia.com/object/ fermi-architecture.html Becoming less and less "S"

12 Co Processors Graphics Processing : floating point specialized –i7 ~ 100 gigaflops –Kepler GPU ~ 1300 gigaflops

13 CUDA Compute Unified Device Architecture – Programming model for general purpose work on GPU hardware – Streaming Multiprocessors each with 16-48 CUDA cores

14 CUDA Designed for 1000's of threads – Broken into "warps" of 32 threads – Entire warp runs on SM in lock step – Branch divergence cuts speed

15 Flynn's Taxonomy Categorization of architectures

16 MISD MISD : Multiple Instruction – Single Data – Different instruction, same data calculated – Rare – Space shuttle : Five processors handle fly by wire input, vote

17 Flynn's Taxonomy Categorization of architectures

18 MIMD MIMD : Multiple Instruction – Multiple Data – Different instructions, working on different data in different processing units – Most common parallel

19 Coprocessors Coprocessor : Assists main CPU with some part of work

20 Co Processors Graphics Processing : floating point specialized –i7 ~ 100 gigaflops –Kepler GPU ~ 1300 gigaflops

21 Other Coprocessors CPU's used to have floating point coprocessors – Intel 30386 & 80387 Audio cards PhysX Crytpo – SLL encryption for servers

22 Multiprocessing Multiprocessing : Many processors, shared memory – May have local cache/special memory

23 Homogenous Multicore i7 : Homogenous multicore – 4 chips in one – separate L2 cache, shared L3

24 Heterogeneous Multicore Different cores for different jobs – Specialized media processing in mobile devices Examples – Tegra  – PS3 Cell

25 Multiprocessing & Memory Memory conflict demo…

26 UMA Uniform Memory Access – Every processor sees every memory using same addresses – Same access time for any CPU to any memory word

27 NUMA Non Uniform Memory Access – Single memory address space visible to all CPUs – Some memory local Fast – Some memory remote Accessed in same way, but slower

28 Connections Bus : One communication channel – Scales poorly

29 Connections Crossbar switched – Segmented memory – Any processor can directly link to any memory – N 2 switches

30 Connections Other topologies – Balance complexity, flexibility and latency

31 BlueGene Major super computer player http://s.top500.org/static/lists/2012/11/TOP500_201211_Poster.png

32 BG/P Compute Cards 4 processors per card Fully coherent caches Connected in double torus to neighbors

33 BG/P Full system : 72 x 32 x 32 torus of nodes

34 Titan The king : Descendant of Redstorm – http://www.olcf.ornl.gov/titan/ http://www.olcf.ornl.gov/titan/

35 Flynn's Taxonomy Categorization of architectures

36 Distributed Systems No common memory space Pass message between processors

37 COW Cluster of Workstations

38 Grid Computing – Multi Computing at internet scale – Resources owned by multiple parties http://folding.stanford.edu/ Seti@Home

39 Parallel Algorithms Some problems highly parallel, others not:

40 Applications can almost never be completely parallelized; some serial code remains Speedup always limited by serial part of program Speedup Issues : Amdahl’s Law Time Number of Cores Parallel portion Serial portion 1 5

41 Speedup Issues : Amdahl’s Law Time Number of Cores Parallel portion Serial portion 123 45 Amdahl’s law: – s is serial fraction of program, P is # of processors

42 Ouch More processors only help with high % of parallelized code

43 Amdahl's Law is Optimistic Each new processor means more – Load balancing – Scheduling – Communication – Etc…


Download ppt "Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing""

Similar presentations


Ads by Google