Download presentation
Presentation is loading. Please wait.
Published byBrittany Anderson Modified over 8 years ago
1
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU
2
Faster: Final Chapter Strategies for faster processors Instruction–level parallelism – general applicability but limited gain SIMD – Single Instruction Multiple Data MIMD – Multiple Instruction Multiple Data 11/30/15Computer Architecture lecture 242
3
SIMD Multimedia extensions for x86 architecture – 4 or 8-way parallel arithmetic Vector arithmetic – multiple fast pipelines – specialized processors GPUs – Graphics Processing Units – co-processor for CPU – hundreds / thousands of arithmetic units 11/30/15Computer Architecture lecture 243
4
GPU High-quality rendering is very compute intensive – image realized by 1M triangles – computing pixels may require 4x10 9 cycles – led to specialized graphics ‘cards’ for rendering hardwired sequence of stages Transition in early 2000’s to more general design – large arrays of processors – fostered experiments for wider use of GPU 11/30/15Computer Architecture lecture 244
5
GPU Several levels of parallelism: – basic unit is a streaming processor (SP) also called CUDA core scalar integer and floating point arithmetic large register file – 128 streaming processors form a streaming multiprocessor (SM) act as SIMD through hardware multithreading (SIMT = single instruction multiple thread) – 16 SMs form a GPU act as MIMD processor composed of SIMD processors 11/30/15Computer Architecture lecture 245
6
GPU structure 11/30/15Computer Architecture lecture 246
7
CUDA NVIDIA developed software to execute GPU programs from C (CUDA = Compute Unified Device) 11/30/15Computer Architecture lecture 247
8
GPU Aimed at high throughput, latency tolerant tasks – multithreading hides latency of main memory, reduces need for large, multilevel cache Very high throughput for suitable tasks – multi-teraflops possible 11/30/15Computer Architecture lecture 248
9
MIMD Provided through (any combination of) – multithreading – multicore chips – clusters Processes communicate through – message passing – shared memory UMA (uniform memory architecture) NUMA (non-uniform memory architecture) 11/30/15Computer Architecture lecture 249
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.