Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU

Faster: Final Chapter Strategies for faster processors Instruction–level parallelism – general applicability but limited gain SIMD – Single Instruction Multiple Data MIMD – Multiple Instruction Multiple Data 11/30/15Computer Architecture lecture 242

SIMD Multimedia extensions for x86 architecture – 4 or 8-way parallel arithmetic Vector arithmetic – multiple fast pipelines – specialized processors GPUs – Graphics Processing Units – co-processor for CPU – hundreds / thousands of arithmetic units 11/30/15Computer Architecture lecture 243

GPU High-quality rendering is very compute intensive – image realized by 1M triangles – computing pixels may require 4x10 9 cycles – led to specialized graphics ‘cards’ for rendering hardwired sequence of stages Transition in early 2000’s to more general design – large arrays of processors – fostered experiments for wider use of GPU 11/30/15Computer Architecture lecture 244

GPU Several levels of parallelism: – basic unit is a streaming processor (SP) also called CUDA core scalar integer and floating point arithmetic large register file – 128 streaming processors form a streaming multiprocessor (SM) act as SIMD through hardware multithreading (SIMT = single instruction multiple thread) – 16 SMs form a GPU act as MIMD processor composed of SIMD processors 11/30/15Computer Architecture lecture 245

GPU structure 11/30/15Computer Architecture lecture 246

CUDA NVIDIA developed software to execute GPU programs from C (CUDA = Compute Unified Device) 11/30/15Computer Architecture lecture 247

GPU Aimed at high throughput, latency tolerant tasks – multithreading hides latency of main memory, reduces need for large, multilevel cache Very high throughput for suitable tasks – multi-teraflops possible 11/30/15Computer Architecture lecture 248

MIMD Provided through (any combination of) – multithreading – multicore chips – clusters Processes communicate through – message passing – shared memory UMA (uniform memory architecture) NUMA (non-uniform memory architecture) 11/30/15Computer Architecture lecture 249

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

Similar presentations

Presentation on theme: "Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

Similar presentations

Presentation on theme: "Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU."— Presentation transcript:

Similar presentations

About project

Feedback