ICAL GPU 架構中所提供分散式運算之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion.

ICAL GPU 架構中所提供分散式運算之功能與限制

11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion

11/17/09ICAL3 Parallel computing with GPU Parallel computing Flynn’s Taxonomy Algorithm decomposed Amdahl’s Law Correctness concepts

11/17/09ICAL4 Parallel computing Parallel computing is a form of computation in which many calculations are carried out simultaneously. Parallel computers hardware: –Single machine: multi-core CPU, GPU –Multiple machines: clusters, MPPs, grid

11/17/09ICAL5 Parallel computing (cont.) There are several kinds of parallel computing, such as: –Bit-level –Instruction level –Data decomposed –Task decomposed The parallel computing has the speedup limit.

11/17/09ICAL6 Algorithm decomposed Task decompositionData Decomposition Prepared the Dinner Enjoy the dinner Cooking Cleaning table Purchasing John clean the tableMary go shopping Wishing dishes John and Mary wishing dishes

11/17/09ICAL7 Flynn’s Taxonomy Data Instruction SingleMultiple Single Multiple SISD MISD SIMD MIMD

11/17/09ICAL8 Amdahl’s Law Amdahl's law is a model for the expected speedup from partial improvement P: Parallel Portion S: Speedup of parallel portion

11/17/09ICAL9 Correctness concepts Race conditionDeadlock …… a=19 …… Read a save a=21 …… a=20 …… save a=a+1 ERROR!

11/17/09ICAL10 NVIDIA CUDA Historical Trends CUDA Programming Languages Reported Speedup

11/17/09ICAL11 Historical Trends

11/17/09ICAL12 CUDA Compute Unified Device Architecture, CUDA CUDA is a computing engine in NVIDIA GPU (graphics processing units)

11/17/09ICAL13 Programming Languages Application C/C++FortranOpenCL...... NVIDIA GPU with the CUDA Parallel Computing Architecture

11/17/09ICAL14 Reported Speedup

11/17/09ICAL15 CUDA Architecture Physical Reality behind CUDA CUDA Architectures Introducing the “Fermi” Architecture SM Architecture CUDA Core Architecture

11/17/09ICAL16 Physical Reality behind CUDA CPU (host) GPU (device) Main Memory

11/17/09ICAL17 CUDA Architectures G80 –First CUDA-capable processor G8x, G9x –Global memory GT200 –Double precision –Shared memory –Larger register file –Relaxed memory coalescing rules Basic CUDA architecture

11/17/09ICAL18 “Fermi” Architecture 3 billion transistors Over 2x the cores (512 total) 8x the peak DP performance L1 and L2 caches ~2x memory bandwidth Up to 1 terabyte of GPU memory

11/17/09ICAL19 SM Architecture 32 CUDA cores per SM (Streaming Multiprocessor) 8x peak double precision floating point performance Dual Thread Scheduler 64 KB of RAM for shared memory and L1 cache

11/17/09ICAL20 CUDA Core Architecture New IEEE 754-2008 floating-point standard Fused multiply-add (FMA) instruction for both single and double precision Newly designed integer ALC optimized for 64-bit and extended precision operations

11/17/09ICAL21 SVD matrix computation SVD SVD matrix computation Experiment Datasets Experiment Environment Experiment Results

11/17/09ICAL22 SVD The singular value decomposition (SVD) is an important factorization of matrix, with many applications in signal processing and statistics. Suppose M is an m-by-n matrix, then there exists a factorization of the form.

11/17/09 SVD matrix computation ImageRGB pixel matrix SVD Matrix

11/17/09 Experiment Datasets 3 test images RBG full color 1024x1024 total 1048576 pixels

11/17/09 Experiment Environment GPU Device NVIDA Geforce 9600 GSO Cores96 Processor Clock 1375 MHz Standard Memory 384 MB Memory Bandwidth 38.4 GB/sec CPU Device Intel Core2 Quad Q9300 Cores4 Processor Clock 2.5 GHz FSB speed1333 MHz L2 Cache6 MB

11/17/09 Experiment Results

11/17/09ICAL27 Conclusion Using GPU to improve the program speed is feasible. NVIDIA CUDA is good with SIMD parallel computing. But there are additional costs about Data passing between main memory and GPU memory.

ICAL GPU 架構中所提供分散式運算之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion.

Similar presentations

Presentation on theme: "ICAL GPU 架構中所提供分散式運算之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ICAL GPU 架構中所提供分散式運算 之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion.

Similar presentations

Presentation on theme: "ICAL GPU 架構中所提供分散式運算 之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion."— Presentation transcript:

Similar presentations

About project

Feedback

ICAL GPU 架構中所提供分散式運算之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion.

Presentation on theme: "ICAL GPU 架構中所提供分散式運算之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion."— Presentation transcript: