Download presentation
Presentation is loading. Please wait.
Published byPercival Conley Modified over 9 years ago
1
ICAL GPU 架構中所提供分散式運算 之功能與限制
2
11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion
3
11/17/09ICAL3 Parallel computing with GPU Parallel computing Flynn’s Taxonomy Algorithm decomposed Amdahl’s Law Correctness concepts
4
11/17/09ICAL4 Parallel computing Parallel computing is a form of computation in which many calculations are carried out simultaneously. Parallel computers hardware: –Single machine: multi-core CPU, GPU –Multiple machines: clusters, MPPs, grid
5
11/17/09ICAL5 Parallel computing (cont.) There are several kinds of parallel computing, such as: –Bit-level –Instruction level –Data decomposed –Task decomposed The parallel computing has the speedup limit.
6
11/17/09ICAL6 Algorithm decomposed Task decompositionData Decomposition Prepared the Dinner Enjoy the dinner Cooking Cleaning table Purchasing John clean the tableMary go shopping Wishing dishes John and Mary wishing dishes
7
11/17/09ICAL7 Flynn’s Taxonomy Data Instruction SingleMultiple Single Multiple SISD MISD SIMD MIMD
8
11/17/09ICAL8 Amdahl’s Law Amdahl's law is a model for the expected speedup from partial improvement P: Parallel Portion S: Speedup of parallel portion
9
11/17/09ICAL9 Correctness concepts Race conditionDeadlock …… a=19 …… Read a save a=21 …… a=20 …… save a=a+1 ERROR!
10
11/17/09ICAL10 NVIDIA CUDA Historical Trends CUDA Programming Languages Reported Speedup
11
11/17/09ICAL11 Historical Trends
12
11/17/09ICAL12 CUDA Compute Unified Device Architecture, CUDA CUDA is a computing engine in NVIDIA GPU (graphics processing units)
13
11/17/09ICAL13 Programming Languages Application C/C++FortranOpenCL...... NVIDIA GPU with the CUDA Parallel Computing Architecture
14
11/17/09ICAL14 Reported Speedup
15
11/17/09ICAL15 CUDA Architecture Physical Reality behind CUDA CUDA Architectures Introducing the “Fermi” Architecture SM Architecture CUDA Core Architecture
16
11/17/09ICAL16 Physical Reality behind CUDA CPU (host) GPU (device) Main Memory
17
11/17/09ICAL17 CUDA Architectures G80 –First CUDA-capable processor G8x, G9x –Global memory GT200 –Double precision –Shared memory –Larger register file –Relaxed memory coalescing rules Basic CUDA architecture
18
11/17/09ICAL18 “Fermi” Architecture 3 billion transistors Over 2x the cores (512 total) 8x the peak DP performance L1 and L2 caches ~2x memory bandwidth Up to 1 terabyte of GPU memory
19
11/17/09ICAL19 SM Architecture 32 CUDA cores per SM (Streaming Multiprocessor) 8x peak double precision floating point performance Dual Thread Scheduler 64 KB of RAM for shared memory and L1 cache
20
11/17/09ICAL20 CUDA Core Architecture New IEEE 754-2008 floating-point standard Fused multiply-add (FMA) instruction for both single and double precision Newly designed integer ALC optimized for 64-bit and extended precision operations
21
11/17/09ICAL21 SVD matrix computation SVD SVD matrix computation Experiment Datasets Experiment Environment Experiment Results
22
11/17/09ICAL22 SVD The singular value decomposition (SVD) is an important factorization of matrix, with many applications in signal processing and statistics. Suppose M is an m-by-n matrix, then there exists a factorization of the form.
23
11/17/09 SVD matrix computation ImageRGB pixel matrix SVD Matrix
24
11/17/09 Experiment Datasets 3 test images RBG full color 1024x1024 total 1048576 pixels
25
11/17/09 Experiment Environment GPU Device NVIDA Geforce 9600 GSO Cores96 Processor Clock 1375 MHz Standard Memory 384 MB Memory Bandwidth 38.4 GB/sec CPU Device Intel Core2 Quad Q9300 Cores4 Processor Clock 2.5 GHz FSB speed1333 MHz L2 Cache6 MB
26
11/17/09 Experiment Results
27
11/17/09ICAL27 Conclusion Using GPU to improve the program speed is feasible. NVIDIA CUDA is good with SIMD parallel computing. But there are additional costs about Data passing between main memory and GPU memory.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.