Programming Massively Parallel Processors Lecture Slides for Chapter 9: Application Case Study – Electrostatic Potential Calculation © David Kirk/NVIDIA.

Slides:



Advertisements
Similar presentations
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.
Advertisements

ECE 598HK Computational Thinking for Many-core Computing Lecture 2: Many-core GPU Performance Considerations © Wen-mei W. Hwu and David Kirk/NVIDIA,
©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.
© David Kirk/NVIDIA and Wen-mei W. Hwu Taiwan, June 30-July 2, 2008 Taiwan 2008 CUDA Course Programming Massively Parallel Processors: the CUDA experience.
© David Kirk/NVIDIA and Wen-mei W. Hwu Urbana, Illinois, August 18-22, 2008 VSCSE Summer School 2008 Accelerators for Science and Engineering Applications:
© David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al, University of Illinois, ECE408 Applied Parallel Programming Lecture 12 Parallel.
© David Kirk/NVIDIA and Wen-mei W. Hwu Taiwan, June 30-July 2, Taiwan 2008 CUDA Course Programming Massively Parallel Processors: the CUDA experience.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CS 395: CUDA Lecture 5 Memory coalescing (from.
1 ECE408/CS483 Applied Parallel Programming Lecture 10: Tiled Convolution Analysis © David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al University.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming.
FIGURE 11.1 Mapping between OpenCL and CUDA data parallelism model concepts. KIRK CH:11 “Programming Massively Parallel Processors: A Hands-on Approach.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CMPS 5433 Dr. Ranette Halverson Programming Massively.
1 Ceng 545 GPU Computing. Grading 2 Midterm Exam: 20% Homeworks: 40% Demo/knowledge: 25% Functionality: 40% Report: 35% Project: 40% Design Document:
GPU Programming Shirley Moore CPS 5401 Fall 2013
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 Graphic Processing Processors (GPUs) Parallel.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.
1 CS/EE 217 GPU Architecture and Parallel Programming Lecture 9: Tiled Convolution Analysis © David Kirk/NVIDIA and Wen-mei W. Hwu,
© David Kirk/NVIDIA and Wen-mei W
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.
L6: Memory Hierarchy Optimization IV, Bandwidth Optimization
Introduction to Computer Graphics
ECE 498AL Lectures 8: Bank Conflicts and Sample PTX code
Graphics on GPU © David Kirk/NVIDIA and Wen-mei W. Hwu,
© David Kirk/NVIDIA and Wen-mei W. Hwu,
ECE408 Fall 2015 Applied Parallel Programming Lecture 11 Parallel Computation Patterns – Reduction Trees © David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al,
ECE408/CS483 Fall 2015 Applied Parallel Programming Lecture 7: DRAM Bandwidth ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University.
ECE408 Fall 2015 Applied Parallel Programming Lecture 21: Application Case Study – Molecular Dynamics.
ECE408/CS483 Applied Parallel Programming Lecture 7: DRAM Bandwidth
ECE 498AL Spring 2010 Lectures 8: Threading & Memory Hardware in G80
ECE408 / CS483 Applied Parallel Programming Lecture 1: Introduction
L4: Memory Hierarchy Optimization II, Locality and Data Placement, cont. CS6235 L4: Memory Hierarchy, II.
L18: CUDA, cont. Memory Hierarchy and Examples
Mattan Erez The University of Texas at Austin
© David Kirk/NVIDIA and Wen-mei W. Hwu,
VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture 5: Data Layout for Grid Applications ©Wen-mei W. Hwu and David Kirk/NVIDIA.
ECE408 / CS483 Applied Parallel Programming Lecture 23: Application Case Study – Electrostatic Potential Calculation.
DRAM Bandwidth Slide credit: Slides adapted from
© David Kirk/NVIDIA and Wen-mei W. Hwu,
© David Kirk/NVIDIA and Wen-mei W. Hwu,
L4: Memory Hierarchy Optimization II, Locality and Data Placement
© 2012 Elsevier, Inc. All rights reserved.
Mattan Erez The University of Texas at Austin
Programming Massively Parallel Processors Performance Considerations
L15: CUDA, cont. Memory Hierarchy and Examples
ECE408 Applied Parallel Programming Lecture 14 Parallel Computation Patterns – Parallel Prefix Sum (Scan) Part-2 © David Kirk/NVIDIA and Wen-mei W.
Introduction to Heterogeneous Parallel Computing
L6: Memory Hierarchy Optimization IV, Bandwidth Optimization
ECE 498AL Lecture 15: Reductions and Their Implementation
ECE498AL Spring 2010 Lecture 4: CUDA Threads – Part 2
ECE 8823A GPU Architectures Module 3: CUDA Execution Model -I
Mattan Erez The University of Texas at Austin
© David Kirk/NVIDIA and Wen-mei W. Hwu,
ECE 8823A GPU Architectures Module 2: Introduction to CUDA C
ECE 498AL Lecture 10: Control Flow
L6: Memory Hierarchy Optimization IV, Bandwidth Optimization
© David Kirk/NVIDIA and Wen-mei W. Hwu,
© David Kirk/NVIDIA and Wen-mei W. Hwu,
© David Kirk/NVIDIA and Wen-mei W. Hwu,
Mattan Erez The University of Texas at Austin
Mattan Erez The University of Texas at Austin
ECE 498AL Spring 2010 Lecture 10: Control Flow
© David Kirk/NVIDIA and Wen-mei W. Hwu,
© David Kirk/NVIDIA and Wen-mei W
A fixed-function NVIDIA GeForce graphics pipeline.
(non-Cartesian) trajectory with linear-solver-based reconstruction.
Presentation transcript:

Programming Massively Parallel Processors Lecture Slides for Chapter 9: Application Case Study – Electrostatic Potential Calculation © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign

Motivation Electrostatic potential map is used in building stable structures for molecular dynamics simulation. © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign

Core Computation The contribution of atom[i] to the electrostatic potential at lattice point j (potential[j]) is atom[i].charge / rij. In the Direct Coulomb Summation method, the total potential at lattice point j is the sum of contributions from all atoms in the system. © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign

Sequential CPU Code © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign

DCS Kernel Design Overview © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign

DCS Kernel Version 1 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign

Information Reuse © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign

DCS kernel Version 2 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign

Memory Layout for Coalescing © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign

DCS Kernel Version 3 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign

Performance Comparison © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign

CPU vs. CPU-GPU Comparison © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign

a PC with 3 GTX8800 GPUs at 700 Watts of total system power. © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign