Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,

Slides:

Advertisements

Similar presentations

Conclusion Kenneth Moreland Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,

Advertisements

Parallel Visualization Kenneth Moreland Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin.

CS179: GPU Programming Lecture 5: Memory. Today GPU Memory Overview CUDA Memory Syntax Tips and tricks for memory handling.

The 7 th Ultrascale Visualization Workshop November 12, 2012 Salt Lake City.

Optimization on Kepler Zehuan Wang

Efficacy of GPUs in RAID Parity Calculation 8/8/2007 Matthew Curry and Lee Ward Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,

Unstructured Data Partitioning for Large Scale Visualization CSCAPES Workshop June, 2008 Kenneth Moreland Sandia National Laboratories Sandia is a multiprogram.

A many-core GPU architecture.. Price, performance, and evolution.

Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:

2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.

GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.

Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.

Roadmap for Many-core Visualization Software in DOE Jeremy Meredith Oak Ridge National Laboratory.

Dax: Rethinking Visualization Frameworks for Extreme-Scale Computing DOECGF 2011 April 28, 2011 Kenneth Moreland Sandia National Laboratories SAND P.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 7: Threading Hardware in G80.

Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.

Add Cool Visualizations Here Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary.

Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Extracted directly from:

Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,

Erik P. DeBenedictis Sandia National Laboratories October 24-27, 2005 Workshop on the Frontiers of Extreme Computing Sandia is a multiprogram laboratory.

General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,

Nov. 14, 2012 Hank Childs, Lawrence Berkeley Jeremy Meredith, Oak Ridge Pat McCormick, Los Alamos Chris Sewell, Los Alamos Ken Moreland, Sandia Panel at.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CS 395 Winter 2014 Lecture 17 Introduction to Accelerator.

The Scalable Data Management, Analysis, and Visualization Institute VTK-m: Accelerating the Visualization Toolkit for Multi-core.

GPU Architecture and Programming

Introducing collaboration members – Korea University (KU) ALICE TPC online tracking algorithm on a GPU Computing Platforms – GPU Computing Platforms Joohyung.

System Architecture: Near, Medium, and Long-term Scalable Architectures Panel Discussion Presentation Sandia CSRI Workshop on Next-generation Scalable.

GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.

LAMMPS Users’ Workshop

Add Cool Visualizations Here Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 8: Threading Hardware in G80.

VTK-m Project Goals A single place for the visualization community to collaborate, contribute, and leverage massively threaded algorithms. Reduce the challenges.

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

Threading Opportunities in High-Performance Flash-Memory Storage Craig Ulmer Sandia National Laboratories, California Maya GokhaleLawrence Livermore National.

STK (Sierra Toolkit) Update Trilinos User Group meetings, 2014 R&A: SAND PE Sandia National Laboratories is a multi-program laboratory operated.

Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.

CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.

Site Report DOECGF April 26, 2011 W. Alan Scott Sandia National Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,

An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.

Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.

Clusters Rule! (SMPs DRUEL!) David R. White Sandia National Labs Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,

Add Cool Visualizations Here Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary.

Adapting the Visualization Toolkit for Many-Core Processors with the VTK-m Library Christopher Sewell (LANL) and Robert Maynard (Kitware) VTK-m Team: LANL:

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.

Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.

GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.

GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.

The Present and Future of Parallelism on GPUs

CS427 Multicore Architecture and Parallel Computing

EECE571R -- Harnessing Massively Parallel Processors ece

Ray-Cast Rendering in VTK-m

Scientific Discovery via Visualization Using Accelerated Computing

Presented by: Isaac Martin

NVIDIA Fermi Architecture

6- General Purpose GPU Programming

Presentation transcript:

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.SAND NO C VTK-m: Building a Visualization Toolkit for Massively Threaded Architectures Ultrascale Visualization Workshop Kenneth Moreland Sandia National Laboratories November 16, 2015

Extreme Scale: Threads, Threads Threads!  A clear trend in supercomputing is ever increasing parallelism  Clock increases are long gone  “The Free Lunch Is Over” (Herb Sutter) *Source: Scientific Discovery at the Exascale, Ahern, Shoshani, Ma, et al. Jaguar – XT5Titan – XK7Exascale* Cores224,256299,008 and 18,688 gpu 1 billion Concurrency224,256 way70 – 500 million way10 – 100 billion way Memory300 Terabytes700 Terabytes128 Petabytes

My new computer's got the clocks, it rocks But it was obsolete before I opened the box − “Weird” Al Yankovic, It’s All About the Pentiums, circa 1999 Moore’s Law is dead. − Gordon Moore, circa 2005

Amdahl vs. Gustafson-Barsis Amdahl’s Law  Any algorithm has data dependencies that makes some fraction of the software inherently serial. Parallelism is ultimately limited by this serial fraction.  See also Span Law. Gustafson-Barsis Law  Increasing the amount of data can potentially increase the amount of independent operations and allow an algorithm to increase parallelism indefinitely.

AMD x86 NVIDIA GPU Full x86 Core + Associated Cache 8 cores per die MPI-Only feasible 2,880 cores collected in 15 SMX Shared PC, Cache, Mem Fetches Reduced control logic MPI-Only not feasible 1mm 1 x86 core 1 Kepler core

Inter-Node Parallelism Inter-Node Parallelism

Inter-Node Parallelism Inter-Node Parallelism Intra-Node Parallelism

Example Algorithm: Contours

Total: 11

How Many Architectures to Support? GPU (NVIDIA)  Sub-architectures:  Fermi, Kepler, Maxwell  Multiple Memory Types:  Global, shared, constant, texture  Memory Amount: Up to 12 GB  1000s of threads  Grids, blocks, and warps CPU/MIC  Mulple ISAs:  Vector unit widths: 2,4,8 / 16  Single Memory Type  Except when not (cache, HSM)  Larger Memory Size  Up to 60/260 threads  No explicit organization

Performance Portability ABCDEF Algorithm Architecture

Performance Portability ABCDEF Algorithm Backend VTK-m

VTK-m Framework Execution Environment Cell Operations Field Operations Basic Math Make Cells Control Environment Grid Topology Array Handle Invoke Device Adapter Allocate Transfer Schedule Sort … Worklet

CUDA SDK 561 Lines PISTON 505 Lines VTK-m 283 Lines

CUDA SDK 561 Lines PISTON 505 Lines VTK-m 283 Lines

Contour Times Surface Simplification Times

Algorithm VTK-m is separate from VTK

Algorithm Simulation VTK-m is separate from VTK

Filter Algorithm Simulation VTK-m is not a replacement for VTK

Reader Filter Rendering Algorithm Simulation

Reader Filter Rendering Algorithm Simulation

Acknowledgements  This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Award Numbers , , and  Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL  Lots of credit goes out to all our collaborators:  Chris Sewell, Jeremy Meredith, David Pugmire, Berk Geveci, Robert Maynard, Hank Childs, and many others.