Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,

Slides:



Advertisements
Similar presentations
Lecture 6: Multicore Systems
Advertisements

LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.
University of Michigan Electrical Engineering and Computer Science Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems.
Unstructured Data Partitioning for Large Scale Visualization CSCAPES Workshop June, 2008 Kenneth Moreland Sandia National Laboratories Sandia is a multiprogram.
Problem Uncertainty quantification (UQ) is an important scientific driver for pushing to the exascale, potentially enabling rigorous and accurate predictive.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
A many-core GPU architecture.. Price, performance, and evolution.
Presented by Rengan Xu LCPC /16/2014
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
A Parallel Structured Ecological Model for High End Shared Memory Computers Dali Wang Department of Computer Science, University of Tennessee, Knoxville.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Parallelization and CUDA libraries Lei Zhou, Yafeng Yin, Hong Man.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Roadmap for Many-core Visualization Software in DOE Jeremy Meredith Oak Ridge National Laboratory.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
Dax: Rethinking Visualization Frameworks for Extreme-Scale Computing DOECGF 2011 April 28, 2011 Kenneth Moreland Sandia National Laboratories SAND P.
GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.
David Luebke NVIDIA Research GPU Computing: The Democratization of Parallel Computing.
Add Cool Visualizations Here Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary.
Extracted directly from:
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
GPU in HPC Scott A. Friedman ATS Research Computing Technologies.
Nov. 14, 2012 Hank Childs, Lawrence Berkeley Jeremy Meredith, Oak Ridge Pat McCormick, Los Alamos Chris Sewell, Los Alamos Ken Moreland, Sandia Panel at.
The Scalable Data Management, Analysis, and Visualization Institute VTK-m: Accelerating the Visualization Toolkit for Multi-core.
GPU Architecture and Programming
Introducing collaboration members – Korea University (KU) ALICE TPC online tracking algorithm on a GPU Computing Platforms – GPU Computing Platforms Joohyung.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.
Add Cool Visualizations Here Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary.
Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
ARCHES: GPU Ray Tracing I.Motivation – Emergence of Heterogeneous Systems II.Overview and Approach III.Uintah Hybrid CPU/GPU Scheduler IV.Current Uintah.
VTK-m Project Goals A single place for the visualization community to collaborate, contribute, and leverage massively threaded algorithms. Reduce the challenges.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
VAPoR: A Discovery Environment for Terascale Scientific Data Sets Alan Norton & John Clyne National Center for Atmospheric Research Scientific Computing.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Site Report DOECGF April 26, 2011 W. Alan Scott Sandia National Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Hank Childs, University of Oregon Jan. 21st, 2013 CIS 610: Many-core visualization libraries.
Would'a, CUDA, Should'a. CUDA: Compute Unified Device Architecture OU Supercomputing Symposium Highly-Threaded HPC.
Heterogeneous Computing With GPGPUs Matthew Piehl Overview Introduction to CUDA Project Overview Issues faced nvcc Implementation Performance Metrics Conclusions.
Add Cool Visualizations Here Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
My Coordinates Office EM G.27 contact time:
Adapting the Visualization Toolkit for Many-Core Processors with the VTK-m Library Christopher Sewell (LANL) and Robert Maynard (Kitware) VTK-m Team: LANL:
Graphic Processing Units Presentation by John Manning.
On the Path to Trinity - Experiences Bringing Codes to the Next Generation ASC Platform Courtenay T. Vaughan and Simon D. Hammond Sandia National Laboratories.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.
GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.
Super Computing By RIsaj t r S3 ece, roll 50.
Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang
Lecture 2: Intro to the simd lifestyle and GPU internals
Ray-Cast Rendering in VTK-m
Scientific Discovery via Visualization Using Accelerated Computing
General Purpose Graphics Processing Units (GPGPUs)
Department of Computer Science, University of Tennessee, Knoxville
6- General Purpose GPU Programming
Multicore and GPU Programming
Wavelet Compression for In Situ Data Reduction
Presentation transcript:

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.SAND NO PE VTK-m DOECGF April 29, 2015 Kenneth Moreland Sandia National Laboratories With contributions from Jeremy Meredith (ORNL), Chris Sewell (LANL), and Robert Maynard (Kitware)

Acknowledgements  This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Award Numbers , , and

Supercomputers!  A clear trend in supercomputing is ever increasing parallelism  Clock increases are long gone  “The Free Lunch Is Over” (Herb Sutter) *Source: Scientific Discovery at the Exascale, Ahern, Shoshani, Ma, et al. Jaguar – XT5Titan – XK7Exascale* Cores224,256299,008 and 18,688 gpu 1 billion Concurrency224,256 way70 – 500 million way10 – 100 billion way Memory300 Terabytes700 Terabytes128 Petabytes

“Everybody who learns concurrency thinks they understand it, ends up finding mysterious races they thought weren’t possible, and discovers that they didn’t actually understand it yet after all.” Herb Sutter

AMD x86 NVIDIA GPU Full x86 Core + Associated Cache 6 cores per die MPI-Only feasible 2,880 cores collected in 15 SMX Shared PC, Cache, Mem Fetches Reduced control logic MPI-Only not feasible 1mm 1 x86 core 1 Kepler “core”

VTK-m Goals  A single place for the visualization community to collaborate, contribute, and leverage massively threaded algorithms  Reduce the challenges of writing highly concurrent algorithms by using data parallel algorithms  Update DOE visualization tools (VTK/ParaView/VisIt) to modern processor technology  Make these algorithms accessible to simulation codes

VTK-m Architecture

Piston  Focuses on developing data-parallel algorithms that are portable across multi-core and many-core architectures for use by LCF codes of interest  Algorithms are integrated into LCF codes in-situ either directly or though integration with ParaView Catalyst PISTON isosurface with curvilinear coordinates Ocean temperature isosurface generated across four GPUs using distributed PISTON PISTON integration with VTK and ParaView

VTK-m Architecture

Dax  Create a toolkit that well suited to design of visualization operations with a great number of shared memory threads.  Develop a framework that adapts to emerging processor and compiler technologies.  Design multi-purpose algorithms that can be applied to a variety of visualization operations.

int vtkCellDerivatives::RequestData(...) {...[allocate output arrays]......[validate inputs]... for (cellId=0; cellId < numCells; cellId++) {...[update progress]... input->GetCell(cellId, cell); subId = cell->GetParametricCenter(pcoords); inScalars->GetTuples(cell->PointIds, cellScalars); scalars = cellScalars->GetPointer(0); cell->Derivatives( subId, pcoords, scalars, 1, derivs); outGradients->SetTuple(cellId, derivs); }...[cleanup]... } struct CellGradient : public WorkletMapCell { typedef void ControlSignature( Topology, Field(Point), Field(Point), Field(Out)); typedef _4 ExecutionSignature(_1,_2,_3); template DAX_EXEC_EXPORT Vector3 operator()(...) { Vector3 parametricCellCenter = ParametricCoordinates ::Center(); return CellDerivative(parametricCellCenter, coords, pointField, cellTag); } }; VTK CodeDax Code

VTK-m Architecture

Extreme-scale Analysis and Visualization Library (EAVL) J.S. Meredith, S. Ahern, D. Pugmire, R. Sisneros, "EAVL: The Extreme-scale Analysis and Visualization Library", Eurographics Symposium on Parallel Graphics and Visualization (EGPGV), More accurately represent simulation data in analysis results Support novel simulation applications New Mesh Layouts Support future low-memory systems Minimize data movement and transformation costs Greater Memory Efficiency Accelerator-based system support Pervasive parallelism for multi-core and many-core processors Parallel Algorithm Framework Direct zero-copy mapping of data from simulation to analysis codes Heterogeneous processing models In Situ Support EAVL enables advanced visualization and analysis for the next generation of scientific simulations, supercomputing systems, and end-user analysis tools.

Low/high dimensional data (9D mesh in GenASiS) H C H C H H AB Multiple simultaneous coordinate systems (lat/lon + Cartesian xyz) Multiple cell groups in one mesh (E.g. subsets, face sets, flux surfaces) Non-physical data (graph, sensor, performance data) Mixed topology meshes (atoms + bonds, sidesets) Novel and hybrid mesh types (quadtree grid from MADNESS) Data Models Supported by EAVL

Execution Environment Cell Operations Field Operations Basic Math Make Cells Control Environment Grid Topology Array Handle Invoke Device Adapter Allocate Transfer Schedule Sort … Worklet VTK-m Framework vtkm::contvtkm::exec

Device Adapter Contents  Tag ( struct DeviceAdapterFoo { }; )  Execution Array Manager  Schedule  Scan  Sort  Other Support algorithms  Stream compact, copy, parallel find, unique Control EnvironmentExecution Environment Transfer functor worklet functor Schedule Compute

VTK-m Arbitrary Composition VTK-m allows clients to access different memory layouts through the Array Handle and Dynamic Array Handle. – Allows for efficient in-situ integration – Allows for reduced data transfer Control EnvironmentExecution Environment Transfer Control EnvironmentExecution Environment Transfer

Functor Mapping Applied to Topologies functor()

Functor Mapping Applied to Topologies functor()

2 x Intel Xeon CPU E GHz + NVIDIA Tesla K40c

Data: 432^3

What We Have So Far  Features  Core Types  Statically Typed Arrays  Dynamically Typed Arrays  Device Interface (Serial, CUDA)  Basic Worklet and Dispatcher  Compiles with  gcc (4.8+), clang, msvc (2010+), icc, pgi, nvcc  User Guide work in progress  Ready for larger collaboration

Questions?m.vtk.orgm.vtk.org