Hank’s Activities Longhorn/XD AHM Austin, TX December 20, 2010 Volume rendering of 4608^3 combustion data set Image credit: Mark Howison Volume rendering.

Slides:

Advertisements

Similar presentations

Refining High Performance FORTRAN Code from Programming Model Dependencies Ferosh Jacob University of Alabama Department of Computer Science

Advertisements

Hank Childs Lawrence Berkeley National Laboratory /

1 Slides presented by Hank Childs at the VACET/SDM workshop at the SDM Center All-Hands Meeting. November 26, 2007 Snoqualmie, Wa Work performed under.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Parallel Research at Illinois Parallel Everywhere

Hank Childs, University of Oregon November 15 th, 2013 Volume Rendering, Part 2.

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs Lawrence Berkeley National Laboratory / University.

January 5, Feature Tracking in VR for Cumulus Cloud Life-Cycle Studies E. J. Griffith, F. H. Post, M. Koutek, T. Heus and H. J. J. Jonker 11 th.

Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:

Deploying a Petascale-Capable Visualization and Analysis Tool April 15, 2010.

Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.

Rockville, MD 28 April 2009 Rockville, MD 28 April 2009 Answers to Review Panel Questions.

E. WES BETHEL (LBNL), CHRIS JOHNSON (UTAH), KEN JOY (UC DAVIS), SEAN AHERN (ORNL), VALERIO PASCUCCI (LLNL), JONATHAN COHEN (LLNL), MARK DUCHAINEAU.

CAD/CAM Design Process and the role of CAD. Design Process Engineering and manufacturing together form largest single economic activity of western civilization.

Large Data Visualization on Distributed Memory Multi-GPU Clusters Thomas Fogal, Hank Childs, Siddharth Shankar, Jens Krüger, R. Daniel Bergeron, Philip.

Tom Fogal Parallel Rendering && VisIt Infrastructure.

Challenges and Solutions for Visual Data Analysis on Current and Emerging HPC Platforms Wes Bethel & Hank Childs, Lawrence Berkeley Lab July 20, 2011.

Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)

Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.

Principles of Programming Chapter 1: Introduction  In this chapter you will learn about:  Overview of Computer Component  Overview of Programming 

Introduction to High-Level Language Programming

Volume Graphics (graduate course) Bong-Soo Sohn School of Computer Science and Engineering Chung-Ang University.

Project Overview:. Longhorn Project Overview Project Program: –NSF XD Vis Purpose: –Provide remote interactive visualization and data analysis services.

Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.

Studying Visual Attention with the Visual Search Paradigm Marc Pomplun Department of Computer Science University of Massachusetts at Boston

Experiments with Pure Parallelism Hank Childs, Dave Pugmire, Sean Ahern, Brad Whitlock, Mark Howison, Prabhat, Gunther Weber, & Wes Bethel April 13, 2010.

Event Metadata Records as a Testbed for Scalable Data Mining David Malon, Peter van Gemmeren (Argonne National Laboratory) At a data rate of 200 hertz,

VisIt: a visualization tool for large turbulence simulations  Outline Success stories with turbulent simulations Overview of VisIt project 1 Hank Childs.

Simulation Technology & Applied Research, Inc N. Port Washington Rd., Suite 201, Mequon, WI P:

Understand Application Lifecycle Management

CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.

Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop

Scientific Visualization Module 6 Volumetric Algorithms (adapted by S.V. Moore – slides deleted, modified, and added) prof. dr. Alexandru (Alex) Telea.

Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.

Nov. 14, 2012 Hank Childs, Lawrence Berkeley Jeremy Meredith, Oak Ridge Pat McCormick, Los Alamos Chris Sewell, Los Alamos Ken Moreland, Sandia Panel at.

NERSC NUG Meeting 5/29/03 Seaborg Code Scalability Project Richard Gerber NERSC User Services.

MPI-hybrid Parallelism for Volume Rendering on Large, Multi-core Systems Mark Howison, E. Wes Bethel, and Hank Childs, LBNL EGPGV 2010 Norkopping, Sweden.

Accelerating image recognition on mobile devices using GPGPU

Parallel I/O Performance: From Events to Ensembles Andrew Uselton National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory.

A Framework for Visualizing Science at the Petascale and Beyond Kelly Gaither Research Scientist Associate Director, Data and Information Analysis Texas.

Presented by: Ashgan Fararooy Referenced Papers and Related Work on:

A Parallel Implementation of MSER detection GPGPU Final Project Lin Cao.

Enabling Reuse-Based Software Development of Large-Scale Systems IEEE Transactions on Software Engineering, Volume 31, Issue 6, June 2005 Richard W. Selby,

A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

MESQUITE: Mesh Optimization Toolkit Brian Miller, LLNL

CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.

Hank Childs, University of Oregon Volume Rendering Primer / Intro to VisIt.

Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox

Hank Childs, University of Oregon Large Data Visualization.

Experiences with Achieving Portability across Heterogeneous Architectures Lukasz G. Szafaryn +, Todd Gamblin ++, Bronis R. de Supinski ++ and Kevin Skadron.

Hank Childs, University of Oregon Volume Rendering, pt 1.

Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.

CSci6702 Parallel Computing Andrew Rau-Chaplin

1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.

1 27B element Rayleigh-Taylor Instability (MIRANDA, BG/L) VisIt: a visualization tool for large turbulence simulations Large data requires special techniques.

Canny Edge Detection Using an NVIDIA GPU and CUDA Alex Wade CAP6938 Final Project.

Radiance 3.4 and Open Source Development Greg Ward.

Visualization Update June 18, 2009 Kelly Gaither, GIG Area Director DV.

Volume Graphics (graduate course) Bong-Soo Sohn School of Computer Science and Engineering Chung-Ang University.

Tuning Threaded Code with Intel® Parallel Amplifier.

Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.

Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.

Resource Characterization Rich Wolski, Dan Nurmi, and John Brevik Computer Science Department University of California, Santa Barbara VGrADS Site Visit.

1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,

VisIt Project Overview

Presented by Munezero Immaculee Joselyne PhD in Software Engineering

Programming Models for SimMillennium

Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz

M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University

Presentation transcript:

Hank’s Activities Longhorn/XD AHM Austin, TX December 20, 2010 Volume rendering of 4608^3 combustion data set Image credit: Mark Howison Volume rendering of flame data set using VisIt + IceT on Longhorn. Image credit: Tom Fogal

My perception of my role in Longhorn/XD  Help users succeed via:  Direct support  Ensuring necessary algorithms/functionality are in place  Research most effective way to utilize Longhorn  Also help test machine through aggressive usage  Collaborate with / facilitate for other project members  Provide visibility for center externally (outreach, etc)

Outline  Researching how to best use Longhorn  HW-accelerated volume rendering on Longhorn  SW-ray casting on Longhorn  Collaborations  Manta/VisIt  VDF/VisIt  User support  Analysis of 4K^3 turbulent data Connected components algorithms  Other user support  Outreach

HW-accelerated volume rendering on longhorn  “Large Data Visualization on Distributed Memory Multi-GPU Clusters”, HPG2010  Authors: Fogal, Childs, Shankar, Krueger, Bergeron, and Hatcher  Ran VisIt + IceT on Longhorn, varying data size and number of GPUs.  Stage data on CPU, transfer to GPU (high transfer time, but can look at bigger data sets) Volume rendering of flame data set using VisIt + IceT on Longhorn. Image credit: Tom Fogal

HW-accelerated volume rendering on longhorn  Observation about CPU volume rendering: Number of coresLargeSmall Ray evaluationFastSlow CompositingSlowFast Paper purpose: study the performance characteristics of GPU volume rendering at high concurrency on big data.  Idea: GPU volume rendering has the computational horsepower to do ray evaluation quickly, but will have many fewer MPI participants.

Big data Lots of GPUs Fast-ish on small data

Software ray-casting  Previous work (not XD-related):  “MPI-Hybrid Parallelism for Volume Rendering on Large Multi-Core Systems”, EGPGV 2010  Authors: Howison, Bethel, and Childs  Strong scaling study up 216,000 cores on ORNL Jaguar machine looking at 4608^3 data.  Study outcome: hybrid parallelism benefits this algorithm, primarily during the compositing phase, since there are less participants in MPI communication.  One of two EGPGV best paper winners, invited for follow on article to TVCG. Volume rendering of combustion data set Image credit: Mark Howison

Software ray-casting  TVCG article (unpublished research):  Add weak scaling study (up to 22K^3) on Jaguar GPU scaling study on Longhorn  GPU scaling study:  Went up to 448 GPUs  Purpose: similar to Fogal work, but with a different spin … show that hybrid parallelism is beneficial. Instead of pthreads or OpenMP on the CPU, we are now using CUDA on the GPU.

Scaling results on GPU 2308^3 data 2308^ ^3 data 2308^ ^3 data

Software ray-casting on Longhorn Two caveats: (1)We didn’t optimize for CUDA. So we could have had favorable numbers to an even higher concurrency level. (2)But 46K processors has more memory and can look at way bigger data sets. Two caveats: (1)We didn’t optimize for CUDA. So we could have had favorable numbers to an even higher concurrency level. (2)But 46K processors has more memory and can look at way bigger data sets. Takeaway: for this algorithm and this data size, longhorn is as powerful as 46K processors of jaguar.

Manta/VisIt  Carson Brownlee delivers integration of VisIt and Manta via vtkManta objects.  Hank does some small work:  Updates work from VisIt 2.0 to VisIt 2.2 & makes a branch for Hank and Carson to put fixes on.  Testing  Carson and Hank create a list of issues and are in the process of tracking them down. Rendering of isosurface by VisIt using Manta

Visualizing and Analyzing Large-Scale Turbulent Flow  Detect, track, classify, and visualize features in large-scale turbulent flow.  Analysis effort by Kelly Gaither (TACC), Hank Childs (LBNL), & Cyrus Harrison (LLNL).  Stresses two algorithms that are difficult in a distributed memory parallel setting: 1. Can we identify connected components? 2. Can we characterize their shape? VisIt calculated connected components on a 4K^3 turbulence data in parallel using TACC's Longhorn machine. 2 million components were initially identified and then the map expression was used to select only the components that had total volume greater than 15. Data courtesy of P.K. Yeung & and Diego Donzis

Identifying connected components in parallel is difficult.  Hard to do efficiently  Tremendous bookkeeping problem.  4 stage algorithm that finds local connectivity and then merges globally. Participating in 2011 EGPGV submission describing this algorithm and its performance. Authors: Harrison, Childs, Gaither

We used shape characterization to assist our feature tracking. 15  Shape characterization metric: chord length distribution  Difficult to perform efficiently in a distributed memory setting P0 P1 P3 P2 Line Scan Filter 1) Choose Lines 2) Calculate Intersections 3) Segment redistribution 4) Analyze lines 5) Collect results Line Scan Analysis Sink It is our hope that chord length distributions, a characteristic function, can assist in tracking component behavior over time.

My role in this effort  Easily summarized: “use VisIt to get results to Kelly”  Several iterations:  Started with just statistics of components  Looked at how variation in isovalue affected statistics  Added in chord length distributions as a characteristic function  Took still images of each component for visual inspection  (recently) extracted each component as its own surface for combined inspection.

VDF/VisIt  John Clyne and Dan Lagreca add VDF reader to VisIt.  Hank performs some testing and debugging.  Still lots to do:  Formal commit to VisIt repo. Also add in new VisIt multi-res hooks.  Study how well large features are preserved across refinement level.  Use coarsest versions in conjunction with analysis code from Janine Bennett.

Other user support  Small amount of effort helping Saju Varghese and Kentaro Nagmine of UNLV  Fixed VisIt bug with ray-casting + point meshes  Helped them format their data into BOV format

Outreach & Service  VisIt tutorials:  SC10 (beginning and advanced), Nov 2010, NOLA  Users at US ARL, Sep 2010, Abderdeen, MD  SciDAC 2010, July 2010, Chattanooga, TN  Speaker at NSF Extreme Scale I/O and Data Analysis Workshop, March 2010, Austin, TX  Participant in NSF Workshop on SW Development Environments, Sep 2010, Washington DC  Given ~10 additional talks at various venues this year

Proposed Future Plans  Continue collaboration with Kelly on analyzing turbulent flow  Formally integrate VDF  Multi-res study with John & Kelly  Would like to do 1T cell runs on Longhorn  Continued user support  Esp. CIG  Connected EGPGV  VisIt + GPU Two trillion cell data set, rendered in VisIt by David Pugmire on ORNL Jaguar machine

Summary  Researching how to best use Longhorn  HW-accelerated volume rendering on Longhorn  SW-ray casting on Longhorn  Collaborating with other Longhorn/XD members  Manta/VisIt  VDF/VisIt  Doing user support  Helping Kelly analyze 4K^3 turbulent data Working to make sure connected components algorithms is up to snuff  Some user support and more to come…  Performing outreach activities