WHY THE RULES ARE CHANGING FOR LARGE DATA VISUALIZATION AND ANALYSIS

Slides:

Advertisements

Similar presentations

Hank Childs Lawrence Berkeley National Laboratory /

Advertisements

EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis April 10, 2011.

EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis July 1, 2011.

Parallel Research at Illinois Parallel Everywhere

The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory February 26, B element Rayleigh-Taylor.

Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, B element Rayleigh-Taylor Instability.

The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory & UC Davis February 26, B.

Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs Lawrence Berkeley National Laboratory / University.

Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO IEEE Symposium of Massive Storage Systems, May 3-5, 2010 Data-Intensive Solutions.

Lawrence Livermore National Laboratory Visualization and Analysis Activities May 19, 2009 Hank Childs VisIt Architect Performance Measures x.x, x.x, and.

SIMULATION. Simulation Definition of Simulation Simulation Methodology Proposing a New Experiment Considerations When Using Computer Models Types of Simulations.

Challenges and Solutions for Visual Data Analysis on Current and Emerging HPC Platforms Wes Bethel & Hank Childs, Lawrence Berkeley Lab July 20, 2011.

Software design and development Marcus Hunt. Application and limits of procedural programming Procedural programming is a powerful language, typically.

Types of Operating System

© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March

Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.

Experiments with Pure Parallelism Hank Childs, Dave Pugmire, Sean Ahern, Brad Whitlock, Mark Howison, Prabhat, Gunther Weber, & Wes Bethel April 13, 2010.

Hank Childs, Lawrence Berkeley Lab & UC Davis Workshop on Exascale Data Management, Analysis, and Visualization Houston, TX 2/22/11 Visualization & The.

Nov. 14, 2012 Hank Childs, Lawrence Berkeley Jeremy Meredith, Oak Ridge Pat McCormick, Los Alamos Chris Sewell, Los Alamos Ken Moreland, Sandia Panel at.

CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees.

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Memory Systems How to make the most out of cheap storage.

A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.

EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis April 10, 2011.

Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.

© 2009 IBM Corporation Motivation for HPC Innovation in the Coming Decade Dave Turek VP Deep Computing, IBM.

CS591x -Cluster Computing and Parallel Programming

EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis April 10, 2011.

Lawrence Livermore National Laboratory BRdeS-1 Science & Technology Principal Directorate - Computation Directorate How to Stop Worrying and Learn to Love.

Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,

HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

Canadian Bioinformatics Workshops

HPC In The Cloud Case Study: Proteomics Workflow

These slides are based on the book:

CS203 – Advanced Computer Architecture

VisIt Project Overview

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Introduction to Parallel Processing

Memory COMPUTER ARCHITECTURE

September 2 Performance Read 3.1 through 3.4 for Tuesday

Introduction Super-computing Tuesday

IM.Grid: A Grid Computing Solution for image processing

Types of Operating System

VisIt Libsim Update DOE Computer Graphics Forum 2012 Brad Whitlock

Tamas Szalay, Volker Springel, Gerard Lemson

So far we have covered … Basic visualization algorithms

Swapping Segmented paging allows us to have non-contiguous allocations

Hierarchical Architecture

Chapter III Desktop Imaging Systems & Issues

Introduction to Reconfigurable Computing

Ray-Cast Rendering in VTK-m

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

CSCI1600: Embedded and Real Time Software

Objective of This Course

CSE8380 Parallel and Distributed Processing Presentation

CS246 Search Engine Scale.

TeraScale Supernova Initiative

CSE 373 Data Structures and Algorithms

CS246: Search-Engine Scale

Memory System Performance Chapter 3

October 9, 2003.

CSE 373: Data Structures and Algorithms

In Situ Fusion Simulation Particle Data Reduction Through Binning

CSCI1600: Embedded and Real Time Software

Facts About High-Performance Computing

Presentation transcript:

WHY THE RULES ARE CHANGING FOR LARGE DATA VISUALIZATION AND ANALYSIS P. Navratil, TACC D. Pugmire, ORNL WHY THE RULES ARE CHANGING FOR LARGE DATA VISUALIZATION AND ANALYSIS Hank Childs Lawrence Berkeley Lab & UC Davis 11/2/10

Supercomputing 101 Why simulation? Simulations are sometimes more cost effective than experiments. New model for science has three legs: theory, experiment, and simulation. What is the “petascale” / “exascale”? 1 FLOP = 1 FLoating point OPeration per second 1 GigaFLOP = 1 billion FLOPs, 1 TeraFLOP = 1000 GigaFLOPs 1 PetaFLOP = 1,000,000 GigaFLOPs, 1 ExaFLOP = billion billion FLOPs PetaFLOPs + petabytes on disk + petabytes of memory  petascale ExaFLOPs + exabytes on disk + petabytes of memory  exascale Why petascale / exascale? More compute cycles, more memory, etc, lead for faster and/or more accurate simulations.

Petascale computing is here. Existing petascale machines LANL RoadRunner ORNL Jaguar UTK Kraken Julich JUGene

Supercomputing is not slowing down. Two ~20 PetaFLOP machines will be online in 2011 Q: When does it stop? A: Exascale is being actively discussed right now http://www.exascale.org LLNL Sequoia NCSA BlueWaters

Exascale machine: requirements Timeline: 2018-2021 Total cost: <$200M Total power consumption: < 20MW Accelerators a certainty FLASH drives to stage data will change I/O patterns (very important for vis!)

How does the petascale/exascale affect visualization? Large scale Large # of variables Large # of time steps Large ensembles

Why is petascale/exascale visualization going to change the rules? Michael Strayer (U.S. DoE Office of Science): “petascale is not business as usual” Especially true for visualization and analysis! Large scale data creates two incredible challenges: scale and complexity Scale is not “business as usual” Supercomputing landscape is changing Solution: we will need “smart” techniques in production environments More resolution leads to more and more complexity Will the “business as usual” techniques still suffice? Outline

Production visualization tools use “pure parallelism” to process data. Parallelized visualization data flow network Parallel Simulation Code P0 P1 P3 P2 P8 P7 P6 P5 P4 P9 Read Process Render Processor 0 Read Process Render P0 P1 P2 P3 Processor 1 P4 P5 P6 P7 Read Process Render Pieces of data (on disk) P8 P9 Processor 2

Pure parallelism: pros and cons Easy to implement Cons: Requires large amount of primary memory Requires large I/O capabilities  requires big machines

Pure parallelism performance is based on # bytes to process and I/O rates. Tomorrow’s machine Amount of data to visualize is typically O(total mem) Vis is almost always >50% I/O and sometimes 98% I/O Today’s machine Two big factors: how much data you have to read how fast you can read it  Relative I/O (ratio of total memory and I/O) is key FLOPs Memory I/O

Anedoctal evidence: relative I/O is getting slower. Time to write memory to disk Machine name Main memory I/O rate ASC purple 49.0TB 140GB/s 5.8min BGL-init 32.0TB 24GB/s 22.2min BGL-cur 69.0TB 30GB/s 38.3min Sequoia ?? >>40min

Why is relative I/O getting slower? “I/O doesn’t pay the bills” And I/O is becoming a dominant cost in the overall supercomputer procurement. Simulation codes aren’t as exposed. And will be less exposed with proposed future architectures.

Recent runs of trillion cell data sets provide further evidence that I/O dominates Weak scaling study: ~62.5M cells/core #cores Problem Size Type Machine 8K 0.5TZ AIX Purple 16K 1TZ Sun Linux Ranger Linux Juno 32K 2TZ Cray XT5 JaguarPF 64K 4TZ BG/P Dawn 16K, 32K 1TZ, 2TZ Cray XT4 Franklin 2T cells, 32K procs on Jaguar Approx I/O time: 2-5 minutes Approx processing time: 10 seconds 2T cells, 32K procs on Franklin 13

Pure parallelism is not well suited for the petascale. Emerging problem: Pure parallelism emphasizes I/O and memory And: pure parallelism is the dominant processing paradigm for production visualization software. Solution? … there are “smart techniques” that de- emphasize memory and I/O. Data subsetting Multi-resolution Out of core In situ

Data subsetting eliminates pieces that don’t contribute to the final picture. Parallelized visualization data flow network Parallel Simulation Code P0 P1 P3 P2 P8 P7 P6 P5 P4 P9 Read Process Render Processor 0 Read Process Render P0 P1 P2 P3 Processor 1 P4 P5 P6 P7 Read Process Render Pieces of data (on disk) P8 P9 Processor 2

Data Subsetting: pros and cons Less data to process (less I/O, less memory) Cons: Extent of optimization is data dependent Only applicable to some algorithms

Multi-resolution techniques use coarse representations then refine. Parallelized visualization data flow network Parallel Simulation Code P0 P1 P3 P2 P8 P7 P6 P5 P4 P9 Read Process Render P2 Processor 0 Read Process Render P4 P0 P1 P2 P3 Processor 1 P4 P5 P6 P7 Read Process Render Pieces of data (on disk) P8 P9 Processor 2

Multi-resolution: pros and cons Avoid I/O & memory requirements Cons Is it meaningful to process simplified version of the data?

Out-of-core iterates pieces of data through the pipeline one at a time. Parallelized visualization data flow network Parallel Simulation Code P0 P1 P3 P2 P8 P7 P6 P5 P4 P9 Read Process Render Processor 0 Read Process Render P0 P1 P2 P3 Processor 1 P4 P5 P6 P7 Read Process Render Pieces of data (on disk) P8 P9 Processor 2

Out-of-core: pros and cons Lower requirement for primary memory Doesn’t require big machines Cons: Still paying large I/O costs (Slow!)

In situ processing does visualization as part of the simulation. Parallel Simulation Code P0 P1 P3 P2 P8 P7 P6 P5 P4 P9 Read Process Render Processor 0 Read Process Render P0 P1 P2 P3 Processor 1 P4 P5 P6 P7 Read Process Render Pieces of data (on disk) P8 P9 Processor 2

In situ processing does visualization as part of the simulation. Parallelized visualization data flow network Parallel Simulation Code GetAccess ToData Process Render P0 P1 P3 P2 P8 P7 P6 P5 P4 P9 Processor 0 GetAccess ToData Process Render Processor 1 GetAccess ToData Process Render Processor 2 … … … … GetAccess ToData Process Render Processor 9

In situ: pros and cons Pros: Cons: No I/O! Lots of compute power available Cons: Very memory constrained Many operations not possible Once the simulation has advanced, you cannot go back and analyze it User must know what to look a priori Expensive resource to hold hostage!

Summary of Techniques and Strategies Pure parallelism can be used for anything, but it takes a lot of resources Smart techniques can only be used situationally Strategy #1 (do nothing): Stick with pure parallelism and live with high machine costs & I/O wait times Other strategies? Assumption: We can’t afford massive dedicated clusters for visualization We can fall back on the super computer, but only rarely

Now we know the tools … what problem are we trying to solve? Three primary use cases: Exploration Confirmation Communication Examples: Scientific discovery Debugging Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies

Notional decision process Exploration Notional decision process Confirmation Communication Need all data at full resolution? Yes In Situ (data analysis & images / movies) No Multi-resolution (debugging & scientific discovery) Yes Do operations require all the data? Yes Do you know what you want do a priori? No Do algorithms require all data in memory? No Interactivity required? No Out-of-core (Data analysis & images / movies) Pure parallelism (Anything & esp. comparison) Yes No Data subsetting (comparison & data analysis)

Alternate strategy: smart techniques Multi-res Data subsetting In situ Do remaining ~5% on SC Out-of-core All visualization and analysis work

Difficult conversations in the future… Multi-resolution: Do you understand what a multi-resolution hierarchy should look like for your data? Who do you trust to generate it? Are you comfortable with your I/O routines generating these hierarchies while they write? How much overhead are you willing to tolerate on your dumps? 33+%? Willing to accept that your visualizations are not the “real” data?

Difficult conversations in the future… In situ: How much memory are you willing to give up for visualization? Will you be angry if the vis algorithms crash? Do you know what you want to generate a priori? Can you re-run simulations if necessary?

How Supercomputing Trends Will Changes the Rules For Vis & Analysis Future machines will not be well suited for pure parallelism, because of its high I/O and memory costs. We won’t be able to use pure parallelism alone any more We will need algorithms to work in multiple processing paradigms