Hank Childs, Lawrence Berkeley Lab & UC Davis Workshop on Exascale Data Management, Analysis, and Visualization Houston, TX 2/22/11 Visualization & The.

Slides:



Advertisements
Similar presentations
Design, prototyping and construction
Advertisements

Hank Childs Lawrence Berkeley National Laboratory /
EUFORIA FP7-INFRASTRUCTURES , Grant JRA4 Overview and plans M. Haefele, E. Sonnendrücker Euforia kick-off meeting 22 January 2008 Gothenburg.
EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis April 10, 2011.
EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis July 1, 2011.
Last Lecture The Future of Parallel Programming and Getting to Exascale 1.
Parallel Research at Illinois Parallel Everywhere
The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory February 26, B element Rayleigh-Taylor.
Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, B element Rayleigh-Taylor Instability.
The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory & UC Davis February 26, B.
Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs Lawrence Berkeley National Laboratory / University.
SYNAR Systems Networking and Architecture Group CMPT 886: Special Topics in Operating Systems and Computer Architecture Dr. Alexandra Fedorova School of.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
Lawrence Livermore National Laboratory Visualization and Analysis Activities May 19, 2009 Hank Childs VisIt Architect Performance Measures x.x, x.x, and.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Challenges and Solutions for Visual Data Analysis on Current and Emerging HPC Platforms Wes Bethel & Hank Childs, Lawrence Berkeley Lab July 20, 2011.
Introduction to Systems Analysis and Design Trisha Cummings.
Elements of a Computer System Dr Kathryn Merrick Thursday 4 th June, 2009.
In Situ Sampling of a Large-Scale Particle Simulation Jon Woodring Los Alamos National Laboratory DOE CGF
INTRODUCTION TO COMPUTER PROGRAMMING itc-314 LECTURE 01.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
Copyright 2013, Net.Orange, Inc. All rights reserved.Confidential and proprietary. Do not distribute without permission. Net.Orange App Development Net.Orange.
Computing hardware CPU.
Experiments with Pure Parallelism Hank Childs, Dave Pugmire, Sean Ahern, Brad Whitlock, Mark Howison, Prabhat, Gunther Weber, & Wes Bethel April 13, 2010.
1 TOPIC 1 INTRODUCTION TO COMPUTER SCIENCE AND PROGRAMMING Topic 1 Introduction to Computer Science and Programming Notes adapted from Introduction to.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
R. Ryne, NUG mtg: Page 1 High Energy Physics Greenbook Presentation Robert D. Ryne Lawrence Berkeley National Laboratory NERSC User Group Meeting.
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
Nov. 14, 2012 Hank Childs, Lawrence Berkeley Jeremy Meredith, Oak Ridge Pat McCormick, Los Alamos Chris Sewell, Los Alamos Ken Moreland, Sandia Panel at.
Presented by On the Path to Petascale: Top Challenges to Scientific Discovery Scott A. Klasky NCCS Scientific Computing End-to-End Task Lead.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
IT253: Computer Organization
Efficient Visualization and Analysis of Very Large Climate Data Hank Childs, Lawrence Berkeley National Laboratory December 8, 2011 Lawrence Livermore.
June 29 San FranciscoSciDAC 2005 Terascale Supernova Initiative Discovering New Dynamics of Core-Collapse Supernova Shock Waves John M. Blondin NC State.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
Co-Design 2013 Summary Exascale needs new architectures due to slowing of Dennard scaling (since 2004), multi/many core limits New programming models,
Experts in numerical algorithms and High Performance Computing services Challenges of the exponential increase in data Andrew Jones March 2010 SOS14.
EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis April 10, 2011.
SAGA: Array Storage as a DB with Support for Structural Aggregations SSDBM 2014 June 30 th, Aalborg, Denmark 1 Yi Wang, Arnab Nandi, Gagan Agrawal The.
Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis April 10, 2011.
Robert Crawford, MBA West Middle School.  Explain how input devices are suited to certain kinds of data.  Distinguish between RAM and ROM.  Identify.
CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale.
Breakout Group: Debugging David E. Skinner and Wolfgang E. Nagel IESP Workshop 3, October, Tsukuba, Japan.
Hank Childs, University of Oregon Large Data Visualization.
Lawrence Livermore National Laboratory BRdeS-1 Science & Technology Principal Directorate - Computation Directorate How to Stop Worrying and Learn to Love.
Programmability Hiroshi Nakashima Thomas Sterling.
Presented by Visualization at the Leadership Computing Facility Sean Ahern Scientific Computing Center for Computational Sciences.
National Center for Supercomputing Applications University of Illinois at Urbana–Champaign Visualization Support for XSEDE and Blue Waters DOE Graphics.
 A computer is an electronic device that receives data (input), processes data, stores data, and produces a result (output).  It performs only three.
HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09.
Next Generation of Apache Hadoop MapReduce Owen
Challenges and Solutions Will Schroeder, co-Founder, President VAC Big Data Consortium Meeting July 31, 2012.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
LECTURE 01: Introduction to Algorithms and Basic Linux Computing
Electron Ion Collider New aspects of EIC experiment instrumentation and computing, as well as their possible impact on and context in society (B) COMPUTING.
Software Architecture in Practice
for the Offline and Computing groups
So far we have covered … Basic visualization algorithms
Ray-Cast Rendering in VTK-m
WHY THE RULES ARE CHANGING FOR LARGE DATA VISUALIZATION AND ANALYSIS
Kiran Subramanyam Password Cracking 1.
TeraScale Supernova Initiative
In Situ Fusion Simulation Particle Data Reduction Through Binning
Presentation transcript:

Hank Childs, Lawrence Berkeley Lab & UC Davis Workshop on Exascale Data Management, Analysis, and Visualization Houston, TX 2/22/11 Visualization & The Exascale

Outline  Standard operating procedure at the terascale/small petascale  Visualization at the exascale

Disclaimer  To provide context for non-vis people, I am presenting what I I believe is a representative workflow for visualization at the Office of Science Labs right now.  I fully understand that many folks -- including many folks in this room -- are researching innovative techniques that are different from this workflow. Further, these techniques are often even being deployed in a production environment.  Regardless, I think this work flow represents X% of visualization work.

Basic workflow for production visualization Simulation cycle Satisfied criteria to do restart dump? Satisfied criteria to do vis dump? Disk Visualization program … acting as a post- processor

 Why do we need visualization?  Insight into a simulation … this is a critical activity  When do people run visualization programs?  Whenever they want! (*) … will this be true at exascale?  Where do visualization programs run?  Either on the supercomputer or on a dedicated vis cluster.  Who’s running visualization programs?  Simulation code consumers  Simulation code developers … will this change?  Visualization experts Who, What, Where, When, Why, and How  Why do we need visualization?  Insight into a simulation  When do people run visualization programs?  Whenever they want! (*)  Where do visualization programs run?  Either on the supercomputer or on a dedicated vis cluster.  Who’s running visualization programs?  Simulation code consumers  Simulation code developers  Visualization experts

Important “What”: What do they want to do?  Exploration  Arrive at scientific insight  Comparison  Debugging!  Confirmation  Make sure the calculation is running smoothly  Communication  Movies / images for PowerPoint  1D curves for scientific papers These activities are different and it is possible, if not likely, that different processing paradigms will be used to accomplish them at the exascale.

Important How: How do visualization programs work on big data? P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallel visualization program P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code I refer to this technique as “pure parallelism”.

Pure parallelism  Pure parallelism is data-level parallelism, but…  Multi-resolution can be data-level parallelism  Out-of-core can be data-level parallelism  Pure parallelism: “brute force” … processing full resolution data using data-level parallelism  Pros:  Easy to implement  Cons:  Requires large I/O capabilities  Requires large amount of primary memory   requires big machines

Pure parallelism and today’s tools  VisIt, ParaView, & EnSight primarily employ a pure parallelism + client-server strategy.  All tools working on advanced techniques as well  Of course, there’s lots more technology out there besides those three tools:  Parallel VTK  VAPOR  VISUS  and more…

Outline  Standard operating procedure at the terascale/small petascale  Visualization at the exascale  How will input data change?  Will life be different for our users?  Will there be vis clusters? What will they look like?  How will I/O trends impact visualization?  What are possible solutions for data movement problems?

How does increased computing power affect the data to be visualized? Large # of time steps Large ensembles High-res meshes Large # of variables / more physics Your mileage may vary; some simulations produce a lot of data and some don’t. Thanks!: Sean Ahern & Ken Joy

Vis clusters  Today’s vis clusters are designed to be proportional to the “big iron”.  Typical rule of thumb is that the little iron should have 10% of the memory of the big iron.  (Most vis clusters have fallen short of the 10% rule for the last few years.)  Exascale machine = $200M, ~1/3 rd of which is for memory. Memory for proposed vis cluster = $6.7M.  I personally view it as unlikely that we will have 10%-sized vis clusters.  Additional issue with getting data off machine.

I/O and visualization Pure parallelism is almost always >50% I/O and sometimes 98% I/O Amount of data to visualize is typically O(total mem) FLOPs MemoryI/O Petascale machine “Exascale machine” Two big factors: ① how much data you have to read ② how fast you can read it  Relative I/O (ratio of total memory and I/O) is key

Trends in I/O MachineYearTime to write memory ASCI Red sec ASCI Blue Pacific sec ASCI White sec ASCI Red Storm sec ASCI Purple sec Jaguar XT sec Roadrunner sec Jaguar XT sec Exascale~2020???? Thanks!: Dave Pugmire, ORNL

Why is relative I/O getting slower?  I/O is quickly becoming a dominant cost in the overall supercomputer procurement.  Simulation codes aren’t as exposed.  And will be even less exposed with proposed future architectures. We need to de-emphasize I/O in our visualization and analysis techniques.

Assertion: we are going to need a lot of solutions. All visualization and analysis work Multi-res In situ Out-of-core Data subsetting Do remaining ~5% on SC w/ pure parallelism

Where does visualization fit into the ecosystem?  Data movement will be expensive on the exascale machine.  Beneficial to process data in situ.  Current thinking from exascale gurus: you’ll have 1000 threads on a node, some will do physics, some will provide services.  Visualization could be a service in this system  … or visualization could be done on a separate node located nearby dedicated to visualization/analysis/IO/etc.  … or a combination of the two.

Where does visualization fit into the ecosystem?  Visualization could be a service in this system…  How do visualization routines interact with the simulation?  How do we write routines that can be re-used across many simulations?  Are we prepared to run on nodes with 1000-way parallelism? Likely with no cache-coherency?  Are our algorithms ready to run on O(1M) nodes?

Where does visualization fit into the ecosystem?  … or visualization could be done on a separate node located nearby dedicated to visualization/analysis/IO/etc.  OK, where exactly?  Likely still to be issues with running on the accelerator.  I personally think it is very unlikely we will need to run visualization algorithms at billion way concurrency.

Possible in situ visualization scenarios Visualization could be a service in this system… … or visualization could be done on a separate node located nearby dedicated to visualization/analysis/IO/etc. Physics #1 Physics #2 Physics #n … Services Viz Physics #1 Physics #2 Physics #n … Services Viz Physics #1 Physics #2 Physics #n … Services Viz Physics #1 Physics #2 Physics #n … Services Viz Physics #1 Physics #2 Physics #n … Services Viz … Physics #1 Physics #2 Physics #n … Services Physics #1 Physics #2 Physics #n … Services Physics #1 Physics #2 Physics #n … Services Physics #1 Physics #2 Physics #n … Services One of many nodes dedicated to vis/analysis/IO Accelerator, similar to HW on rest of exascale machine (e.g. GPU) … or maybe this is a high memory quad-core running Linux! Specialized vis & analysis resources … or maybe the data is reduced and sent to dedicated resources off machine! … And likely many more configurations Viz We will possibly need to run on: -The accelerator in a lightweight way -The accelerator in a heavyweight way -A vis cluster (?) & data movement is a big issue… We will possibly need to run on: -The accelerator in a lightweight way -The accelerator in a heavyweight way -A vis cluster (?) & data movement is a big issue…

Reducing cells to results (e.g. pixels or numbers) can be hard.  Must to reduce data every step of the way.  Example: contour + normals + render Important that you have less data in pixels than you had in cells. (*)  Easier example: synthetic diagnostics Physics #1 Physics #2 Physics #n … Services Physics #1 Physics #2 Physics #n … Services Physics #1 Physics #2 Physics #n … Services Physics #1 Physics #2 Physics #n … Services One of many nodes dedicated to vis/analysis/IO Specialized vis & analysis resources

In situ  In situ will almost certainly play a large role in the portfolio … but it is not a panacea.  Pros:  No I/O & opportunity to minimize data moved  Cons:  Very memory constrained  Some operations not possible Once the simulation has advanced, you cannot go back and analyze it  User must know what to look a priori Expensive resource to hold hostage!

Will life be different for our users?

Multi-resolution techniques  Pros  Drastically reduce I/O & memory requirements  Confidence in pictures; multi-res hierarchy addresses “many cells to one pixel issue”  Cons  Not always meaningful to process simplified version of the data.  How do we generate hierarchical representations on the exascale machine? What costs do they incur (data movement costs, storage costs)?

Do we have our use cases covered?  Three primary use cases:  Exploration  Confirmation  Communication Examples: Scientific discovery Debugging Examples: Scientific discovery Debugging Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Examples: Data analysis Images / movies Multi-res In situ

Under-represented topics in this talk.  We will have quintillions of data points … how do we meaningfully represent that with millions of pixels?  We are unusual: we are data consumers, not data producers, and the exascale machine is being designed for data producers  Data is going to be different at the exascale: ensembles, multi-physics, etc.  The outputs of visualization software will be different.

Under-represented topics in this talk, pt 2  We have an extreme SWE challenge: write once, use many  Our community has done little work in hybrid parallelism and it is critical to our future.  Accelerators on exascale machine are likely not to have cache coherency  Do all of our algorithms work in a GPU-type setting?  We have a huge investment in CPU-SW. What now?

Summary  The exascale machine will almost certainly lead to a paradigm shift in the way visualization programs process data.  Where to process data and what data to move will be a central issue.  Our algorithms will need to run:  in a light weight manner (e.g. on a few of the thousands of cores on the accelerator)  and/or using all the threads on an accelerator  and/or on machines that may not have accelerators at all.

Backup slides

Summary: exascale impacts  I/O will be slow  FLASH drives will hurt vis community  Massive number of threads on a node; unclear if there will be cache coherency  May very well look like GPU programming  Data movement huge issue  Likely outcome: we get a few threads of the GPU to do our job.  Must fit into ecosystem  Trade off on where to do vis.