Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by On the Path to Petascale: Top Challenges to Scientific Discovery Scott A. Klasky NCCS Scientific Computing End-to-End Task Lead.

Similar presentations


Presentation on theme: "Presented by On the Path to Petascale: Top Challenges to Scientific Discovery Scott A. Klasky NCCS Scientific Computing End-to-End Task Lead."— Presentation transcript:

1 Presented by On the Path to Petascale: Top Challenges to Scientific Discovery Scott A. Klasky NCCS Scientific Computing End-to-End Task Lead

2 2 1. Code Performance  From 2004 - 2008, computing power for codes like GTC will go up 3 orders of magnitude!  2 Paths for Pscale computing for most simulations.  More physics. Larger problems.  Code Coupling.  My personal definition of leadership class computing.  “Simulation runs on >50% of cores, running for >10 hours.”  One ‘small’ simulation will cost $38,000 on a Pflop computer.  Science scales with processors.  XGC and GTC fusion simulations will run on 80% of cores for 80 hours ($400,000/simulation).

3 3 Data Generated.  MTF will be ~2 days.  Restarts contain critical information to replay the simulation at different times.  Typical Restarts = 1/10 of memory. Dumps every 1 hour. (Big 3 apps support this claim).  Analysis files dump every physical timestep. Typically every 5 minutes of simulation.  Analysis files vary. We estimate for ITER size simulations data output will be roughly 1GB/5 minutes.  DEMAND I/O < 5% of calculation.  Total simulation will potentially produce =1280TB + 960GB.  Need > (16*1024+12)/(3600 *.05) = 91GB/sec.  Asynchronous I/O is needed!!! (Big 3 apps (combustion, fusion, astro allow buffers).  Reduces I/O rate to (16*1024+12)/3600 = 4.5GB/sec. (with lower overhead).  Get the data off the HPC, and over to another system!  Produce HDF5 files on another system (too expensive for HPC system).

4 4 Workflow Automation is desperately needed. (with high-speed data-in-transit techniques).  Need to integrate Autonomics into workflows….  Need to make it easy for the scientists.  Need to make it fault tolerant/robust.

5 5 A few days in the life of Sim Scientist. Day 1 -morning.  8:00AM Get Coffee, Check to see if job is running.  Ssh into jaguar.ccs.ornl.gov (job 1)  Ssh into seaborg.nersc.gov (job 2) (this is running yea!)  Run gnuplot to see if run is going ok on seaborg. This looks ok.  9:00AM Look at data from old run for post processing.  Legacy code (IDL, Matlab) to analyze most data.  Visualize some of the data to see if there is anything interesting.  Is my job running on jaguar? I submitted this 4K processor job 2 days ago!  10:00AM scp some files from seaborg to my local cluster.  Luckily I only have 10 files (which are only 1 GB/file).  10:30AM first file appears on my local machine for analysis.  Visualize data with Matlab.. Seems to be ok.  11:30AM see that the second file had trouble coming over.  Scp the files over again… Dohhh

6 6 Day 1 evening.  1:00PM Look at the output from the second file.  Opps, I had a mistake in my input parameters.  Ssh into seaborg, kill job. Emacs the input, submit job.  Ssh into jaguar, see status. Cool, it’s running.  bbcp 2 files over to my local machine. (8 GB/file).  Gnuplot data.. This looks ok too, but still need to see more information.  1:30PM Files are on my cluster.  Run matlab on hdf5 output files. Looks good.  Write down some information in my notebook about the run.  Visualize some of the data. All looks good.  Go to meetings.  4:00PM Return from meetings.  Ssh into jaguar. Run gnuplot. Still looks good.  Ssh into seaborg. My job still isn’t running……  8:00PM Are my jobs running?  ssh into jaguar. Run gnuplot. Still looks good.  Ssh into seaborg. Cool. My job is running. Run gnuplot. Looks good this time!

7 7 And Later  4:00AM yawn… is my job on jaguar done?  Ssh into jaguar. Cool. Job is finished. Start bbcp files over to my work machine. (2 TB of data).  8:00AM @@!#!@. Bbcp is having troubles. Resubmit some of my bbcp from jaguar to my local cluster.  8:00AM (next day). Opps still need to get the rest of my 200GB of data over to my machine.  3:00PM My data is finally here!  Run Matlab. Run Ensight. Oppps…. Something’s wrong!!!!!!!!! Where did that instability come from?  6:00PM finish screaming!

8 8 Need metadata integrated into the high-performance I/O, and integrated for simulation monitoring.  Typical Monitoring  Look at volume averaged quantities.  At 4 key times this quantity looks good.  Code had 1 error which didn’t appear in the typical ascii output to generate this graph.  Typically users run gnuplot/grace to monitor output.  More advanced monitoring  5 seconds move 600MB, and process the data. Really need to use FFT for 3D data, and then process data + particles 50 seconds (10 time steps) move & process data. 8 GB for 1/100 of the 30 billion particles. Demand low overhead <5%!

9 9 Parallel Data Analysis.  Most applications use scalar data analysis.  IDL  Matlab.  Ncar graphics.  Need techniques such as PCA  Need help, since data analysis is written quickly, and changed often… No harden versions…. Maybe….

10 10 Statistical Decomposition of Time Varying Simulation Data  Transform to reduce non-linearity in distribution (often density- based)  PCA computed via SVD (or ICA, FA, etc.)  Construction of component movies  Interpretation of spatial, time, and movie components  Pairs of equal singular values indicate periodic motion G Ostrouchov: ostrouchovg@ornl.gov ETG GTC Simulation Data (Z. Lin, UCI and S. Klasky, ORNL) Decomposition shows transitions between wave components in time

11 11 New Visualization Challenges.  Finding the needle in the haystack.  Feature identification/tracking!  Analysis of 5D+time phase-space (with 1x10 12 ) particles!  Real-time visualization of codes during execution.  Debugging Visualization.

12 12 Where is my data?  ORNL, NERSC, HPSS (NERSC,ORNL), local cluster, laptop?  We need to keep track of multiple copies?  We need to query the data. Query based visualization methods.  Don’t want to distinguish between different disks/tapes.


Download ppt "Presented by On the Path to Petascale: Top Challenges to Scientific Discovery Scott A. Klasky NCCS Scientific Computing End-to-End Task Lead."

Similar presentations


Ads by Google