Managed by UT-Battelle for the Department of Energy China-US Software Workshop March 6, 2012 Scott Klasky Data Science Group Leader Computer Science and.

Slides:



Advertisements
Similar presentations
White Paper Exercise Objective: Develop specific joint projects in areas of mutual interest and based on existing relationships, which could result in.
Advertisements

S ITE R EPORT : L AWRENCE B ERKELEY N ATIONAL L ABORATORY J OERG M EYER
Earth System Curator Spanning the Gap Between Models and Datasets.
1 Ideas About the Future of HPC in Europe “The views expressed in this presentation are those of the author and do not necessarily reflect the views of.
U.S. Department of Energy’s Office of Science Basic Energy Sciences Advisory Committee Dr. Daniel A. Hitchcock October 21, 2003
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
U.S. Department of Energy Office of Science Advanced Scientific Computing Research Program How to write a good HBCU Proposal George Seweryniak DOE Program.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
May 25, 2010 Mary Hall May 25, 2010 Advancing the Compiler Community’s Research Agenda with Archiving and Repeatability * This work has been partially.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Grand Challenges Robert Moorhead Mississippi State University Mississippi State, MS 39762
1 Ideas About the Future of HPC in Europe “The views expressed in this presentation are those of the author and do not necessarily reflect the views of.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Welcome to HTCondor Week #14 (year #29 for our project)
Commodity Grid (CoG) Kits Keith Jackson, Lawrence Berkeley National Laboratory Gregor von Laszewski, Argonne National Laboratory.
White Paper Exercise Objective: Develop specific joint projects in areas of mutual interest and based on existing relationships, which could result in.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
Slide 1 Auburn University Computer Science and Software Engineering Scientific Computing in Computer Science and Software Engineering Kai H. Chang Professor.
Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD.
1 High Performance Buildings Research & Implementation Center (HiPerBRIC) National Lab-Industry-University Partnership February 5, 2008.
CoG Kit Overview Gregor von Laszewski Keith Jackson.
Future role of DMR in Cyber Infrastructure D. Ceperley NCSA, University of Illinois Urbana-Champaign N.B. All views expressed are my own.
Physics Steven Gottlieb, NCSA/Indiana University Lattice QCD: focus on one area I understand well. A central aim of calculations using lattice QCD is to.
2005 Materials Computation Center External Board Meeting The Materials Computation Center Duane D. Johnson and Richard M. Martin (PIs) Funded by NSF DMR.
Results of the HPC in Europe Taskforce (HET) e-IRG Workshop Kimmo Koski CSC – The Finnish IT Center for Science April 19 th, 2007.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Supercomputing Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
Pascucci-1 Valerio Pascucci Director, CEDMAV Professor, SCI Institute & School of Computing Laboratory Fellow, PNNL Massive Data Management, Analysis,
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Commodity Grid Kits Gregor von Laszewski (ANL), Keith Jackson (LBL) Many state-of-the-art scientific applications, such as climate modeling, astrophysics,
IDC HPC USER FORUM Weather & Climate PANEL September 2009 Broomfield, CO Panel questions: 1 response per question Limit length to 1 slide.
Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.
Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Computational Science & Engineering meeting national needs Steven F. Ashby SIAG-CSE Chair March 24, 2003.
1 1 Office of Science Jean-Luc Vay Accelerator Technology & Applied Physics Division Lawrence Berkeley National Laboratory HEP Software Foundation Workshop,
J.-N. Leboeuf V.K. Decyk R.E. Waltz J. Candy W. Dorland Z. Lin S. Parker Y. Chen W.M. Nevins B.I. Cohen A.M. Dimits D. Shumaker W.W. Lee S. Ethier J. Lewandowski.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Xolotl: A New Plasma Facing Component Simulator Scott Forest Hull II Jr. Software Developer Oak Ridge National Laboratory
HEP and NP SciDAC projects: Key ideas presented in the SciDAC II white papers Robert D. Ryne.
ComPASS Summary, Budgets & Discussion Panagiotis Spentzouris, Fermilab ComPASS PI.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
NSF Middleware Initiative Purpose To design, develop, deploy and support a set of reusable, expandable set of middleware functions and services that benefit.
An Architectural Approach to Managing Data in Transit Micah Beck Director & Associate Professor Logistical Computing and Internetworking Lab Computer Science.
Center for Component Technology for Terascale Simulation Software (CCTTSS) 110 April 2002CCA Forum, Townsend, TN This work has been sponsored by the Mathematics,
Considering Time in Designing Large-Scale Systems for Scientific Computing Nan-Chen Chen 1 Sarah S. Poon 2 Lavanya Ramakrishnan 2 Cecilia R. Aragon 1,2.
Advanced User Support in the Swedish National HPC Infrastructure May 13, 2013NeIC Workshop: Center Operations best practices.
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
Toward High Breakthrough Collaboration (HBC) Susan Turnbull Program Manager Advanced Scientific Computing Research March 4, 2009.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Presented by SciDAC-2 Petascale Data Storage Institute Philip C. Roth Computer Science and Mathematics Future Technologies Group.
VisIt Project Overview
A Brief Introduction to NERSC Resources and Allocations
Electron Ion Collider New aspects of EIC experiment instrumentation and computing, as well as their possible impact on and context in society (B) COMPUTING.
Collaborations and Interactions with other Projects
A Web-enabled Approach for generating data processors
In Situ Fusion Simulation Particle Data Reduction Through Binning
Presentation transcript:

Managed by UT-Battelle for the Department of Energy China-US Software Workshop March 6, 2012 Scott Klasky Data Science Group Leader Computer Science and Mathematics Research Division ORNL

Managed by UT-Battelle for the Department of Energy Remembering my past Sorry, but I was a relativist a long long time ago. NSF funded the Binary Black Hole Grand Challenge 1993 – Universities: Texas, UIUC, UNC, Penn State, Cornell, NWU, Syracuse, U. Pittsburgh

Managed by UT-Battelle for the Department of Energy The past, but with the same issues R. Matzner, php?speaker=Matzner php?speaker=Matzner

Managed by UT-Battelle for the Department of Energy Some of my active projects DOE ASCR: Runtime Staging: ORNL, Georgia Tech, NCSU, LBNL DOE ASCR: Combustion Co-Design: Exact: LBNL, LLNL, LANL, NREL, ORNL, SNL, Georgia Tech, Rutgers, Stanford, U. Texas, U. Utah DOE ASCR: SDAV: LBNL, ANL, LANL, ORNL, UC Davis, U. Utah, Northwestern, Kitware, SNL, Rutgers, Georgia Tech, OSU DOE/ASCR/FES: Partnership for Edge Physics Simulation (EPSI): PPPL, ORNL, Brown, U. Col, MIT, UCSD, Rutgers, U. Texas, Lehigh, Caltech, LBNL, RPI, NCSU DOE/FES: SciDAC Center for Nonlinear Simulation of Energetic Particles in Burning Plasmas: PPPL, U. Texas, U. Col., ORNL DOE/FES: SciDAC GSEP: U. Irvine, ORNL, General Atomics, LLNL DOE/OLCF: ORNL NSF: Remote Data and Visualization: UTK, LBNL, U.W, NCSA NSF Eager: An Application Driven I/O Optimization Approach for PetaScale Systems and Scientific Discoveries: UTK NSF G8: G8 Exascale Software Applications: Fusion Energy, PPPL, U. Edinburgh, CEA (France), Juelich, Garching, Tsukuba, Keldish (Russia) NASA/ROSES: An Elastic Parallel I/O Framework for Computational Climate Modeling : Auburn, NASA, ORNL

Managed by UT-Battelle for the Department of Energy Scientific Data Group at ORNL

Managed by UT-Battelle for the Department of Energy Top reasons of why I love collaboration I love spending my time working with a diverse set of scientist I like working on complex problems I like exchanging ideas to grow I want to work on large/complex problems that require many researchers to work together to solve these Building sustainable software is tough, I want to

Managed by UT-Battelle for the Department of Energy ADIOS Goal was to create a framework for I/O processing that would Enable us to deal with system/application complexity Rapidly changing requirements Evolving target platforms, and diverse teams

Managed by UT-Battelle for the Department of Energy ADIOS involves collaboration Idea was to allow different groups to create different I/O methods that could ‘plug’ into our framework Groups which created ADIOS methods include: ORNL, Georgia Tech, Sandia, Rutgers, NCSU, Auburn Islands of performance for different machines dictate that there is never one ‘best’ solution for all codes New applications (such as Grapes and GEOS-5) allow new methods to evolve Sometimes just for their code for one platform, and other times ideas can be shared

Managed by UT-Battelle for the Department of Energy ADIOS collaboration

Managed by UT-Battelle for the Department of Energy What do I want to make collaboration easy I don’t care about clouds, grids, HPC, exascale, but I do care about getting science done efficiently Need to make it easy to Share data Share codes Give credit without knowing who did what to advance my science Use other codes and tools and technologies to develop more advanced codes Must be easier than RTFM System needs to decide what to be moved, how to move it, where is the information I want to build our research/development from others

Managed by UT-Battelle for the Department of Energy Need to deal with collaborations gone bad I have had several incidents where “collaborators” become competitors Worry about IP being taken and not referenced Worry about data being used in the wrong context Without record of where the idea/data came from it makes people afraid to collaborate bobheske.wordpress.com

Managed by UT-Battelle for the Department of Energy Why now? Science has gotten very complex Science teams are getting more complex Experiments have gotten complex More diagnostics, larger teams, more complexities Computing hardware has gotten complex People often want to collaborate but find the technologies too limited, and fear the unknown

Managed by UT-Battelle for the Department of Energy What is GRAPES GRAPES: Global/Regional Assimilation and PrEdiction System developed by CMA 3D-VAR DATA ASSIMILATION Initialization GRAPES Global model Global 6h forecast field Static Data Global 6h forecast field GTS data ATOVS 资料 预处理 Background field QC Analysis field Modelvar postvar Database Grads Output GRAPES input Filter Regional model 6h cycle, only 2h for 10day global prediction

Managed by UT-Battelle for the Department of Energy Development plan of GRAPES in CMA System upgrade GDAS GFS Global-3DVAR NESDIS-ATOVS, More channel EUmetCAST- ATOVS Operation Grapes-global-3DVAR 50km, GPS/COSMIC FY3-ATOVS, FY2-Track wind QuikSCAT Operation AIRS selected channel Operation T639L60-3DVAR+ModelOperation GRAPES GFS 50km Pre-operation GRAPES GFS 25km T639L60-3DVAR+Model After 2011, Only use GRAPES model Pre-operation Operation higher resolution is a key point of future GRAPES

Managed by UT-Battelle for the Department of Energy Why IO? IO dominates the time of GRAPES when > 2048p 25km H-resolution Case Over Tianhe-1A Grapes_input and colm_init are Input func. Med_last_solve_io/med_before_solve_io are output func.

Managed by UT-Battelle for the Department of Energy Typical I/O performance when using ADIOS High Writing Performance (Most codes achieve > 10X speedup over other I/O libraries) S3D  32 GB/s with 96K cores, 0.6% I/O overhead XGC1 code  40 GB/s, SCEC code  30 GB/s GTC code  40 GB/s, GTS code  35 GB/s Chimera  12X performance increase Ramgen  50X performance increase

Managed by UT-Battelle for the Department of Energy Details: I/O performance engineering of the Global Regional Assimilation and Prediction System (GRAPES) code on supercomputers using the ADIOS framework GRAPES is increasing the resolution, and I/O performance must be reduced GRAPES will begin to need to abstract I/O away from a file format, and more into I/O services. One I/O service will be writing GRIB2 files Another I/O service will be compression methods Another I/O service will be inclusion of analytics and visualization

Managed by UT-Battelle for the Department of Energy Benefits to the ADIOS community More users = more sustainability More users = more developers Easy for us to create I/O skeletons for next generation system designers

Managed by UT-Battelle for the Department of Energy Skel Skel is a versatile tool for creating and managing I/O skeletal applications Skel generates source code, makefile, and submission scripts The process is the same for all “ADIOSed” applications Measurements are consistent and output is presented in a standard way One tool allows us to benchmark I/O for many applications grapes.xml skel xml grapes_skel. xml skel params grapes_params.xml skel src skel makefile skel submit Makefile Submit scripts Source files make Executables make deploy skel_grapes

Managed by UT-Battelle for the Department of Energy What are the key requirements for your collaboration - e.g., travel, student/research/developer exchange, workshop/tutorials, etc. Student exchange Tsinghua University sends student to UTK/ORNL (3 months/year) Rutgers University sends student to Tsinghua University (3 months/year) Senior research exchange UTK/ORNL + Rutgers + NCSU send senior researchers to Tsinghua University (1+ week * 2 times/year) Our group prepares tutorials for Chinese community Full day tutorials for each visit Each visit needs to allow our researchers access to the HPC systems so we can optimize Computer time for teams for all machines Need to optimize routines together, and it is much easier when we have access to machines 2 phone calls/month

Managed by UT-Battelle for the Department of Energy Leveraging other funding sources NSF: EAGER proposal, RDAV proposal Work with Climate codes, sub surfacing modeling, relativity, … NASA: ROSES proposal Work with GEOS-5 climate code DOE/ASCR Research new techniques for I/O staging, co-design hybrid-staging, I/O support for SciDAC/INCITE codes DOE/FES Support I/O pipelines, and multi-scale, multi-physics code coupling for fusion codes DOE/OLCF Support I/O and analytics on the OLCF for simulations which run at scale

Managed by UT-Battelle for the Department of Energy What specific mechanisms need to be set up? Need

Managed by UT-Battelle for the Department of Energy What the metrics of success? Grapes I/O overhead is dramatically reduced Win for both teams ADIOS has new mechanism to output GRIB2 format Allows ADIOS to start talking to more teams doing weather modeling Research is performed which allow us to understand new RDMA networks New understanding of how to optimize data movement on exotic architecture New methods in ADIOS that minimize I/O in Grapes, and can help new codes New studies from Skel give hardware designers parameters to allow them to design file systems for next generation machines, based on Grapes, and many other codes Mechanisms to share open source software that can lead to new ways to share code amongst a even larger diverse set of researchers

Managed by UT-Battelle for the Department of Energy Team & Roles Need for and impact of China-US collaboration Objectives and significance of the research Approach and mechanisms; support required I mprove I/O to meet the time-critical requirement for operation of GRAPES Improve ADIOS on new types of parallel simulation and platforms (such as Tianhe-1A) Extend ADIOS to support the GRIB2 format Feed back the results to ADIOS and help researchers in many communities Connect I/O software from the US with parallel application and platforms in China Service extensions, performance optimization techniques, and evaluation results will be shared Faculty and student members of the project will gain international collaboration experience Monthly teleconference Student exchange Meetings at Tsinghua University with two of the ADIOS developers Meeting during mutual attended conferences (SC, IPDPS) Joint publications Dr. Zhiyan Jin, CMA, Design GRAPES I/O infrastructure Dr. Scott Klasky, ORNL, Directing ADIOS, with Drs. Podhorszki, Abbasi, Qiu, Logan Dr. Xiaosong Ma, NCSU/ORNL, I/O and staging methods, to exploit in-transit processing to GRAPES Dr. Manish Parashar, RU, Optimize the ADIOS Dataspace method for GRAPES Dr. Wei Xue, TSU, Developing the new I/O stack of GRAPES using ADIOS, and tuning the implementation for Chinese supercomputers I/O performance engineering of the Global Regional Assimilation and Prediction System (GRAPES) code on supercomputers using the ADIOS framework