1 Introduction to the TeraGrid Daniel S. Katz TeraGrid GIG Director of Science Senior Grid/HPC Researcher, Computation Institute University.

Slides:



Advertisements
Similar presentations
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Advertisements

1 Data Challenges on the TeraGrid Daniel S. Katz TeraGrid GIG Director of Science Senior Grid/HPC Researcher, Computation Institute University.
Xsede eXtreme Science and Engineering Discovery Environment Ron Perrott University of Oxford 1.
1 US activities and strategy :NSF Ron Perrott. 2 TeraGrid An instrument that delivers high-end IT resources/services –a computational facility – over.
TeraGrid Quarterly Meeting Dec 6-7, 2007 DVS GIG Project Year 4&5 Project List Kelly Gaither, DVS Area Director.
User Introduction to the TeraGrid 2007 SDSC NCAR TACC UC/ANL NCSA ORNL PU IU PSC.
Advancing Scientific Discovery through TeraGrid Scott Lathrop TeraGrid Director of Education, Outreach and Training University of Chicago and Argonne National.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
(e)Science-Driven, Production- Quality, Distributed Grid and Cloud Data Infrastructure for the Transformative, Disruptive, Revolutionary, Next-Generation.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 High Performance Computing at SCEC Scott Callaghan Southern California Earthquake Center University of Southern California.
Simo Niskala Teemu Pasanen
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Core Services I & II David Hart Area Director, UFP/CS TeraGrid Quarterly Meeting December 2008.
Network, Operations and Security Area Tony Rimovsky NOS Area Director
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Science on the TeraGrid Daniel S. Katz Director of Science, TeraGrid GIG Senior Computational Researcher, Computation Institute, University.
TeraGrid National Cyberinfrasctructure for Scientific Research PRESENTER NAMES AND AFFILIATIONS HERE.
Purdue RP Highlights TeraGrid Round Table September 23, 2010 Carol Song Purdue TeraGrid RP PI Rosen Center for Advanced Computing Purdue University.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
18:15:32Service Oriented Cyberinfrastructure Lab, Grid Deployments Saul Rioja Link to presentation on wiki.
Science Gateways on the TeraGrid Nancy Wilkins-Diehr Area Director for Science Gateways San Diego Supercomputer Center
AT LOUISIANA STATE UNIVERSITY University of Washington – e-Science Introduction to the TeraGrid Jeffrey P. Gardner Sr. Research Scientist, High Performance.
National Center for Supercomputing Applications The Computational Chemistry Grid: Production Cyberinfrastructure for Computational Chemistry PI: John Connolly.
The TeraGrid David Hart Indiana University AAAS’09, FEBRUARY 13, 2009.
Advancing Scientific Discovery through TeraGrid Scott Lathrop TeraGrid Director of Education, Outreach and Training University of Chicago and Argonne National.
August 2007 Advancing Scientific Discovery through TeraGrid Scott Lathrop TeraGrid Director of Education, Outreach and Training University of Chicago and.
TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,
1 TeraGrid ‘10 August 2-5, 2010, Pittsburgh, PA State of TeraGrid in Brief John Towns TeraGrid Forum Chair Director of Persistent Infrastructure National.
August 2007 Advancing Scientific Discovery through TeraGrid Adapted from S. Lathrop’s talk in SC’07
SAN DIEGO SUPERCOMPUTER CENTER NUCRI Advisory Board Meeting November 9, 2006 Science Gateways on the TeraGrid Nancy Wilkins-Diehr TeraGrid Area Director.
TeraGrid Overview Cyberinfrastructure Days Internet2 10/9/07 Mark Sheddon Resource Provider Principal Investigator San Diego Supercomputer Center
1 Preparing Your Application for TeraGrid Beyond 2010 TG09 Tutorial June 22, 2009.
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
TeraGrid Quarterly Meeting Dec 5 - 7, 2006 Data, Visualization and Scheduling (DVS) Update Kelly Gaither, DVS Area Director.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
TeraGrid National CyberInfrastructure for Scientific Research Philip Blood Senior Scientific Specialist Pittsburgh Supercomputing Center April 23, 2010.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Leveraging the InCommon Federation to access the NSF TeraGrid Jim Basney Senior Research Scientist National Center for Supercomputing Applications University.
SC06, Tampa FL November 11-17, 2006 Science Gateways on the TeraGrid Powerful Beyond Imagination! Nancy Wilkins-Diehr TeraGrid Area Director for Science.
TeraGrid Quarterly Meeting Arlington, VA Sep 6-7, 2007 NCSA RP Status Report.
Riding the Crest: High-End Cyberinfrastructure Experiences and Opportunities on the NSF TeraGrid A Panel Presentation by Laura M c GinnisRadha Nandkumar.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Sergiu April 2006June 2006 Overview of TeraGrid Resources and Services Sergiu Sanielevici, TeraGrid Area Director for User.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Education, Outreach and Training (EOT) and External Relations (ER) Scott Lathrop Area Director for EOT Extension Year Plans.
SDSC TACC UC/ANL NCSA ORNL PU IU PSC NCAR Caltech USC/ISI UNC/RENCI UW Resource Provider (RP) Software Integration Partner Grid Infrastructure Group (UChicago)
Data, Visualization and Scheduling (DVS) TeraGrid Annual Meeting, April 2008 Kelly Gaither, GIG Area Director DVS.
Network, Operations and Security Area Tony Rimovsky NOS Area Director
Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.
TeraGrid Overview John-Paul “JP” Navarro TeraGrid Area Co-Director for Software Integration University of Chicago/Argonne National Laboratory March 25,
AT LOUISIANA STATE UNIVERSITY CCT: Center for Computation & Technology Introduction to the TeraGrid Daniel S. Katz Lead, LONI as a TeraGrid.
October 2007 TeraGrid : Advancing Scientific Discovery and Learning Diane A. Baxter, Ph.D. Education Director San Diego Supercomputer Center University.
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
TG ’08, June 9-13, State of TeraGrid John Towns Co-Chair, TeraGrid Forum Director, Persistent Infrastructure National Center for Supercomputing.
SAN DIEGO SUPERCOMPUTER CENTER Science Gateways on the TeraGrid Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways SDSC Director of Consulting,
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Education, Outreach and Training (EOT) and External Relations (ER) Scott Lathrop Area Director for EOT and ER July 2008.
TeraGrid’s Process for Meeting User Needs. Jay Boisseau, Texas Advanced Computing Center Dennis Gannon, Indiana University Ralph Roskies, University of.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
SCEC CyberShake on TG & OSG: Options and Experiments Allan Espinosa*°, Daniel S. Katz*, Michael Wilde*, Ian Foster*°,
Seismic Hazard Analysis Using Distributed Workflows
Scott Callaghan Southern California Earthquake Center
rvGAHP – Push-Based Job Submission Using Reverse SSH Connections
CyberShake Study 18.8 Technical Readiness Review
Presentation transcript:

1 Introduction to the TeraGrid Daniel S. Katz TeraGrid GIG Director of Science Senior Grid/HPC Researcher, Computation Institute University of Chicago & Argonne National Laboratory Affiliate Faculty, Center for Computation & Technology, Louisiana State University (LSU) Adjunct Associate Professor, Electrical and Computer Engineering, LSU

2 What is the TeraGrid? World’s largest open scientific discovery infrastructure Leadership class resources at eleven partner sites combined to create an integrated, persistent computational resource –High-performance networks –High-performance computers (>1 Pflops (~100,000 cores) -> 1.75 Pflops) And a Condor pool (w/ ~13,000 CPUs) –Visualization systems –Data Storage Data collections (>30 PB, >100 discipline-specific databases) Archival storage (10-15 PB current stored) –Science Gateways –User portal –User services - Help desk, training, advanced app support Allocated to US researchers and their collaborators through national peer-review process –Generally, review of computing, not science Extremely user-driven –MPI jobs, ssh or grid (GRAM) access, etc.

3 TeraGrid Governance 11 Resource Providers (RPs) funded under individual agreements with NSF –Mostly different: start and end dates, goals, and funding models 1 Coordinating Body – Grid Integration Group (GIG) –University of Chicago/Argonne –Subcontracts to all RPs and six other universities –~10 Area Directors, lead coordinated work across TG –~18 Working groups with members from many RPs work on day-to- day issues –RATs formed to handle short-term issues TeraGrid Forum sets policies and is responsible for the overall TeraGrid –Each RP and the GIG votes in the TG Forum

4 TeraGrid

5 How One Uses TeraGrid Compute Service (HPC, HTC, CPUs, GPUs, VMs) Viz Service Data Service Network, Accounting, … RP 1 RP 3 RP 2 TeraGrid Infrastructure (Accounting, Network, Authorization,…) POPS Science Gateways User Portal Command Line 1. Get an allocation for your project 2. Allocation PI adds users 2. Use TeraGrid resource s

6 –NSF ‘Track2a’ HPC system –504 TF –15,744 Quad-Core AMD Opteron processors –123 TB memory, 1.7 PB disk (UT/ORNL) –NSF ‘Track2b’ HPC system –170 TF Cray XT4 system –To be upgraded to Cray XT5 at 1 PF 10,000+ compute sockets 100 TB memory, 2.3 PB disk –NSF ‘Track2c’ HPC system –First NSF ‘Track2d’ system – for data intensive computing –More ‘Track2d’ systems to be announced TG New Large Resources Blue NSF Track 1 10 PF peak Coming in 2011

7 How TeraGrid Is Used Use Modality Community Size (rough est. - number of users) Batch Computing on Individual Resources 850 Exploratory and Application Porting 650 Workflow, Ensemble, and Parameter Sweep 250 Science Gateway Access 500 Remote Interactive Steering and Visualization 35 Tightly-Coupled Distributed Computation data

8 Who Uses TeraGrid (2008)

9 User Portal: portal.teragrid.org

10 User Portal: User Information Knowledge Base for quick answers to technical questions Documentation Science Highlights News and press releases Education, outreach and training events and resources

11 Access to resources Terminal: ssh, gsissh Portal: TeraGrid user portal, Gateways –Once logged in to portal, click on “Login” Also, SSO from command-line

12 Data Storage Resources Global File System –GPFS-WAN 700 TB disk storage at SDSC, accessible from machines at NCAR, NCSA, SDSC, ANL Licensing issues prevent further use –Data Capacitor (Lustre-WAN) Mounted on growing number of TG systems 535 TB storage at IU, including databases Ongoing work to improve performance and authentication infrastructure Another Lustre-WAN implementation being built by PSC –pNFS is a possible path for global file systems, but is far away from being viable for production Data Collections –Storage at SDSC (files, databases) for collections used by communities Tape Storage –Available at IU, NCAR, NCSA, SDSC Access is generally through GridFTP (through portal or command-line)

13 TGUP Data Mover Drag and drop java applet in user portal –Uses GridFTP, 3 rd -party transfers, RESTful services, etc.

14 Science Gateways A natural extension of Internet & Web 2.0 Idea resonates with Scientists –Researchers can imagine scientific capabilities provided through familiar interface Mostly web portal or web or client-server program Designed by communities; provide interfaces understood by those communities –Also provide access to greater capabilities (back end) –Without user understand details of capabilities –Scientists know they can undertake more complex analyses and that’s all they want to focus on –TeraGrid provides tools to help developer Seamless access doesn’t come for free –Hinges on very capable developer Slide courtesy of Nancy Wilkins-Diehr

15 Current Science Gateways Biology and Biomedicine Science Gateway Open Life Sciences Gateway The Telescience Project Grid Analysis Environment (GAE) Neutron Science Instrument Gateway TeraGrid Visualization Gateway, ANL BIRN Open Science Grid (OSG) Special PRiority and Urgent Computing Environment (SPRUCE) National Virtual Observatory (NVO) Linked Environments for Atmospheric Discovery (LEAD) Computational Chemistry Grid (GridChem) Computational Science and Engineering Online (CSE-Online) GEON(GEOsciences Network) Network for Earthquake Engineering Simulation (NEES) SCEC Earthworks Project Network for Computational Nanotechnology and nanoHUB GIScience Gateway (GISolve) Gridblast Bioinformatics Gateway Earth Systems Grid Astrophysical Data Repository (Cornell) Slide courtesy of Nancy Wilkins-Diehr

16 Focused Outreach Campus Champions –Volunteer effort of a staff member at a university Pathways for Broadening Participation –Low-level trial effort –TeraGrid RP staff have extended interaction with MSIs Both –Have initial small TG allocation Can create and distribute suballocations very quickly –Work with users as needed to help them 16

17 TG App: Predicting storms Hurricanes and tornadoes cause massive loss of life and damage to property TeraGrid supported spring 2007 NOAA and University of Oklahoma Hazardous Weather Testbed –Major Goal: assess how well ensemble forecasting predicts thunderstorms, including the supercells that spawn tornadoes –Nightly reservation at PSC, spawning jobs at NCSA as needed for details –Input, output, and intermediate data transfers –Delivers “better than real time” prediction –Used 675,000 CPU hours for the season –Used 312 TB on HPSS storage at PSC Slide courtesy of Dennis Gannon, ex-IU, and LEAD Collaboration

18 App: GridChem Slide courtesy of Joohyun Kim Different licensed applications with different queues Will be scheduled for workflows

19 Apps: Genius and Materials 19 HemeLB on LONI LAMMPS on TeraGrid Fully-atomistic simulations of clay-polymer nanocomposites Slide courtesy of Steven Manos and Peter Coveney Why cross-site / distributed runs? 1.Rapid turnaround, conglomeration of idle processors to run a single large job 2.Run big compute & big memory jobs not possible on a single machine Modeling blood flow before (during?) surgery

20 SAN DIEGO SUPERCOMPUTER CENTER, UCSD SCEC: Southern California Earthquake Center PI: Tom Jordan, USC PetaShake: extend deterministic simulations of strong ground motions to 3 Hz CyberShake: compute physics- based probabilistic seismic hazard attenuation maps Two SCEC Projects Source: SCEC

21 SCEC CyberShake SCEC: Southern California Earthquake Center PI: Tom Jordan, USC Using the large scale simulation data, estimate probabilistic seismic hazard (PSHA) curves for sites in southern California (probability that ground motion will exceed some threshold over a given time period) Used by hospitals, power plants etc. as part of their risk assessment Plan to replace existing phenomenological curves with more accurate results using new CyberShake code. (better directivity, basin amplification) Completed 40 locations <=2008, targeting 200 in 2009, and 2000 in 2010

22 SCEC CyberShake – PSHA Computing A. Generate rupture variations – 48 hours of sequential processing at USC –Output: 1 TB of data describing 600,000 potential earthquakes, each with a unique hypocenter and slip distribution For each location: –B. Generate Strain Green Tensors (SGTs) – two 18 hour 400-core jobs Output: 25 GB of data –C. Generate Hazard Curve – 840,000 sequential jobs, takes O(10) hours on 1000 cores Output: Hazard Curve (small amount of data) –Distribute the work for the locations across USC, TACC, NCSA 1 TB of data is zipped, GridFTP’ed, and unzipped at each site Takes about 3 days Generating all curves takes a few weeks Managing the sequential jobs for each hazard curve requires effective grid workflow tools for job submission, data management and error recovery, using Pegasus (ISI) and DAGman (U of Wisconsin)

23 ENZO ENZO simulated cosmological structure formation Big current production simulation: –4096x4096x4096 non-adaptive mesh, 16 fields per mesh point –64 billion dark matter particles –About 4000 MPI processes, 1-8 OpenMP threads per process –Reads 5 TB input data –Writes 8 TB data files A restart reads latest 8 TB data file –All I/O uses HDF5, each MPI process reading/writing their own data –Over a few months for the simulation, >100 data files will be written, and about >20 will be read for restarts –24 hour batch runs 5-10 data files output per run Needs ~100 TB free disk space at start of run (adaptive case is different, but I/O is roughly similar)

24 ENZO Calculation Stages 1.Generate initial conditions for density, matter velocity field, dark matter particle position and velocity (using parallel HDF5) –Using NICS Kraken w/ 4K MPI processes, though TACC Ranger is reasonable alternative –About 5 TB of initial data are created in TB files 2.Decompose the initial conditions for the number of MPI tasks to be used for the actual simulation (using sequential HDF5) –The decomposition of the mesh into the "tiles" needed for MPI tasks requires strided reads in a large data cube This is very costly on NICS Kraken but can be done more efficiently on TACC Ranger If done on Ranger, then 2 TB of data (4 512-GB files) have to be transmitted from NICS to TACC and after running the MPI decomposition task (with 4K MPI tasks) there are 8K files (another 2 TB) which must be returned to NICS –The dark matter particle sort onto the "tiles" is most efficient on NICS Kraken because it has a superior interconnect The sort is usually run in 8 slices using 4K MPI tasks 3.Evolve time (using sequential HDF5) –Dump data files during run –Archive data files (8 TB every couple of hours -> 1100 MB/sec but NICS HPSS only reaches 300 MB/sec) 4.Derive data products –Capture 5-6 fields from each data file (~256 GB each) –Send to ANL or SDSC for data analysis or viz –Archive output of data analysis of viz (back at NICS) This overall run will produce at least 1 PB of data, with at least 100 TB needed to be archived, and requires 100 TB of free disk space

25 TeraGrid: Both Operations and Research Operations –Facilities/services on which users rely –Infrastructure on which other providers build AND R&D –Learning how to do distributed, collaborative science on a global, federated infrastructure –Learning how to run multi-institution shared infrastructure

keynote speakers, 3 panels, conference papers & posters, 8 tutorials, 4 workshops Support for student travel soon to be available (up to 75 students) Advance registration fee waived Conference hotel costs subsidized by $200/student ($400 for double occupancy) Poster submission deadline – June 26 th Workshop submission deadlines – some are June 26 th (Contact poster/workshop chairs as needed)

27