Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Similar presentations


Presentation on theme: "National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)"— Presentation transcript:

1 National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

2 What we’ve already learned so far –What grids are, why we want them and who is using them (Intro) –Grid Authentication and Authorization –Harnessing CPU cycles with Condor –Data Management and the Grid  In this lecture –Fabric level infrastructure: Grid building blocks –National Grid efforts in the US Open Science Grid TeraGrid Introduction

3 Grid Resources in the US Research Participation  Majority from physics : Tevatron, LHC, STAR, LIGO.  Used by 10 other (small) research groups.  90 members, 30 VOs,  Contributors:  5 DOE Labs  BNL, Fermilab, NERSC, ORNL, SLAC.  65 Universities.  5 partner campus/regional grids.  Accessible resources:  43,000+ cores  6 Petabytes disk cache  10 Petabytes tape stores  14 internetwork partnership  Usage  15,000 CPU WallClock days/day  1 Petabyte data distributed/month.  100,000 application jobs/day.  20% cycles through resource sharing, opportunistic use. Research Participation  Majority from physics : Tevatron, LHC, STAR, LIGO.  Used by 10 other (small) research groups.  90 members, 30 VOs,  Contributors:  5 DOE Labs  BNL, Fermilab, NERSC, ORNL, SLAC.  65 Universities.  5 partner campus/regional grids.  Accessible resources:  43,000+ cores  6 Petabytes disk cache  10 Petabytes tape stores  14 internetwork partnership  Usage  15,000 CPU WallClock days/day  1 Petabyte data distributed/month.  100,000 application jobs/day.  20% cycles through resource sharing, opportunistic use. Research Participation  Support for Science Gateways  over 100 scientific data collections (discipline specific databases)  Contributors:  11 Supercomputing centers IndianaIndiana, LONI, NCAR, NCSA, NICS, ORNL, PSC, Purdue, SDSC, TACC and UC/ANLLONINCARNCSANICSORNL PSCPurdueSDSCTACCUC/ANL Computational resources: –> 1 Petaflop computing capability –30 Petabytes of storage (disk and tape)‏ –Dedicated high performance internet connections (10G)  750 TFLOPS (161K-cores) in parallel computing systems and growing Research Participation  Support for Science Gateways  over 100 scientific data collections (discipline specific databases)  Contributors:  11 Supercomputing centers IndianaIndiana, LONI, NCAR, NCSA, NICS, ORNL, PSC, Purdue, SDSC, TACC and UC/ANLLONINCARNCSANICSORNL PSCPurdueSDSCTACCUC/ANL Computational resources: –> 1 Petaflop computing capability –30 Petabytes of storage (disk and tape)‏ –Dedicated high performance internet connections (10G)  750 TFLOPS (161K-cores) in parallel computing systems and growing TeraGrid OSG

4 OSG vs TG OSGTG Computational Resource 43K-cores across 80 institutions 161K-cores across 11 institutions and 22 systems Storage support a shared file system is not mandatory, and hence applications need to be aware of this shared file system (NFS, PVFS, GPFS, Lustre) on each system, and even has a WAN GPFS mounted across most systems Accessibility Private IP space for compute nodes no interactive sessions supports Condor throughout supports GT2 (and a few GT4) the firewall is locked down More compute nodes  public IP space Support for interactive sessions with login and compute nodes Supports GT2, GT4 for remote access, and mostly PBS/SGE and some Condor for local access 10K ports open in the TG firewall on login and compute nodes

5 Layout of Typical Grid Site Computing Fabric Grid Middleware Grid Level Services + + => A Grid Site => globus Compute Element Storage Element User Interface Authz server Monitoring Element Monitoring Clients Services Data Management Services Grid Operations The GridThe Grid globus Globus, Condor, ++

6 Before FermiGrid e.g.Fermilab User Resource Head Node Workers Astrophysics Resource Head Node Workers Common Resource Head Node Workers ParticlePhysics Resource Head Node Workers Theory Existing Common Gateway & Central Services Common Gateway & Central Services Guest User Local Grid with adaptor to national grid Central Campus wide Grid Services Enable efficiencies and sharing across internal farms and storage Maintain autonomy of individual resources Next Step: Campus Infrastructure Days - new activity OSG, Internet2 and TeraGrid

7 Grid Monitoring & Information Services

8 To efficiently use a Grid, you must locate and monitor its resources. Check the availability of different grid sites Discover different grid services Check the status of “jobs” Make better scheduling decisions with information maintained on the “health” of sites

9 Monitoring provides information for several purposes Operation of Grid –Monitoring and testing Grid Deployment of applications –What resources are available to me? (Resource discovery) –What is the state of the grid? (Resource selection) –How to optimize resource use? (Application configuration and adaptation) Information for other Grid Services to use

10 Monitoring information is either static or dynamic, broadly. Static information about a site: –Number of worker nodes, processors –Storage capacities –Architecture and Operating systems Dynamic information about a site –Number of jobs currently running –CPU utilization of each worker node –Overall site “availability” Time-varying information is critical for scheduling of grid jobs More accurate info costs more: it’s a tradeoff.

11 Open Science Grid Overview

12 grid service providers: – middleware developers – cluster, network and storage administrators – local-grid communities the grid consumers: – global collaborations – single researchers – campus communities – under-served science domains  into a cooperative infrastructure to share and sustain a common heterogeneous distributed facility in the US and beyond. The Open Science Grid Consortium brings:

13

14 96 Resources across production & integration infrastructures 20 Virtual Organizations +6 operations Includes 25% non-physics. ~20,000 CPUs (from 30 to 4000) ~6 PB Tapes ~4 PB Shared Disk Snapshot of Jobs on OSGs Sustaining through OSG submissions: 3,000-4,000 simultaneous jobs. ~10K jobs/day ~50K CPUhours/day. Peak test jobs of 15K a day. Using production & research networks OSG Snapshot

15 OSG - a Community Consortium DOE Laboratories and DOE, NSF, other, University Facilities contributing computing farms and storage resources, infrastructure and user services, user and research communities. Grid technology groups: Condor, Globus, Storage Resource Management, NSF Middleware Initiative. Global research collaborations: High Energy Physics - including Large Hadron Collider, Gravitational Wave Physics - LIGO, Nuclear and Astro Physics, Bioinformatics, Nanotechnology, CS research…. Partnerships: with peers, development and research groups Enabling Grids for EScience (EGEE),TeraGrid, Regional & Campus Grids (NYSGrid, NWICG, TIGRE, GLOW..) Education: I2U2/Quarknet sharing cosmic ray data, Grid schools… 19992000200120022005200320042006200720082009 PPDG GriPhyN iVDGL TrilliumGrid3 OSG (DOE) (DOE+NSF) (NSF)

16 OSG sits in the middle of an environment of a Grid- of-Grids from Local to Global Infrastructures Inter-Operating and Co-Operating Grids: Campus, Regional, Community, National, International. Virtual Organizations doing Research & Education.

17 Overlaid by virtual computational environments of single to large groups of researchers local to worldwide

18 AstroPhysics LIGO VO The Open Science Grid UW Campus Grid Tier2 site A OSG Operations BNL cluster FNAL cluster User Communities Biology nanoHub HEP Physics CMS VO HEP Physics CMS VO HEP Physics CMS VO HEP Physics CMS VO Astromomy SDSS VO Astronomy SDSS VO Nanotech nanoHub AstroPhysics LIGO VO Astrophysics LIGO VO OSG Resource Providers VO support center RP support center VO support center VO support center A RP support center RP support center A UW Campus Grid Dep. cluste r Dep. cluste r Dep. cluste r Dep. cluste r Virtual Organization (VO): Organization composed of institutions, collaborations and individuals, that share a common interest, applications or resources. VOs can be both consumers and providers of grid resources.

19 OSG Grid Monitoring

20 stor_stat Ganglia … GIP job_state Monitoring information DataBase Collector others… MonALISA VORS Discovery Service ACDC GINI, SOAP, WDSL… GRAM: jobman-mis https: Web Services GridCat MonALISA server MIS-Core Infrastructure MDS Monitoring Information Consumer API Historical information DataBase Site Level Infrastructure Grid Level Clients

21 Open Science Grid

22 Virtual Organization Resource Selector - VORS http://vors.grid.iu.edu/ http://vors.grid.iu.edu/ Custom web interface to a grid scanner that checks services and resources on: –Each Compute Element –Each Storage Element Very handy for checking: –Paths of installed tools on Worker Nodes. –Location & amount of disk space for planning a workflow. –Troubleshooting when an error occurs.

23 VORS entry for OSG_LIGO_PSU Gatekeeper: grid3.aset.psu.edu OSG Consortium Mtg March 2007 Quick Start Guide to the OSG

24 Gratia -- job accounting system http://gratia-osg.fnal.gov:8880/gratia-reporting/ http://gratia-osg.fnal.gov:8880/gratia-reporting/

25 OSG Grid Level Clients Tools provide basic information about OSG resources –Resource catalog: official tally of OSG sites –Resource discovery: what services are available, where are they and how do I access it –Metrics Information: Usage of resources over time Used to assess scheduling priorities –Where and when should I send my jobs? –Where can I put my output? Used to monitor health and status of the Grid

26 Managing Storage A Solution: SRM ( Storage Resource Manager ) –Grid enabled interface to put data on a site Provides scheduling of data transfer requests Provides reservation of storage space Technologies in the OSG pipeline –dCache/SRM (disk cache with SRM) Provided by DESY & FNAL SE(s) available to OSG as a service from the USCMS VO –DRM (Disk Resource Manager) Provided by LBL Can be added on top of a normal UNIX file system $> globus-url-copy srm://ufdcache.phys.ufl.edu/cms/foo.rfz \ gsiftp://cit.caltech.edu/data/bar.rfz

27 How do you join the OSG? A software perspective (based on Alain Roy’s presentation)

28 Joining OSG Assumption: – You have a campus grid Question: –What changes do you need to make to join OSG?

29 Your Campus Grid assuming that you have a cluster with a batch system: –Condor –Sun Grid Engine –PBS/Torque –LSF

30 Administrative Work You need a security contact –Who will respond to security concerns You need to register your site You should have a web page about your site. –This will be published –People can learn about your site.

31 Big Picture Compute Element (CE) –OSG jobs submitted to CE, which gives them to batch system –Also has information services and lots of support software Shared file system –OSG requires a couple of directories to be mounted on all worker nodes Storage Element (SE) –How do you manage your storage at your site

32 Installing Software The OSG Software Stack –Based on the VDT The majority of the software you’ll install It is grid independent –OSG Software Stack: VDT + OSG-specific configuration Installed via Pacman

33 What is installed? GRAM: –Allows job submissions GridFTP: – Allows file transfers CEMon/GIP: – Publishes site information Some authorization mechanism –grid-mapfile: file that lists authorized users, or –GUMS (grid identity mapping service) And a few other things…

34 OSG Middleware Infrastructure Applications VO Middleware Core grid technology distributions: Condor, Globus, Myproxy: shared with TeraGrid and others Virtual Data Toolkit (VDT) core technologies + software needed by stakeholders: many components shared with EGEE OSG Release Cache: OSG specific configurations, utilities etc. HEP Data and workflow management etc Biology Portals, databases etc User Science Codes and Interfaces Existing Operating, Batch systems and Utilities. Astrophysics Data replication etc

35 Picture of a basic site

36 Shared file system OSG_APP –For users to store applications OSG_DATA –A place to store data –Highly recommended, not required OSG_GRID –Software needed on worker nodes –Not required –May not exist on non-Linux clusters Home directories for users –Not required, but often very convenient

37 Storage Element Some folks require more sophisticated storage management –How do worker nodes access data? –How do you handle terabytes (petabytes?) of data Storage Elements are more complicated –More planning needed –Some are complex to install and configure Two OSG supported options of SRMs: –dCache –Bestman

38 More information Site planning –https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/SitePlanninghttps://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/SitePlanning Installing OSG Software Stack –https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/ –Tutorial: http://www.mcs.anl.gov/~bacon/osgedu/sa_intro.htmlhttp://www.mcs.anl.gov/~bacon/osgedu/sa_intro.html

39 Genome Analysis and Database Update system Runs across TeraGrid and OSG. Uses the Virtual Data System (VDS) workflow & provenance. Pass through public DNA and protein databases for new and newly updated genomes of different organisms and runs BLAST, Blocks, Chisel. 1200 users of resulting DB. Request: 1000 CPUs for 1-2 weeks. Once a month, every month. On OSG at the moment >600CPUs and 17,000 jobs a week.

40 Summary of OSG Provides core services, software and a distributed facility for an increasing set of research communities. Helps VOs access resources on many different infrastructures. Interested in collaborating and contributing our experience and efforts.

41 TeraGrid Overview

42 42 Computational Resources (size approximate - not to scale) Tommy Minyard, TACC SDSC TACC UC/ANL NCSA ORNL PU IU PSC NCAR 2007 (504TF) 2008 (~1PF) Tennessee LONI/LSU

43 The TeraGrid Facility Grid Infrastructure Group (GIG) –University of Chicago –TeraGrid integration, planning, management, coordination –Organized into areas (and not VOs as OSG) User Services Operations Gateways Data/Visualization/Scheduling Education Outreach & Training Software Integration Resource Providers (RP) –Ex: NCSA, SDSC, PSC, Indiana, Purdue, ORNL, TACC, UC/ANL –Systems (resources, services) support, user support –Provide access to resources via policies, software, and mechanisms coordinated by and provided through the GIG.

44 SDSC TACC UC/ANL NCSA ORNL PU IU PSC NCAR Caltech USC/ISI UNC/RENCI UW Resource Provider (RP) Software Integration Partner Grid Infrastructure Group (UChicago) 11 Resource Providers, One Facility LONI NICS

45 TeraGrid Hardware Components High-end compute hardware –Intel/Linux clusters –Alpha SMP clusters –IBM POWER3 and POWER4 clusters –SGI Altix SMPs –SUN visualization systems –Cray XT3 –IBM Blue Gene/L Large-scale storage systems – hundreds of terabytes for secondary storage Visualization hardware Very high-speed network backbone (40Gb/s) – bandwidth for rich interaction and tight coupling

46 TeraGrid Objectives DEEP Science: Enabling Petascale Science –Make Science More Productive through an integrated set of very-high capability resources Address key challenges prioritized by users WIDE Impact: Empowering Communities –Bring TeraGrid capabilities to the broad science community Partner with science community leaders - “Science Gateways” OPEN Infrastructure, OPEN Partnership –Provide a coordinated, general purpose, reliable set of services and resources Partner with campuses and facilities

47 TeraGrid Resources and Services Computing - –nearly a Petaflop of computing power today –500 Tflop Ranger system at TACC Remote visualization servers and software Data –Allocation of data storage facilities –Over 100 Scientific Data Collections Central allocations process Technical Support –Central point of contact for support of all systems –Advanced Support for TeraGrid Applications (ASTA) –Education and training events and resources –Over 20 Science Gateways

48 Requesting Allocations of Time TeraGrid resources are provided for free to academic researchers and educators –Development Allocations Committee (DAC) for start-up accounts up to 30,000 hours of time are requests processed in two weeks - start-up and courses –Medium Resource Allocations Committee (MRAC) for requests of up to 500,000 hours of time are reviewed four times a year –Large Resource Allocations Committee (LRAC) for requests of over 500,000 hours of time are reviewed twice a year

49 TeraGrid User Community

50 TeraGrid Web Resources TeraGrid User Portal for managing user allocations and job flow Knowledge Base for quick answers to technical questions User Information including documentation, information about hardware and software resources Science Highlights News and press releases Education, outreach and training events and resources TeraGrid Provides a rich array of web- based resources: In general, seminars and workshops will be accessible via video on the Web. Extensive documentation will also be Web-based.

51 Science Gateways Broadening Participation in TeraGrid Increasing investment by communities in their own cyberinfrastructure, but heterogeneous: Resources Users – from expert to K-12 Software stacks, policies Science Gateways –Provide “TeraGrid Inside” capabilities –Leverage community investment Three common forms: –Web-based Portals –Application programs running on users' machines but accessing services in TeraGrid –Coordinated access points enabling users to move seamlessly between TeraGrid and other grids. Workflow Composer Source: Dennis Gannon (gannon@cs.indiana.edu)

52 TeraGrid as a Social Network Science Gateway community very successful –Transitioning to consulting model Campus Champions – Campus Representatives assisting local users HPC University – training and education resources and events Education and Outreach –Engaging thousands of people

53 HPC Education and Training Workshops, institutes and seminars on high-performance scientific computing Hands-on tutorials on porting and optimizing code for the TeraGrid systems On-line self-paced tutorials High-impact educational and visual materials suitable for K–12, undergraduate and graduate classes TeraGrid partners offer training and education events and resources to educators and researchers:

54 “ HPC University” Advance researchers’ HPC skills –Catalog of live and self-paced training –Schedule series of training courses –Gap analysis of materials to drive development Work with educators to enhance the curriculum –Search catalog of HPC resources –Schedule workshops for curricular development –Leverage good work of others Offer Student Research Experiences –Enroll in HPC internship opportunities –Offer Student Competitions Publish Science and Education Impact –Promote via TeraGrid Science Highlights, iSGTW –Publish education resources to NSDL-CSERD

55 Sampling of Training Topics Offered HPC Computing – Introduction to Parallel Computing – Toward Multicore Petascale Applications – Scaling Workshop - Scaling to Petaflops – Effective Use of Multi-core Technology – TeraGrid - Wide BlueGene Applications – Introduction to Using SDSC Systems – Introduction to the Cray XT3 at PSC – Introduction to & Optimization for SDSC Sytems – Parallel Computing on Ranger & Lonestar Domain-specific Sessions – Petascale Computing in the Biosciences – Workshop on Infectious Disease Informatics at NCSA Visualization –Introduction to Scientific Visualization – Intermediate Visualization at TACC –Remote/Collaborative TeraScale Visualization on the TeraGrid Other Topics – NCSA to host workshop on data center design – Rocks Linux Cluster Workshop – LCI International Conference on HPC Clustered Computing Over 30 on-line asynchronous tutorials

56 Internships and Fellowships Computer science in user support and operations Future technologies Research activities TeraGrid Partners offer internships and fellowships that allow undergraduates, post-graduate students and faculty to be located on-site and work with TeraGrid staff and researchers in areas critical to advancing scientific discovery:

57 ANL/UCIUNCSAORNLPSCPurdueSDSCTACC Computational Resources Itanium 2 (0.5 TF) IA-32 (0.5 TF) Itanium2 (0.2 TF) IA-32 (2.0 TF) Itanium2 (10.7 TF) SGI SMP (7.0 TF) Dell Xeon (17.2TF) IBM p690 (2TF) Condor Flock (1.1TF) IA-32 (0.3 TF) XT3 (10 TF) TCS (6 TF) Marvel SMP (0.3 TF) Hetero (1.7 TF) IA-32 (11 TF) Opportunistic Itanium2 (4.4 TF) Power4+ (15.6 TF) Blue Gene (5.7 TF) IA-32 (6.3 TF) Online Storage20 TB32 TB1140 TB1 TB300 TB26 TB1400 TB50 TB Mass Storage1.2 PB5 PB2.4 PB1.3 PB6 PB2 PB Net Gb/s, Hub30 CHI10 CHI30 CHI10 ATL30 CHI10 CHI10 LA10 CHI Data Collections # collections Approx total size Access methods 5 Col. >3.7 TB URL/DB/ GridFTP > 30 Col. URL/SRB/DB/ GridFTP 4 Col. 7 TB SRB/Portal/ OPeNDAP >70 Col. >1 PB GFS/SRB/ DB/GridFTP 4 Col. 2.35 TB SRB/Web Services/ URL InstrumentsProteomics X-ray Cryst. SNS and HFIR Facilities Visualization Resources RI: Remote Interact RB: Remote Batch RC: RI/Collab RI, RC, RB IA-32, 96 GeForce 6600GT RB SGI Prism, 32 graphics pipes; IA-32 RI, RB IA-32 + Quadro4 980 XGL RB IA-32, 48 Nodes RBRI, RC, RB UltraSPARC IV, 512GB SMP, 16 gfx cards TeraGrid Resources 100+ TF 8 distinct architectures 3 PB Online Disk >100 data collections

58 Science Gateways A new initiative for the TeraGrid Increasing investment by communities in their own cyberinfrastructure, but heterogeneous: Resources Users – from expert to K-12 Software stacks, policies Science Gateways –Provide “TeraGrid Inside” capabilities –Leverage community investment Three common forms: –Web-based Portals –Application programs running on users' machines but accessing services in TeraGrid –Coordinated access points enabling users to move seamlessly between TeraGrid and other grids. Workflow Composer

59 Applications can cross infrastructures e.g: OSG and TeraGrid

60 More Info: Open Science Grid –http://www.opensciencegrid.orghttp://www.opensciencegrid.org TeraGrid –http://www.teragrid.org

61 it’s the people…that make the grid a community!


Download ppt "National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)"

Similar presentations


Ads by Google