Download presentation
Presentation is loading. Please wait.
Published byJustin Brooks Modified over 9 years ago
1
RIT Colloquium (May 23, 2007)Paul Avery 1 Paul Avery University of Florida avery@phys.ufl.edu Physics Colloquium RIT (Rochester, NY) May 23, 2007 Open Science Grid Linking Universities and Laboratories In National Cyberinfrastructure www.opensciencegrid.org
2
RIT Colloquium (May 23, 2007)Paul Avery 2 Cyberinfrastructure and Grids Grid: Geographically distributed computing resources configured for coordinated use Fabric: Physical resources & networks providing raw capability Ownership:Resources controlled by owners and shared w/ others Middleware: Software tying it all together: tools, services, etc. Enhancing collaboration via transparent resource sharing US-CMS “Virtual Organization”
3
RIT Colloquium (May 23, 2007)Paul Avery 3 Motivation: Data Intensive Science 21 st century scientific discovery Computationally & data intensive Theory + experiment + simulation Internationally distributed resources and collaborations Dominant factor: data growth (1 petabyte = 1000 terabytes) 2000~0.5 petabyte 2007~10 petabytes 2013~100 petabytes 2020~1000 petabytes Powerful cyberinfrastructure needed ComputationMassive, distributed CPU Data storage & accessLarge-scale, distributed storage Data movementInternational optical networks Data sharingGlobal collaborations (100s – 1000s) SoftwareManaging all of the above How to collect, manage, access and interpret this quantity of data?
4
RIT Colloquium (May 23, 2007)Paul Avery 4 Open Science Grid: July 20, 2005 Consortium of many organizations (multiple disciplines) Production grid cyberinfrastructure 80+ sites, 25,000+ CPUs: US, UK, Brazil, Taiwan
5
RIT Colloquium (May 23, 2007)Paul Avery 5 The Open Science Grid Consortium Open Science Grid U.S. grid projects LHC experiments Laboratory centers Education communities Science projects & communities Technologists (Network, HPC, …) Computer Science University facilities Multi-disciplinary facilities Regional and campus grids
6
RIT Colloquium (May 23, 2007)Paul Avery 6 Open Science Grid Basics Who Comp. scientists, IT specialists, physicists, biologists, etc. What Shared computing and storage resources High-speed production and research networks Meeting place for research groups, software experts, IT providers Vision Maintain and operate a premier distributed computing facility Provide education and training opportunities in its use Expand reach & capacity to meet needs of stakeholders Dynamically integrate new resources and applications Members and partners Members:HPC facilities, campus, laboratory & regional grids Partners:Interoperation with TeraGrid, EGEE, NorduGrid, etc.
7
RIT Colloquium (May 23, 2007)Paul Avery 7 Crucial Ingredients in Building OSG Science “Push”: ATLAS, CMS, LIGO, SDSS 1999: Foresaw overwhelming need for distributed cyberinfrastructure Early funding: “Trillium” consortium PPDG:$12M (DOE)(1999 – 2006) GriPhyN:$12M (NSF)(2000 – 2006) iVDGL:$14M (NSF)(2001 – 2007) Supplements + new funded projects Social networks: ~150 people with many overlaps Universities, labs, SDSC, foreign partners Coordination: pooling resources, developing broad goals Common middleware: Virtual Data Toolkit (VDT) Multiple Grid deployments/testbeds using VDT Unified entity when collaborating internationally Historically, a strong driver for funding agency collaboration
8
RIT Colloquium (May 23, 2007)Paul Avery 8 OSG History in Context 1999200020012002200520032004200620072008 2009 PPDG GriPhyN iVDGL TrilliumGrid3 OSG (DOE) (DOE+NSF) (NSF) European Grid + Worldwide LHC Computing Grid Campus, regional grids LHC Ops LHC construction, preparation LIGO operation LIGO preparation
9
RIT Colloquium (May 23, 2007)Paul Avery 9 Principal Science Drivers High energy and nuclear physics 100s of petabytes (LHC)2007 Several petabytes2005 LIGO (gravity wave search) 0.5 - several petabytes2002 Digital astronomy 10s of petabytes2009 10s of terabytes2001 Other sciences coming forward Bioinformatics (10s of petabytes) Nanoscience Environmental Chemistry Applied mathematics Materials Science? Data growth Community growth 2007 2005 2003 2001 2009
10
RIT Colloquium (May 23, 2007)Paul Avery 10 OSG Virtual Organizations ATLAS HEP/LHCHEP experiment at CERN CDF HEPHEP experiment at FermiLab CMS HEP/LHCHEP experiment at CERN DES Digital astronomyDark Energy Survey DOSAR Regional gridRegional grid in Southwest US DZero HEPHEP experiment at FermiLab DOSAR Regional gridRegional grid in Southwest ENGAGE Engagement effortA place for new communities FermiLab Lab gridHEP laboratory grid fMRI Functional MRI GADU BioBioinformatics effort at Argonne Geant4 SoftwareSimulation project GLOW Campus gridCampus grid U of Wisconsin, Madison GRASE Regional gridRegional grid in Upstate NY
11
RIT Colloquium (May 23, 2007)Paul Avery 11 OSG Virtual Organizations (2) GridChem ChemistryQuantum chemistry grid GPN Great Plains Networkwww.greatplains.net GROW Campus gridCampus grid at U of Iowa I2U2 EOTE/O consortium LIGO Gravity wavesGravitational wave experiment Mariachi Cosmic raysUltra-high energy cosmic rays nanoHUB NanotechNanotechnology grid at Purdue NWICG Regional gridNorthwest Indiana regional grid NYSGRID NY State Gridwww.nysgrid.org OSGEDU EOTOSG education/outreach SBGRID Structural biologyStructural biology @ Harvard SDSS Digital astronomySloan Digital Sky Survey (Astro) STAR Nuclear physicsNuclear physics experiment at Brookhaven UFGrid Campus gridCampus grid at U of Florida
12
RIT Colloquium (May 23, 2007)Paul Avery 12 Partners: Federating with OSG Campus and regional Grid Laboratory of Wisconsin (GLOW) Grid Operations Center at Indiana University (GOC) Grid Research and Education Group at Iowa (GROW) Northwest Indiana Computational Grid (NWICG) New York State Grid (NYSGrid) (in progress) Texas Internet Grid for Research and Education (TIGRE) nanoHUB (Purdue) LONI (Louisiana) National Data Intensive Science University Network (DISUN) TeraGrid International Worldwide LHC Computing Grid Collaboration (WLCG) Enabling Grids for E-SciencE (EGEE) TWGrid (from Academica Sinica Grid Computing) Nordic Data Grid Facility (NorduGrid) Australian Partnerships for Advanced Computing (APAC)
13
RIT Colloquium (May 23, 2007)Paul Avery 13 Search for Origin of Mass New fundamental forces Supersymmetry Other new particles 2007 – ? TOTEM LHCb ALICE 27 km Tunnel in Switzerland & France CMS ATLAS Defining the Scale of OSG: Experiments at Large Hadron Collider LHC @ CERN
14
RIT Colloquium (May 23, 2007)Paul Avery 14 CMS: “Compact” Muon Solenoid Inconsequential humans
15
RIT Colloquium (May 23, 2007)Paul Avery 15 All charged tracks with pt > 2 GeV Reconstructed tracks with pt > 25 GeV (+30 minimum bias events) 10 9 collisions/sec, selectivity: 1 in 10 13 Collision Complexity: CPU + Storage
16
RIT Colloquium (May 23, 2007)Paul Avery 16 CMS ATLAS LHCb Storage Raw recording rate 0.2 – 1.5 GB/s Large Monte Carlo data samples 100 PB by ~2013 1000 PB later in decade? Processing PetaOps (> 300,000 3 GHz PCs) Users 100s of institutes 1000s of researchers LHC Data and CPU Requirements
17
RIT Colloquium (May 23, 2007)Paul Avery 17 CMS Experiment OSG and LHC Global Grid Online System CERN Computer Center FermiLab Korea Russia UK Maryland 200 - 1500 MB/s >10 Gb/s 10-40 Gb/s 2.5-10 Gb/s Tier 0 Tier 1 Tier 3 Tier 2 Physics caches PCs Iowa UCSDCaltech U Florida 5000 physicists, 60 countries 10s of Petabytes/yr by 2009 CERN / Outside = 10-20% FIU Tier 4 OSG
18
RIT Colloquium (May 23, 2007)Paul Avery 18 ATLAS CMS LHC Global Collaborations 2000 – 3000 physicists per experiment USA is 20–31% of total
19
RIT Colloquium (May 23, 2007)Paul Avery 19 LIGO: Search for Gravity Waves LIGO Grid 6 US sites 3 EU sites (UK & Germany) * LHO, LLO: LIGO observatory sites * LSC: LIGO Scientific Collaboration Cardiff AEI/Golm Birmingham
20
RIT Colloquium (May 23, 2007)Paul Avery 20 Sloan Digital Sky Survey: Mapping the Sky
21
RIT Colloquium (May 23, 2007)Paul Avery 21 Integrated Database Integrated Database Includes: Parsed Sequence Data and Annotation Data from Public web sources. Results of different tools used for Analysis: Blast, Blocks, TMHMM, … GADU using Grid Applications executed on Grid as workflows and results are stored in integrated Database. GADU Performs: Acquisition: to acquire Genome Data from a variety of publicly available databases and store temporarily on the file system. Analysis: to run different publicly available tools and in-house tools on the Grid using Acquired data & data from Integrated database. Storage: Store the parsed data acquired from public databases and parsed results of the tools and workflows used during analysis. Bidirectional Data Flow Public Databases Genomic databases available on the web. Eg: NCBI, PIR, KEGG, EMP, InterPro, etc. Applications (Web Interfaces) Based on the Integrated Database PUMA2 Evolutionary Analysis of Metabolism Chisel Protein Function Analysis Tool. TARGET Targets for Structural analysis of proteins. PATHOS Pathogenic DB for Bio-defense research Phyloblocks Evolutionary analysis of protein families TeraGridOSGDOE SG GNARE – Genome Analysis Research Environment Services to Other Groups SEED (Data Acquisition) Shewanella Consortium (Genome Analysis) Others.. Bioinformatics: GADU / GNARE
22
RIT Colloquium (May 23, 2007)Paul Avery 22 Bioinformatics (cont) Shewanella oneidensis genome
23
RIT Colloquium (May 23, 2007)Paul Avery 23 Nanoscience Simulations collaboration nanoHUB.org courses, tutorials online simulation seminars learning modules Real users and real usage >10,100 users 1881 sim. users >53,000 simulations
24
RIT Colloquium (May 23, 2007)Paul Avery 24 OSG Engagement Effort Purpose: Bring non-physics applications to OSG Led by RENCI (UNC + NC State + Duke) Specific targeted opportunities Develop relationship Direct assistance with technical details of connecting to OSG Feedback and new requirements for OSG infrastructure (To facilitate inclusion of new communities) More & better documentation More automation
25
RIT Colloquium (May 23, 2007)Paul Avery 25 OSG and the Virtual Data Toolkit VDT: a collection of software Grid software (Condor, Globus, VOMS, dCache, GUMS, Gratia, …) Virtual Data System Utilities VDT: the basis for the OSG software stack Goal is easy installation with automatic configuration Now widely used in other projects Has a growing support infrastructure
26
RIT Colloquium (May 23, 2007)Paul Avery 26 Why Have the VDT? Everyone could download the software from the providers But the VDT: Figures out dependencies between software Works with providers for bug fixes Automatic configures & packages software Tests everything on 15 platforms (and growing) Debian 3.1 Fedora Core 3 Fedora Core 4 (x86, x86-64) Fedora Core 4 (x86-64) RedHat Enterprise Linux 3 AS (x86, x86-64, ia64) RedHat Enterprise Linux 4 AS (x64, x86-64) ROCKS Linux 3.3 Scientific Linux Fermi 3 Scientific Linux Fermi 4 (x86, x86-64, ia64) SUSE Linux 9 (IA-64)
27
RIT Colloquium (May 23, 2007)Paul Avery 27 VDT Growth Over 5 Years (1.6.1i now) vdt.cs.wisc.edu # of Components
28
RIT Colloquium (May 23, 2007)Paul Avery 28 Collaboration with Internet2 www.internet2.edu
29
RIT Colloquium (May 23, 2007)Paul Avery 29 Optical, multi-wavelength community owned or leased “dark fiber” (10 GbE) networks for R&E Spawning state-wide and regional networks (FLR, SURA, LONI, …) Bulletin: NLR-Internet2 merger announcement Collaboration with National Lambda Rail www.nlr.net
30
RIT Colloquium (May 23, 2007)Paul Avery 30 UltraLight 10 Gb/s+ network Caltech, UF, FIU, UM, MIT SLAC, FNAL Int’l partners Level(3), Cisco, NLR http://www.ultralight.org Funded by NSF Integrating Advanced Networking in Applications
31
RIT Colloquium (May 23, 2007)Paul Avery 31 REDDnet: National Networked Storage NSF funded project Vandebilt 8 initial sites Multiple disciplines Satellite imagery HEP Terascale Supernova Initative Structural Biology Bioinformatics Storage 500TB disk 200TB tape Brazil?
32
RIT Colloquium (May 23, 2007)Paul Avery 32 OSG Jobs Snapshot: 6 Months 5000 simultaneous jobs from multiple VOs SepDecFebNovJanOctMar
33
RIT Colloquium (May 23, 2007)Paul Avery 33 OSG Jobs Per Site: 6 Months 5000 simultaneous jobs at multiple sites SepDecFebNovJanOctMar
34
RIT Colloquium (May 23, 2007)Paul Avery 34 Completed Jobs/Week on OSG 400K CMS “Data Challenge” SepDecFebNovJanOctMar
35
RIT Colloquium (May 23, 2007)Paul Avery 35 # Jobs Per VO New Accounting System (Gratia)
36
RIT Colloquium (May 23, 2007)Paul Avery 36 Massive 2007 Data Reprocessing by D0 Experiment @ Fermilab SAM OSG LCG ~ 400M total ~ 250M OSG
37
RIT Colloquium (May 23, 2007)Paul Avery 37 CDF Discovery of B s Oscillations
38
RIT Colloquium (May 23, 2007)Paul Avery 38 Communications: International Science Grid This Week SGTW iSGTW From April 2005 Diverse audience >1000 subscribers www.isgtw.org
39
RIT Colloquium (May 23, 2007)Paul Avery 39 OSG News: Monthly Newsletter 18 issues by Apr. 2007 www.opensciencegrid.org/ osgnews
40
RIT Colloquium (May 23, 2007)Paul Avery 40 Grid Summer Schools Summer 2004, 2005, 2006 1 week @ South Padre Island, Texas Lectures plus hands-on exercises to ~40 students Students of differing backgrounds (physics + CS), minorities Reaching a wider audience Lectures, exercises, video, on web More tutorials, 3-4/year Students, postdocs, scientists Agency specific tutorials
41
RIT Colloquium (May 23, 2007)Paul Avery 41 Project Challenges Technical constraints Commercial tools fall far short, require (too much) invention Integration of advanced CI, e.g. networks Financial constraints (slide) Fragmented & short term funding injections (recent $30M/5 years) Fragmentation of individual efforts Distributed coordination and management Tighter organization within member projects compared to OSG Coordination of schedules & milestones Many phone/video meetings, travel Knowledge dispersed, few people have broad overview
42
RIT Colloquium (May 23, 2007)Paul Avery 42 Funding & Milestones: 1999 – 2007 20002001200320042005200620072002 GriPhyN, $12M PPDG, $9.5M UltraLight, $2M CHEPREO, $4M Grid Communications Grid Summer Schools 2004, 2005, 2006 Grid3 start OSG start VDT 1.0 First US-LHC Grid Testbeds Digital Divide Workshops 04, 05, 06 LIGO Grid LHC start iVDGL, $14M DISUN, $10M OSG, $30M NSF, DOE VDT 1.3 Grid, networking projects Large experiments Education, outreach, training
43
RIT Colloquium (May 23, 2007)Paul Avery 43 Challenges from Diversity and Growth Management of an increasingly diverse enterprise Sci/Eng projects, organizations, disciplines as distinct cultures Accommodating new member communities (expectations?) Interoperation with other grids TeraGrid International partners (EGEE, NorduGrid, etc.) Multiple campus and regional grids Education, outreach and training Training for researchers, students … but also project PIs, program officers Operating a rapidly growing cyberinfrastructure 25K 100K CPUs, 4 10 PB disk Management of and access to rapidly increasing data stores (slide) Monitoring, accounting, achieving high utilization Scalability of support model (slide)
44
RIT Colloquium (May 23, 2007)Paul Avery 44 Rapid Cyberinfrastructure Growth: LHC CERN Tier-1 Tier-2 2008: ~140,000 PCs Meeting LHC service challenges & milestones Participating in worldwide simulation productions
45
RIT Colloquium (May 23, 2007)Paul Avery 45 OSG Operations Distributed model Scalability! VOs, sites, providers Rigorous problem tracking & routing Security Provisioning Monitoring Reporting Partners with EGEE operations
46
RIT Colloquium (May 23, 2007)Paul Avery 46 Integrated Network Management Five Year Project Timeline & Milestones LHC Simulations Support 1000 Users; 20PB Data Archive Contribute to Worldwide LHC Computing Grid LHC Event Data Distribution and Analysis Contribute to LIGO Workflow and Data Analysis Additional Science Communities+1 Community Facility Security : Risk Assessment, Audits, Incident Response, Management, Operations, Technical Controls Plan V11st AuditRisk Assessment AuditRisk Assessment AuditRisk Assessment AuditRisk Assessment VDT and OSG Software Releases: Major Release every 6 months; Minor Updates as needed VDT 1.4.0VDT 1.4.1VDT 1.4.2………… Advanced LIGO LIGO Data Grid dependent on OSG CDF Simulation STAR, CDF, D0, Astrophysics D0 Reprocessing D0 Simulations CDF Simulation and Analysis LIGO data run SC5 Facility Operations and Metrics: Increase robustness and scale; Operational Metrics defined and validated each year. Interoperate and Federate with Campus and Regional Grids 200620072008200920102011 Project startEnd of Phase I End of Phase II VDT Incremental Updates dCache with role based authorization OSG 0.6.0OSG 0.8.0OSG 1.0OSG 2.0OSG 3.0… AccountingAuditing VDS with SRM Common S/w Distribution with TeraGrid EGEE using VDT 1.4.X Transparent data and job movement with TeraGrid Transparent data management with EGEE Federated monitoring and information services Data Analysis (batch and interactive) Workflow Extended Capabilities & Increase Scalability and Performance for Jobs and Data to meet Stakeholder needs SRM/dCache Extensions “Just in Time” Workload Management VO Services Infrastructure Improved Workflow and Resource Selection Work with SciDAC-2 CEDS and Security with Open Science +1 Community 200620072008200920102011 +1 Community STAR Data Distribution and Jobs 10KJobs per Day
47
RIT Colloquium (May 23, 2007)Paul Avery 47 Extra Slides
48
RIT Colloquium (May 23, 2007)Paul Avery 48 VDT Release Process (Subway Map) Gather requirements Build software Test Validation test bed ITB Release Candidate VDT Release Integration test bed OSG Release Time Day 0 Day N From Alain Roy
49
RIT Colloquium (May 23, 2007)Paul Avery 49 VDT Challenges How should we smoothly update a production service? In-place vs. on-the-side Preserve old configuration while making big changes Still takes hours to fully install and set up from scratch How do we support more platforms? A struggle to keep up with the onslaught of Linux distributions AIX? Mac OS X? Solaris? How can we accommodate native packaging formats? RPM Deb Fedora Core 3 Fedora Core 4 RHEL 3 RHEL 4 BCCD Fedora Core 6
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.