Download presentation
Presentation is loading. Please wait.
Published byMilo Terry Modified over 8 years ago
1
Mardi Gras Conference (February 3, 2005)Paul Avery1 University of Florida avery@phys.ufl.edu Grid3 and Open Science Grid Mardi Gras Conference Louisiana State University Baton Rouge February 3, 2005
2
Mardi Gras Conference (February 3, 2005)Paul Avery2 Data Grids & Collaborative Research Scientific discovery increasingly dependent on collaboration Computationally & data intensive analyses Resources and collaborations distributed internationally Dominant factor: data growth (1 Petabyte = 1000 TB) 2000~0.5 Petabyte 2005~10 Petabytes 2012~100 Petabytes 2018~1000 Petabytes? Drives need for powerful linked resources: “Data Grids” ComputationMassive, distributed CPU Data storage and accessDistributed hi-speed disk and tape Data movementInternational optical networks Collaborative research and Data Grids Data discovery, resource sharing, distributed analysis, etc. How to collect, manage, access and interpret this quantity of data?
3
Mardi Gras Conference (February 3, 2005)Paul Avery3 Data Intensive Disciplines High energy & nuclear physics Belle, BaBar, Tevatron, RHIC, JLAB Large Hadron Collider (LHC) Astronomy Digital sky surveys, “Virtual” Observatories VLBI arrays: multiple- Gbps data streams Gravity wave searches LIGO, GEO, VIRGO, TAMA, ACIGA, … Earth and climate systems Earth Observation, climate modeling, oceanography, … Biology, medicine, imaging Genome databases Proteomics (protein structure & interactions, drug delivery, …) High-res brain scans (1-10 m, time dependent)
4
Mardi Gras Conference (February 3, 2005)Paul Avery4 U.S. Projects GriPhyN (NSF) iVDGL (NSF) Particle Physics Data Grid (DOE) Open Science Grid UltraLight TeraGrid (NSF) DOE Science Grid (DOE) NEESgrid (NSF) NSF Middleware Initiative (NSF) Background: Data Grid Projects EU, Asia projects EGEE (EU) LCG (CERN) DataGrid EU national Projects DataTAG (EU) CrossGrid (EU) GridLab (EU) Japanese, Korea Projects Not exclusively HEP (but many driven/led by HEP) Many 10s x $M brought into the field Large impact on other sciences, education HEP led projects
5
Mardi Gras Conference (February 3, 2005)Paul Avery5 U.S. “Trillium” Grid Consortium Trillium = PPDG + GriPhyN + iVDGL Particle Physics Data Grid:$12M (DOE)(1999 – 2004+) GriPhyN:$12M (NSF)(2000 – 2005) iVDGL:$14M (NSF)(2001 – 2006) Basic composition (~150 people) PPDG:4 universities, 6 labs GriPhyN:12 universities, SDSC, 3 labs iVDGL:18 universities, SDSC, 4 labs, foreign partners Expts:BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO, SDSS/NVO Complementarity of projects GriPhyN:CS research, Virtual Data Toolkit (VDT) development PPDG:“End to end” Grid services, monitoring, analysis iVDGL:Grid laboratory deployment using VDT Experiments provide frontier challenges Unified entity when collaborating internationally
6
Mardi Gras Conference (February 3, 2005)Paul Avery6 Goal: Peta-scale Data Grids for Global Science Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Transforms Distributed resources (code, storage, CPUs, networks) Resource Management Services Security and Policy Services Other Grid Services Interactive User Tools Production Team Single Researcher Workgroups Raw data source PetaOps Petabytes Performance
7
Mardi Gras Conference (February 3, 2005)Paul Avery7 Trillium Science Drivers Experiments at Large Hadron Collider 100s of Petabytes2007 - ? High Energy & Nuclear Physics expts ~1 Petabyte (1000 TB)1997 – present LIGO (gravity wave search) 100s of Terabytes2002 – present Sloan Digital Sky Survey 10s of Terabytes2001 – present Data growth Community growth 2007 2005 2003 2001 2009 Future Grid resources Massive CPU (PetaOps) Large distributed datasets (>100PB) Global communities (1000s)
8
Mardi Gras Conference (February 3, 2005)Paul Avery8 Sloan Digital Sky Survey (SDSS) Using Virtual Data in GriPhyN Galaxy cluster size distribution Sloan Data
9
Mardi Gras Conference (February 3, 2005)Paul Avery9 The LIGO Scientific Collaboration (LSC) and the LIGO Grid LIGO Grid: 6 US sites * LHO, LLO: observatory sites * LSC - LIGO Scientific Collaboration - iVDGL supported iVDGL has enabled LSC to establish a persistent production grid Cardiff AEI/Golm + 3 EU sites (Cardiff/UK, AEI/Germany) Birmingham
10
Mardi Gras Conference (February 3, 2005)Paul Avery10 Search for Origin of Mass & Supersymmetry (2007 – ?) TOTEM LHCb ALICE 27 km Tunnel in Switzerland & France CMS ATLAS Large Hadron Collider (LHC) @ CERN
11
Mardi Gras Conference (February 3, 2005)Paul Avery11 LHC Data Rates: Detector to Storage Level 1 Trigger: Special Hardware 40 MHz 75 KHz 75 GB/sec 5 KHz 5 GB/sec Level 2 Trigger: Commodity CPUs 100 Hz 0.1 – 1.5 GB/sec Level 3 Trigger: Commodity CPUs Raw Data to storage (+ simulated data) Physics filtering ~TBytes/sec
12
Mardi Gras Conference (February 3, 2005)Paul Avery12 10 9 collisions/sec, selectivity: 1 in 10 13 Complexity: Higgs Decay into 4 muons
13
Mardi Gras Conference (February 3, 2005)Paul Avery13 LHC: Petascale Global Science Complexity:Millions of individual detector channels Scale:PetaOps (CPU), 100s of Petabytes (Data) Distribution:Global distribution of people & resources CMS Example- 2007 5000+ Physicists 250+ Institutes 60+ Countries BaBar/D0 Example - 2004 700+ Physicists 100+ Institutes 35+ Countries
14
Mardi Gras Conference (February 3, 2005)Paul Avery14 ATLAS CMS LHC Global Collaborations 1000 – 4000 per experiment USA is 20 – 25% of total
15
Mardi Gras Conference (February 3, 2005)Paul Avery15 CMS Experiment LHC Global Data Grid (2007+) Online System CERN Computer Center USA Korea Russia UK Maryland 0.1 - 1.5 GB/s >10 Gb/s 10-40 Gb/s 2.5-10 Gb/s Tier 0 Tier 1 Tier 3 Tier 2 Physics caches PCs Iowa UCSDCaltech U Florida 5000 physicists, 60 countries 10s of Petabytes/yr by 2008 1000 Petabytes in < 10 yrs? FIU Tier 4
16
Mardi Gras Conference (February 3, 2005)Paul Avery16 CMS: Grid Enabled Analysis (GAE) Architecture Scheduler Catalogs Grid Services Web Server Execution Priority Manager Grid Wide Execution Service Data Manage- ment Fully- Concrete Planner Fully- Abstract Planner Analysis Client Virtual Data Replica Applications Monitoring Partially- Abstract Planner Metadata HTTP, SOAP, XML- RPC Chimera Sphinx MonALISA ROOT (analysis tool) Python Cojac (detector viz)/ IGUANA (cms viz) Clarens MCRunjob BOSS RefDB POOL ORCA ROOT FAMOS VDT-Server MOPDB Discovery ACL management Cert. based access uClients talk standard protocols to “Grid Services Web Server” uSimple Web service API allows simple or complex analysis clients uTypical clients: ROOT, Web Browser, …. uClarens portal hides complexity uKey features: Global Scheduler, Catalogs, Monitoring, Grid-wide Execution service Analysis Client
17
Mardi Gras Conference (February 3, 2005)Paul Avery17 Collaborative Research by Globally Distributed Teams Non-hierarchical: Chaotic analyses + productions Superimpose significant random data flows
18
Mardi Gras Conference (February 3, 2005)Paul Avery18 Trillium Grid Tools: Virtual Data Toolkit Sources (CVS) Patching GPT src bundles NMI Build & Test Condor pool (37 computers) … Build Test Package VDT Build Contributors (VDS, etc.) Build Pacman cache RPMs Binaries Test Use NMI processes later
19
Mardi Gras Conference (February 3, 2005)Paul Avery19 VDT Growth Over 3 Years VDT 1.1.3, 1.1.4 & 1.1.5 pre-SC 2002 VDT 1.0 Globus 2.0b Condor 6.3.1 VDT 1.1.7 Switch to Globus 2.2 VDT 1.1.11 Grid3 VDT 1.1.8 First real use by LCG VDT 1.1.14 May 10
20
Mardi Gras Conference (February 3, 2005)Paul Avery20 Language: define software environments Interpreter: create, install, configure, update, verify environments Version 3.0.2 released Jan. 2005 LCG/Scram ATLAS/CMT CMS DPE/tar/make LIGO/tar/make OpenSource/tar/make Globus/GPT NPACI/TeraGrid/tar/make D0/UPS-UPD Commercial/tar/make Combine and manage software from arbitrary sources. % pacman –get iVDGL:Grid3 “1 button install”: Reduce burden on administrators Remote experts define installation/ config/updating for everyone at once VDT ATLA S NPAC I D- Zero iVDGL UCHEP % pacman VDT CMS/DPE LIGO Packaging of Grid Software: Pacman
21
Mardi Gras Conference (February 3, 2005)Paul Avery21 Collaborative Relationships CS Perspective Computer Science Research Virtual Data Toolkit Partner science projects Partner networking projects Partner outreach projects Larger Science Community Globus, Condor, NMI, iVDGL, PPDG EU DataGrid, LHC Experiments, QuarkNet, CHEPREO, Dig. Divide Production Deployment Tech Transfer Techniques & software Requirements Prototyping & experiments Other linkages Work force CS researchers Industry U.S.Grids Int’l Outreach
22
Mardi Gras Conference (February 3, 2005)Paul Avery22 Grid3: An Operational National Grid 35 sites, 3500 CPUs: Universities + 4 national labs Part of LHC Grid Running since October 2003 Applications in HEP, LIGO, SDSS, Genomics, CS http://www.ivdgl.org/grid3
23
Mardi Gras Conference (February 3, 2005)Paul Avery23 Grid3 Applications High energy physics US-ATLAS analysis (DIAL), US-ATLAS GEANT3 simulation (GCE) US-CMS GEANT4 simulation (MOP) BTeV simulation Gravity waves LIGO: blind search for continuous sources Digital astronomy SDSS: cluster finding (maxBcg) Bioinformatics Bio-molecular analysis (SnB) Genome analysis (GADU/Gnare) CS Demonstrators Job Exerciser, GridFTP, NetLogger-grid2003
24
Mardi Gras Conference (February 3, 2005)Paul Avery24 Grid3 Shared Use Over 6 months cms dc04 atlas dc2 Sep 10 Usage: CPUs
25
Mardi Gras Conference (February 3, 2005)Paul Avery25 Grid3 Open Science Grid Iteratively build & extend Grid3, to national infrastructure Shared resources, benefiting broad set of disciplines Realization of the critical need for operations More formal organization needed because of scale Grid3 Open Science Grid Build OSG from laboratories, universities, campus grids, etc. Argonne, Fermilab, SLAC, Brookhaven, Berkeley Lab, Jeff. Lab UW Madison, U Florida, Purdue, Chicago, Caltech, Harvard, etc. Further develop OSG Partnerships and contributions from other sciences, universities Incorporation of advanced networking Focus on general services, operations, end-to-end performance
26
Mardi Gras Conference (February 3, 2005)Paul Avery26 http://www.opensciencegrid.org
27
Mardi Gras Conference (February 3, 2005)Paul Avery27 Open Science Grid Basics OSG infrastructure A large CPU & storage Grid infrastructure supporting science Grid middleware based on Virtual Data Toolkit (VDT) Loosely coupled, consistent infrastructure: “Grid of Grids” Emphasis on “end to end” services for applications OSG collaboration builds on Grid3 Computer and application scientists Facility, technology and resource providers Grid3 OSG-0 OSG-1 OSG-2 … Fundamental unit is the Virtual Organization (VO) E.g., an experimental collaboration, a research group, a class Simplifies organization and logistics Distributed ownership of resources Local facility policies, priorities, and capabilities must be supported
28
Mardi Gras Conference (February 3, 2005)Paul Avery28 OSG Integration: Applications, Infrastructure, Facilities
29
Mardi Gras Conference (February 3, 2005)Paul Avery29 Enterprise Technical Groups Research Grid Projects VOs Researchers Sites Service Providers Universities, Labs activity 1 activity 1 activity 1 Activities Advisory Committee Core OSG Staff (few FTEs, manager) OSG Council (all members above a certain threshold, Chair, officers) Executive Board (8-15 representatives Chair, Officers) OSG Organization
30
Mardi Gras Conference (February 3, 2005)Paul Avery30 OSG Technical Groups and Activities Technical Groups address and coordinate a technical area Propose and carry out activities related to their given areas Liaise & collaborate with other peer projects (U.S. & international) Participate in relevant standards organizations. Chairs participate in Blueprint, Grid Integration and Deployment activities Activities are well-defined, scoped set of tasks contributing to the OSG Each Activity has deliverables and a plan … is self-organized and operated … is overseen & sponsored by one or more Technical Groups
31
Mardi Gras Conference (February 3, 2005)Paul Avery31 OSG Technical Groups (7 currently) Governance Charter, organization, by-laws, agreements, formal processes Policy VO & site policy, authorization, priorities, privilege & access rights Security Common security principles, security infrastructure Monitoring and Information Services Resource monitoring, information services, auditing, troubleshooting Storage Storage services at remote sites, interfaces, interoperability Support Centers Infrastructure and services for user support, helpdesk, trouble ticket Education and Outreach Networking?
32
Mardi Gras Conference (February 3, 2005)Paul Avery32 OSG Activities (5 currently) Blueprint Defining principles and best practices for OSG Deployment Deployment of resources & services Incident Response Plans and procedures for responding to security incidents Integration Testing & validating & integrating new services and technologies Data Resource Management (DRM) Deployment of specific Storage Resource Management technology
33
Mardi Gras Conference (February 3, 2005)Paul Avery33 OSG Short Term Plans Maintain Grid3 operations In parallel with extending Grid3 to OSG OSG technology advances for Spring 2005 deployment Add full Storage Elements Extend Authorization services Extend Data Management services Interface to sub-Grids Extend monitoring, testing, accounting Add new VOs + OSG-wide VO Services Add Discovery Service Service challenges & collaboration with the LCG Make the switch to “Open Science Grid” in Spring 2005
34
Mardi Gras Conference (February 3, 2005)Paul Avery34 Open Science Grid Meetings Sep. 17, 2003 @ NSF Strong interest of NSF education people Jan. 12, 2004 @ Fermilab Initial stakeholders meeting, 1 st discussion of governance May 20-21, 2004 @ Univ. of Chicago Joint Trillium Steering meeting to define OSG program July 2004 @ Wisconsin First attempt to define OSG Blueprint (document) Sep. 9-10, 2004 @ Harvard Major OSG workshop: Technical, Governance, Sciences Dec. 15-17, 2004 @ UCSD Major meeting for Technical Groups Feb. 15-17, 2005 @ U Chicago Integration meeting
35
Mardi Gras Conference (February 3, 2005)Paul Avery35 Networks
36
Mardi Gras Conference (February 3, 2005)Paul Avery36 Networks and Grids for Global Science Network backbones and major links are advancing rapidly To the 10G range in < 3 years; faster than Moore’s Law New HENP and DOE Roadmaps: a factor ~1000 BW Growth per decade We are learning to use long distance 10 Gbps networks effectively 2004 Developments: to 7 - 7.5 Gbps flows with TCP over 16-25 kkm Transition to community-operated optical R&E networks US, CA, NL, PL, CZ, SK, KR, JP … Emergence of a new generation of “hybrid” optical networks We must work to close to digital divide To allow scientists in all world regions to take part in discoveries Regional, last mile, local bottlenecks and compromises in network quality are now on the critical path Important examples on the road to closing the digital divide CLARA, CHEPREO, and the Brazil HEPGrid in Latin America Optical networking in Central and Southeast Europe APAN Links in the Asia Pacific: GLORIAD and TEIN Leadership and Outreach: HEP Groups in Europe, US, Japan, & Korea
37
Mardi Gras Conference (February 3, 2005)Paul Avery37 HEP Bandwidth Roadmap (Gb/s)
38
Mardi Gras Conference (February 3, 2005)Paul Avery38 Evolving Science Requirements for Networks (DOE High Perf. Network Workshop) Science Areas Today End2End Throughput 5 years End2End Throughput 5-10 Years End2End Throughput Remarks High Energy Physics 0.5 Gb/s100 Gb/s 1000 Gb/s High bulk throughput Climate (Data & Computation) 0.5 Gb/s160-200 Gb/s N x 1000 Gb/s High bulk throughput SNS NanoScience Not yet started 1 Gb/s1000 Gb/s + QoS for Control Channel Remote control and time critical throughput Fusion Energy0.066 Gb/s (500 MB/s burst) 0.2 Gb/s (500MB/ 20 sec. burst) N x 1000 Gb/s Time critical throughput Astrophysics0.013 Gb/s (1 TB/week) N*N multicast 1000 Gb/s Computational steering and collaborations Genomics Data & Computation 0.091 Gb/s (1 TB/day) 100s of users1000 Gb/s + QoS for Control Channel High throughput and steering See http://www.doecollaboratory.org/meetings/hpnpw /
39
Mardi Gras Conference (February 3, 2005)Paul Avery39 UltraLight: Advanced Networking in Applications 10 Gb/s+ network Caltech, UF, FIU, UM, MIT SLAC, FNAL Int’l partners Level(3), Cisco, NLR Funded by ITR2004
40
Mardi Gras Conference (February 3, 2005)Paul Avery40 Education and Outreach
41
Mardi Gras Conference (February 3, 2005)Paul Avery41 NEWS: Bulletin: ONE TWO WELCOME BULLETIN General Information Registration Travel Information Hotel Registration Participant List How to Get UERJ/Hotel Computer Accounts Useful Phone Numbers Program Contact us: Secretariat Chairmen Grids and the Digital Divide Rio de Janeiro, Feb. 16-20, 2004 Background World Summit on Information Society HEP Standing Committee on Inter-regional Connectivity (SCIC) Themes Global collaborations, Grids and addressing the Digital Divide Next meeting: May 2005 (Korea) http://www.uerj.br/lishep2004
42
Mardi Gras Conference (February 3, 2005)Paul Avery42 Second Digital Divide Grid Meeting Prof. Dongchul Son Center for High Energy Physics Kyungpook National University International Workshop on HEP Networking, Grids and Digital Divide Issues for Global e-Science May 23-27, 2005 Daegu, Korea
43
Mardi Gras Conference (February 3, 2005)Paul Avery43 iVDGL, GriPhyN Education / Outreach Basics $200K/yr Led by UT Brownsville Workshops, portals Partnerships with CHEPREO, QuarkNet, …
44
Mardi Gras Conference (February 3, 2005)Paul Avery44 June 21-25 Grid Summer School First of its kind in the U.S. (South Padre Island, Texas) 36 students, diverse origins and types (M, F, MSIs, etc) Marks new direction for U.S. Grid efforts First attempt to systematically train people in Grid technologies First attempt to gather relevant materials in one place Today:Students in CS and Physics Later:Students, postdocs, junior & senior scientists Reaching a wider audience Put lectures, exercises, video, on the web More tutorials, perhaps 2+/year Dedicated resources for remote tutorials Create “Grid book”, e.g. Georgia Tech New funding opportunities NSF: new training & education programs
45
CHEPREO: Center for High Energy Physics Research and Educational Outreach Florida International University Physics Learning Center CMS Research iVDGL Grid Activities AMPATH network (S. America) Funded September 2003 $4M initially (3 years) 4 NSF Directorates!
46
Mardi Gras Conference (February 3, 2005)Paul Avery46 QuarkNet/GriPhyN e-Lab Project
47
Mardi Gras Conference (February 3, 2005)Paul Avery47 Chiron/QuarkNet Architecture
48
Mardi Gras Conference (February 3, 2005)Paul Avery48 Muon Lifetime Analysis Workflow
49
Mardi Gras Conference (February 3, 2005)Paul Avery49 QuarkNet Portal Architecture Simpler interface for non-experts Builds on Chiron portal
50
Mardi Gras Conference (February 3, 2005)Paul Avery50 Summary Grids enable 21 st century collaborative science Linking research communities and resources for scientific discovery Needed by LHC global collaborations pursuing “petascale” science Grid3 was an important first step in developing US Grids Value of planning, coordination, testbeds, rapid feedback Value of building & sustaining community relationships Value of learning how to operate Grid as a facility Value of delegation, services, documentation, packaging Grids drive need for advanced optical networks Grids impact education and outreach Providing technologies & resources for training, education, outreach Addressing the Digital Divide OSG: a scalable computing infrastructure for science? Strategies needed to cope with increasingly large scale
51
Mardi Gras Conference (February 3, 2005)Paul Avery51 Grid Project References Grid3 www.ivdgl.org/grid3 Open Science Grid www.opensciencegrid.org GriPhyN www.griphyn.org iVDGL www.ivdgl.org PPDG www.ppdg.net CHEPREO www.chepreo.org UltraLight ultralight.cacr.caltech.edu Globus www.globus.org LCG www.cern.ch/lcg EU DataGrid www.eu-datagrid.org EGEE www.eu-egee.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.