Download presentation
Presentation is loading. Please wait.
Published byJeffery O’Connor’ Modified over 9 years ago
1
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery1 University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu Extending the Grid Reach in Europe Brussels, Mar. 23, 2001 http://www.phys.ufl.edu/~avery/griphyn/talks/avery_brussels_23mar01.ppt Global Data Grids The Need for Infrastructure
2
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery2 Global Data Grid Challenge “Global scientific communities, served by networks with bandwidths varying by orders of magnitude, need to perform computationally demanding analyses of geographically distributed datasets that will grow by at least 3 orders of magnitude over the next decade, from the 100 Terabyte to the 100 Petabyte scale.”
3
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery3 Data Intensive Science: 2000-2015 è Scientific discovery increasingly driven by IT Computationally intensive analyses Massive data collections Rapid access to large subsets Data distributed across networks of varying capability è Dominant factor: data growth (1 Petabyte = 1000 TB) 2000~0.5 Petabyte 2005~10 Petabytes 2010~100 Petabytes 2015~1000 Petabytes? How to collect, manage, access and interpret this quantity of data?
4
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery4 Data Intensive Disciplines è High energy & nuclear physics è Gravity wave searches (e.g., LIGO, GEO, VIRGO) è Astronomical sky surveys (e.g., Sloan Sky Survey) è Global “Virtual” Observatory è Earth Observing System è Climate modeling è Geophysics
5
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery5 Data Intensive Biology and Medicine è Radiology data è X-ray sources (APS crystallography data) è Molecular genomics (e.g., Human Genome) è Proteomics (protein structure, activities, …) è Simulations of biological molecules in situ è Human Brain Project è Global Virtual Population Laboratory (disease outbreaks) è Telemedicine è Etc. Commercial applications not far behind
6
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery6 The Large Hadron Collider at CERN “Compact” Muon Solenoid at the LHC Standard man
7
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery7 1800 Physicists 150 Institutes 32 Countries LHC Computing Challenges è Complexity of LHC environment and resulting data è Scale: Petabytes of data per year (100 PB by 2010) è Global distribution of people and resources CMS Experiment
8
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery8 Global LHC Data Grid Hierarchy Tier 1 T2 3 3 3 3 3 3 3 3 3 3 3 Tier 0 (CERN) 4 4 4 4 3 3 Tier0 CERN Tier1 National Lab Tier2 Regional Center at University Tier3 University workgroup Tier4 Workstation GriPhyN: è R&D è Tier2 centers è Unify all IT resources
9
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery9 Global LHC Data Grid Hierarchy Tier2 Center Online System CERN Computer Center > 20 TIPS France Center USA Center Italy Center UK Center Institute Institute ~0.25TIPS Workstations, other portals ~100 MBytes/sec 2.5-10 Gb/sec 100 - 1000 Mbits/sec Bunch crossing per 25 nsecs. 100 triggers per second Event is ~1 MByte in size Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Physics data cache ~PBytes/sec 2.5-10 Gb/sec Tier2 Center ~622 Mbits/sec Tier 0 +1 Tier 1 Tier 3 Tier 4 Tier2 Center Experiment Tier 2
10
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery10 Global Virtual Observatory Source Catalogs Image Data Specialized Data: Spectroscopy, Time Series, Polarization Information Archives: Derived & legacy data: NED,Simbad,ADS, etc Discovery Tools: Visualization, Statistics Standards Multi-wavelength astronomy, Multiple surveys
11
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery11 GVO: The New Astronomy è Large, globally distributed database engines Integrated catalog and image databases Multi-Petabyte data size Gbyte/s aggregate I/O speed per site è High speed (>10 Gbits/s) backbones Cross-connecting, correlating the major archives è Scalable computing environment 100s–1000s of CPUs for statistical analysis and discovery
12
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery12 Infrastructure for Global Grids
13
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery13 Grid Infrastructure è Grid computing sometimes compared to electric grid You plug in to get resource (CPU, storage, …) You don’t care where resource is located è This analogy might have an unfortunate downside You might need different sockets!
14
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery14 Role of Grid Infrastructure è Provide essential common Grid infrastructure Cannot afford to develop separate infrastructures è Meet needs of high-end scientific collaborations Already international and even global in scope Need to share heterogeneous resources among members Experiments drive future requirements è Be broadly applicable outside science Government agencies: National, regional (EU), UN Non-governmental organizations (NGOs) Corporations, business networks (e.g., supplier networks) Other “virtual organizations” è Be scalable to the Global level But EU + US is a good starting point
15
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery15 A Path to Common Grid Infrastructure è Make a concrete plan è Have clear focus on infrastructure and standards è Be driven by high-performance applications è Leverage resources & act coherently è Build large-scale Grid testbeds è Collaborate with industry
16
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery16 Building Infrastructure from Data Grids è 3 Data Grid projects recently funded è Particle Physics Data Grid (US, DOE) Data Grid applications for HENP Funded 2000, 2001 http://www.ppdg.net/ è GriPhyN (US, NSF) Petascale Virtual-Data Grids Funded 9/2000 – 9/2005 http://www.griphyn.org/ è European Data Grid (EU) Data Grid technologies, EU deployment Funded 1/2001 – 1/2004 http://www.eu-datagrid.org/ HEP in common Focus: infrastructure development & deployment International scope
17
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery17 Background on Data Grid Projects è They support several disciplines GriPhyN:CS, HEP (LHC), gravity waves, digital astronomy PPDG:CS, HEP (LHC + current expts), Nuc. Phys., networking DataGrid:CS, HEP, earth sensing, biology, networking è They are already joint projects Each serving needs of multiple constituencies Each driven by high-performance scientific applications Each has international components Their management structures are interconnected è Each project developing and deploying infrastructure US$23M (additional proposals for US$35M) What if they join forces?
18
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery18 A Common Infrastructure Opportunity è GriPhyN + PPDG + EU-DataGrid + national efforts France, Italy, UK, Japan è Have agreed to collaborate, develop joint infrastructure Initial meeting March 4 in Amsterdam to discuss issues Future meetings in June, July è Preparing management document Joint management, technical boards + steering committee Coordination of people, resources An expectation that this will lead to real work è Collaborative projects Grid middleware Integration into applications Grid testbed: iVDGL Network testbed (Foster): T 3 = Transatlantic Terabit Testbed
19
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery19 iVDGL è International Virtual-Data Grid Laboratory A place to conduct Data Grid tests at scale A concrete manifestation of world-wide grid activity A continuing activity that will drive Grid awareness A basis for further funding è Scale of effort For national, international scale Data Grid tests, operations Computationally and data intensive computing Fast networks è Who Initially US-UK-EU Other world regions later Discussions w/ Russia, Japan, China, Pakistan, India, South America
20
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery20 iVDGL Parameters è Local control of resources vitally important Experiments, politics demand it US, UK, France, Italy, Japan,... è Grid Exercises Must serve clear purposes Will require configuration changes not trivial “Easy”, intra-experiment tests first (10-20%, national, transatlantic) “Harder” wide-scale tests later (50-100% of all resources) è Strong interest from other disciplines Our CS colleagues (wide scale tests) Other HEP + NP experiments Virtual Observatory (VO) community in Europe/US Gravity wave community in Europe/US/(Japan?) Bioinformatics
21
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery21 Revisiting the Infrastructure Path è Make a concrete plan GriPhyN + PPDG + EU DataGrid + national projects è Have clear focus on infrastructure and standards Already agreed COGS (Consortium for Open Grid Software) to drive standards? è Be driven by high-performance applications Applications are manifestly high-perf: LHC, GVO, LIGO/GEO/Virgo, … Identify challenges today to create tomorrow’s Grids
22
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery22 Revisiting the Infrastructure Path (cont) è Leverage resources & act coherently Well-funded experiments depend on Data Grid infrastructure Collab. with national laboratories: FNAL, BNL, RAL, Lyon, KEK, … Collab. with other Data Grid projects: US, UK, France, Italy, Japan Leverage new resources: DTF, CAL-IT 2, … Work through Global Grid Forum è Build and maintain large-scale Grid testbeds iVDGL T 3 è Collaboration with industry next slide è EC investment in this opportunity Leverage and extend existing projects, worldwide expertise Invest in testbeds Work with national projects (US/NSF, UK/PPARC, …) Part of same infrastructure
23
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery23 Collaboration with Industry è Industry efforts are similar, but only in spirit ASP, P2P, home PCs, … IT industry mostly has not invested in Grid R&D We have different motives, objectives, timescales è Still many areas of common interest Clusters, storage, I/O Low cost cluster management High-speed, distributed databases Local and wide-area networks, end-to-end performance Resource sharing, fault-tolerance, … è Fruitful collaboration requires clear objectives è EC could play important role in enabling collaborations
24
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery24 Status of Data Grid Projects è GriPhyN US$12M funded by NSF/ITR 2000 program (5 year R&D) 2001 supplemental funds requested for initial deployments Submitting 5-year proposal ($15M) to NSF Intend to fully develop production Data Grids è Particle Physics Data Grid Funded in 1999, 2000 by DOE ($1.2 M per year) Submitting 3-year Proposal ($12M) to DOE Office of Science è EU DataGrid 10M Euros funded by EU (3 years, 2001 – 2004) Submitting proposal in April for additional funds è Other projects?
25
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery25 Grid References è Grid Book www.mkp.com/grids è Globus www.globus.org è Global Grid Forum www.gridforum.org è PPDG www.ppdg.net è EU DataGrid www.eu-datagrid.org/ è GriPhyN www.griphyn.org
26
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery26 Summary è Grids will qualitatively and quantitatively change the nature of collaborations and approaches to computing è Global Data Grids provide challenges needed to build tomorrows Grids è We have a major opportunity to create common infrastructure è Many challenges during the coming transition New grid projects will provide rich experience and lessons Difficult to predict situation even 3-5 years ahead
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.