Presentation is loading. Please wait.

Presentation is loading. Please wait.

DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)1 LHC Data Grid The GriPhyN Perspective DOE/NSF Baseline Review of US-CMS Software and Computing.

Similar presentations


Presentation on theme: "DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)1 LHC Data Grid The GriPhyN Perspective DOE/NSF Baseline Review of US-CMS Software and Computing."— Presentation transcript:

1 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)1 LHC Data Grid The GriPhyN Perspective DOE/NSF Baseline Review of US-CMS Software and Computing Brookhaven National Lab Nov. 15, 2000 Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu

2 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)2 Fundamental IT Challenge “Scientific communities of thousands, distributed globally, and served by networks with bandwidths varying by orders of magnitude, need to extract small signals from enormous backgrounds via computationally demanding (Teraflops-Petaflops) analysis of datasets that will grow by at least 3 orders of magnitude over the next decade, from the 100 Terabyte to the 100 Petabyte scale.”

3 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)3 GriPhyN = App. Science + CS + Grids è Several scientific disciplines  US-CMSHigh Energy Physics  US-ATLASHigh Energy Physics  LIGO/LSCGravity wave research  SDSSSloan Digital Sky Survey è Strong partnership with computer scientists è Design and implement production-scale grids  Maximize effectiveness of large, disparate resources  Develop common infrastructure, tools and services  Build on existing foundations: PPDG project, Globus tools  Integrate and extend capabilities of existing facilities è  $70M total cost  NSF  $12M:R&D  $39M:Tier 2 center hardware, personnel  $19M:Networking

4 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)4 Importance of GriPhyN è Fundamentally alters conduct of scientific research  Old:People, resources flow inward to labs  New:Resources, data flow outward to universities è Strengthens universities  Couples universities to data intensive science  Couples universities to national & international labs  Brings front-line research to students  Exploits intellectual resources of formerly isolated schools  Opens new opportunities for minority and women researchers è Builds partnerships to drive new IT/science advances  Physics  Astronomy  Application Science  Computer Science  Universities  Laboratories  Fundamental sciences  IT infrastructure  Research Community  IT industry

5 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)5 GriPhyN R&D Funded è NSF/ITR results announced Sep. 13  $11.9M from Information Technology Research Program  $ 1.4M in matching from universities  Largest of all ITR awards  Joint NSF oversight from CISE and MPS è Scope of ITR funding  Major costs for people, esp. students, postdocs  No hardware or professional staff for operations !  2/3 CS + 1/3 application science  Industry partnerships being developed Microsoft, Intel, IBM, Sun, HP, SGI, Compaq, Cisco

6 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)6 GriPhyN Institutions  U Florida  U Chicago  Caltech  U Wisconsin, Madison  USC/ISI  Harvard  Indiana  Johns Hopkins  Northwestern  Stanford  Boston U  U Illinois at Chicago  U Penn  U Texas, Brownsville  U Wisconsin, Milwaukee  UC Berkeley  UC San Diego  San Diego Supercomputer Center  Lawrence Berkeley Lab  Argonne  Fermilab  Brookhaven

7 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)7 Funding for Tier 2 Centers è GriPhyN/ITR has no funds for hardware è NSF is proposal driven  How to get $40M = hardware + software (all 4 expts)  Approx. $24M of this needed for ATLAS + CMS è Long-term funding possibilities not clear  Requesting funds for FY2001 for prototype centers  NSF ITR2001 competition ($15M max)  Other sources?

8 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)8 Benefits of GriPhyN to LHC è Virtual Data Toolkit for LHC computing è Additional positions to support tool integration  1 postdoc + 1 student, Indiana (ATLAS)  1 postdoc/staff, Caltech (CMS)  2 scientists + 1 postdoc + 2 students, Florida (CMS) è Leverage vendor involvement for LHC computing  Dense computing clusters (Compaq, SGI)  Load balancing (SGI, IBM)  High performance storage (Sun, IBM)  High speed file systems (SGI)  Enterprise-wide distributed computing (Sun)  Performance monitoring (SGI, IBM, Sun, Compaq)  Fault-tolerant clusters (SGI, IBM)  Network services (Cisco)

9 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)9 Why a Data Grid: Physical è Unified system: all computing resources part of grid  Efficient resource use (manage scarcity)  Resource discovery / scheduling / coordination truly possible  “The whole is greater than the sum of its parts” è Optimal data distribution and proximity  Labs are close to the (raw) data they need  Users are close to the (subset) data they need  Minimize bottlenecks è Efficient network use  local > regional > national > oceanic  No choke points è Scalable growth

10 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)10 Why a Data Grid: Political è Central lab cannot manage / help 1000s of users  Easier to leverage resources, maintain control, assert priorities at regional / local level è Cleanly separates functionality  Different resource types in different Tiers  Organization vs. flexibility  Funding complementarity (NSF vs DOE), targeted initiatives è New IT resources can be added “naturally”  Matching resources at Tier 2 universities  Larger institutes can join, bringing their own resources  Tap into new resources opened by IT “revolution” è Broaden community of scientists and students  Training and education  Vitality of field depends on University / Lab partnership

11 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)11 GriPhyN Research Agenda è Virtual Data technologies  Derived data, calculable via algorithm (e.g., 90% of HEP data)  Instantiated 0, 1, or many times  Fetch data vs. execute algorithm  Very complex (versions, consistency, cost calculation, etc) è Planning and scheduling  User requirements (time vs cost)  Global and local policies + resource availability  Complexity of scheduling in dynamic environment (hierarchy)  Optimization and ordering of multiple scenarios  Requires simulation tools, e.g. MONARC

12 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)12 Virtual Data in Action l Data request may l Compute locally l Compute remotely l Access local data l Access remote data l Scheduling based on l Local policies l Global policies l Local autonomy

13 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)13 Research Agenda (cont.) è Execution management  Co-allocation of resources (CPU, storage, network transfers)  Fault tolerance, error reporting  Agents (co-allocation, execution)  Reliable event service across Grid  Interaction, feedback to planning è Performance analysis  Instrumentation and measurement of all grid components  Understand and optimize grid performance è Virtual Data Toolkit (VDT)  VDT = virtual data services + virtual data tools  One of the primary deliverables of R&D effort  Ongoing activity + feedback from experiments (5 year plan)  Technology transfer mechanism to other scientific domains

14 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)14 PetaScale Virtual Data Grids Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Transforms Distributed resources (code, storage, computers, and network ) è Resource è Management è Services Resource Management Services è Security and è Policy è Services Security and Policy Services è Other Grid è Services Other Grid Services Interactive User Tools Production Team Individual Investigator Workgroups Raw data source

15 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)15 Tier 2 Hardware Costs Year2000200120022003200420052006 T2 #1$350K$90K$150K$120K$500K*$250K T2 #2$192K$320K$120K$500K*$250K T2 #3$350K$120K$500K*$250K T2 #4$750K*$250K T2 #5$750K* Total$350K$282K$470K$590K$1200K$1750K FY01-06 total: $6.0M

16 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)16 Tier 2 Personnel Costs Year2000200120022003200420052006 T2 #1$50K$200K T2 #2$200K T2 #3$200K T2 #4$200K T2 #5$200K Total$50K$400K $600K $800K$1000K FY01-06 total: $3.8M

17 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)17 The Management Problem

18 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)18 PPDG BaBar Data Management BaBar D0 CDF Nuclear Physics CMSAtlas Globus Users SRB Users Condor Users HENP GC Users CMS Data Management Nucl Physics Data Management D0 Data Management CDF Data Management Atlas Data Management Globus Team Condor SRB Team HENP GC

19 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)19 GriPhyN Org Chart External Advisory Panel VD Toolkit Development Requirements Definition & Scheduling Integration & Testing Documentation & Support Outreach & Education CS Research Execution Management Performance Analysis Request Planning & Scheduling Virtual Data Applications ATLAS CMS LSC/LIGO SDSS NSF Review Committee Physics Expts Industrial Programs Tech. Coordination Committee Networks Databases Digital Libraries Grids Etc. Project Directors Internet 2DOE Science NSF PACIs Collaboration Board Other Grid Projects Project Coordination Group System Integration Project Coordinator

20 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)20 GriPhyN Management Plan è GriPhyN PMP incorporates CMS, ATLAS plans  Schedules, milestones  Senior computing members in Project Coordination Group  Formal liaison with experiments  Formal presence in Application area GriPhyN PMP: “…the schedule, milestones and deliverables of the [GriPhyN] Project must correspond to those of the experiments to which they belong.”

21 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)21 Long-Term Grid Management è CMS Grid is a complex, distributed entity  International scope  Many sites from Tier 0 – Tier 3  Thousands of boxes  Regional, national, international networks è Necessary to manage  Performance monitoring  Failures  Bottleneck identification and resolution  Optimization of resource use  Change policies based on resources, CMS priorities

22 DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)22 Grid Management (cont) è Partition US-CMS grid management  Can effectively separate USA Grid component  US Grid management coordinated with CMS è Tools  Stochastic optimization with incomplete information  Visualization  Database performance  Many tools provided by GriPhyN performance monitoring  Expect substantial help from IT industry è Timescale and scope  Proto-effort starting in 2004, no later than 2005   4  6 experts needed to oversee US Grid  Expertise: ODBMS, networks, scheduling, grids, etc.


Download ppt "DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)1 LHC Data Grid The GriPhyN Perspective DOE/NSF Baseline Review of US-CMS Software and Computing."

Similar presentations


Ads by Google