Download presentation
Presentation is loading. Please wait.
Published byCynthia Cooper Modified over 8 years ago
1
Particle Physics Grids - a help or a hindrance for other areas of eScience? NeSC Opening Edinburgh, 25 April 2002 David Williams IT Division, CERN also President TERENA
2
mailto:David.O.Williams@cern.ch mailto:David.O.Williams@cern.ch Homepage: cern.ch/David.Williams/ Slides: cern.ch/David.Williams/public/NeSCTalk.ppt cern.ch/David.Williams/cern.ch/David.Williams/public/NeSCTalk.ppt mailto:David.O.Williams@cern.chcern.ch/David.Williams/cern.ch/David.Williams/public/NeSCTalk.ppt
3
Not my own work n While I may generate some ideas, I am not responsible in any way for CERN’s work on grids n It is a pleasure to acknowledge the people who do the real work n and the people who let me use their slides, including:- Fabrizio Gagliardi, Bob Jones, Neil Geddes, Ian Foster, Hans Hoffmann, Tony Hey, Olivier Martin (and anyone else that I unintentionally forgot)
4
Outline n A bit about CERN and particle physics n For PP, there is no alternative (to Grids) n EU DataGrid (EDG) n The EDG Testbed n Other European projects n Other non-European projects
5
ABOUT CERN AND THE GLOBAL PARTICLE PHYSICS COMMUNITY This is (y)our laboratory
6
aéroport Genève Atlas CMS Alice LHCb PS 1954 2000
7
CERN's (Human) Network in the World 267 institutes in Europe, 4603 users 208 institutes elsewhere, 1632 users some points = several institutes
8
Introduction The Mission of CERN (1954): “The Organization shall provide for collaboration among European States in nuclear research of a pure scientific and fundamental character, and in research essentially related thereto. The organization shall have no concern with work for military requirements and the results of its experimental and theoretical work shall be published or otherwise made generally available”. (2000) 20 Member States
9
The ATLAS detector 26m long by 20m diameter
10
The ATLAS Collaboration A "virtual" large science Laboratory n Objectives –Study proton proton collisions at c.m. energies of 14 000 GeV n Milestones –R&D Experiments1988-1995 –Letter of Intent1992 –Technical Proposal1994 –Approval1996 –Start Construction1998 –Installation2003-06 –Exploitation2007 - (~) 2017 n Open, collaborative culture on a world scale
11
ATLAS (2) n Scale –2000 Scientists –150 Institutes –35 Countries n Cost –475 MCHF material cost (1995 prices) –CERN contribution < 15% n Methodology –All collaborating institutes supply deliverables: namely software packages, detectors, electronics, simulation tasks, equipment of all kinds, engineering tasks,... n Some numbers –150 M detector channels –10 M lines of code –1 M control points
12
Computer Room in 1985 PC farms in 2000 Changing technology!
13
TINA
14
Data coming out of our ears n When they start operating in 2007, each LHC experiment will generate several Petabytes of (already highly selected and compressed) data per year. n There are four experiments n This is roughly 500x more data than we had to deal with from LEP in 2000.
15
Users from all over the world n The users are from all over Europe, and all over the world n They need (and like) to spend as much time as possible in their home institutes n That is especially true for the youngest researchers n CERN cannot possibly afford to pay for “all” LHC computing and data storage n If users are to make extra investments in computing and data storage facilities they tend to prefer to make that investment “back home”
16
(Personal) Conclusion n Politics (and technological flexibility/uncertainty) force you to a globally distributed solution for handling your data, and computing on it n You cannot manage to operate such a system reliably at that scale without reliable grid middleware [i.e. you might be able to do it with 10-20 very smart graduate students per experiment as a “production team” but they would rise in revolt (or implement a Grid!)] n There is no alternative
17
EDG
18
EU DataGrid (EDG) Project Objectives n To build on the emerging Grid technology to develop a sustainable computing model for effective sharing of computing resources and data n Specific project objectives: –Middleware for fabric & Grid management (mostly funded by the EU) –Large scale testbed (mostly funded by the partners) –Production quality demonstrations (partially funded by the EU) n To collaborate with and complement other European and US projects n Test and demonstrator of Géant n Contribute to Open Standards and international bodies: –Co-founder of Global GRID Forum and host of GGF1 and GGF3 –Industry and Research Forum for dissemination of project results
19
Objectives for the first year of the project n Collect requirements for middleware –Take into account requirements from application groups n Survey current technology –For all middleware n Core Services testbed –Testbed 0: Globus (no EDG middleware) n First Grid testbed release –Testbed 1: first release of EDG middleware n WP1: workload –Job resource specification & scheduling n WP2: data management –Data access, migration & replication n WP3: grid monitoring services –Monitoring infrastructure, directories & presentation tools n WP4: fabric management –Framework for fabric configuration management & automatic sw installation n WP5: mass storage management –Common interface for Mass Storage Sys. n WP7: network services –Network services and monitoring
20
Main Partners n CERN – International (Switzerland/France) n CNRS - France n ESA/ESRIN – International (Italy) n INFN - Italy n NIKHEF – The Netherlands n PPARC - UK
21
Research and Academic Institutes CESNET (Czech Republic) Commissariat à l'énergie atomique (CEA) – France Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI) Consiglio Nazionale delle Ricerche (Italy) Helsinki Institute of Physics – Finland Institut de Fisica d'Altes Energies (IFAE) - Spain Istituto Trentino di Cultura (IRST) – Italy Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany Royal Netherlands Meteorological Institute (KNMI) Ruprecht-Karls-Universität Heidelberg - Germany Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands Swedish Research Council - Sweden Assistant Partners Industrial Partners Datamat (Italy) IBM-UK (UK) CS-SI (France)
22
Project scope n 9.8 M Euros of EU funding over 3 years n 90% for middleware and applications (HEP, EO and Biomedical) n ~50 “funded” staff (= paid from EU funds) and ~70 “unfunded” ftes n Three year phased developments & demos (2001-2003) n Extensions (time and funds) on the basis of first successful results: –DataTAG (2002-2003) –CrossGrid (2002-2004) –GridStart (2002-2004)
23
EDG TESTBED
24
Project Schedule n Project started on 1/1/2001 n TestBed 0 (early 2001) –International test bed 0 infrastructure deployed n Globus 1 only - no EDG middleware n TestBed 1 (end 2001/early 2002) –First release of EU DataGrid software to defined users within the project: n HEP experiments, Earth Observation, Biomedical applications n Project successfully reviewed by EU on March 1 st 2002 n TestBed 2 (September-October 2002) –Builds on TestBed 1 to extend facilities of DataGrid n TestBed 3 (March 2003) & 4 (September 2003) n Project completion expected by end 2003
25
TestBed 1 Sites Status Web interface showing status of (~400) servers at testbed 1 sites
26
Summary (Jones) n Application groups requirements defined and analysed n Extensive survey of relevant technologies completed and used as a basis for EDG developments n First release of the testbed successfully deployed n Excellent collaborative environment developed with key players in Grid arena n Project can be judged by: –level of "buy-in" by the application groups –wide-spread usage of EDG software –number and quality of EDG sw releases –positive influence on developments of GGF standards & Globus toolkit
27
(SOME) RELATED EUROPEAN WORK
28
GRIDs – EU IST projects (~36m Euro) Science Industry / businessApplications Middleware & Tools Underlying Infrastructures CROSSGRID DATAGRID DATATAG GRIDLAB EGSO GRIA GRIP EUROGRID DAMIEN GRIDSTART
29
NL SURFnet CERN UK SuperJANET4 Abilene ESNET MREN IT GARR-B GEANT NewYork STAR-TAP STAR-LIGHT DataTAG project
30
Some international relations
31
BaBarGrid - UK Deployment
32
GridPP n To convince yourself that this is starting to be real try to look at some of the GridPP demos today
33
(SOME) RELATED NON-EUROPEAN WORK
34
Overview of GriPhyN Project n GriPhyN basics –$11.9M (NSF) + $1.6M (matching) –5 Year effort, started October 2000 –4 frontier physics experiments: ATLAS, CMS, LIGO, SDSS –Over 40 active participants n GriPhyN funded primarily as an IT research project –2/3 CS + 1/3 physics
35
GriPhyN Approach n Virtual Data –Tracking the derivation of experiment data with high fidelity –Transparency with respect to location and materialization n Automated grid request planning –Advanced, policy driven scheduling n Achieve this at peta-scale magnitude
36
GriPhyN CMS SC2001 Demo Full Event Database of ~100,000 large objects Full Event Database of ~40,000 large objects “Tag” database of ~140,000 small objects Request Parallel tuned GSI FTP Bandwidth Greedy Grid-enabled Object Collection Analysis for Particle Physics http://pcbunn.cacr.caltech.edu/Tier2/Tier2_Overall_JJB.htm Work of: Koen Holtman, J.J. Bunn, H. Newman, & others
37
PPDG n DoE funding intended (primarily) for the hardware needed to exploit GriPhyN
38
iVDGL: A Global Grid Laboratory n International Virtual-Data Grid Laboratory –A global Grid laboratory (US, Europe, Asia, South America, …) –A place to conduct Data Grid tests “at scale” –A mechanism to create common Grid infrastructure –A laboratory for other disciplines to perform Data Grid tests –A focus of outreach efforts to small institutions n U.S. part funded by NSF (2001-2006) –$13.7M (NSF) + $2M (matching) “We propose to create, operate and evaluate, over a sustained period of time, an international research laboratory for data-intensive science.” From NSF proposal, 2001
39
iVDGL Components n Computing resources –2 Tier1 laboratory sites (funded elsewhere) –7 Tier2 university sites software integration –3 Tier3 university sites outreach effort n Networks –USA (TeraGrid, Internet2, ESNET), Europe (Géant, …) –Transatlantic (DataTAG), Transpacific, AMPATH?, … n Grid Operations Center (GOC) –Joint work with TeraGrid on GOC development n Computer Science support teams –Support, test, upgrade GriPhyN Virtual Data Toolkit n Education and Outreach n Coordination, management
40
iVDGL Components (cont.) n High level of coordination with DataTAG –Transatlantic research network (2.5 Gb/s) connecting EU & US n Current partners –TeraGrid, EU DataGrid, EU projects, Japan, Australia n Experiments/labs requesting participation –ALICE, CMS-HI, D0, BaBar, BTEV, PDC (Sweden)
41
RELATIONS WITH OTHER BRANCHES OF e-SCIENCE EU DataGrid is not just Particle Physics
42
Biomedical applications n n Data mining on genomic databases (exponential growth) n n Indexing of medical databases (Tb/hospital/year) n n Collaborative framework for large scale experiments (e.g. epidemiological studies) n n Parallel processing for – –Databases analysis – –Complex 3D modelling
43
Biomedical (cont) n I was personally very impressed by the serious analysis provided by the Biomedical work package at the EDG Review. Johan Montagnat at Lyon. Accessible via http://www.edg.org http://www.edg.org n [Also by the security requirements analysis, Kelsey et al].
44
Earth Observations ESA missions: about 100 Gbytes of data per day (ERS 1/2) 500 Gbytes, for the next ENVISAT mission (launched March 1st) EO requirements for the Grid: enhance the ability to access high level products allow reprocessing of large historical archives improve Earth science complex applications (data fusion, data mining, modelling …)
45
Data structures n Particle physics is actually a rather extensive user of databases (see SC2001 demo slide) n The 3 main issues are “performance, performance, performance” n Each experiment does agree on its own unified data model, so we don’t (typically) have to constantly federate separately-curated DBs in the way that is common to bioinfomatics and many other branches of research n But the experiment’s model does evolve on a say annual basis n And the “end user” often wants to get hold of highly selected data on a “personal” machine and explore it n A pity that initial Grid developments assumed a very simplistic view of data structures n Good that UK is pushing ahead in this area n Particle physics looks forward to using (and helping test) this work
46
Test beds n Biggest pressure on EDG testbed is to turn in into a bigger facility aiming to offer more resources for “Data Challenges” and to operate on a 24*7 basis n It is potentially useful for other people wanting to run tests at scale (and is already being used for this) n I suggest that this idea (a really large configurable testbed) should become part of FP6 Integrated Project n PP will really push for production quality MW and SW
47
TODAY’S KEY ISSUES Databases and/or data management integrated with grid MW SecurityReliability [Networking - everywhere]
48
Some “Large” Grid Issues (Geddes) n Consistent transaction management n Query (task completion time) estimation n Queuing and co-scheduling strategies n Load balancing (e.g., Self Organizing Neural Network) n Error Recovery: Fallback and Redirection Strategies n Strategy for use of tapes n Extraction, transport and caching of physicists’ object-collections; Grid/Database Integration n Policy-driven strategies for resource sharing among sites and activities; policy/capability tradeoffs n Network Performance and Problem Handling n Monitoring and Response to Bottlenecks n Configuration and Use of New-Technology Networks e.g. Dynamic Wavelength Scheduling or Switching n Fault-Tolerance, Performance of the Grid Services Architecture (H. Newman 13/3/02)
49
Consulting Prototyping Deployment Consulting Training Courses Dissemination Forum Tools and Service Development Creation and support E-Science centres Applications in Other Sciences EIROforum S/W Hardening GLOBUS EuroGrid, Gridlab etc. Semantic GRID Database Security Europe Infrastructure Integrated Project ENABLING GRIDS ESCIENCE EUROPE EGEE National eScience Centres Industry Applications SMEs developing Grid-enabled Applications Industry Outreach Modulable Testbeds R&D Agenda Deployment with IT Industry Science Outreach
50
and finally n Best wishes to NeSC n UK should feel very pleased at the level of research- development-industry interaction that has been obtained via eScience Programme
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.