Download presentation
Presentation is loading. Please wait.
1
Grid Deployment Area Status Report
Ian Bird SC2 14 March 2003
2
Deployment Status Summary
GDB Working groups reported and defined LCG-1 First “packaged” version of LCG (LCG-0) is released and available LCG-0 has been deployed at CERN, RAL, CNAF, (Taiwan) Expect middleware for LCG-1 (July) to be delivered from VDT/EDG end April Re-aligned EDG/LCG at CERN to share resources and people Re-organisation of groups and people within IT also Collaborative activities developing in several areas
3
Grid Deployment Organisation
ALICE ATLAS CMS LHCb policies, strategy, scheduling, standards, recommendations grid deployment manager grid deployment board Grid Resource Coordinator LCG security group LCG operations team LCG toolkit integration & certification grid infra- structure team experiment support team Joint Trillium/ EDG/LCG testing team CERN-based teams regional centre operations regional centre operations regional centre operations regional centre operations operations call centre core infra- structure security tools grid monitoring regional centre operations regional centre operations regional centre operations regional centre operations anticipated teams at other institutes
4
Deployment Personnel CERN LCG Other Requested* Management 1
Certification & Test 2 3 Test group 1.3 System analysis & support Grid infrastructure (people moved from ADC and retain commitments to EDG WP4,6) 0.5 2*0.5 Experiment integration & support Total 1.5 8 4.3 6 * Expected to be fulfilled by INFN-funded Fellows by July
5
Personnel Issues Staff to support LCG-1 grid services have commitments to EDG test-bed support – part of reason for rationalising EDG/LCG resources Have used 2 people from certification to help support test-beds/LCG Pilot Testing group is understaffed – need to find at least 3 full time people to contribute here Don’t expect to get final LCG funded effort before July
6
Milestones Define LCG-1 – Feb 1
The GDB WG reports were published and discussed at Feb 6 GDB. Sufficient to define direction and issues to be addressed; used in planning and deploying Initial Pilot Service available (Feb 1), Pilot 2 (March 15) Strategy at CERN changed – no separation between LCG and physics production systems. Pilot cluster is available – configured as minimum but can move batch nodes between Pilot and LXBatch as needed. This makes more efficient usage of the machines. Pilot service worker nodes managed by FIO group. Integrating LSF, Addressing NFS vs AFS issues Deployment schedule -> LCG-0 deployed to CERN, RAL, CNAF + Legnaro(T2), Taiwan, preparing for FNAL by end of March This is actually ahead of proposed schedule
7
LCG-1 Ramp-up Schedule Date Regional Center Experiment
Pilot 1 start – Feb 1 15/2/03 CERN All 1 28/2/03 CNAF, RAL 2 30/3/03 FNAL CMS 3 15/4/03 Taiwan Atlas,CMS 4 30/4/03 Karlsruhe 5 7/5/03 IN2P3 6 15/5/03 BNL (want to wait) Atlas 7 21/5/03 Russia(Moscow),Tokyo LCG-1 Initial Public Service Start – July 1 Tier 2 centres will be brought on-line in parallel once the local Tier 1 is up to provide support
8
Milestones - 2 Certification process defined (January)
This has been done – agreed common process with EDG Have agreed joint project with VDT (US): VDT provide basic level (Globus, Condor) testing suites We provide higher level testing Expect to get HEPCAL test-cases from GAG Need to pull in other expertise E.g. EDG WP8/loose cannons Look at using common tools and frameworks (where it makes sense) NMI/VDT-LCG We need to do this soon to avoid divergence Need much more effort on devising & writing tests Real effort currently is only 2 people
9
Test and Validation process
Developers machines Build system Development Testbed ~15cpu Certification Testbed ~40cpu Production Unit Test Build Integration Certification Production WPs add unit tested code to CVS repository Run nightly build & auto. tests Individual WP tests Grid certification Certified public release for use by apps. Build system Integration Team Test Group Users Tagged package WPs Tagged release selected for certification Overall release tests Certified release selected for deployment Application Certification Fix problems Appl. Representatives Releases candidate Releases candidate Tagged Releases Certified Releases Office hours 24x7 Bugzilla anomalies reports
10
Testing and Certification
Building certification tb At CERN is 4x10 (25 available now) nodes (local grid) Will include Wisconsin asap Currently installing the EDG2.0/LCG-1 pre-cursor on the cert tb and beginning the certification work
11
Test-beds and services
Agreement 2 weeks ago to merge EDG and LCG production services and to separate test-beds The only way to share scarce resources (support) March – July: No EDG production test-bed at CERN now (but access to Castor) EDG core sites will run dev. test-beds and either EDG production or LCG pilot, unless they have resources for both LCG pilot on other EDG sites and at non-EDG centres From July: There will be a single production system – LCG-1 at least at CERN, hopefully at other EDG/LCG sites
12
Milestones – 3 Packaging/configuration mechanism defined – March
Group (EDG, LCG, VDT) have documented an agreed common approach Now will proceed with a staged implementation Basic for LCG-1 in July, and more developed later Delivery of middleware – March 1 We have a working set (“LCG-0”) that is in use now Expect delivery of mw for July by end April (from EDG) Identify operations and call centres – February 1 2 candidates for operations centres – hopefully this should be clarified in the next 2 weeks No clear candidate for a support centre – but we (LCG CERN group) will have a basic support service – already in place.
13
Other Progress
14
Middleware system support
European grid support centre Maarten Litmaath as 1/3 of technical Globus support people (SE, UK, LCG) Will participate in Globus 2.4 release process LCG team Currently 2 (3) people Building relationships with EDG, Globus, VDT Worked on RLS stress testing and debugging
15
Security Dave Kelsey will lead ongoing security activity
Policies Security strategy and plan This is needed urgently – as basis for operational agreements at centres Security operational issues: Led by Dane Skow (FNAL), group of site security contacts Gathering issues, constraints, etc. This group will handle daily security issues Proposing collaboration on VO management FNAL, INFN, …
16
Collaborative Activities
HICB – JTB GLUE Schema and evolution Validation and Test Suites Distribution, Meta-Packaging, Configuration Monitoring tools (proposed), aspects of ops centres Proposed collaboration on VO tools (led by FNAL) GGF Production Grid Management (operations) User Services (call centres) Tools, trouble ticket exchange standards, etc Site AAA (security) Particle and Nuclear Physics Applications area As a forum in GGF to present issues and get collaboration Other HEPiX – Fabric, operations, tools, procedures Security – site security contacts Storage Interfaces – SRM
17
Compiler Issues EDG 2.0 release plan has assumed gcc 2.95.2
This was agreed with the experiments CMS and Atlas now request EDG 2.0/LCG-1 with gcc 3.2.2 For LHCb and ALICE is acceptable Essential for integration with POOL Problem This will delay delivery of EDG 2.0 to LCG by ~6 weeks (this was estimate after 5 minutes thought) Possibilities: Continue as scheduled and deliver EDG2.0 with ..and .. foresee update in September This implies LCG-1 cannot use POOL before upgrade Switch to gcc now – introduces 6 week delay
18
Compiler Issues Observations: (I propose ) Continue with agreed plan –
Pool will not have been integrated in experiment software and tested before LCG-1 is deployed in July It has a command-line interface A delay of 6 weeks will mean nothing is deployed during the summer (vacations) An upgrade of middleware is already foreseen for Sep/Oct (I propose ) Continue with agreed plan – deploy EDG 2.0/LCG-1 as planned, and upgrade to gcc 3.2 and include POOL in September
19
Summary Progress on schedule with deployment
Need to find effort on testing activities Need to get operations and call centre activities started
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.