Download presentation
Presentation is loading. Please wait.
Published byJulius Crawford Modified over 9 years ago
1
Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033
2
Project status report Overall status – experience with data Planning and milestones Status of planning for new Tier 0 Brief summary of EGEE EGI transition Resource planning for 2010, 2011, 2012 2Ian.Bird@cern.ch
3
Sergio Bertolucci, CERN3... And now at 7 TeV
4
Running increasingly high workloads: – Jobs in excess of 650k / day; Anticipate millions / day soon – CPU equiv. ~100k cores Workloads are: – Real data processing – Simulations – Analysis – more and more (new) users Data transfers at unprecedented rates next slide Sergio Bertolucci, CERN4 Today WLCG is: e.g. CMS: no. users doing analysis
5
Sergio Bertolucci, CERN5 Data transfers Final readiness test (STEP’09) Preparation for LHC startupLHC physics data Nearly 1 petabyte/week 2009: STEP09 + preparation for data Castor traffic last week: > 4 GB/s input > 13 GB/s served Castor traffic last week: > 4 GB/s input > 13 GB/s served Real data – from 30/3
6
WLCG uses EGEE & OSG 85k CPU-days/day 30k CPU-days/day 6Ian.Bird@cern.ch
7
Has meant very rapid data distribution and analysis – Data is processed and available at Tier 2s within hours! Sergio Bertolucci, CERN7 Readiness of the computing CMS ATLAS LHCb
8
Ian.Bird@cern.ch8 More and more users >200 users ~500 jobs on average over 3 months >200 users ~500 jobs on average over 3 months ATLAS: number of distinct users accessing various data types Many hundreds of users accessed grid data ATLAS: number of distinct users accessing various data types Many hundreds of users accessed grid data CMS
9
Sergio Bertolucci, CERN9 And physics output...
10
Sergio Bertolucci, CERN10 Fibre cut during STEP’09: Redundancy meant no interruption Fibre cut during STEP’09: Redundancy meant no interruption
11
Reliabilities This is not the full picture: Experiment-specific measures give complementary view Need to be used together with some understanding of underlying issues 11Ian.Bird@cern.ch
12
Site readiness as seen by the experiments – LH week before data taking; RH 1 st week of data Ian.Bird@cern.ch12 Site availability seen by experiments
13
201020112012 JanFebMarAprMayJunJulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDecJanFeb SUpp runningHISUpp runningHI WLCG timeline 2010- 2012 2010 Capacity commissioned 2011 Capacity commissioned EGEE-III ends EGI & NGIs EGI HEP – SSC EMI (SA3) 13Ian.Bird@cern.ch
14
Now full report each month Glexec + SCAS services available; Deployment discussion / policy ongoing Not all sites yet publishing; information validation in progress 14Ian.Bird@cern.ch
15
15Ian.Bird@cern.ch
16
Future milestones Actually very few formal milestones now – Moved from set up to regular operations Not all problems solved – and more will certainly arise – These can be subject to specific milestones However, in general we must move from tracking milestones to tracking metrics for – Performance – Reliability – Scalability Today we have some – but we need to propose a set of useful metrics that we track – Accounting, reliability/availability, throughputs, are published on-line – Operational metrics reviewed weekly – A lot of information in different places (SLS, dashboards, etc). 16Ian.Bird@cern.ch
17
STATUS OF PLANS FOR TIER 0
18
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 18 Frédéric Hemmer Revised Tier 0 strategy The power situation has evolved –Aggressive replacement of old equipment –Technology evolution –Refined estimates of needs in next few years –400 kW additional power made available (2.5 2.9 MW) –But situation for backed-up (Diesel) power is more critical – close to the limit and lack of redundancy Revised strategy –Hosting agreement for 100 kW of backed-up power in Geneva area –Consolidate existing CC critical power situation –Investigate container solution for incremental capacity addition –Investigate (far) remote hosting possibilities Ian.Bird@cern.ch
19
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 19 Frédéric Hemmer Tier-0 Power needs estimates Ian.Bird@cern.ch NB: Real limit is closer to 2.7 MW than 2.9 assumed so far
20
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 20 Frédéric Hemmer March 2010 situation Additional 400 KW in building 513 –The power capacity has been made available Critical power consolidation in 513 –Various solutions are being studied Requiring additional UPS & cooling capacity Should provide ~600 KW of backed up power; Hopefully as an addition to the 2.9 MW –Will not be available before mid-2011 External hosting of 100 KW in Geneva –Hosting company identified & contract being signed Target implementation: summer 2010 –Will allow for initial experience of remote operations Containers –Initial technology assessment done & Market survey launched –Location: Prévessin close to building 931 Will require civil engineering to host electrical power distribution Cannot be available before end 2011 (Far) remote hosting proposals –No concrete financial proposals yet from Norway Although technical pre-proposal fairly clear –Likelihood that a similar offer will come from Finland Ian.Bird@cern.ch
21
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 21 Frédéric Hemmer Summary Current estimates predict that the Computer Centre will now run out of power ~ 2013 –Within the current requirements of the experiments –Within the limits of the technology evolution IT has started to prepare several stop gap solutions to be able to cope with changing conditions as well as alternative options –But costs are significant Decisions for the medium term should be taken in 2010 in light of experience of data taking and once alternative options can be evaluated Ian.Bird@cern.ch
22
EGI: Status of project submissions There were 3 different (sub-) calls 1)EGI itself (project named EGI-Inspire); includes an activity (SA3) specifically focussed on support for existing large communities This project was invited to a hearing; likely to receive requested funding 2)Middleware (project named EMI); includes support for all gLite software required by WLCG (FTS, LFC, dCache, etc., etc.) This project was invited to a hearing; asked to make a 900k€ cut 3)Virtual Research Communities (ex-SSC); There were several EGEE-derived proposals, including one (ROSCOE) that contained a VRC for HEP These will NOT be funded. Project funding expected to start only in June (may be back dated to May) 22Ian.Bird@cern.ch
23
EGEE-EGI: Risk for WLCG? This situation does not represent a major risk for WLCG – EGEE EGI transition is well planned by EGEE, and is well advanced – Countries representing the majority of the resources have NGIs and the Tier 1s are well placed – Important operational tools (GGUS, monitoring, etc.) are assured even if project funding does not appear WLCG operational procedures are well tested and are mostly independent of the existence of EGEE or EGI SA3 activity contains Dashboards, Ganga, & specific tasks for each experiment (~2 FTE each); VRC had integration/analysis support – EMI contains essential middleware support and “harmonisation” gLite/ARC/Unicore (long term development was not included) No funding for HEP VRC means that work with other application communities will significantly reduce at CERN Should now consider strategy for longer term of middleware 23Ian.Bird@cern.ch
24
Status of non-European states Concern expressed at last RRB over status in EGI of some non-EC states The situation has evolved: – EGI.eu: introduced Associate member status – EGI-Inspire project: full partners Ian.Bird@cern.ch24
25
RESOURCE PLANNING Baseline assumptions used by all experiments for requirements analysis
26
Present understanding of schedule for both 2010 and 2011 26Ian.Bird@cern.ch 2010 + 2011 – Running from mid-Feb – end Nov – Pb-Pb in November – In principle stop after 1 fb -1 ; plan to run 2 years (0.2 in 2010, remainder in 2011) 2012: shutdown of accelerator (but not computing)
27
Assumptions and guidance: 2010,11,12 Assumptions: The agreed RRB year is April - March (i.e. resources for a given year available by April) – In 2010 exceptionally delayed this until June 1 st (based on the schedules understood at that time) – Of course some Tier 1s have already installed some fraction of their 2010 pledges. Also agreed that in 2011 revert to the April installation deadline. 2010 pledges or the installation schedules cannot be changed: – nominal 2009 resources must satisfy the needs until end of May; – 2010 resources should cover the time from June to March 2011, – and the 2011 resources from April 2011 onwards. Live time: 30 days/month = 720 hours folding in efficiencies 720 x 0.7 x 0.4 = ~200 effective hours/month 27Ian.Bird@cern.ch 1)Availability of machine for physics = 0.7 The rest is technical stop + recovery from technical stop + dedicated MD 2)Efficiency for physics = time with colliding beams/time that machine is available = 0.4 The rest is turnaround time + faults + access 1)Availability of machine for physics = 0.7 The rest is technical stop + recovery from technical stop + dedicated MD 2)Efficiency for physics = time with colliding beams/time that machine is available = 0.4 The rest is turnaround time + faults + access
28
Summary of requirements Totals20102010 pledge 20112012 CERN CPU 233.4 263.3219.7 CERN disk 14.7914.819.722.8 CERN tape 31.7 48.849.7 T1 CPU394.1412543.5584 T1 disk49.3944.566.368.9 T1 tape56.251.4111.07131.72 T2 CPU562.6511.1730.2787 T2 disk46.6239.675.4278.42 Ian.Bird@cern.ch28 Old 2010 request + 2010 pledges are as presented at the Autumn 2009 RRB
29
Budget cut in France: -40% – Notified after the last RRB – Proposed impact for 2010 somewhat less with planning and management – Risk for 2011? Concerns over some Tier 1s – Recent experience is good, hope this is sustainable in the long term Level of effort available in EMI for middleware support – Including release process etc. – May be at the limit Data access for analysis – Early discussions on how to address this – 2 year timescale Ian.Bird@cern.ch29 Concerns
30
Summary First experience with data has been positive from the WLCG point of view – Thanks to the huge efforts invested in recent years in testing – All Tier 0, Tier 1 and Tier 2 staff must take the credit for this Resource planning for coming years is a concern Still to see what effect many more non-expert users will have Transition from EGEE to EGI is now – It is (hopefully!) not a major risk for WLCG Must start to address long term sustainability of the system we have 30Ian.Bird@cern.ch
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.