LCG-France Tier-1 and Analysis Facility Overview Fabio Hernandez IN2P3/CNRS Computing Centre - Lyon CMS Tier-1 tour Lyon, November 30 th.

Slides:

Advertisements

Similar presentations

Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.

Advertisements

LCG-France Project Status Fabio Hernandez Frédérique Chollet Fairouz Malek Réunion Sites LCG-France Annecy, May

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

GRIF Status Michel Jouvin LAL / IN2P3

Overview of LCG-France Tier-2s and Tier-3s Frédérique Chollet (IN2P3-LAPP) on behalf of the LCG-France project and Tiers representatives CMS visit to Tier-1.

A New Building Data Center Upgrade capacity and technology step 2011 – May the 4 th– Hepix spring meeting Darmstadt (de) Pascal Trouvé (Facility Manager.

October 23rd, 2009 Visit of CMS Computing Management at CC-IN2P3.

Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.

October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.

LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.

Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.

CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

Status of the DESY Grid Centre Volker Guelzow for the Grid Team DESY IT Hamburg, October 25th, 2011.

Preparation of KIPT (Kharkov) computing facilities for CMS data analysis L. Levchuk Kharkov Institute of Physics and Technology (KIPT), Kharkov, Ukraine.

G.Rahal LHC Computing Grid: CCIN2P3 role and Contribution KISTI-CCIN2P3 Workshop Ghita Rahal KISTI, December 1st, 2008.

BINP/GCF Status Report BINP LCG Site Registration Oct 2009

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

11/30/2007 Overview of operations at CC-IN2P3 Exploitation team Reported by Philippe Olivero.

Tier 1 Facility Status and Current Activities Rich Baker Brookhaven National Laboratory NSF/DOE Review of ATLAS Computing June 20, 2002.

GridPP3 Project Management GridPP20 Sarah Pearce 11 March 2008.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.

Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager

GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.

LCG-France Vincent Breton, Eric Lançon and Fairouz Malek, CNRS-IN2P3 and LCG-France ISGC Symposium Taipei, March 27th 2007.

Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.

Infrastructure Improvements 2010 – November 4 th – Hepix – Ithaca (NY)

21 October 2010 Dietrich Liko Grid Tier-2 HEPHY Scientific Advisory Board.

Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.

The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.

US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National Laboratory November.

ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.

US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.

11 November 2010 Natascha Hörmann Computing at HEPHY Evaluation 2010.

Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team ATHIC2012, Busan,

CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.

U.S. ATLAS Computing Facilities Overview Bruce G. Gibbard Brookhaven National Laboratory U.S. LHC Software and Computing Review Brookhaven National Laboratory.

Tier 1 at Brookhaven (US / ATLAS) Bruce G. Gibbard LCG Workshop CERN March 2004.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

CERN IT Department CH-1211 Genève 23 Switzerland t SL(C) 5 Migration at CERN CHEP 2009, Prague Ulrich SCHWICKERATH Ricardo SILVA CERN, IT-FIO-FS.

EGEE is a project funded by the European Union under contract IST New VO Integration Fabio Hernandez ROC Managers Workshop,

WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team

A Computing Tier 2 Node Eric Fede – LAPP/IN2P3. 2 Eric Fede – 1st Chinese-French Workshop Plan What is a Tier 2 –Context and definition To be a Tier 2.

Summary of OSG Activities by LIGO and LSC LIGO NSF Review November 9-11, 2005 Kent Blackburn LIGO Laboratory California Institute of Technology LIGO DCC:

CC-IN2P3 Tier-2s Cloud Frédérique Chollet (IN2P3-LAPP) on behalf of the LCG-France project and Tiers representatives ATLAS visit to Tier-1 Lyon, April.

Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.

Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.

LCG-France, the infrastructure, the activities Informal meeting France-Israel November 3rd, 2009.

Using HLRmon for advanced visualization of resource usage Enrico Fattibene INFN - CNAF ISCG 2010 – Taipei March 11 th, 2010.

November 28, 2007 Dominique Boutigny – CC-IN2P3 CC-IN2P3 Update Status.

1 Tutorial Outline 30’ From Content Management Systems to VREs 50’ Creating a VRE 80 Using a VRE 20’ Conclusions.

CC-IN2P3: A High Performance Data Center for Research Dominique Boutigny February 2011 Toward a future cooperation with Israel.

Report from US ALICE Yves Schutz WLCG 24/01/2007.

Status report on LHC_2: ATLAS computing

LHC Computing Grid Project Status

Data Challenge with the Grid in ATLAS

Western Analysis Facility

LCG-France activities

Update on Plan for KISTI-GSDC

Luca dell’Agnello INFN-CNAF

The CCIN2P3 and its role in EGEE/LCG

News and computing activities at CC-IN2P3

Simulation use cases for T2 in ALICE

Organization of ATLAS computing in France

LHC Data Analysis using a worldwide computing grid

Pierre Girard ATLAS Visit

The LHCb Computing Data Challenge DC06

Presentation transcript:

LCG-France Tier-1 and Analysis Facility Overview Fabio Hernandez IN2P3/CNRS Computing Centre - Lyon CMS Tier-1 tour Lyon, November 30 th 2007

F. Hernandez 2 Contents Introduction to LCG-France Introduction to CC-IN2P3 Resources for LHC experiments with focus on CMS  budget, plan and pledges  contribution Current and future work Conclusions Questions

F. Hernandez 3 LCG-France project Goals  Setup, develop and maintain a LCG Tier-1 and an Analysis Facility at CC-IN2P3  Promote the creation of Tier-2/Tier-3 french sites and coordinate their integration into the WLCG collaboration Funding  national funding for Tier-1 and AF  Tier-2s and tier-3s are funded by universities, local/regional governments, hosting laboratories, … Organization  Started in June 2004  Scientific and technical leaders appointed for a 4-year term, management board (executive) and overview boards in place since then Leverage on EGEE operations organization  Co-location of the tier-1 and the EGEE regional operations centre More information  Project web site:  Project charter:

F. Hernandez 4 LCG-France: sites Tier-2: Subatech Tier-2: LPC Tier-2: GRIF CEA/DAPNIA LAL LLR LPNHE IPNO Tier-2: GRIF CEA/DAPNIA LAL LLR LPNHE IPNO AF: CC-IN2P3 Tier-3: LAPP Tier-1: CC-IN2P3 Tier-3: IPHC Lyon Clermont-Ferrand Ile de France Marseille Nantes Strasbourg Annecy Tier-3: IPNL Tier-3: CPPM

Introduction to CC-IN2P3 Data repository and processing facility shared by several experiments  Operates a WLCG tier-1, a tier-2 and an Analysis Facility for the 4 LHC experiments The main compute farm used by both grid and local users  Grid middleware is "just another" interface for using our services Data storage infrastructure (disk and mass storage) accessible to all jobs running in the site  Although not all storage spaces have a gridified interface F. Hernandez 5

Introduction to CC-IN2P3 (cont.) Also operating grid services for non-LHC VOs F. Hernandez 6

7 Budget: all LHC Experiments Equipment and running costs for all LHC experiments at CC-IN2P3 ( )  Total required: 31,9 M€  the refurbishment of current machine room and the construction of a second one are NOT included Budget requested on a pluriannual basis  Approval on a yearly basis  Impact on hardware procurement process

F. Hernandez 8 Budget: CMS Equipment and running costs for CMS needs ( )  7,7 M€

Planned resource deployment F. Hernandez 9

10 Planned resource deployment: CMS A fraction of those resources are not pledged

F. Hernandez 11 Planned vs. Actual Contribution Source:

Resources: CPU F. Hernandez 12 Pledged CPU capacity equivalent to 588 job slots

Resources: CPU F. Hernandez 13 Daily throughput for CMS: 4000 jobs Total throughput of the site: jobs/day Daily throughput for CMS: 4000 jobs Total throughput of the site: jobs/day

Resources: disk F. Hernandez 14 Allocation: 292 TB (pledged: 237 TB) Total allocation (all supported experiments) : 1.2 PB Allocation: 292 TB (pledged: 237 TB) Total allocation (all supported experiments) : 1.2 PB

Resources: MSS F. Hernandez 15 Utilization by CMS: 304 TB (pledged: 292 TB) Total amount of data managed by HPSS: 2.7 PB Utilization by CMS: 304 TB (pledged: 292 TB) Total amount of data managed by HPSS: 2.7 PB

Data transfer: CERN→CCIN2P3 F. Hernandez 16 CERN → CCIN2P3 target rate for CMS: 32 MB/sec CERN → CCIN2P3 target rate for CMS: 32 MB/sec

F. Hernandez 17 WAN bandwidth requirements Source: Megatable The number of CMS sites we have to exchage data (and solve problems) with is worrying!!!

F. Hernandez 18 Procurement We have to satisfy several constraints  Formal process is long  Call for tenders at the European level  Budget is approved on a yearly basis  « Final word » during last quarter each year  Limited machine room space available  Extensive in-situ tests to realistically identify the real characteristics of candidate hardware (when possible)  Optimize (computing power/m²) but also (computing power/€) ○Forecast of running costs performed at this stage  Desired availability of computing equipment in operation by experiments  Delays in the deliveries of equipment

F. Hernandez 19 Procurement (cont.) Starting in 2007, we modified our procurement plan for LHC experiments  With budget for year N, purchase at least 40% of the required equipment for year N+1  Procurement process for remaining fraction triggered as soon as next year budget is known Our experience with this model for this year's procurement makes us confident that we will be in good shape to provide a significant amount of the pledged capacity by April 1st each year

F. Hernandez : Capacity Increase Compute nodes  479 worker nodes  DELL Intel Xeon 2.33 GHz, 8 cores, 2 CPUs, 16 GB RAM, 160 GB disk  6.2 MSI2000  New bid ongoing  Expected ~400 equivalent machines Disk servers  1.2 PB  Sun X4500 (Thumpers)  Order placed for additional 0.6 PB Mass storage  Cartridges for populating the SUN/STK SL8500 library  Both STK T and LTO4  Increase of the cartridge storage capacity: 2 PB

F. Hernandez 21 Site Availability Site availability daily score: May – October 2007 Sources: Below the target

VO-specific tests F. Hernandez 22 Source: Alberto Aimar, Management Board meeting, 20th Nov. 2007, : We need to understand why our site is not perceived available by CMS, in spite of the number of jobs and data being constantly transfered to and from the site

F. Hernandez 23 Current work Top priority: stability!!! Continue the integration of the grid services to the standard operations  Including on-call service  Monitoring & alerting, progressively documenting procedures, identifying roles and levels of service, etc.  Strong interaction with people developing the grid operations portal (CIC) Consolidation of the services  Deploy for availability  Hardware redundancy  Usage of (real or virtual) stand-by machines  Including VO Boxes Assigning job priorities based on VOMS roles & groups  Interim solution in place

F. Hernandez 24 Current work (cont.) Continous development of BQS  for coping with the expected load in the years ahead  for making it more grid-aware  Keep grid attributes in the job records ○Submitter identity, grid name, VO name, etc.  Scheduling based on grid-related attributes ○VOMS roles/groups, grid identity, etc.  Allow/deny job execution based on grid identity  Development of gLiteCE and CREAM compatible BQS-backed computing element

F. Hernandez 25 Current work (cont.) Storage services  Consolidating the AFS service  Continuous work for internal reconfiguration of HPSS according to the (known) needs of LHC experiments  Planned upgrade of the hardware for the core machines of dCache/SRM Trying to understand how the data will be accessed  What are the required rates for data transfers between MSS→disk→worker nodes and backwards… ..for each one of the several kind of job (reconstruction, simulation, analysis, …) Job profiling  Studying the observed usage of memory and CPU time for LHC jobs  Memory requirements have a significant impact on budget and on the capacity of our site to efficiently exploit the purchased hardware Understanding how to build an Analysis Facility

F. Hernandez 26 Facility Upgrade Last July, a major effort for upgrading the electric and cooling infrastructure of the site was finished  From 500 kW to 1000 kW of electrical power usable for computing equipment  +600 kW for cooling  Improvements include  New diesel generator (880 kW, 72h autonomy)  2 additional UPS (500 kW each)  Significant improvement of electrical distribution  1 additional liquid cooler, pipe network for cool water, 7 additional chilled water units (for the machine room and for the UPS) Budget: more than 1,5 M€  2+ years-long project People have done (almost heroic) efforts to maintain the site in (near normal) operating conditions

F. Hernandez 27 New building Courtesy of Dominique Boutigny

F. Hernandez 28 New building (cont.) On-going project for building an additional machine room  800 m² floor space  Electric power for computing equipment: 1 MW at the beginning, with capacity for increasing up to 2,5 MW Offices: for around 30 additional people Meeting rooms, 140+ seats amphitheatre Encouraging signals recently received regarding funding, but the second machine room won’t be available before 2010 as needed  Currently evaluating temporary solutions (offsite hosting of the equipment, shorten renewal intervals by leasing the hardware, use of modular transportable machine rooms to be installed in the parking lot)

F. Hernandez 29 New building (cont.) Courtesy of Dominique Boutigny

F. Hernandez/F. Malek 30

F. Hernandez 31 What’s next today In the coming presentations you will find the detailed status and plans of the  Storage infrastructure and data transfers  Grid services  Site operations  Network infrastructure

F. Hernandez 32 Questions

F. Hernandez 33 Fabien Wernli, 2006