Presentation is loading. Please wait.

Presentation is loading. Please wait.

GRIF Status Michel Jouvin LAL / IN2P3

Similar presentations


Presentation on theme: "GRIF Status Michel Jouvin LAL / IN2P3"— Presentation transcript:

1 GRIF Status http://grif.fr Michel Jouvin LAL / IN2P3 jouvin@lal.in2p3.fr

2 12/10/2005GRIF Tier2 - HEPix - SLAC 2005 Objectives Build a Tier2 facility for simulation and analysis in Paris Region Build a Tier2 facility for simulation and analysis in Paris Region 80% LHC 4 experiments, 20% EGEE and local 80% LHC 4 experiments, 20% EGEE and local LHC : analysis (2/3) and MC simulation (1/3) LHC : analysis (2/3) and MC simulation (1/3) Be ready at LHC startup (2 nd half of 2007) Be ready at LHC startup (2 nd half of 2007) Resource goals(end of 2007) Resource goals(end of 2007) CPU : 1500 kSI2K (1kSI2K ~ P4 Xeon 2,8 Ghz) CPU : 1500 kSI2K (1kSI2K ~ P4 Xeon 2,8 Ghz) Storage : 350 TB of disks (disk only, no MSS) Storage : 350 TB of disks (disk only, no MSS) Network : 10 Gb/s backbone inside Tier2, 1 Gb/s external link Network : 10 Gb/s backbone inside Tier2, 1 Gb/s external link

3 12/10/2005GRIF Tier2 - HEPix - SLAC 2005 Members Project started by DAPNIA (CEA), LAL (IN2P3, Orsay) and LPNHE (IN2P3, Paris), Fall 2004 Project started by DAPNIA (CEA), LAL (IN2P3, Orsay) and LPNHE (IN2P3, Paris), Fall 2004 DAPNIA and LAL involved in Grid effort since beginning of EDG DAPNIA and LAL involved in Grid effort since beginning of EDG 3 EGEE contracts (2 for operation support) 3 EGEE contracts (2 for operation support) No lab big enough to run a T2 by itself No lab big enough to run a T2 by itself LLR (IN2P3, Palaiseau) and IPNO (IN2P3, Orsay) joined the project in Sept. 05 LLR (IN2P3, Palaiseau) and IPNO (IN2P3, Orsay) joined the project in Sept. 05 IPNO : nuclear physics (Alice + Agatha) IPNO : nuclear physics (Alice + Agatha) LLR : CMS LLR : CMS

4 12/10/2005GRIF Tier2 - HEPix - SLAC 2005 Organization 1 EGEE/LCG site, distributed over all labs 1 EGEE/LCG site, distributed over all labs Computing and storage resources in each lab Computing and storage resources in each lab Computing rooms and financing Computing rooms and financing IPNO wil concentrate on non LHC resources funding IPNO wil concentrate on non LHC resources funding 1 Gb/s link for IPNO, LAL, LPNHE, “soon” for DAPNIA 1 Gb/s link for IPNO, LAL, LPNHE, “soon” for DAPNIA Technical Committee : people from every lab Technical Committee : people from every lab 5 FTE in 2005, 6-7 in 2006, more in 2007 5 FTE in 2005, 6-7 in 2006, more in 2007 Currently 15-20 people involved (several part time) Currently 15-20 people involved (several part time) M. Jouvin (chairman), P. Micout, P.F. Honoré… M. Jouvin (chairman), P. Micout, P.F. Honoré… Scientific Committee (fund raising) Scientific Committee (fund raising) J.P. Meyer (DAPNIA/Atlas, chairman), 1 person / lab J.P. Meyer (DAPNIA/Atlas, chairman), 1 person / lab

5 12/10/2005GRIF Tier2 - HEPix - SLAC 2005 Finances Total budget estimated to 1,6 M€ (2005-2007) Total budget estimated to 1,6 M€ (2005-2007) 30% from Region council 30% from Region council 30% from National Research Agency (ANR) 30% from National Research Agency (ANR) 40% from the labs (CEA, CNRS, Paris6 university) 40% from the labs (CEA, CNRS, Paris6 university) No significant support from IN2P3 / LCG France (focused on T1) No significant support from IN2P3 / LCG France (focused on T1) ½ budget still uncertain… First answers soon… ½ budget still uncertain… First answers soon… Progressive investment : no HW replacement before 2009 Progressive investment : no HW replacement before 2009 2005 : 150 K€, 2006 : 450 K€, 2007 : 1 M€ 2005 : 150 K€, 2006 : 450 K€, 2007 : 1 M€ If necessary, could use 2008 to spread the effort If necessary, could use 2008 to spread the effort 2009+ : 300 K€/year expected from IN2P3/LCG France 2009+ : 300 K€/year expected from IN2P3/LCG France

6 12/10/2005GRIF Tier2 - HEPix - SLAC 2005 Current Status EGEE/LCG GRIF site created EGEE/LCG GRIF site created IN2P3-LAL decommissionned, resources moved to GRIF IN2P3-LAL decommissionned, resources moved to GRIF 2 sites with resources, 2 sites ordering 2 sites with resources, 2 sites ordering DAPNIA : 20 WNs CPUs, 12 TB, installation in progress DAPNIA : 20 WNs CPUs, 12 TB, installation in progress LAL : 26 WNs CPUs, 8 TB (SRM/DPM), LCG services LAL : 26 WNs CPUs, 8 TB (SRM/DPM), LCG services 4,5 TB on order 4,5 TB on order LPNHE : 15 WNs CPUs, 5 TB ordered soon LPNHE : 15 WNs CPUs, 5 TB ordered soon IPNO : 20 WN CPUs (dual core blades) IPNO : 20 WN CPUs (dual core blades) End of 2005 : 80 WNs CPUs, 25 TB End of 2005 : 80 WNs CPUs, 25 TB Separate CE/SE on each site Separate CE/SE on each site

7 12/10/2005GRIF Tier2 - HEPix - SLAC 2005 2005 Main Activities… Setup of resources on each site Setup of resources on each site Global configuration consistency : Quattor choosen Global configuration consistency : Quattor choosen Flexible site customization inside a unique database Flexible site customization inside a unique database Setup of a multi-site technical team Setup of a multi-site technical team Tutorials for new sites administrators Tutorials for new sites administrators Sharing management load (ex : middleware upgrade) Sharing management load (ex : middleware upgrade) Write documentation for sharing information and expertise (Trac) Write documentation for sharing information and expertise (Trac)

8 12/10/2005GRIF Tier2 - HEPix - SLAC 2005 … 2005 Main Activites Evaluate DPM as a storage solution Evaluate DPM as a storage solution Successful so far, easy to setup and manage Successful so far, easy to setup and manage Quattor component written to manage DPM configuration Quattor component written to manage DPM configuration Plan to evaluate a multi-site configuration Plan to evaluate a multi-site configuration Disk servers on several sites Disk servers on several sites Current lack of srmcp is a problem with CMS/Phedex Current lack of srmcp is a problem with CMS/Phedex Participation to LCG SC3 Participation to LCG SC3 Throughput phase : 35 MB/s sustained 4 days Throughput phase : 35 MB/s sustained 4 days Plan to join service phase mid-november Plan to join service phase mid-november

9 12/10/2005GRIF Tier2 - HEPix - SLAC 2005 2006 : Mini Tier2 Main goal : setup 20+% of final configuration Main goal : setup 20+% of final configuration 300 WNs CPUs, 70 TB 300 WNs CPUs, 70 TB Exact size wil depend on fund rising success… Exact size wil depend on fund rising success… Focus Focus Muti-site or mono-site CE/SE resources Muti-site or mono-site CE/SE resources Final choice for batch scheduler : evaluation of LSF and SGE Final choice for batch scheduler : evaluation of LSF and SGE Final choice for SE architecture (DPM only, DPM + LUSTRE) Final choice for SE architecture (DPM only, DPM + LUSTRE) Setup of monitoring tools : Nagios ?, Lemon ?, others ? Setup of monitoring tools : Nagios ?, Lemon ?, others ? Integration with local operations on each site Integration with local operations on each site Miscellanous Miscellanous Continue active participitation to SC Continue active participitation to SC Evaluation of 10 Gb/s link feasibality and effectiveness Evaluation of 10 Gb/s link feasibality and effectiveness Computer rooms requirements (electrical power, air cooling…) Computer rooms requirements (electrical power, air cooling…)

10 12/10/2005GRIF Tier2 - HEPix - SLAC 2005 Storage Challenge Efficient use and management of a large amount of storage seen as the main challenge Efficient use and management of a large amount of storage seen as the main challenge Access to data from 1000+ CPUs, no staging Access to data from 1000+ CPUs, no staging Decided to start partnership with HP on LUSTRE in the Grid (LCG) context Decided to start partnership with HP on LUSTRE in the Grid (LCG) context Performance with a large number of clients Performance with a large number of clients Geographically distributed LUSTRE configuration Geographically distributed LUSTRE configuration Replication of critical datas (metadatas) among sites Replication of critical datas (metadatas) among sites SRM and/or xrootd integration SRM and/or xrootd integration Funds requested to ANR, answer soon… Funds requested to ANR, answer soon… Uncertainty with HP troubles in France… Uncertainty with HP troubles in France…

11 12/10/2005GRIF Tier2 - HEPix - SLAC 2005 Batch Scheduler 1 unified T2 means 1 batch scheduler 1 unified T2 means 1 batch scheduler Required for a coherent view/publishing of resources Required for a coherent view/publishing of resources Main requirements Main requirements Efficient use of distributed resources Efficient use of distributed resources Handle 1000+ running jobs, 10Kjobs in queues Handle 1000+ running jobs, 10Kjobs in queues Torque may not be appropriate Torque may not be appropriate Scalability and rosbustness, lack of dynamic reconfiguration Scalability and rosbustness, lack of dynamic reconfiguration Looking at LSF Looking at LSF LAL has experience for its internal use (and contacts…) LAL has experience for its internal use (and contacts…) Multicluster may offer the flexibility for global unified resource but maintaining some job/resources affinity at each site Multicluster may offer the flexibility for global unified resource but maintaining some job/resources affinity at each site Evaluation to start soon : 1 cluster+CE per site + cross submission Evaluation to start soon : 1 cluster+CE per site + cross submission Other candidates : SGE, Condor ? Other candidates : SGE, Condor ?


Download ppt "GRIF Status Michel Jouvin LAL / IN2P3"

Similar presentations


Ads by Google