Download presentation
Presentation is loading. Please wait.
Published bySherilyn Gilbert Modified over 6 years ago
1
LFC Status and Futures INFN T1+T2 Cloud Workshop
James Casey, CERN 22 November 2006
2
INFN T1+T2 Cloud Workshop
About this talk What this is about… It’s a summary of the current state of the LFC And a focus on a few outstanding issues Mostly about WLCG VO usage But I can answer questions about wider usage It isn’t a tutorial But pointers at end to tutorial (and other) information But stop me if I don’t explain something enough INFN T1+T2 Cloud Workshop
3
Current LFC Implementation
Current supported version is LFC Part of gLite 3.0 ( Update 06) glite-LFC_mysql and glite-LFC_oracle YAIM profiles Main feature set is stable Hierarchical namespace Replicas attached to LFN Limited “file system” metadata attached to LFC Transactions and sessions GSI authenticated and authorized access VOMS integration Oracle and MySQL as possible database backend INFN T1+T2 Cloud Workshop
4
INFN T1+T2 Cloud Workshop
Usage WLCG Usage ALICE currently use the LFC as a local catalog But this is under review, and they might change to a “trivial” local catalog implementation, similar to what CMS uses currently ATLAS use the LFC as both a central and local catalog ATLAS DDM plan does away with central catalog “Local” catalog is per-cloud, and resides at T1 CMS use LFC technology for their central DLS catalog LHCb use LFC as a central catalog, with R/O replicas at T1s Currently only one replica, at CNAF Global usage 30 central LFCs for 106 VOs, with 27 DLIs 60 local LFCs for 72 VOs, with 57 DLIs INFN T1+T2 Cloud Workshop
5
Particular issues of interest
Recent work focus Replication of Oracle DB for LHCb Service Management with QUATTOR & LEMON Currently, work is going into the following areas Secondary group support SSL Session re-use Performance evaluation and tuning Future work (still to be approved by TCG) Native Java API Bulk methods Web Service interface INFN T1+T2 Cloud Workshop
6
LFC Service Replication
LHCb model is for R/O replicas at Tier-1 CNAF is the first one deployed Now in production for a few weeks Still need to work out how to operate a “distributed & tightly coupled” service E.g. Co-ordination of maintenance operations More details in Barbara's talk INFN T1+T2 Cloud Workshop
7
Service Management at CERN
LFC was first deployed during SC3 for the experiments to evaluate as a pilot service It was decided that it was a suitable replacement for the EDG RLS Migration strategy for existing data decided, and migration carried out At the same time, new nodes allocated and a fully “quattorized” installation was created Done under the WLCG Service Coordination activity Followed ‘Dashboard’ procedure INFN T1+T2 Cloud Workshop
8
Requirements / Development
INFN T1+T2 Cloud Workshop
9
INFN T1+T2 Cloud Workshop
Hardware INFN T1+T2 Cloud Workshop
10
INFN T1+T2 Cloud Workshop
Operations INFN T1+T2 Cloud Workshop
11
INFN T1+T2 Cloud Workshop
Service Deployment INFN T1+T2 Cloud Workshop
12
INFN T1+T2 Cloud Workshop
Quattor At CERN, we don’t use the full set of LAL components But try for an approach that is more integrated with yaim Re-use other ncm components for LFC configuration ncm-castorconf for /etc/shift.conf ncm-exportconf for /etc/sysconfig/lfcdaemon ncm-sindes for host cert and DB password propagation And ncm-yaim for all other ‘grid’ configuration Use LEAF for state management Putting a machine in maintenance takes it out of the public load-balanced aliases INFN T1+T2 Cloud Workshop
13
INFN T1+T2 Cloud Workshop
LEMON Created LEMON sensors to detect LFC level problems Also for statistics gathering We monitor the LFC daemon processes too And the GRIS BDII on the node Take node out of load-balanced alias on alarms INFN T1+T2 Cloud Workshop
14
INFN T1+T2 Cloud Workshop
LEMON alarms at CERN Alarm name Description Comments LFCDAEMON_WRONG no lfcdaemon process running LFC_DLI_WRONG no lfc-dli process running LFC_DB_ERROR ORA-number string detected in /var/log/lfc/log Oracle specific LFC_NOREAD can’t stat given directory Ex: /grid/ops LFC_NOWRITE can’t utime on file LFC_SLOWREADDIR excessive time taken to read given directory Time > 10s LFC_ACTIVE_CONN number of active connections to LFC use netstat INFN T1+T2 Cloud Workshop
15
INFN T1+T2 Cloud Workshop
Central ATLAS LFC – 1 day 5 minute sample interval Av = 8.3 op/s Peak = 40 op/s INFN T1+T2 Cloud Workshop
16
INFN T1+T2 Cloud Workshop
Secondary groups Particular authorization problem The LFC doesn’t support secondary groups yet Two different VOMS roles are mapped to two different gids The same user might not be able to access the group owned files, depending on his/her VOMS credentials Short-term solution: Use ACLs (lfc-setacl) Long-term solution: Secondary group support code already donated by grid.it team Some issues to work out in the code LFC developers don’t want to support secondary groups via grid-mapfile But the code will make it into the code base INFN T1+T2 Cloud Workshop
17
INFN T1+T2 Cloud Workshop
SSL Session re-use SSL authentication is the major bottleneck on the LFC nodes Causes a load-bottleneck 20 concurrent SSL authentications on dual 2.8GHz WN Also a round-trip bottleneck Up to 8 roundtrips for authentication Try to see if SSL Session re-use will help Since majority of operations are usually done by a few DNs from a few nodes Proto-typing a solution now to evaluate the performance gains First results show that it will help with load bottleneck But maybe not with the round-trip ? Currently still 6 roundtrips INFN T1+T2 Cloud Workshop
18
INFN T1+T2 Cloud Workshop
Performance analysis Until recently, most performance tests were “synthetic” E.g. in C, not python; simple repetitive operations Trying to move towards tests similar to what experiments do Use ATLAS DQ2 python code as example Move towards running continual regression tests so we can spot changes And add in test cases for code from other VOs INFN T1+T2 Cloud Workshop
19
Creation of 1024 symlinks (s) INFN T1+T2 Cloud Workshop
First Results Creation of symlinks (s) CE HOST prod-lfc-shared-central.cern.ch 19 142 37 254 218 39 lfc01.pic.es 50 16 101 66 230 52 68 lfc0448.gridpp.rl.ac.uk 32 53 12 22 188 42 mu11.matrix.sara.nl 30 54 26 173 31 lfc.triumf.ca 189 216 180 172 7 192 195 lfc-atlas.in2p3.fr 25 33 197 10 44 lfc-sc.cr.cnaf.infn.it 49 40 207 43 lfc.grid.sinica.edu.tw 365 355 321 312 259 333 329 cclcgceli02.in2p3.fr lcgce01.triumf.ca .uni-wuppertal.de grid-ce.physik ce101.cern.ch lcg2ce.ific.uv.es lcgce01.gridpp. rl.ac.uk tbn20.nikhef.nl ??? WAN latencies LFC HOST INFN T1+T2 Cloud Workshop
20
INFN T1+T2 Cloud Workshop
More Information Main LFC documentation page LFC CERN LFC Admin Guide Troubleshooting page Mailing lists User support CERN LFC Administrators LFC middleware 3rd level support INFN T1+T2 Cloud Workshop
21
INFN T1+T2 Cloud Workshop
Summary LFC is stable and widely used Many different deployment models in LHC experiments No major functionality changes foreseen Focus is on reliability and service aspects With some performance analysis ongoing now that global system is in place INFN T1+T2 Cloud Workshop
22
INFN T1+T2 Cloud Workshop
Questions ? INFN T1+T2 Cloud Workshop
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.