Presentation is loading. Please wait.

Presentation is loading. Please wait.

Model (CMS) T2 setup for end users

Similar presentations


Presentation on theme: "Model (CMS) T2 setup for end users"— Presentation transcript:

1 Model (CMS) T2 setup for end users
Artem Trunov for EKP team EPK – Uni Karlsruhe

2 Intro – use cases T2 is Users need to
Where physicist are supposed to do analysis. Serving multi-institution community Users need to Edit code, build, debug apps. Submit, follow, debug jobs Store logs and other non-data files Store data files Have easy and convenient access to storage to manipulate their data files. Work as a part of AWG (Analysis Working Group) spawning across institutes

3 Presently, Grid-only environment does not satisfy needs of an end user.
Local access is required to work

4 CMS T2 (from Computing TDR)
User-visible services required at each Tier-2 centre include: Medium- or long-term storage of required data samples. For analysis work, these will be mostly AOD, with some fraction of RECO. RAW data may be required for calibration and detector studies. Transfer, buffering and short-term caching of relevant samples from Tier-1’s, and transfer of produced data to Tier-1’s for storage. Provision and management of temporary local working space for the results of analysis. Support for remote batch job submission. Support for interactive bug finding e.g. fault finding for crashing jobs. Optimised access to CMS central database servers, possibly via replicas or proxies,for obtaining conditions and calibration data. Mechanisms for prioritisation of resource access between competing remote and local users, in accordance with both CMS and local policies. To support the above user-level services, Tier-2s must provide the following system-level services: Accessibility via the workload management services described in Section 4.8 and access to the data management services described in Section 4.4. Quotas, queuing and prioritisation mechanisms for CPU, storage and data transfer resources, for groups and individual users. Provision of the required software installation to replicate the CMS ‘offline environment’ for running jobs. Provision of software, servers and local databases required for the operation of the CMS workload and data management services. Additional services may include: Job and task tracking including provenance bookkeeping for groups and individual users. Group and personal CVS and file catalogues. Support for local batch job submission.

5 Access to a T2 To facilitate user’s work and in accordance with CMS C-TDR, T2 should provide to all “associated” users (all national and officially agreed internalional) a mean to login to a given T2 site an opportunity to debug their jobs on a T2 eventually following jobs on WNs access to (local or global) home and group space for log files, code, builds etc direct access to (local or global) group storage for custom skims, ntuples etc

6 Details: logins for users
Gsissh for logins This mode of access is happily used by experiment’s admins on the VO boxes – technology proved by time. The ideal model is to have a gsissh access to general login interactive cluster. Interactive machines will be used for building, debugging, grid UI, etc. User's DN is mapped to a unique local account (better be not generic like cms001). Jobs coming via LCG/gLite CE are mapped to the same account. The minimal model access to the VO BOX, where gsissh is already provided for CMS admins Simplifying user management Students come and go, how to reduce the burden of administration? Local passwordless account is created for every CMS user that receives a national Grid certificate (certain filtering could be applied on the DN, if desired). At the same time the grid map file on the VO Box or interactive cluster is updated to allow gsissh logins. When user's certificate is expired or revoked, his account (or gsissh access) is automatically disabled and later automatically removed. User’s home and workgroup dirs Ideally a user wants to login everywhere but have only one home dir to avoid extra sync. Solution winner – a global user’s home dirs and group’s dirs on AFS, hosted at one center, for example at DESY. Then it simplifies local user management for admins, since local accounts are without a home directory. Users will need to klog to the AFS cell with their AFS password. This also provides additional security and access control to group space and caching. Options for debugging A special grid/local queue with one or few nodes where users can login and debug jobs Could also give (gsissh) access to all worker nodes to debug their jobs in vito.

7 Details: storage for users
User produced data (custom skims, ntuples) should go to some storage space on an SE where it is available for user management, job access and transfers. Local posix access via /pnfs or /dpm is highly desirable Quotas and disk space management User quotas are not enforced, only group quotas. Group storage has g+w sticky bit set such that every group dir is writable by any member. There is a group manager who is responsible for maintaining disk space, talking to users who take too much space, removing old data, negotiating new space quota with admins etc Archiving to tape By default, user's data is not archived to tape, i.e. not in tape pools (where tape is available). When necessary, the group manager can physically copy the data to the tape pool for archiving. The path is most likely changed.

8 A federation of T2 sites T2 Site2 T2 Site1 Could add T3s as well!
Batch nodes T2 Site2 Login nodes /pnfs/site2.de/ /pnfs/site1.de/ /afs/site1.de/ $CVSROOT Could add T3s as well! /pnfs T2 Site1 storage Batch nodes Login nodes /pnfs/site2.de/ /pnfs/site1.de/ /afs/site1.de/ $CVSROOT Infrastructure: AFS, CVS, Web, DB user /pnfs storage

9 T3 site Main difference – fully dedicated to local activity
In principle, no need of grid tools However a CE would still be a benefit to allow sharing of spare CPU. With 8 cores per WN 1 GB per rack (like at the CC) link is not enough for analysis which requires fast data access. Should aim at 1GB per WN to core router For servers: 10GB still to expensive? Traditional batch workers Core router NAS boxes for remote data serving with xrootd; also PROOF and batch as virtual machines A modern server like Sun Thumper 8 cores 20 TB With xrootd test by Jean-Yves Neif, a 1 GB link was saturated and the CPU was ~20% Can run analysis on spare CPU! Data servers

10 Link between T3 and higher Tiers
SRM is too heavy and inconvenient If xrootd is setup as the main storage, it’s staging (and migration) mechanism can be used to access data stored Could also use xrootd’s migration mechanism to upload the data to higher tiers automatically xrootd←dCache link solution is deployed at the CC. file open Xrootd/PROOF data servers dccp or srmcp transfer Site2 Site1 /pnfs storage /pnfs storage

11 T3 or T2 with SRM Tier transfers srm:// dCache Imp/exp pool root://
Not implemented dCache Imp/exp pool root:// srm:// “staging” “migration” XROOTD analysis pool root:// Implemented jobs

12 T3 or T2 without SRM Tier transfers gsiftp:// posix
Transfer server 10GB/s posix posix FS: gpfs, xrootd etc root:// jobs

13 Analysis Facility at a T1
T1 site Shared infrastructure and batch Dedicated storage space Access to entire T1 storage (!) Access to local batch CPU time dedicated to local VO members. Auth. mechanism: a national group in VOMS Batch nodes Login nodes /pnfs/site2.de/ /pnfs/site1.de/ /afs/site1.de/ $CVSROOT /pnfs storage Infrastructure: AFS, CVS, Web, DB user Virtual T2 Site

14 PROOF at CC-IN2P3 HPSS IFZ rfcp XROOTD analysis pool PROOF Master
PROOF agents are run on the xrootd cluster and take advantage of the following: free cpu cycles due to low overhead of xrootd zero cost solution - no new hardware involved Direct access to data on disk, not using bandwidth  1GB node interconnect when inter-server access is required. transparent access to full data set stored at our T1 for all experiments via xrootd-dcache link deployed on this xrootd cluster and dynamic staging management of infrastructure conveniently fit into existing xrootd practices this setup is more close to possible 2008 PROOF solution because of  1GB  node connection and  large "scratch" space. this is a kind of setup that T2 sites may also considers to deploy HPSS IFZ rfcp XROOTD analysis pool worker worker worker worker PROOF Master VO BOX GSI auth Remote session Local User Session


Download ppt "Model (CMS) T2 setup for end users"

Similar presentations


Ads by Google