Model (CMS) T2 setup for end users

Slides:



Advertisements
Similar presentations
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
16 th May 2006Alessandra Forti Storage Alessandra Forti Group seminar 16th May 2006.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
Stefano Belforte INFN Trieste 1 CMS Simulation at Tier2 June 12, 2006 Simulation (Monte Carlo) Production for CMS Stefano Belforte WLCG-Tier2 workshop.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
Paperless Timesheet Management Project Anant Pednekar.
VO Box Issues Summary of concerns expressed following publication of Jeff’s slides Ian Bird GDB, Bologna, 12 Oct 2005 (not necessarily the opinion of)
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
CMS Computing Model summary UKI Monthly Operations Meeting Olivier van der Aa.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)
Gestion des jobs grille CMS and Alice Artem Trunov CMS and Alice support.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
CMS data access Artem Trunov. CMS site roles Tier0 –Initial reconstruction –Archive RAW + REC from first reconstruction –Analysis, detector studies, etc.
Storage discovery in AliEn
Lyon Analysis Facility - status & evolution - Renaud Vernet.
Jean-Philippe Baud, IT-GD, CERN November 2007
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
Grid Computing: Running your Jobs around the World
Belle II Physics Analysis Center at TIFR
ALICE Monitoring
dCache “Intro” a layperson perspective Frank Würthwein UCSD
Report PROOF session ALICE Offline FAIR Grid Workshop #1
CMS transferts massif Artem Trunov.
Computing Board Report CHIPP Plenary Meeting
Objectives Differentiate between the different editions of Windows Server 2003 Explain Windows Server 2003 network models and server roles Identify concepts.
Artem Trunov and EKP team EPK – Uni Karlsruhe
THE STEPS TO MANAGE THE GRID
Artem Trunov, Günter Quast EKP – Uni Karlsruhe
Simulation use cases for T2 in ALICE
Survey on User’s Computing Experience
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Computing Infrastructure for DAQ, DM and SC
N. De Filippis - LLR-Ecole Polytechnique
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Artem Trunov Computing Center IN2P3
Michael P. McCumber Task Force Meeting April 3, 2006
INFNGRID Workshop – Bari, Italy, October 2004
The LHCb Computing Data Challenge DC06
Presentation transcript:

Model (CMS) T2 setup for end users Artem Trunov for EKP team EPK – Uni Karlsruhe

Intro – use cases T2 is Users need to Where physicist are supposed to do analysis. Serving multi-institution community Users need to Edit code, build, debug apps. Submit, follow, debug jobs Store logs and other non-data files Store data files Have easy and convenient access to storage to manipulate their data files. Work as a part of AWG (Analysis Working Group) spawning across institutes

Presently, Grid-only environment does not satisfy needs of an end user. Local access is required to work

CMS T2 (from Computing TDR) User-visible services required at each Tier-2 centre include: Medium- or long-term storage of required data samples. For analysis work, these will be mostly AOD, with some fraction of RECO. RAW data may be required for calibration and detector studies. Transfer, buffering and short-term caching of relevant samples from Tier-1’s, and transfer of produced data to Tier-1’s for storage. Provision and management of temporary local working space for the results of analysis. Support for remote batch job submission. Support for interactive bug finding e.g. fault finding for crashing jobs. Optimised access to CMS central database servers, possibly via replicas or proxies,for obtaining conditions and calibration data. Mechanisms for prioritisation of resource access between competing remote and local users, in accordance with both CMS and local policies. To support the above user-level services, Tier-2s must provide the following system-level services: Accessibility via the workload management services described in Section 4.8 and access to the data management services described in Section 4.4. Quotas, queuing and prioritisation mechanisms for CPU, storage and data transfer resources, for groups and individual users. Provision of the required software installation to replicate the CMS ‘offline environment’ for running jobs. Provision of software, servers and local databases required for the operation of the CMS workload and data management services. Additional services may include: Job and task tracking including provenance bookkeeping for groups and individual users. Group and personal CVS and file catalogues. Support for local batch job submission.

Access to a T2 To facilitate user’s work and in accordance with CMS C-TDR, T2 should provide to all “associated” users (all national and officially agreed internalional) a mean to login to a given T2 site an opportunity to debug their jobs on a T2 eventually following jobs on WNs access to (local or global) home and group space for log files, code, builds etc direct access to (local or global) group storage for custom skims, ntuples etc

Details: logins for users Gsissh for logins This mode of access is happily used by experiment’s admins on the VO boxes – technology proved by time. The ideal model is to have a gsissh access to general login interactive cluster. Interactive machines will be used for building, debugging, grid UI, etc. User's DN is mapped to a unique local account (better be not generic like cms001). Jobs coming via LCG/gLite CE are mapped to the same account. The minimal model access to the VO BOX, where gsissh is already provided for CMS admins Simplifying user management Students come and go, how to reduce the burden of administration? Local passwordless account is created for every CMS user that receives a national Grid certificate (certain filtering could be applied on the DN, if desired). At the same time the grid map file on the VO Box or interactive cluster is updated to allow gsissh logins. When user's certificate is expired or revoked, his account (or gsissh access) is automatically disabled and later automatically removed. User’s home and workgroup dirs Ideally a user wants to login everywhere but have only one home dir to avoid extra sync. Solution winner – a global user’s home dirs and group’s dirs on AFS, hosted at one center, for example at DESY. Then it simplifies local user management for admins, since local accounts are without a home directory. Users will need to klog to the AFS cell with their AFS password. This also provides additional security and access control to group space and caching. Options for debugging A special grid/local queue with one or few nodes where users can login and debug jobs Could also give (gsissh) access to all worker nodes to debug their jobs in vito.

Details: storage for users User produced data (custom skims, ntuples) should go to some storage space on an SE where it is available for user management, job access and transfers. Local posix access via /pnfs or /dpm is highly desirable Quotas and disk space management User quotas are not enforced, only group quotas. Group storage has g+w sticky bit set such that every group dir is writable by any member. There is a group manager who is responsible for maintaining disk space, talking to users who take too much space, removing old data, negotiating new space quota with admins etc Archiving to tape By default, user's data is not archived to tape, i.e. not in tape pools (where tape is available). When necessary, the group manager can physically copy the data to the tape pool for archiving. The path is most likely changed.

A federation of T2 sites T2 Site2 T2 Site1 Could add T3s as well! Batch nodes T2 Site2 Login nodes /pnfs/site2.de/ /pnfs/site1.de/ /afs/site1.de/ $CVSROOT Could add T3s as well! /pnfs T2 Site1 storage Batch nodes Login nodes /pnfs/site2.de/ /pnfs/site1.de/ /afs/site1.de/ $CVSROOT Infrastructure: AFS, CVS, Web, DB user /pnfs storage

T3 site Main difference – fully dedicated to local activity In principle, no need of grid tools However a CE would still be a benefit to allow sharing of spare CPU. With 8 cores per WN 1 GB per rack (like at the CC) link is not enough for analysis which requires fast data access. Should aim at 1GB per WN to core router For servers: 10GB still to expensive? Traditional batch workers Core router … NAS boxes for remote data serving with xrootd; also PROOF and batch as virtual machines A modern server like Sun Thumper 8 cores 20 TB With xrootd test by Jean-Yves Neif, a 1 GB link was saturated and the CPU was ~20% Can run analysis on spare CPU! Data servers

Link between T3 and higher Tiers SRM is too heavy and inconvenient If xrootd is setup as the main storage, it’s staging (and migration) mechanism can be used to access data stored Could also use xrootd’s migration mechanism to upload the data to higher tiers automatically xrootd←dCache link solution is deployed at the CC. file open Xrootd/PROOF data servers dccp or srmcp transfer Site2 Site1 /pnfs storage /pnfs storage

T3 or T2 with SRM Tier transfers srm:// dCache Imp/exp pool root:// Not implemented dCache Imp/exp pool root:// srm:// “staging” “migration” XROOTD analysis pool root:// Implemented jobs

T3 or T2 without SRM Tier transfers gsiftp:// posix Transfer server 10GB/s posix posix FS: gpfs, xrootd etc root:// jobs

Analysis Facility at a T1 T1 site Shared infrastructure and batch Dedicated storage space Access to entire T1 storage (!) Access to local batch CPU time dedicated to local VO members. Auth. mechanism: a national group in VOMS Batch nodes Login nodes /pnfs/site2.de/ /pnfs/site1.de/ /afs/site1.de/ $CVSROOT /pnfs storage Infrastructure: AFS, CVS, Web, DB user Virtual T2 Site

PROOF at CC-IN2P3 HPSS IFZ rfcp XROOTD analysis pool PROOF Master PROOF agents are run on the xrootd cluster and take advantage of the following: free cpu cycles due to low overhead of xrootd zero cost solution - no new hardware involved Direct access to data on disk, not using bandwidth  1GB node interconnect when inter-server access is required. transparent access to full data set stored at our T1 for all experiments via xrootd-dcache link deployed on this xrootd cluster and dynamic staging management of infrastructure conveniently fit into existing xrootd practices this setup is more close to possible 2008 PROOF solution because of  1GB  node connection and  large "scratch" space. this is a kind of setup that T2 sites may also considers to deploy HPSS IFZ rfcp XROOTD analysis pool worker worker worker worker PROOF Master VO BOX GSI auth Remote session Local User Session