A New CDF Model for Data Movement Based on SRM Manoj K. Jha INFN- Bologna 27 th Feb., 2009 University of Birmingham, U.K.

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

Physics with SAM-Grid Stefan Stonjek University of Oxford 6 th GridPP Meeting 30 th January 2003 Coseners House.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.
Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.
Workload Management Massimo Sgaravatto INFN Padova.
JIM Deployment for the CDF Experiment M. Burgon-Lyon 1, A. Baranowski 2, V. Bartsch 3,S. Belforte 4, G. Garzoglio 2, R. Herber 2, R. Illingworth 2, R.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
The D0 Monte Carlo Challenge Gregory E. Graham University of Maryland (for the D0 Collaboration) February 8, 2000 CHEP 2000.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
LcgCAF:CDF submission portal to LCG Federica Fanzago for CDF-Italian Computing Group Gabriele Compostella, Francesco Delli Paoli, Donatella Lucchesi, Daniel.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
11 March 2004Getting Ready for the Grid SAM: Tevatron Experiments Using the Grid CDF and D0 Need the Grid –Requirements, the CAF and SAM –Grid from the.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Deploying and Operating the SAM-Grid: lesson learned Gabriele Garzoglio for the SAM-Grid Team Sep 28, 2004.
SAM Job Submission What is SAM? sam submit …… Data Management Details. Conclusions. Rod Walker, 10 th May, Gridpp, Manchester.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
Distribution After Release Tool Natalia Ratnikova.
September 4,2001Lee Lueking, FNAL1 SAM Resource Management Lee Lueking CHEP 2001 September 3-8, 2001 Beijing China.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
A Design for KCAF for CDF Experiment Kihyeon Cho (CHEP, Kyungpook National University) and Jysoo Lee (KISTI, Supercomputing Center) The International Workshop.
Interactive Job Monitor: CafMon kill CafMon tail CafMon dir CafMon log CafMon top CafMon ps LcgCAF: CDF submission portal to LCG resources Francesco Delli.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
ORBMeeting July 11, Outline SAM Overview and Station description Resource Management Station Cache Station Prioritized Fair Share Job Control File.
Movement of MC Data using SAM & SRM Interface Manoj K. Jha (INFN- Bologna) Gabriele Compostella, Antonio Cuomo, Donatella Lucchesi,, Simone Pagan (INFN.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
A Test Framework for Movement of MC Data using SAM & SRM Interface Manoj K. Jha (INFN- Bologna) Antonio Cuomo, Donatella Lucchesi, Gabriele Compostella,
DØSAR a Regional Grid within DØ Jae Yu Univ. of Texas, Arlington THEGrid Workshop July 8 – 9, 2004 Univ. of Texas at Arlington.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Outline: Tasks and Goals The analysis (physics) Resources Needed (Tier1) A. Sidoti INFN Pisa.
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
May Donatella Lucchesi 1 CDF Status of Computing Donatella Lucchesi INFN and University of Padova.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Storage and Data Movement at FNAL D. Petravick CHEP 2003.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
DCAF (DeCentralized Analysis Farm) Korea CHEP Fermilab (CDF) KorCAF (DCAF in Korea) Kihyeon Cho (CHEP, KNU) (On the behalf of HEP Data Grid Working Group)
DCAF(DeCentralized Analysis Farm) for CDF experiments HAN DaeHee*, KWON Kihwan, OH Youngdo, CHO Kihyeon, KONG Dae Jung, KIM Minsuk, KIM Jieun, MIAN shabeer,
Adapting SAM for CDF Gabriele Garzoglio Fermilab/CD/CCF/MAP CHEP 2003.
MC Production in Canada Pierre Savard University of Toronto and TRIUMF IFC Meeting October 2003.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
D0 File Replication PPDG SLAC File replication workshop 9/20/00 Vicky White.
A New CDF Model for Data Movement Based on SRM Manoj K. Jha INFN- Bologna Presently at Fermilab 21 st April, 2009 Post Doctoral Interview University of.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
CDF Monte Carlo Production on LCG GRID via LcgCAF Authors: Gabriele Compostella Donatella Lucchesi Simone Pagan Griso Igor SFiligoi 3 rd IEEE International.
Workload Management Workpackage
LcgCAF:CDF submission portal to LCG
Artem Trunov and EKP team EPK – Uni Karlsruhe
Support for ”interactive batch”
Presentation transcript:

A New CDF Model for Data Movement Based on SRM Manoj K. Jha INFN- Bologna 27 th Feb., 2009 University of Birmingham, U.K.

Overview  Introduction  CDF  Tevatron Performance  CDF Analysis Framework (CAF)  Present MC Prod. Model  Data Transfer Model  Proposed Model  Test Framework  Performance Parametrs  Other Activities 6/5/2016Manoj K. Jha INFN Bologna2

Collider Detector at Fermilab (CDF) 6/5/2016Manoj K. Jha INFN Bologna3  CDF :  Large multipurpose Particle Physics Experiment at Fermi National Accelerator Lab (Fermilab)  Started collecting data in 1988  Data taking will continue until at least Oct (with desire to extend another year) also in 2010 pp 1.96 Tev

Tevatron Performance 6/5/2016Manoj K. Jha INFN Bologna4  Great Tevatron performance  close to 5 fb -1 delivered  Need Monte Carlo(MC) data for detector understanding and physics analysis.  Not feasible to produce all the MC data onsite due to limitation on computing resources.  Only option left is to utilize resources at Tier-I and Tier – II remote grid sites using the CDF Analysis Framework (CAF) and bring back the MC output at Fermilab.

CDF Analysis Framework(CAF)  CAF developed as a portal.  A set of daemons (submitter, monitor, mailer) accept requests from the users via kerberized connections.  Requests are converted into commands to the underlying batch system.  A user job consists of several sections. Each section has a wrapper associated with it.  Task of wrapper is to setup security envelop & prepare environment before the actual user code starts.  When the user code finishes, it will also copy whatever is left to a user specified location using rcp. 6/5/2016Manoj K. Jha INFN Bologna5 User Desktop CAF Head Node File FNAL exe tar ball WN using rcp

Present CAF MC Prod. Model: Drawbacks 6/5/2016Manoj K. Jha INFN Bologna6  Presently, CDF is relying on rcp tool for transfer of output from Worker Nodes(WN) to destination file servers. It leads to  WN are sitting idle just because another WN is transferring files to the fileservers.  Loss of output from the WN in most of the cases.  Wastage of CPU and network resources when CDF is utilizing the concept of opportunistic resources.  No mechanism to deal with the sudden arrival of output from WN at file servers. Especially, happens during the conference period.  Overloading of available file servers in most of the cases.  Users have to resubmit the job.  No catalogue of MC produced files on user basis.  CDF needs a robust and reliable framework for data transfer from WN to Storage Element(SE) at Fermilab.

Date Transfer Model  Output from WN will be temporarily collected in the SE closer to WN.  Transfer or wait time for MC output data on WN will be considerably reduced due to large bandwidth between WN and its SE.  Transfer sequentially the collected output to the destination SE.  How to manage the files on SE ?  Use SRM managed SE  How to transfer file b/w SEs ?  Use Sequential Data Access via Meta-Data (SAM) features for transfer of files between its stations.  Model uses the SAM SRM interface for data transfer. 6/5/2016Manoj K. Jha INFN Bologna7

Storage Resource Manager(SRM)  Uniform Grid interface to heterogeneous storage  Negotiate protocols  Eliminate specialized code  Dynamic space allocation and file management on shared storage components on Grid  Interface specification v1 is in field, v2 is latest 6/5/2016Manoj K. Jha INFN Bologna8

SAM Data Management System  Flexible and scalable distributed model  Reliable and Fault Tolerant  Adapters for many batch systems: LSF, PBS, Condor, FBS  Adapters for mass storage systems: Enstore, (HPSS, dCache, and SRM managed SE.  Adapters for Transfer Protocols: cp, rcp, scp, encp, GridFTP.  Useful in many cluster computing environments: Desktop, private network (PN), NFS shared disk,…  Ubiquitous for CDF users 6/5/2016Manoj K. Jha INFN Bologna9  SAM is Sequential data Access via Meta-data  Est SAM Station:  Collection of SAM servers which manage data delivery and caching for a node or cluster.  The node or cluster hardware itself.

Proposed Model 6/5/2016Manoj K. Jha INFN Bologna10 User Desktop CAF Head exe tar ball WN SAM Station SAM Station

6/5/2016Manoj K. Jha INFN Bologna11 Proposed Model User Desktop CAF Head Node exe tar ball WN SAM Station SAM Station SAM Station SAM Station

Test Framework  Hardware:  CAF: CNAF Tier –I  SE: dCache managed SRMs  500 GB of space at Gridka, Karlsruhe, 10 channels per request (closer to WN)  1 TB of space at University of California, San Diego(UCSD), 50 channels per request (destination SE)  SAM station:  station “cdf-cnafTest” at “cdfsam1.cr.cnaf.infn.it” (closer to WN)  station “canto-test” at “cdfsam15.fnal.gov” (destination SE)  Setup:  A cronjob submits a job of variable no. of segments every 10 minutes at CNAF CAF. Maxmium segments per job is 5.  Each segment creates a dummy file of random size which vary between 10 MB to 5 GB.  A cron process running at station “cdf-cnafTest” for creation of datasets from the files in SRM closer to WN (Gridka).  Another cron job running at station “canto-test” for transfer of dataset between two stations. 6/5/2016Manoj K. Jha INFN Bologna12

Test Framework 6/5/2016Manoj K. Jha INFN Bologna13 CNAFCAF Closer to WN Destination SAM Station “cdf-cnafTest” SAM Station “cdf-cantotest””

Distribution of no. of job sections  No. outside the circle represents no. of section being submitted in a single job  No. inside the circle represents total no. of section being submitted for the test framework.  For ex: A job with 1 section had been submitted 56 times, with 2 sections 64 times and so on. 6/5/2016Manoj K. Jha INFN Bologna14 File Type File Size A10 MB < size < 100 MB B100 MB < size < 1 GB C1GB < size < 3 GB D size > 3 GB

Transfer rate from WN to its closer SE 6/5/2016Manoj K. Jha INFN Bologna15 Transfer rate increase with file size for file types A, B & C.

Dataset Type  A CAF job corresponding to JID consists of number of sections.  Each section has its output and log files.  Size of output file varies from few Mega bytes to several Giga bytes while that of log file are of the order of few Kilo bytes.  Model proposes two types of dataset corresponding to a JID.  Output dataset:  Collection of all the output files corresponding to a JID  Log dataset:  Collection of all the log files corresponding to a JID. 6/5/2016Manoj K. Jha INFN Bologna16 Dataset TypeDataset Size A10 MB < size < 1 GB B1GB < size < 3 GB C3 GB < size < 10 GB D size > 10 GB

Output File Dataset Creation Time 6/5/2016Manoj K. Jha INFN Bologna17

Log File Dataset Creation Time 6/5/2016Manoj K. Jha INFN Bologna18

Transfer Rate for Diff. Dataset 6/5/2016Manoj K. Jha INFN Bologna19

Getting files on User’s Desktop  Requirements on user’s desktop:  A SAM station environment with its service certificate installed.  A SRM client  File query:  User’s will know the dataset name in advance corresponding to a JID. Since dataset name will follow a fix syntax.  User’s can query the list of files which matches dataset using the sam command “sam list files --dim=“  File location:  Command “sam locate ” tells the location of file name.  Getting file on desktop:  Command “sam get dataset –defName= --group=test – downloadPlugin=srmcp “  A wrapper script can be written such that above changes will be transparent to users. 6/5/2016Manoj K. Jha INFN Bologna20 Abstract of this work has been accepted to CHEP 09, Prague

Other Activities Computing:  Conducted the simulation workshop for CMS in Feb. 04.  Participants learnt the installation of CMS software and their use in physics analysis  System administrator for University of Delhi, High Energy Phyiscs Group from 2000 – 2006  Member of LHC Physics Center (LPC), Fermilab MC production group  Generated large number of MC samples for validation and physics analysis of new releases of CMS software. Physics Analysis:  CDF:  Search for Quark Compositeness Using Dijets (Abstract accepted to APS, Denver 2009)  Estimation of Heavy Flavor content in Minimum Bias events  CMS:  Proposed the jet clustering algorithm for CMS: CMS AN 2008/02  CMS Sensitivity to Contact Interactions using Dijet: J. Phys. G36:015004, /5/2016Manoj K. Jha INFN Bologna21

Thank You ! 6/5/2016Manoj K. Jha INFN Bologna22

Backup Slides 6/5/2016Manoj K. Jha INFN Bologna23

Computing Model 6/5/2016Manoj K. Jha INFN Bologna24 DISK CACHE Data Robotic tape storage CAF Data user jobs CAF :production/User analysis FNAL hosts data : farm used to analyze them decentralized CAF (dCAF): mostly Monte Carlo production dCAF Data

Condor’s view of Computing job.  We think of job as task with the related parallel tasks (Job sections)  Condor – each job section – independent Job DAGMan – allows CDF Users – Condor to work together 6/5/2016Manoj K. Jha INFN Bologna25 Data Handling Step

Interactive User Job Monitoring  Condor Computing on Demand (COD) is used to allow users to monitor/interact with their job 6/5/2016Manoj K. Jha INFN Bologna26 w/ Condor COD Users can: - look their working directories and files check their running jobs (debug if needed) CDF Code and Condor Code together give the users tools needed to get their jobs efficiently

Overview of Sam 6/5/2016Manoj K. Jha INFN Bologna27 Database Server(s) (Central Database) Name Server Global Resource Manager(s) Log server Station 1 Servers Station 2 Servers Station 3 Servers Station n Servers Mass Storage System(s) Shared Globally Local Shared Locally Arrows indicate Control and data flow

The SAM Station  Responsibilities  Cache Management  Project (Job) Management  Movement of data files to/from MSS or other Stations  Consists of a set of inter-communicating servers:  Station Master Server,  File Storage Server,  Stager(s),  Project Manager(s) 6/5/2016Manoj K. Jha INFN Bologna28

Components of a SAM Station 6/5/2016Manoj K. Jha INFN Bologna29 Station & Cache Manager File Storage Server File Stager(s) Project Managers /Consumers eworkers File Storage Clients MSS or Other Station MSS or Other Station Data flow Control Producers/ Cache Disk Temp Disk