IFAE Apr CDF Computing Experience - Gabriele Compostella1 IFAE Apr CDF Computing Experience Gabriele Compostella, University.

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

Physics with SAM-Grid Stefan Stonjek University of Oxford 6 th GridPP Meeting 30 th January 2003 Coseners House.
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.
Sep Donatella Lucchesi 1 CDF Status of Computing Donatella Lucchesi INFN and University of Padova.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
A tool to enable CMS Distributed Analysis
JIM Deployment for the CDF Experiment M. Burgon-Lyon 1, A. Baranowski 2, V. Bartsch 3,S. Belforte 4, G. Garzoglio 2, R. Herber 2, R. Illingworth 2, R.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
YuChul Yang kr October 21, 2005The Korean Physical Society The Current Status of CDF Grid 양유철 *, 한대희, 공대정, 김지은, 서준석, 장성현, 조기현, 오영도,
LcgCAF:CDF submission portal to LCG Federica Fanzago for CDF-Italian Computing Group Gabriele Compostella, Francesco Delli Paoli, Donatella Lucchesi, Daniel.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
YuChul Yang Oct KPS 2006 가을 EXCO, 대구 The Current Status of KorCAF and CDF Grid 양유철, 장성현, 미안 사비르 아메드, 칸 아딜, 모하메드 아즈말, 공대정, 김지은,
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
Interactive Job Monitor: CafMon kill CafMon tail CafMon dir CafMon log CafMon top CafMon ps LcgCAF: CDF submission portal to LCG resources Francesco Delli.
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
CDF Grid at KISTI 정민호, 조기현 *, 김현우, 김동희 1, 양유철 1, 서준석 1, 공대정 1, 김지은 1, 장성현 1, 칸 아딜 1, 김수봉 2, 이재승 2, 이영장 2, 문창성 2, 정지은 2, 유인태 3, 임 규빈 3, 주경광 4, 김현수 5, 오영도.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
Movement of MC Data using SAM & SRM Interface Manoj K. Jha (INFN- Bologna) Gabriele Compostella, Antonio Cuomo, Donatella Lucchesi,, Simone Pagan (INFN.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Remote Control Room and SAM DH Shifts at KISTI for CDF Experiment 김현우, 조기현, 정민호 (KISTI), 김동희, 양유철, 서준석, 공대정, 김지은, 장성현, 칸 아딜 ( 경북대 ), 김수봉, 이재승, 이영장, 문창성,
A New CDF Model for Data Movement Based on SRM Manoj K. Jha INFN- Bologna 27 th Feb., 2009 University of Birmingham, U.K.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
Outline: Tasks and Goals The analysis (physics) Resources Needed (Tier1) A. Sidoti INFN Pisa.
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
May Donatella Lucchesi 1 CDF Status of Computing Donatella Lucchesi INFN and University of Padova.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
Dan Bradley Condor Project CS and Physics Departments University of Wisconsin-Madison CCB The Condor Connection Broker.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Integration of Physics Computing on GRID S. Hou, T.L. Hsieh, P.K. Teng Academia Sinica 04 March,
July 26, 2007Parag Mhashilkar, Fermilab1 DZero On OSG: Site And Application Validation Parag Mhashilkar, Fermi National Accelerator Laboratory.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
1 P. Murat, Mini-review of the CDF Computing Plan 2006, 2005/10/18 An Update to the CDF Offline Plan and FY2006 Budget ● Outline: – CDF computing model.
CDF Monte Carlo Production on LCG GRID via LcgCAF Authors: Gabriele Compostella Donatella Lucchesi Simone Pagan Griso Igor SFiligoi 3 rd IEEE International.
OSG Consortium Meeting - March 6th 2007Evaluation of WMS for OSG - by I. Sfiligoi1 OSG Consortium Meeting Evaluation of Workload Management Systems for.
European Condor Week CDF experience with Condor glide-ins and GCB - Igor Sfiligoi1 European Condor Week 2006 Using Condor Glide-Ins and GCB to run.
Condor Week 2006, University of Wisconsin 1 Matthew Norman Using Condor Glide-ins and GCB to run in a grid environment Elliot Lipeles, Matthew Norman,
Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
Real Time Fake Analysis at PIC
LcgCAF:CDF submission portal to LCG
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Overview of the Belle II computing
Belle II Physics Analysis Center at TIFR
Workload Management System
Evolution of the distributed computing model The case of CMS
Artem Trunov and EKP team EPK – Uni Karlsruhe
Building Grids with Condor
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
DØ MC and Data Processing on the Grid
Data Processing for CDF Computing
Presentation transcript:

IFAE Apr CDF Computing Experience - Gabriele Compostella1 IFAE Apr CDF Computing Experience Gabriele Compostella, University of Trento and INFN on behalf of CDF computing Group

IFAE Apr CDF Computing Experience - Gabriele Compostella2 Batavia, IL, USA Fermilab Accelerator Complex

IFAE Apr CDF Computing Experience - Gabriele Compostella3 L peak : 2.3x10 32 s -1 cm bunches, 396 ns crossing time 30 mA/hr 25 mA/hr 20 mA/hr 15 mA/hr Integrated Luminosity (fb -1 ) We are here today By FY07: Double data up to FY06 By FY09: Double data up to FY07 CDF Experiment Luminosity Collected Luminosity /30/03 9/30/04 9/30/05 9/30/06 9/30/07 9/30/08 9/30/09

IFAE Apr CDF Computing Experience - Gabriele Compostella4 ~ 2 ~ 5.5 ~ 9 ~ 20 ~ 23 CDF Experiment Data Volumes Estimated CPU needs Collected Data

IFAE Apr CDF Computing Experience - Gabriele Compostella5 CDF Data ProductionCDF Tape Robot Disk Cache SAM Raw Data L3 Trigger SAM Production Data Production Farm

IFAE Apr CDF Computing Experience - Gabriele Compostella6 SAM ● Data Handling model relies on SAM (Sequential data Access via Metadata), a centralized system that manages all official data ● File Catalog for data on tape and on disk: – Files based, with metadata attached – Files are organized in datasets, users can create their own ● Manage File Transfers: tape-to-cache, cache-to-disk, disk-to-tape ● For jobs accessing data: – copy necessary files from the closest location to local cache – provide files in the optimal order – keep track of processed and unprocessed files – Failed sections can be automatically recovered ● Each site serving data needs to have a SAM Station SAM

IFAE Apr CDF Computing Experience - Gabriele Compostella7 Users Analysis and MC Production ● CAF Central Analysis Facility: mostly User analysis FNAL hosts data -> farm used to analyze them ● Decentralized CAF (dCAF): mostly Monte Carlo production Remote sites produce Monte Carlo, some (ex. CNAF) have also data Produced files are then uploaded to SAM ● CAF and dCAF are CDF dedicated farms

IFAE Apr CDF Computing Experience - Gabriele Compostella8 CAF Model Develop, debug and submit from personal PC Output to any place CAF No need to stay connected Submit and forget until receiving a mail CDF user must have access to CAF clients and specific CDF certificate Developed by: H. Shih-Chieh, E. Lipeles, M. Neubauer, I. Sfiligoi, F. Wuerthwein

IFAE Apr CDF Computing Experience - Gabriele Compostella9 The CAF Headnode Batch System CAF (Portal) Worker Node(WN) Startd Collector Schedd Submitter Daemons Negotiator Legacy protocol Monitoring Daemons... Mailer Daemon Web Monitor Interactive Monitor

IFAE Apr CDF Computing Experience - Gabriele Compostella10 Web based Monitor: information on running jobs for all users and jobs/users history Interactive job Monitor: Available commands: CafMon list CafMon kill CafMon dir CafMon tail CafMon ps CafMon top Job Monitoring

IFAE Apr CDF Computing Experience - Gabriele Compostella11 CDF batch computing in numbers Up to 4000 batch slots(last 3 months) Almost 800 registered users, Approx. 100 active at any given time

IFAE Apr CDF Computing Experience - Gabriele Compostella12 Moving to the Grid “The Grid” ● N eed to expand resources, luminosity expect to increase by factor ~4 until CDF stops taking data. ● Resources were in dedicated pools: limited expansion and need to be maintained by CDF personnel ● GRID resources can be fully exploited before LHC experiments will start to analyze data ● As a first step, we can move MC jobs to the Grid (no data access...) ● Keep on using CAF (and CNAF) for data analysis. ● CDF developed 2 portals to the GRID: ● NamCAF for OSG and LcgCAF for LCG dCAFs

IFAE Apr CDF Computing Experience - Gabriele Compostella13 NamCAF Condor Glidein based CAF More info at:

IFAE Apr CDF Computing Experience - Gabriele Compostella14 All jobs done CDF use of Condor glide-ins Grid Pool Globus Submitter Daemons Grid Pool Globus Batch queue CAF (Portal) User jobs Schedd Batch queue Collector/Neg Schedd Glidekeeper Monitor Glideins Batch Slot Glidein User Job By I. Sfiligoi, S. Sarkar

IFAE Apr CDF Computing Experience - Gabriele Compostella15 CDF usage of OSG resources per site Last year's usage of different OSG Sites accessed trough NamCAF ●...Up to 1500 batch slots ● Self contained tarballs ● No specific support on Grid sites

IFAE Apr CDF Computing Experience - Gabriele Compostella16 CDF usage of OSG resources

IFAE Apr CDF Computing Experience - Gabriele Compostella17 LcgCAF CDF Submission Portal to LCG More info at:

IFAE Apr CDF Computing Experience - Gabriele Compostella18 CDF-SE … Robotic tape Storage GRID site User Desktop job CDF user must have access to CAF clients and Specific CDF certificate output … job … GRID site LcgCAF: General Architecture Monitor Submitter Mailer cdf-ui ~ cdf-head WMS job CafExe Developed by: F.Delli Paoli, D.Jeans, D.Lucchesi, S.Sarkar, I.Sfiligoi

IFAE Apr CDF Computing Experience - Gabriele Compostella19 Lcg site: CNAF Lcg site: Lyon … … Proxy Cache /home/cdfsoft CNAF server CDF Code Distribution In order to run, a CDF MC job needs available to the WN: CDF code distribution: Parrot is used as virtual file system for CDF code To have good performances with Parrot, a Squid web proxy cache is also needed. Run Condition DB: FroNTier is used for DB access (...in NamCAF too) Job monitor (implementation different from CAF, same functionalities)

IFAE Apr CDF Computing Experience - Gabriele Compostella20 LcgCAF opened to all users and officially supported by CDF since November Currently an average of 10 users running, peaks of more than 2500 running sections. LcgCAF Monitor: last 4 months LcgCAF Usage Sites accessed trough LcgCAF

IFAE Apr CDF Computing Experience - Gabriele Compostella21 CDF Usage of LCG resources (European sites not included in this plot)

IFAE Apr CDF Computing Experience - Gabriele Compostella22 Conclusions and Future developments...GRID can serve also a running experiment! CDF proved to have an adaptable, expandable and succesfull computing model Great effort has been put on exploiting GRID, using oportunistic resources, and MC production is completely moving towards distributed systems Future Developments: ● Plan to add more resources into NamCAF and LcgCAF ● Plan to implement a mechanism of policy management for LcgCAF on top of GRID policy tools ● Explore the possibility of running on data moving jobs towards sites that can serve the desired files ● SAM-SRM interface under development to allow: ● output file declaration into catalogue directly from LcgCAF ● automatic transfer of output files from local CDF storage to FNAL tape

IFAE Apr CDF Computing Experience - Gabriele Compostella23 Backup

IFAE Apr CDF Computing Experience - Gabriele Compostella24 Condor Binaries Transfer GlideCAF (Portal) Collector/Neg Main Schedd Glidekeeper Glide-in Schedd HTTPd (1) (2) (3) (4) Submitter Daemons glidein_startup.sh validate_node() env http_proxy=proxy1.fnal.gov wget sha1sum knownSHA1 condor.tgz if [ $? -eq 0 ]; then tar -xzf condor.tgz local_config()./condor_master -r $retmins -dyn -f fi HTTP Proxy (once) Pull model

IFAE Apr CDF Computing Experience - Gabriele Compostella25 Working with remote Grid sites Collector Main Schedd Available Batch Slot Startd Globus WAN TCP Available Batch Slot Startd Globus Glide-in Schedd Available Batch Slot Startd Globus UDP User Job UDP is fast TCP traffic is reliable Firewall/NAT Use GCB to bridge firewalls GCB Outgoing TCP User Job

IFAE Apr CDF Computing Experience - Gabriele Compostella26 How GCB works Firewall/NAT Collector Schedd Grid Pool GCB Broker Startd Globus 4) Schedd address forwarded 3) Send own address 6) A job can be sent to the startd GCB must be on a public network TCP 2) Advertise itself and the GCB connection 5) Establish a TCP connection to the schedd Just a proxy User Job 1) Establish a persistent TCP connection to a GCB TCP

IFAE Apr CDF Computing Experience - Gabriele Compostella27 Security problem at a glance Submitter Daemons Grid Pool Globus CAF (Portal) User jobs Schedd Batch queue Collector/Neg Schedd Glidekeeper Monitor Glideins Batch Slot Glidein User Job Batch Slot Glidein User Job Node uidX ProxyA uidX ProxyB ProxyX

IFAE Apr CDF Computing Experience - Gabriele Compostella28 run this batch job User’ s App Process Interface (main, exit, abort, kill, sleep) Local Operating System CondorPBSNQELSF Load Leveler (open, close, read, write, lseek) I/O Interface Local Operating System ChirpFTPHttpRFIODCAP Storage Server access data Parrot User’ s App Parrot Design

IFAE Apr CDF Computing Experience - Gabriele Compostella29 Parrot Performances with Http in a CDF-like job

IFAE Apr CDF Computing Experience - Gabriele Compostella30 … CDF-SE Job output Worker Nodes … CDF-Storage Gridftp KRB5 rcp LcgCAF Output Storage ● User job output is transferred to CDF Storage Element (SE) via gridftp or to a CDF-Storage location via rcp. ● From CDF-SE or CDF-storage output copied using gridftp to FNAL fileservers. ● Automatic procedure copies files from fileservers to tape and declares them to the CDF catalogue. ● Planning to use a SRM-SE interface for big sites.

IFAE Apr CDF Computing Experience - Gabriele Compostella31 ~94%  ~100% ~6,2 · 10 6 B s ->D s  D s ->, D s -> K*K, D s ->K s K  with 1 Recovery Events on tapeSamples CDF B Monte Carlo production by Power Users in July: ~5000 jobs in ~1 month Possible Failures: ● GRID: site miss-configuration, temporary authentication problems, WMS overload, LB “loose” informations ● LcgCAF: Parrot/Squid cache stacked (Parrot cache coherency problems are under investigation) Automatic Retry mechanism: ● GRID failures: WMS retry shallow retry ● LcgCAF failures: “home-made” retry (log files parsing) – only for certain type of jobs LcgCAF Performances