CMS — Service Challenge 3 Requirements and Objectives

Slides:



Advertisements
Similar presentations
Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.
Advertisements

LCG Tiziana Ferrari - SC3: INFN installation status report 1 Service Challenge Phase 3: Status report Tiziana Ferrari on behalf of the INFN SC team INFN.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
EGEE is a project funded by the European Union under contract IST VO box: Experiment requirements and LCG prototype Operations.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
The CMS Beijing Tier 2: Status and Application Xiaomei Zhang CMS IHEP Group Meeting December 28, 2007.
Baseline Services Group Status of File Transfer Service discussions Storage Management Workshop 6 th April 2005 Ian Bird IT/GD.
Daniele Bonacorsi Andrea Sciabà
WLCG IPv6 deployment strategy
Grid Computing: Running your Jobs around the World
Regional Operations Centres Core infrastructure Centres
ALICE and LCG Stefano Bagnasco I.N.F.N. Torino
Real Time Fake Analysis at PIC
Ian Bird WLCG Workshop San Francisco, 8th October 2016
“A Data Movement Service for the LHC”
LCG Service Challenge: Planning and Milestones
ATLAS Use and Experience of FTS
Overview of the Belle II computing
U.S. ATLAS Grid Production Experience
Service Challenge 3 CERN
3D Application Tests Application test proposals
Data Challenge with the Grid in ATLAS
INFN-GRID Workshop Bari, October, 26, 2004
Database Readiness Workshop Intro & Goals
CMS transferts massif Artem Trunov.
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
Olof Bärring LCG-LHCC Review, 22nd September 2008
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Simulation use cases for T2 in ALICE
Data Management cluster summary
R. Graciani for LHCb Mumbay, Feb 2006
LHC Data Analysis using a worldwide computing grid
ATLAS DC2 & Continuous production
The LHCb Computing Data Challenge DC06
Presentation transcript:

CMS — Service Challenge 3 Requirements and Objectives May 26, 2005 CMS — Service Challenge 3 Requirements and Objectives Lassi A. Tuura Northeastern University May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

CMS Service Challenge Goals May 26, 2005 CMS Service Challenge Goals An integration test for next production system Main output for SC3 data transfer and data serving infrastructure known to work for realistic use Including testing the workload management components: the resource broker and computing elements Bulk data processing mode of operation Success is mandatory for preparation to SC4 and onward Main focus on alternatives with reasonable expectation of success We cannot afford total failure on any significant component, would be difficult to change and keep up with SC4 / CMS DC06 May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

CMS Service Challenge Goals Explained May 26, 2005 CMS Service Challenge Goals Explained An integration test for next production system Full experiment software stack – not a middleware test “Stack” = s/w required by transfers, data serving, processing jobs Checklist on readiness for integration test Complexity and functionality tests already carried out, no glaring bugs Ready for system test with other systems, throughput objectives (Integration test cycles of ~three months – two during SC3) Becomes next production service if/when tests pass Main output: data transfer and data serving infrastructure known to work for realistic use cases Using realistic storage systems, files, transfer tools, … Prefer you to use standard services (SRM, …), but given a choice between system not reasonably ready for 24/7 deployment and reliable more basic system, CMS prefers success with the old system to failure with the new one Due to limited CMS resources, please confirm and coordinate with us your infrastructure so we can reach the objectives without excessive risk Complexity and functionality tests -- components should be known to perform at scale before the integration test May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

May 26, 2005 Qualitative CMS Goals https://uimon.cern.ch/twiki/bin/view/CMS/SWIntegration Overview of service exercise Structured data flow executing CMS computing model Simultaneous data import, export and analysis Job throughput at Tier 2 sites Concretely Data produced centrally and distributed to Tier 1 centres (MSS) Strip jobs at Tier 1 produce analysis datasets (probably “fake” jobs) Approximately 1/10th of original data, also stored in MSS Analysis datasets shipped to Tier 2 sites, published locally May involve access from MSS at Tier 1 Tier 2 sites produce MC data, ship to Tier 1 MSS (probably “fake” jobs) May not be the local Tier 1 Transfers between Tier 1 sites Analysis datasets, 2nd replica of raw for failover simulation Implied: software installation, job submission, harvesting, monitoring Details will be confirmed by June 13-15 SC3 meeting at CERN May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

Quantitative CMS Goals (I) May 26, 2005 Quantitative CMS Goals (I) Throughput phase: see rates in Jamie’s presentations Service phase (to be confirmed in June workshop) Constraints T1 strip jobs able to execute in sync with inbound transfers T1 able to keep up tape migration with strip jobs and inbound transfers (from T0 and T2s) T2 to T1 outbound transfers able to keep up with local production jobs, inbound transfers able to keep up with T1 strip jobs Sustained rates over at least two periods of one full week 10% <20 minutes from T0 availability to T1 strip job submission 33% <30 minutes from T0 availability to T1 strip job submission X TB/day aggregate data transfer rate per T1 X MB/s data served to local jobs concurrently with transfers + migration to tape (at Tier-1 sites) X jobs / day at T1, T2 May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

Quantitative CMS Goals (II) May 26, 2005 Quantitative CMS Goals (II) Total data capacity for throughput and service phases per month 50 TB from CERN to at least two Tier 1 sites ~10 TB from CERN to other Tier 1 sites ~5 TB to each Tier 2 5-10 TB T1/T1 analysis dataset transfers 50 TB T1/T1 2nd raw replica transfers (for simulating Tier 1 failover) Data from both throughput and service phase can be discarded after a while Data for service phase may need to be kept for a while (weeks) Data for throughput phase can be recycled after a day or so Most likely no need for CPU capacity dedicated to the service phase Submitting jobs to normal worker nodes, expect access to SC storage Reasonable capacity available for two or three periods of a week at a time When integration tests have passed, services can go into production Resources expected to remain for testbed environment Details in the June workshop May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

CMS SC3 Schedule July: throughput phase May 26, 2005 CMS SC3 Schedule July: throughput phase Optional leading site-only tuning phase, may use middleware only T0/T1/T2 simultaneous import/export using CMS data placement and transfer system (PhEDEx) to coordinate the transfers Overlaps setup phase for other components on testbed; will not distract transfers – setting up e.g. software installation, job submission etc. September: service phase 1 — modest throughput Seed transfers to get initial data to the sites Demonstrate bulk data processing, simulation at T1, T2s Requires software, job submission, output harvesting, monitoring, … Not everything everywhere, something reasonable at each site November: service phase 2 — modest throughput Phase 1 + continuous data movement Any improvements to CMS production (as in MC production) system Already in September if available then May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

SC3 Services In Test Services for all sites (I) May 26, 2005 SC3 Services In Test Services for all sites (I) Data storage dCache, Castor or other (xrootd, gpfs, …) SRM interface highly desirable, but not mandatory if unrealistic Data transfer PhEDEx + normally SRM, can be + GridFTP – see Daniele’s presentation CMS will test FTS from November with other experiments (ATLAS, LHCb) File catalogue The “safe” choice is POOL MySQL catalogue Big question will catalogue scale for worker node jobs Currently using XML catalogues from worker nodes LCG favours LFC, but first step to CMS validation only now (today!) LFC exists, but no POOL version that can use it, and thus no CMS software Existing CMS software to date will not be able to use LFC US-CMS will test Globus RLS instead of LFC / MySQL on some sites Same caveats as with LFC Not planning to test EGEE Fireman yet May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

SC3 Services In Test Services for all sites (II) May 26, 2005 SC3 Services In Test Services for all sites (II) Software packaging, installation, publishing into information system Either central automated installation, or using local service So far, central automated is not really very automated… Computing element and worker nodes In particular, how the CE obtains jobs (RB, direct submission?) Interoperability between different grid variants Job submission Including head node / UI for submitting Job output harvesting CMS agents, often configured with PhEDEx (These services require solutions for all grid variants) May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

SC3 Services In Test Services for some sites May 26, 2005 SC3 Services In Test Services for some sites PubDB / DLS Backend MySQL database + web server interface for PubDB Job monitoring and logging BOSS + MySQL database + local agents MC Production tools and infrastructure McRunjob, pile-up etc. File merging We have to get somewhere with this task item Probably agents running at the site producing data (These will evolve and be replaced with middleware improvements) May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

Support servers (I) Server-type systems required at each site May 26, 2005 Support servers (I) Server-type systems required at each site UI / head node for job submission (public login) Storage space for CMS software installation (single root for all) “Small databases” server for CMS services (see below, MySQL) File catalogue database server (presumably MySQL on most sites) Gateway-type server for PubDB, PhEDEx, job output harvesting PubDB needs web server, PhEDEx local disk (~20 GB sufficient) Typically installed as UI, but not public login (CMS admins only) For SC3, one machine to run all agents is enough For SC3, requires outbound access, plus access to local resources PubDB requires inbound HTTP access, can install under any web server The agents do not require substantial CPU power or network bandwidth, “typical” recent box with local disk and “typical” local network bandwidth should be enough (CERN gateway dual 2.4GHz PIV, 2 GB memory – plenty) May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

Support servers (II) Optional gateway services at some sites May 26, 2005 Support servers (II) Optional gateway services at some sites BOSS job monitoring and logging Local MySQL / SQLite backend per user on UI (MySQL can be shared) Optional real-time monitoring database – to be discussed BOSS itself does not require gateway server, only databases File merging Service + operation of CMS services by CMS people at the site May have help from CMS people at your Tier 1, ask May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

May 26, 2005 Some Observations Give yourself enough time to put services into production Our experience is that it takes months to bring a site up Reserve enough time (read: months) to debug completely new systems before expecting great results You are expected to support what you put into production Don’t plan for heroic one-time effort for throughput phase, you will kill yourself in the service phase Choose a services suite that is ready for integration test We need at least a month after large-scale functionality milestone for deployment into the experiment integration test For throughput test, everything fully debugged by end of June Decision to pick fallbacks latest by mid-June (SC3 workshop?) Seek to “Evaluate what works, not find out what doesn’t” May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

Summary Integration test for the next production service May 26, 2005 Summary Integration test for the next production service Testing many new components ready for the step Choose new components and fallbacks wisely Aimed for data transfer and data serving infrastructure CMS welcomes many new sites to join! Opportunity for significant increase the infrastructure available for physicists in painless manner and readiness towards LHC startup! Make sure to leverage the knowledge of your local gurus! Like this workshop :-) May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari

May 26, 2005 Contact Information CMS Wiki for software and computing integration https://uimon.cern.ch/twiki/bin/view/CMS/SWIntegration Overall service challenge coordination Jamie Shiers <jamie.shiers@cern.ch> CMS computing coordination Lothar Bauerdick <bauerdick@fnal.gov> David Stickland <david.stickland@cern.ch> CMS contact for service challenges Lassi A. Tuura <lassi.tuura@cern.ch> INFN “local guru” on CMS data transfers Daniele Bonacorsi <daniele.bonacorsi@bo.infn.it> May 26, 2005 Italian T2 Meeting, Bari Italian T2 Meeting, Bari