Maria Girone, CERN CMS Status Report Maria Girone, CERN David Lange, LLNL.

Slides:



Advertisements
Similar presentations
Upgrading the CMS simulation and reconstruction David J Lange LLNL April CHEP 2015D. Lange.
Advertisements

DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
Status of CMS Matthew Nguyen Recontres LCG-France December 1 st, 2014 *Mostly based on information from CMS Offline & Computing Week November 3-7.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
ATLAS Data Challenges US ATLAS Physics & Computing ANL October 30th 2001 Gilbert Poulard CERN EP-ATC.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
LHCbComputing LHCC status report. Operations June 2014 to September m Running jobs by activity o Montecarlo simulation continues as main activity.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
OPERATIONS REPORT JUNE – SEPTEMBER 2015 Stefan Roiser CERN.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
Analysis Performance and I/O Optimization Jack Cranshaw, Argonne National Lab October 11, 2011.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Predrag Buncic CERN Plans for Run2 and the ALICE upgrade in Run3 ALICE Tier-1/Tier-2 Workshop February 2015.
DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.
LHCb LHCb GRID SOLUTION TM Recent and planned changes to the LHCb computing model Marco Cattaneo, Philippe Charpentier, Peter Clarke, Stefan Roiser.
A successful public- private partnership Maria Girone CERN openlab CTO.
Ideas and Plans towards CMS Software and Computing for Phase 2 Maria Girone, David Lange for the CMS Offline and Computing group April
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
Operations Coordination Team Maria Girone, CERN IT-ES GDB, 11 July 2012.
DATA ACCESS and DATA MANAGEMENT CHALLENGES in CMS.
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
1. Maria Girone, CERN  Highlights of Run I  Challenges of 2015  Mitigation for Run II  Improvements in Data Management, Organized Processing and Analysis.
Computing Operations Roadmap
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Virtualization and Clouds ATLAS position
A successful public-private partnership
SuperB and its computing requirements
for the Offline and Computing groups
Status and Prospects of The LHC Experiments Computing
LHCb Software & Computing Status
Readiness of ATLAS Computing - A personal view
The ADC Operations Story
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Thoughts on Computing Upgrade Activities
Status of Full Simulation for Muon Trigger at SLHC
ALICE Computing Upgrade Predrag Buncic
Computing Overview Topics here: CSA lessons (briefly) PADA
The latest developments in preparations of the LHC community for the computing challenges of the High Luminosity LHC Dagmar Adamova (NPI AS CR Prague/Rez)
New strategies of the LHC experiments to meet
US ATLAS Physics & Computing
ATLAS DC2 & Continuous production
Presentation transcript:

Maria Girone, CERN CMS Status Report Maria Girone, CERN David Lange, LLNL

Maria Girone, CERN  Resource usage  Preparing for Run2: Computing milestones and CSA14  Software development and production schedule  Resource requests for 2015  Planning for

Maria Girone, CERN  CMS has used the highest percentage of pledged time at Tier-1  Many high priority activities are on-going  processing of simulation for Run1 analysis  production for the CSA14 commissioning challenge  preparation of upgrade samples  Tier-2s have fluctuated around 100% with continued analysis  Run2 simulation will ramp up soon 3

Maria Girone, CERN An intense testing & commissioning program throughout 2014 for validating changes of the CMS Computing model for Run2 Data Management milestone: 30 April 2014  More transparent access to the data for users and production  Disk and tape separated to manage Tier-1 disk resources and control tape access  Data federation and data access (AAA)  Developing Dynamic Data Placement for handling centrally managed disk space Analysis Milestone: 30 June 2014 Demonstrate the full scale of the new CRAB3 distributed analysis tool  Main changes are reducing the job failures in handling of user data products, improved job tracking and automatic resubmission  Organized Production Milestone: 30 November 2014  Exercise the full system for organized production  Utilize the Agile Infrastructure (IT-CC and Wigner) at scale for the Tier-0  Run with multi-core at Tier-0 and Tier-1 for data reconstruction  Demonstrate shared workflows to improve latency and speed 4 Done!

Maria Girone, CERN  Summer exercise (Computing, Software and Analysis challenge CSA14) for physics users and developer community: >150 users involved, started 2 nd week of July, wrapping up now  For Computing and Offline this involved  Commissioning of a new mini-AOD format (nearly a factor of 10 smaller than the Run 1 AOD while targeting 80% of analysis users)  Commissioning of the data federation (AAA). The target is 20% of analysis access served over the wide area  Commissioning of the production workflows (SIM, DIGI, RECO)  Commissioning of the new analysis submission tool (CRAB3)  Production of the samples for the challenge Production and reconstruction of ~2 billion 13TeV events under various pile-up scenarios for 25ns and 50ns running conditions 5  Good opportunity to identify remaining weaknesses of the production system under high-load, which are currently being addressed

Maria Girone, CERN  During CSA14, AAA has been used to access data over the WAN via CRAB3  Target: 60k files/day, O(100TB)/day)  Goal of the data federation scaling tests during the last few weeks of CSA14 has been to exceed 20% of analysis access  Phase 1: USA only (see plot)  Average incoming rate is 30TB/week for the 7 US Tier2 sites (40MB/s)  Phase 2: add Europe – starting now  AAA deployment in Europe started later and we are expecting to have to invest more in validation 6 2 Months 1 Week

Maria Girone, CERN  CRAB3 usage remains a small fraction of the total user submission  CSA14 users were asked to use CRAB3  We further test operations at scale with Hammer Cloud  We are developing a CRAB3 adoption plan before the beginning of Run2  User-driven MiniAOD production was done with CRAB3  Initial feedback from users is VERY positive!  User production was able to keep pace with central MiniAOD production 7

Maria Girone, CERN  CMS has been using the HLT for production activity for all of 2014  Began also using the CERN AI for production and Tier-0  We have demonstrated up to 5k simultaneous jobs  Large scale tests are scheduled for later this fall pending that sufficient AI resources are available  Short full-scale tests need to be early enough to understand and solve problems identified before the run starts  Discussing with IT on how to achieve this goal HI reprocessing for QM2014 HLT hardware unavailable Steady use in production now 8

Maria Girone, CERN We are ready to begin production of Monte Carlo samples for 2015 startup:  All changes to CMS detectors during shutdown reflected in geometry  An improved material model for tracker implemented  The detector simulation is updated to Geant 4.10  Technical improvements over past year have resulted in a factor of 2 speed improvement CSA14 exercise was first large scale test of recent CMSSW developments  Simulation, digitization and reconstruction applications proved to be robust in processing of ~1.5 billion events  Successfully integrated and deployed a new analysis data format: “MiniAOD”  ~10x reduction in size relative to AOD achieved through a number of approaches  Plan to produce the MiniAOD for 2015 in two ways: 1. Together with the AOD format in the reconstruction workflow 2. In a dedicated conversion workflow from the AOD to also include all available physics object recipes needed for analysis 9

Maria Girone, CERN  The CMSSW release for startup reconstruction is on track to be complete in December (“CMSSW_7_3_0”)  We are now finishing an intermediate release of CMSSW that includes the first set of algorithmic changes for 25ns bunch spacing operations (for the “PHYS14” exercise)  Excellent multi-threaded performance of the reconstruction has been achieved over the summer  We continue to support releases for Phase-I and Phase-II upgrade development work in dedicated releases  Even very high pileup workflows run within usual GRID resource constraints  Successful demonstration of recent developments in our MinBias event overlay algorithm and the flexibility of our tracking algorithms  Time/event is a challenge at PU, but our tools have allowed us to keep up with rapid detector and reconstruction algorithm changes  We have started the process of integrating the code developed for Technical Proposal software developments into our mainstream CMSSW releases 10

Maria Girone, CERN Current CMS schedule calls for 2B events prepared in advance for 2 PU scenarios (one for 50 ns and one for 25 ns)  1B Gen-Sim will start soon!  PHYS14 exercise being planned  Prepare high priority analysis for speedy results on first 1-5 fb -1  Run2 production release CMSSW_7_3 by December 11

Maria Girone, CERN 12  CMS plans to switch to multi-core processing for reconstruction by the beginning of Run2  Better I/O behavior, less merging, better memory footprint, fewer processes to track, and generally the direction for the future  Multi-core queues now exist at all the Tier-1 sites CMS has already achieved >99% parallel safe code and has excellent efficiency up to 8 cores – Computing is expected to receive CMSSW_7_2 as a production release in October Details presented at ACAT

Maria Girone, CERN  CMS is currently in the interactions with the Computing Resource Scrutiny group  The request endorsed for 2015 has very little contingency  We are continuing to optimize the Computing system to fit within the resource envelope Setting priorities and making hard choices!  Reduced the number of reprocessing passes  Reduced the ratio of Monte Carlo events to the number of real data events (ratio of 1.3 AND the bulk can be reprocessed only once)  No contingency for mistakes  This interaction with C-RSG is a first look at 2016  Small changes expected in the instantaneous luminosity and a big increase in the live seconds of running expected wrt

Maria Girone, CERN  The largest requested increase is in Tier-2 CPU  The total volume of data to analyze drives the increase  All quantities except the Tier- 0 are predicted to grow faster than technology evolution  Some optimization is hopefully still possible with experience from the first year of Run2 14

Maria Girone, CERN  CMS is facing a huge increase in the scale of the expected computing needed for Run4  The WLCG Computing Model Evolution document predicts 25% processing capacity and 20% storage increase per year  Factor of 8 in processing and 6 in storage between now and Run4  Even assuming a factor of 2 code improvements the deficit is 4-13 in processing and 3-7 in storage PhasePile-UpHLT OutputReconstruction time ratio to Run2 AOD Size ratio to Run2 Total Weighted Average Increase above Run2 Phase I (2019) 501kHz41.43 Phase II (2024) 1405kHz Phase II (2024) kHz

Maria Girone, CERN  It is unlikely we will get a factor of 5 more money, nor will the experiment be willing to take a factor of 5 less data  Big improvements are needed  Roughly 40% of the CMS processing capacity is devoted to task identified as reconstruction  Prompt reconstruction, re-reconstruction, data and simulation reco  Looking at lower cost and lower power massively parallel systems like ARM and high performance processors like GPUs (Both can lower the average cost per processing)  ~20% of the offline computing capacity is in areas identified as selection and reduction  Analysis selection, skimming, production of reduced user formats  Looking at dedicated data reduction solutions like event catalogs and big data tools like Map Reduce  The remaining 40% is a mix  Lot of different activities with no single area to concentrate optimization effort  Simulation already has a strong ongoing optimization effort (in Geant4 and CMS)  User analysis activities developed by many people  Smaller scale calibration and monitoring activities 16

Maria Girone, CERN  CMS decided to merge the Offline and Computing organization over the next year (complete by Sept. 2015)  Does not put an artificial distinction between these closely related activities  Aligns CMS with the other LHC experiments  Makes communication lines clearer from outside  CMS appointed Maria Girone and David Lange to co-lead the new joint project  The sub-projects will evolve in the new structure over the next year  We hope that the enlarged scope will strengthen the joint project and create new opportunities 17

Maria Girone, CERN  CMS is using the long shutdown to improve the efficiency of our software and computing system  Data management and data access improvements are in commissioning. Production improvements will continue through the fall. Analysis improvements are in user testing  We are pushing forward the adoption of CRAB3 and pushing the testing the organized processing environment, including AI, to a reasonable scale  Streamlining software and computing workflows as we expect at best constant level of effort  In addition to trying to handle day-to-day operations and prepare for Run2, we are assembling projects to do the innovative development needed to close the future resource gap in Run4  Offering leading-edge techniques may result in the injection of new people  An excellent area for collaborative work