US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/2008 1 FDR - Results so far, schedule and challenges Jim Cochran Iowa State University What.

Slides:



Advertisements
Similar presentations
Sander Klous on behalf of the ATLAS Collaboration Real-Time May /5/20101.
Advertisements

Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
SLUO LHC Workshop, SLACJuly 16-17, Analysis Model, Resources, and Commissioning J. Cochran, ISU Caveat: for the purpose of estimating the needed.
Ganga in the ATLAS Full Dress Rehearsal Birmingham 4th June 2008 Karl Harrison University of Birmingham - ATLAS Full Dress Rehearsal (FDR) involves loading.
CMS Alignment and Calibration Yuriy Pakhotin on behalf of CMS Collaboration.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
L3 Filtering: status and plans D  Computing Review Meeting: 9 th May 2002 Terry Wyatt, on behalf of the L3 Algorithms group. For more details of current.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
Trigger Software Validation Olga Igonkina (U.Oregon), Ricardo Gonçalo (RHUL) TAPM Open Meeting – Mar. 1, 2007 Outline: Status of infrastructure Schedule.
News from the Beatenberg Trigger Workshop Ricardo Gonçalo - 18 February 2009 Higgs WG Meeting during ATLAS Week.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.
Alignment Strategy for ATLAS: Detector Description and Database Issues
LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.
CCRC08-1 report WLCG Workshop, April KorsBos, ATLAS/NIKHEF/CERN.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
2nd September Richard Hawkings / Paul Laycock Conditions data handling in FDR2c  Tag hierarchies set up (largely by Paul) and communicated in advance.
ATLAS Liquid Argon Calorimeter Monitoring & Data Quality Jessica Levêque Centre de Physique des Particules de Marseille ATLAS Liquid Argon Calorimeter.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
US ATLAS T2/T3 Workshop at UTANovember 11, Profiling Analysis at Startup Jim Cochran Iowa State Outline: - Computing model status - Expected Early.
OFFLINE TRIGGER MONITORING TDAQ Training 5 th November 2010 Ricardo Gonçalo On behalf of the Trigger Offline Monitoring Experts team.
Meeting, 5/12/06 CMS T1/T2 Estimates à CMS perspective: n Part of a wider process of resource estimation n Top-down Computing.
DPDs and Trigger Plans for Derived Physics Data Follow up and trigger specific issues Ricardo Gonçalo and Fabrizio Salvatore RHUL.
1 “Steering the ATLAS High Level Trigger” COMUNE, G. (Michigan State University ) GEORGE, S. (Royal Holloway, University of London) HALLER, J. (CERN) MORETTINI,
4 th Workshop on ALICE Installation and Commissioning January 16 th & 17 th, CERN Muon Tracking (MUON_TRK, MCH, MTRK) Conclusion of the first ALICE COSMIC.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
Latest News & Other Issues Ricardo Goncalo (LIP), David Miller (Chicago) Jet Trigger Signature Group Meeting 9/2/2015.
Moritz Backes, Clemencia Mora-Herrera Département de Physique Nucléaire et Corpusculaire, Université de Genève ATLAS Reconstruction Meeting 8 June 2010.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
The ATLAS TAGs Database - Experiences and further developments Elisabeth Vinek, CERN & University of Vienna on behalf of the TAGs developers group.
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
David Stickland CMS Core Software and Computing
Performance DPDs and trigger commissioning Preparing input to DPD task force.
OPERATIONS REPORT JUNE – SEPTEMBER 2015 Stefan Roiser CERN.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Trigger Input to First-Year Analysis Model Working Group And some soul searching… Trigger Open Meeting – 29 July 2009.
The Worldwide LHC Computing Grid Introduction & Housekeeping Collaboration Workshop, Jan 2007.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
L1Calo EM Efficiencies Hardeep Bansil University of Birmingham L1Calo Joint Meeting, Stockholm 29/06/2011.
The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.
Commissioning and run coordination CMS week Commissioning plenary 28-February 2007 DQM and monitoring workshop MandateGoals.
ID Week 13 th of October 2014 Per Johansson Sheffield University.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
Using direct photons for L1Calo monitoring + looking at data09 Hardeep Bansil University of Birmingham Birmingham ATLAS Weekly Meeting February 18, 2010.
Initial Planning towards The Full Dress Rehearsal Michael Ernst.
Oct 16, 2009T.Kurca Grilles France1 CMS Data Distribution Tibor Kurča Institut de Physique Nucléaire de Lyon Journées “Grilles France” October 16, 2009.
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
ADC Operations Shifts J. Yu Guido Negri, Alexey Sedov, Armen Vartapetian and Alden Stradling coordination, ADCoS coordination and DAST coordination.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
Introduction 08/11/2007 Higgs WG – Trigger meeting Ricardo Gonçalo, RHUL.
David Lange Lawrence Livermore National Laboratory
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
LHCb LHCb GRID SOLUTION TM Recent and planned changes to the LHCb computing model Marco Cattaneo, Philippe Charpentier, Peter Clarke, Stefan Roiser.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Computing Operations Roadmap
Data Challenge with the Grid in ATLAS
Readiness of ATLAS Computing - A personal view
ALICE Computing Upgrade Predrag Buncic
ATLAS DC2 & Continuous production
The ATLAS Computing Model
Presentation transcript:

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ FDR - Results so far, schedule and challenges Jim Cochran Iowa State University What is the FDR ? Event mixing – a moving target Current Schedule FDR-1 operation: the week of February 4 (Tier 0) Data distribution Preparation for US analysis effort Early user feedback Lessons Learned (so far) Outline Plans for FDR-2 Much (most ?) material stolen from talks by Michael Wilson, Ian Hinchcliffe, Dave Charlton, Alexei Klimentov, Kors Bos, …

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ What is the FDR ? The Full-Dress Rehearsal (FDR) is an attempt to test the complete chain (or as much of it as we can actually test without real collisions) There are many steps between DAQ output and final plots The individual steps have been tested, the full chain has not … as we’ll see, FDR is not a totally realistic test Basic idea: feed (appropriately mixed*) raw events into the system at DAQ output Treat, as much as possible, as if real data (calibration, streaming, reco, …) To happen in two parts Instantaneous luminosity (cm -1 s -1 ) Integrated luminosity (pb -1 ) FDR-1~ FDR-2 ~ details later 13.3

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ more specifically … mixed data in bytestream format is copied to and streamed from SFO What’s an SFO ? Receives event data from Event Filter nodes and writes raw data files SFOs are the output of TDAQ and reside at Point1 on the surface Data is copied from SFO disks to T0 SFOs have a 24hr buffer and a data rate up to 600 MB/s will be 5-10 SFO PCs Insert mixed (bytestream) data here T0

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Motivation If the LHC turned on tomorrow, what would happen to the recorded data? – We know we are not ready—what don’t we know? – Realtime tests of hardware, software, databases, storage media, networks, data flow, data integrity and quality checks, calibrations, etc. – What are the flaws in our processing models? – Are we able to identify and correct routine running problems? Where are resources needed? – Tests of distributed computing and analysis machinery (T0  T1  T2  T3) (especially should be tested under heavy user load) SFOTier0Tier1 Metadata Trigger config. Lumi. info AOD, ESD, TAG prod DQ monitoring Calibrations (ongoing) MC prod. Import Archiving and ExportDPD prod. Reprocessing known unknowns vs unknown unknowns …

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Main FDR steps Sample preparation Calibration stream preparation SFO preparation FDR run - SFO to Tier-0 part - Tier-0 operations - Data quality, calibration and alignment operations - Data transfers - Tier-1 operations - Tier-2 operations After main "FDR run" - Perform distributed analysis, use TAGs, produce DPDs, etc. - Re-reconstruct from BS after a time - Remake DPDs as needed for analysis

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Schedule Sample preparation1/3/08 (actual 1/18/08) Calibration stream preparation SFO preparation FDR runWeek of Feb 4 After main "FDR run"Feb 11 and beyond (re)processing at T1~March 17 ? FDR-1 Simulationalready started Digitizationstarts April 1 90% of RDO at BNLApril 20 remaining 10% of RDO at BNLApril 30 Mixing complete at BNLMay 14 data on SFOMay 19 FDR-2 done at T2s details later Original estimate for mixing was 30 days

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Sample Preparation: Details Make use of previously generated events where possible, else generate Mix events randomly to get correct physics mixture as expected at HLT output - Fakes will be discussed later Event mixing also runs the trigger simulation, only events passing all levels of triggering are kept Trigger information is written into event, for later use in analysis Events are written into physics streams (+ Express & Calibration) Format of written events is bytestream (BS=RAW) Files for most streams respect luminosity block (LB) boundaries - not express stream Files for physics stream therefore are written per stream, per LB, per SFO O(10 7 ) events for each of FDR-1 and FDR-2 All MC-truth information is lost by this processing

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Event Mixing: FDR-1 An evolving compromise between realistic simulation and practical limitations – Goal: include all significant standard-model processes in known proportions (“mixing”) with background events passing trigger chains (“fakes”) - 10 one-hour runs at ~10 31 cm -1 s -1 (lumi. varying across run), 1 one-hour run at ~10 32 cm -1 s -1 (constant lumi.) - Produce files in bytestream format, split across streams, SFOs, and luminosity blocks (2 minutes/LB) – Actually achieved by Thurs Feb 7: - 8 runs at ~10 31 no fakes; 2 runs at ~10 31 with some e/γ fakes – 1 run at ~10 32 finished late, was sent to Tier1 asynchronously

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Event Mixing: Issues At we have an event rate from minbias of 70 kHz and a trigger output of 200 Hz The only way to make 10 hr of unbiased data is to start with 2.5B events! But, we only need realistic event rates after the EventFilter: 8M events in total This low rate is possible only because of trigger rejection and prescales need to prefilter and make samples with threshholds matched to triggers not trivial to get right! Mixing logistics: Data was not available until Jan 18 (Jan 3 was original plan) All data prestaged to CASTOR disk – output copied back to CASTOR Could not get all the data on the compute nodes fast enough CPU time to process events too long In order to get data in time for T0 had to change strategy: Reduce fake samples drastically

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Lessons from FDR-1 sample preparation Fake samples must be matched in detail to trigger menu – Fakes need to be prefiltered using same menu as will be used for mixing Tier0 is not appropriate for mixing (as noted will use BNL for FDR-2) A number of lessons about computing infrastructure: – Fast turn-around on MC production not yet reliable – Data distribution is labor intensive – Software fixes have taken time Reallocating Tier0 resources (computers, people) affects operations – Tier0 managers need time to prepare and maintain infrastructure – Disk space and CPU capacities are large but limited

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ FDR-1 Operation: Trigger menu The trigger menu for L = selects mostly fakes – The rate of interesting events selected at this lumi. is much lower than the nominal 200 Hz – Thus, because 8/10 low-lumi. runs do not include enhanced-fake (e.g., EM-like jets) samples, the overall rate is closer to 10 Hz Achieved rate is ~50 Hz for runs with enhanced-fake samples – Conclusions This is a menu for first data and detector commissioning Not optimal for the final FDR-1 mixed sample

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ FDR-1 data playing Data played from 5 SFOs during 5-8 Feb (Tues-Fri) – Same events replayed every day with different run numbers – Reconstruction with different AtlasPoint1 patch every day (for monitoring updates and fixes) - Flexible: allowed us to run anything at all - Dangerous: not reproducible; software not properly validated - Will build AtlasPoint for Tier1 reconstruction Exported streamsNon-exported streams - Express - ID tracks for alignment - Muon and b physics (“Muon”) - Jets, tau, and missing E T (“Jet”) - Electron and photon (“Egamma”) - Minimum bias (“Minbias”)

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Express stream First use of express stream was successful – Primary purposes: data quality and calibrations – Sufficient to detect problems and details, allowing timely validation of data (if DQ is operational) – Monitoring histograms (detector and performance) available hours after run appeared; shifters spotted problems shortly thereafter – Histograms moved to AFS to avoid overloading castor with too many requests – ESDs and AODs not exported—consider temporary storage so users can access them if needed for DQ

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Data quality Data quality was checked by central shifters and system experts – Included: Pixel, LAr, Tile, MDT, RPC, L1RPC, ID alignment, e/gamma, jets, missing E T, muon tracks – Two shifts per day using desk in Tier0 control room – Exercised ability to spot problems with 10-minute granularity – Lots of room for improvement; core functionality came online during the week. Run summary: Histograms:

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Some unexpected features Shoulders at ~150 GeV and ~600 GeV TRT timing not configured for one wheel (shift of 15ns)

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Expected DQ flags Run 3062, dead LAr crate (or crates?) Run 3062, minutes 20-30, hot LAr cells What about the noisy barrel crate? Problems were introduced for short time intervals to test data-quality monitoring:

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Calibrations Tested the calibration loop: determine calibration constants within 24 hrs of data being recorded and apply new constants during bulk reconstruction – Pixel calibration planned to run on express stream Software only working in 13.X.0; postponed until FDR2 – TRT calibration running on ID-alignment stream First of many iterations completed on Tier0 promptly; remaining iterations finishing on lxbatch – ID alignment

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ ID-alignment stream First test of a calibration stream – An efficient and dedicated effort had this ready for testing at the beginning of January – Used by both ID alignment and TRT calibration in FDR1 – Uses event fragments from L2 (“L2_trck10i_calib”) Isolated tracks with pT > 10GeV (5GeV for FDR1) Select tracks at 60 Hz (L = ) with no additional TDAQ load 600k tracks selected for FDR1 (from a dijet sample; uses 26GB) – Attempted to include 50k cosmic events; postponed due to software incompatibilities

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ ID alignment procedure The ID alignment group produced constants under tight computing and time constraints – Only had ID alignment stream from beginning of January for testing – The alignment procedure necessarily involves iterations - This is not suitable for Tier0; full implications are still being considered – Calibration model is being adapted—ID alignment ran outside of production, constants fed back before bulk reconstruction – Will expect a full test of the new model in FDR-2

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ ID alignment updating A new ID alignment was calculated by 16:00 Thursday Feb 7 (two-day turnaround) – Used ID-alignment stream; did not use cosmics or express stream this time – Only two iterations at level 2 due to time constraints Perfect Nominal Aligned Perfect Nominal Aligned Perfect Nominal Aligned d0z0Q/p T

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Daily meeting Daily operations meeting at 16:00 from 4-8 Feb – Review first look at fresh data and shift report – Complete look at previous day’s data (incl. overnight) Calibration signoff—upload new constants? If new constants, reprocess express stream ⇒ Need to consider resources for this carefully Process physics streams if data quality is understood – Ensure that all data-quality flags have been set Need reports from ALL systems and groups

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Flagging data quality Data-quality assessment has both automatic & manual components – Automatic tools will be run to check DCS, histograms – Expert system shifters need to check automatic assessments and possibly override for every run – After the daily meeting, all assessments are combined into a final flag, which is written into both DB and AODs – Assessments may span time period from 1 LB - entire run

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ External participation Information flow to those outside CERN should be improved – Ideally, a portal where collaborators can check realtime status of Runs recorded Runs with express stream processed Runs waiting for data-quality signoff List of bulk streams processed, and then exported – How can offsite colleagues participate in meaningful data checks?

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Data Distribution

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Dataset replication to T2s started immediately after T1s had complete replicas

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Data Distribution: Summary Data replication from CERN to Tier-1s and within all clouds is relatively stable Notes: Site problems were fixed by operations team (central and regional) within 24h Data export from CERN was delayed by 3 days (several technical issues - all under discussion) Data replication monitoring and book-keeping still have room for improvement Not enough data to be a transfer challenge - will use other data (M5) for CCRC

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Preparation for US analysis effort (and some results) Ongoing effort in US to prepare user analysis queues - have been tested and are mostly ready to go (modulo scl3 vs scl4 issues etc.) - experts have been very responsive User participation in AOD analysis frustrated/delayed by - lack of standard analysis package - should be available this week (including trigger info) - Non-existence of the tags (?) - confusion about where to obtain lumi info (not a serious issue for now) A large set of people have expressed interest in FDR participation - expect “surge” in activity once analysis doesn’t require expert status Primary DPDs to be generated during production - perhaps during T0 reprocessing ? Individual groups are starting to produce their own secondary & tertiary DPDs Tutorial today

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Lashkar Kashif Harvard Dimuon mass

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ Looking for J/  and Upsilon to ee Andy Nelson, ISU

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ h of data (2.5 pb -1 ) - with fakes where possible - some fakes will be reused many times (JF samples) aim to redigitize so the repeated events are not quite the same 2. 3h of data (10.8 pb -1 ) - probably no fakes - with 75ns pileup - for some of this data, the beamspot will move FDR-2 plan (March 4, 2008)

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ FDR-2 Simulation Production Simulation of physics and background for FDR-2 Data Sample Need to produce before May (1/3 during CCRC-1 ) –0.5 M minimum bias and cavern events –10 M physics events –100 M fake events Simulation  HITS (1.5 MB/ev), Digitization  RDO (2.5 MB/ev) Reconstruction  ESD (1 MB/ev), AOD 0.2 MB/ev), TAG (1 kB/ev) Simulation + Digitization is done at the T2’s HITS and RDOs uploaded to T1 –HITS to tape at T1 –RDO to CERN for mixing Reconstruction done at T1 –ESD, AOD, TAG archived to tape at T1 –ESD copied to other T1 by share (3 full copies world-wide) –AOD and TAG copied to each other T1

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/

US ATLAS Transparent Distributed Facility Workshop, UNC 3/4/ FDR Lessons Learned (so far) Fake samples must match trigger menu items; purity and rate both important Software validation must be improved Calibration procedures need detailed advance planning and coordination Signoff procedure for bulk processing needs both automation and expert attention Express-stream reprocessing needs consideration and resources Always best to try things early – Finding problems was the goal; most were overcome – For FDR-2 expect basics to run more smoothly Will focus on the operational details and the calibration model Tune the existing system; commission new components More physics content in streams Developers, users and experts have a much better sense of how the system is intended to work; can be more efficient in the future – All these problems must be faced sometime—better now than later