CMS livetime during last week’s runnin 11 Sep 2012G. Rakness (UCLA)1 Fills 3025-3047  1.02/fb delivered, 0.98/fb recorded 96.1% live ~2.4% not running.

Slides:



Advertisements
Similar presentations
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
Advertisements

A look at interrupts What are interrupts and why are they needed.
Bug Session One. Session description In this session, pupils are introduced to a programming sequence which will make a light pattern on their Bug. Objectives.
The Software Process Strategy The Software Process Strategy Part III.
Heuristic Evaluation IS 485, Professor Matt Thatcher.
A look at interrupts What are interrupts and why are they needed.
How to use Postcron Margaret Huck. What is Postcron and How does it help me manage my online FB parties?  Postcron is an essential lifesaving, timesaving.
Session 4 Review and Improve Performance TopicTiming 1End of shift closeout10 minutes 2Review performance each day10 minutes 3Plan to improve10 minutes.
Sequential circuit design
Bug Session Two. Session description In this session the use of algorithms is reinforced to help pupils plan out what they will need to program on their.
Personal Software Process Overview CIS 376 Bruce R. Maxim UM-Dearborn.
A Bug Tracking Story Danny R. Faught Tejas Software Consulting ASEE Software Engineering Process Improvement Workshop 2002.
S. Durkin, USCMS-EMU Meeting, Oct. 21, 2005 Critical Data Errors S. Durkin The Ohio State University USCMS EMU Meeting, FNAL, Oct. 20, 2005.
Bug Session One. Session description In this session, pupils are introduced to a programming sequence which will make a light pattern on their Bug. Objectives.
Introduction Optimizing Application Performance with Pinpoint Accuracy What every IT Executive, Administrator & Developer Needs to Know.
Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition.
Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition.
Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition.
An Asynchronous Level-1 Tracking Trigger for Future LHC Detector Upgrades A. Madorsky, D. Acosta University of Florida/Physics, POB , Gainesville,
T. Rowe Price, Invest With Confidence and the Bighorn Sheep logo is a registered trademark of T. Rowe Price Group, Inc. Please dial from.
Abbreviated list of April Global Run (AGR) achievements All: clock frequency scan ECAL/RCT/L1: test (28 of 500) new oSLB  oRM links Strips: run with all.
Spreadsheets in Finance and Forecasting Presentation 9 Macros.
15 Sept 2009G. Rakness (UCLA)1 Paper progress On the Analysis Review Committee for the paper: “Performance of the CMS Cathode Strip Chambers with Cosmic.
13 March 2007G. Rakness (UCLA) 1 Minus side slice test status Greg Rakness University of California, Los Angeles UCLA phone meeting 13 March 2007.
Create Change Or Let It Happen To You. Change Is l Finding solutions l Convincing people to try the solutions l Selling l Many think of poor sales experiences.
This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.
Offline shifter training tutorial L. Betev February 19, 2009.
28 Jan 2009G. Rakness (UCLA)1/15 Overview of What Was Done Last Week… Worked with Alex M. to measure ALCT turn-on curves –See next slides… Obtained values.
Cover for special hardcover edition of Phys. Lett. B featuring CMS+ATLAS Higgs results 31 July 2012G. Rakness (UCLA)1 The articles were submitted today.
OFFLINE TRIGGER MONITORING TDAQ Training 5 th November 2010 Ricardo Gonçalo On behalf of the Trigger Offline Monitoring Experts team.
Chapter 16 Micro-programmed Control
CSC Synchronization documentation Step-by-step procedure to… – Read out a chamber – Synchronize triggers from many chambers (and read out data) Description.
13-Nov-15 (1) CSC Computer Organization Lecture 7: Input/Output Organization.
Making Python Pretty!. How to Use This Presentation… Download a copy of this presentation to your ‘Computing’ folder. Follow the code examples, and put.
Various Wrote Run Coordination job description – Found this exercise to be useful as CSC Operations guy, so decided to do it as Deputy Run Coordinator…
Collisions… … with “non-stable beams” maybe on Thursday –Got experts to converge on goal and how to do it … to check timing, using L1 = zero-bias seeding.
Slide 1 Project 1 Task 2 T&N3311 PJ1 Information & Communications Technology HD in Telecommunications and Networking Task 2 Briefing The Design of a Computer.
Draft of talk to be given in Madrid: CSC Operations Summary Greg Rakness University of California, Los Angeles CMS Run Coordination Workshop CIEMAT, Madrid.
L0 DAQ S.Brisbane. ECS DAQ Basics The ECS is the top level under which sits the DCS and DAQ DCS must be in READY state before trying to use the DAQ system.
UCLA group meeting1/11 CSC update – a 2-week summary Status of CMS at LHC: L=2*10 32 reached 25-Oct-2010 (=the original goal for 2011) and 42 pb -1 collected.
4 Dec 2008G. Rakness (UCLA)1 Online Software Updates and RPC data in the RAT …including Pad Bit Mapping and Efficiency… Greg Rakness University of California,
1Sequential circuit design Acknowledgement: Most of the following slides are adapted from Prof. Kale's slides at UIUC, USA by Erol Sahin and Ruken Cakici.
6. Shift Leader 1. Introduction 2. SL Duties 3. Golden Rules 4. Operational Procedure 5. Mode Handshakes 6. Cold Start 7. LHCb State Control 8. Clock Switching.
Recall: CFEB inefficiencies Collision Run June 2010G. Rakness (UCLA)1 validation/run135445/ExpressPhysics/Site/PNGS/hORecHits.png.
UCLA Group Meeting March 20, 2014 Andrew Peck Shayan Rastegari 1 Updates from Lab Andrew Peck & Shayan Rastegari March 20, 2014.
CSC Shifter Training Course – Global Running Fred Borcherding Reach from CSCOperations Twiki page or directly:
MCast Errors and HV Adjustments Multicast Errors (seen on the DATA ERIS connection) have caused a disruption of a HV Adjustment due to a timeout (since.
Control Room and Shift Operations: CMS Greg Rakness (CMS Deputy Run Coordinator) University of California, Los Angeles ATLAS Post-LS1 Operations Workshop.
Introduction to Computing Systems and Programming Programming.
Calorimeter global commissioning: progress and plans Patrick Robbe, LAL Orsay & CERN, 25 jun 2008.
CSC 205 Programming II Lecture 1 PSP. The Importance of High-Quality Work Three aspects to doing an effective software engineering job producing quality.
1 Top Level of CSC DCS UI 2nd PRIORITY ERRORS 3rd PRIORITY ERRORS LV Primary - MaratonsHV Primary 1 st PRIORITY ERRORS CSC_COOLING CSC_GAS CSC – Any Single.
DCC Out of Sync Problems Stan Durkin, Ohio State.
4HOnline Volunteer Leaders Forum September 27, 2014 Presented by Ben Knowles State 4-H Information Management Coordinator.
Status at CERN 8 fills  ~500/pb since last Tuesday – ~5.3/fb delivered, ~4.8/fb recorded, … SEU workshop on Friday – Different detectors have different.
ECAL Shift Duty: A Beginners Guide By Pourus Mehta.
Sundry LHC Machine Development starts 19 June –Original plan to have 90m comm. next week was torpedoed by private discussions between spokesperson and.
Various Gave presentation at TIG meeting on p5 control room. Feedback included… Proposal should explicitly cover all phases of CMS operations Running.
Central DQM Shift Tutorial Online/Offline
Draft of to institute heads
A step-by-Step Guide For labels or merges
Starting to use it with the ACM problems
Recall: CSC TF Halo rate spikes during run (23 May)
904 Status Recall last Group Meeting…
Offline shifter training tutorial
EUDAQ Status Report Emlyn Corrin, 29 September 2010
T Project Review Group: pdm I2 Iteration
“Golden” Local Run: Trigger rate = 28Hz
Offline shifter training tutorial
Presentation transcript:

CMS livetime during last week’s runnin 11 Sep 2012G. Rakness (UCLA)1 Fills  1.02/fb delivered, 0.98/fb recorded 96.1% live ~2.4% not running ~1.4% deadtime

The downtime was spent (poorly) recovering from problems Preshower FMM=Error – Shift crew did not correctly follow directions to recover Tracker FMM=Busy (3 times) – Shift crew did not correctly follow directions to recover Trigger Software = ERROR at start of run (because the subsystem started with FMM ≠ Ready) – Shift crew focuses to recover Trigger instead of the subsystem Global Trigger sends events, then goes to FMM=Busy – Shift crew frustrated (this happens just after software in ERROR) 10 Sept 2012G. Rakness (UCLA)2  1h30min spent over 4 separate occasions 

Downtimes are “easy” to address Fix the bugs So easy to say, so hard to do… Use automated recovery tools If these are SEU’s, use FixSoftError Go to the right state… For example, Preshower  if you want a Red-Recycle, go to Software=Error Trigger  If the problem is a subsystem, don’t go to Software=Error Simplify the DAQ shifter recovery instructions Simpler is better than exactly precise The 2 minutes which are saved the crew following the exact instructions are overwhelmed by the 20 minutes to recover from cascading problems by not doing it exactly right… 10 Sept 2012G. Rakness (UCLA)3

As of TODAY, DAQ shifter Action Matrix finally in action If you have a problem with a subsystem stopping triggers or data flow... 1.find the row corresponding to the subsystem 2.analyze the symptoms. If there is backpressure (small "<" next to a FED), this may not be a problem with that subsystem. What is the trigger doing? Go to step 3., but be ready to call the DAQ DOC and/or HLT DOC. 3.if the symptom is explicitly specified (e.g., infinite loop of SoftErrorRecovery), do the Action corresponding to the column. If the symptom is not explicitly specified, perform the numbered Actions in order until the problem is resolved. If the last number doesn't work or if you have reached the condition in the last column, call the subsystem DOC! 11 Sep 2012G. Rakness (UCLA)4

Deadtime numbers are approaching downtime numbers Recall: if a subsystem has a problem, they can ask for help and get an automatic reaction – FMM = Out-of-sync  Resync from Global Trigger (GT) firmware Used by pixel – FMM = Out-of-sync  Resync from GT software Used by CSC/ECAL/Tracker – FMM= Error  Hard Reset from GT software Used by CSC/DT – Software = RunningWithSoftError  FixSoftError from Global DAQ software Used by Pixel/ECAL/Tracker How often does this actually happen? 11 Sep 2012G. Rakness (UCLA)5

Numbers from last week FMM = Out-of-sync  Resync from GT firmware – 3369 times ~ 154sec 46msec per Resync) – 46msec is programmed in GT registers as settle/recover time  tune? FMM = Out-of-sync  Resync from GT software – 149 times by CSC = 185.1s (from FMM logs) – If CSC could have resync programmed in GT firmware  harder to do… FMM= Error  Hard Reset from GT software – 1 time by CSC ~ 2sec 1.5sec per software read) – Very little to gain here Software = RunningWithSoftError  FixSoftError from Global DAQ software – 16 times ~ 160sec (compare with 1h 30minutes on slide 2) – Subsystems need to use this, if they can… 11 Sep 2012G. Rakness (UCLA)6

To do Write poster for TWEPP (conference is next week) Run Coordination – Follow up database check – Follow up DAQ shifter actions – Follow up ECAL/HCAL reconfiguration after clock changes – Rework shift leader training – Make shift summary Elog template for shift leader CSC – Check ALCT slow control firmware at 904… then at p5 on chamber with strange ADC values – Finish documentation of CSC timing – Put firmware into SVN – RAT firmware loading As well… – Make neutron skims 11 Sep 2012G. Rakness (UCLA)7