Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMS livetime during last week’s runnin 11 Sep 2012G. Rakness (UCLA)1 Fills 3025-3047  1.02/fb delivered, 0.98/fb recorded 96.1% live ~2.4% not running.

Similar presentations


Presentation on theme: "CMS livetime during last week’s runnin 11 Sep 2012G. Rakness (UCLA)1 Fills 3025-3047  1.02/fb delivered, 0.98/fb recorded 96.1% live ~2.4% not running."— Presentation transcript:

1 CMS livetime during last week’s runnin 11 Sep 2012G. Rakness (UCLA)1 Fills 3025-3047  1.02/fb delivered, 0.98/fb recorded 96.1% live ~2.4% not running ~1.4% deadtime

2 The downtime was spent (poorly) recovering from problems Preshower FMM=Error – Shift crew did not correctly follow directions to recover Tracker FMM=Busy (3 times) – Shift crew did not correctly follow directions to recover Trigger Software = ERROR at start of run (because the subsystem started with FMM ≠ Ready) – Shift crew focuses to recover Trigger instead of the subsystem Global Trigger sends 15-16 events, then goes to FMM=Busy – Shift crew frustrated (this happens just after software in ERROR) 10 Sept 2012G. Rakness (UCLA)2  1h30min spent over 4 separate occasions 

3 Downtimes are “easy” to address Fix the bugs So easy to say, so hard to do… Use automated recovery tools If these are SEU’s, use FixSoftError Go to the right state… For example, Preshower  if you want a Red-Recycle, go to Software=Error Trigger  If the problem is a subsystem, don’t go to Software=Error Simplify the DAQ shifter recovery instructions Simpler is better than exactly precise The 2 minutes which are saved the crew following the exact instructions are overwhelmed by the 20 minutes to recover from cascading problems by not doing it exactly right… 10 Sept 2012G. Rakness (UCLA)3

4 As of TODAY, DAQ shifter Action Matrix finally in action If you have a problem with a subsystem stopping triggers or data flow... 1.find the row corresponding to the subsystem 2.analyze the symptoms. If there is backpressure (small "<" next to a FED), this may not be a problem with that subsystem. What is the trigger doing? Go to step 3., but be ready to call the DAQ DOC and/or HLT DOC. 3.if the symptom is explicitly specified (e.g., infinite loop of SoftErrorRecovery), do the Action corresponding to the column. If the symptom is not explicitly specified, perform the numbered Actions in order until the problem is resolved. If the last number doesn't work or if you have reached the condition in the last column, call the subsystem DOC! 11 Sep 2012G. Rakness (UCLA)4 https://twiki.cern.ch/twiki/bin/viewauth/CMS/ShiftNews#DAQ_Shifter_Action_Matrix

5 Deadtime numbers are approaching downtime numbers Recall: if a subsystem has a problem, they can ask for help and get an automatic reaction – FMM = Out-of-sync  Resync from Global Trigger (GT) firmware Used by pixel – FMM = Out-of-sync  Resync from GT software Used by CSC/ECAL/Tracker – FMM= Error  Hard Reset from GT software Used by CSC/DT – Software = RunningWithSoftError  FixSoftError from Global DAQ software Used by Pixel/ECAL/Tracker How often does this actually happen? 11 Sep 2012G. Rakness (UCLA)5

6 Numbers from last week FMM = Out-of-sync  Resync from GT firmware – 3369 times ~ 154sec (estimate @ 46msec per Resync) – 46msec is programmed in GT registers as settle/recover time  tune? FMM = Out-of-sync  Resync from GT software – 149 times by CSC = 185.1s (from FMM logs) – If CSC could have resync programmed in GT firmware  harder to do… FMM= Error  Hard Reset from GT software – 1 time by CSC ~ 2sec (estimate @ 1.5sec per software read) – Very little to gain here Software = RunningWithSoftError  FixSoftError from Global DAQ software – 16 times ~ 160sec (compare with 1h 30minutes on slide 2) – Subsystems need to use this, if they can… 11 Sep 2012G. Rakness (UCLA)6

7 To do Write poster for TWEPP (conference is next week) Run Coordination – Follow up database check – Follow up DAQ shifter actions – Follow up ECAL/HCAL reconfiguration after clock changes – Rework shift leader training – Make shift summary Elog template for shift leader CSC – Check ALCT slow control firmware at 904… then at p5 on chamber with strange ADC values – Finish documentation of CSC timing https://twiki.cern.ch/twiki/bin/view/CMS/CSCsynchronizationDocuments – Put firmware into SVN – RAT firmware loading As well… – Make neutron skims 11 Sep 2012G. Rakness (UCLA)7


Download ppt "CMS livetime during last week’s runnin 11 Sep 2012G. Rakness (UCLA)1 Fills 3025-3047  1.02/fb delivered, 0.98/fb recorded 96.1% live ~2.4% not running."

Similar presentations


Ads by Google