DAQ2 Shift TutorialcDAQ group1 Monitoring of the DAQ2 system Remi Mommsen, FNAL.

Slides:



Advertisements
Similar presentations
Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.
Advertisements

TFACTS Private Provider Financial/Invoicing Overview 1.
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 7: Advanced File System Management.
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 14 Server and Network Monitoring.
DAQ2 Shift TutorialcDAQ group1 Monitoring of the DAQ2 system.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
New Features in Release 4.3 (May 16, 2005). Release 4.3 New Features Navigation enhancements Punch-out supplier availability notifications The ability.
CLEW Basics Lorie Stolarchuk Learning Technology Trainer Centre for Teaching and Learning 1.
Windows Server 2008 Chapter 11 Last Update
CMS DAQ-2 Shifter Tutorial
Downloading and Installing PAF Insight PAF Insight can be easily downloaded Or can be installed from a CD A license is needed t0 activate the program.
Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2012 Xavier Vilasis.
Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition.
Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition.
Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition.
5. Data Manager 1. Introduction 2. Data Manager Duties 3. Quality Checking 4. Problem Reporting 5. Data Monitoring 6. Histogram Presenter 7. Trend Presenter.
Microsoft ® Word 2010 Core Skills Lesson 1: Getting Started Courseware #: 3240 Microsoft Office Word 2010.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 7: Advanced File System Management.
Offline Tracker DQM Shift Tutorial. 29/19/20152 Tracker Shifts Overview Online Shifts at P5 (3/day for 24 hours coverage) – One Pixel shifter and one.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
ERA Manager Training December 19, Propriety and Confidential. Do not distribute. 2 ERA Manager Overview In an effort to reduce the need for Providers,
C.Combaret, L.Mirabito Lab & beamtest DAQ with XDAQ tools.
Offline shifter training tutorial L. Betev February 19, 2009.
Recent Software Issues L3 Review of SM Software, 28 Oct Recent Software Issues Occasional runs had large numbers of single-event files. INIT message.
How to Run a Scenario In HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>
OFFLINE TRIGGER MONITORING TDAQ Training 5 th November 2010 Ricardo Gonçalo On behalf of the Trigger Offline Monitoring Experts team.
A.Golunov, “Remote operational center for CMS in JINR ”, XXIII International Symposium on Nuclear Electronics and Computing, BULGARIA, VARNA, September,
Agilent Technologies Copyright 1999 H7211A+221 v Capture Filters, Logging, and Subnets: Module Objectives Create capture filters that control whether.
The new CMS DAQ system for LHC operation after 2014 (DAQ2) CHEP2013: Computing in High Energy Physics Oct 2013 Amsterdam Andre Holzner, University.
CMS pixel data quality monitoring Petra Merkel, Purdue University For the CMS Pixel DQM Group Vertex 2008, Sweden.
CLEW Basics Lorie Stolarchuk Learning Technology Trainer Centre for Teaching and Learning 1.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
Draft of talk to be given in Madrid: CSC Operations Summary Greg Rakness University of California, Los Angeles CMS Run Coordination Workshop CIEMAT, Madrid.
DAQ Andrea Petrucci 6 May 2008 – CMS-UCSD meeting OUTLINE Introduction SCX Setup Run Control Current Status of the Tests Summary.
Pixel DQM Status R.Casagrande, P.Merkel, J.Zablocki (Purdue University) D.Duggan, D.Hidas, K.Rose (Rutgers University) L.Wehrli (ETH Zuerich) A.York (University.
6. Shift Leader 1. Introduction 2. SL Duties 3. Golden Rules 4. Operational Procedure 5. Mode Handshakes 6. Cold Start 7. LHCb State Control 8. Clock Switching.
TELL1 command line tools Guido Haefeli EPFL, Lausanne Tutorial for TELL1 users : 25.February
Online Consumers produce histograms (from a limited sample of events) which provide information about the status of the different sub-detectors. The DQM.
LHC CMS Detector Upgrade Project RCT/CTP7 Readout Isobel Ojalvo, U. Wisconsin Level-1 Trigger Meeting June 4, June 2015, Isobel Ojalvo Trigger Meeting:
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
The NA62RunControl: Status update Nicolas Lurkin School of Physics and Astronomy, University of Birmingham NA62 TDAQ Meeting – CERN, 10/06/2015.
SNG via Webinar. Where’s Webinar??  Double click Aflac 2000 folder  Highlight “SNGWebCommunicator”  Right Click and “Send To - Desktop”
Purchase Document Management Course 2 Training Presentation for Supply Chain Platform: Rolls-Royce June 2013.
Career Spot Videos The Menu Bar Easily update your information through these quick links Click on the icons to join us on Facebook & Twitter and get immediate.
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
Orders and Invoices Supply Chain Platform: Rolls-Royce Training for Indirect Suppliers March 2016.
Invoices and Service Invoices Training Presentation for Raytheon Supply Chain Platform (RSCP) April 2016.
1 Top Level of CSC DCS UI 2nd PRIORITY ERRORS 3rd PRIORITY ERRORS LV Primary - MaratonsHV Primary 1 st PRIORITY ERRORS CSC_COOLING CSC_GAS CSC – Any Single.
Purchase Document Management Supply Chain Platform: Energy For Siemens Energy ADGT November 2014.
1.Switch on the computer and wait for loading. 2.Select the Windows 7 OS at the end of the list. 3.Click on the link ‘Administrator’ 4.Enter the administrator.
20OCT2009Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2009 Xavier Vilasis.
EPS 2007 Alexander Oh, CERN 1 The DAQ and Run Control of CMS EPS 2007, Manchester Alexander Oh, CERN, PH-CMD On behalf of the CMS-CMD Group.
ECAL Shift Duty: A Beginners Guide By Pourus Mehta.
Orders – Create Responses Boeing Supply Chain Platform (BSCP) Detailed Training July 2016.
AdisInsight User Guide July 2015
General System Navigation
Central DQM Shift Tutorial Online/Offline
Central Online DQM Shift Tutorial March 2017, CMS DQM group
Hands-On Microsoft Windows Server 2008
Controlling a large CPU farm using industrial tools
Offline shifter training tutorial
MiniDAQ2 Workshop Control System.
Hannes Sakulin, CERN/EP on behalf of the CMS DAQ group
Offline shifter training tutorial
Bomgar Remote support software
Quick Reference Guide: Skills Profiler – Employee
Purchase Document Management
Presentation transcript:

DAQ2 Shift TutorialcDAQ group1 Monitoring of the DAQ2 system Remi Mommsen, FNAL

DAQ2 Shift TutorialcDAQ group2 Monitoring tools 1. RCMS/LVL0 interface Has been covered by Hannes 2. aDAQMon Overview screen to see at a glance the CMS running configuration and rates. 3. DAQView Most comprehensive monitoring tool for issues with data flow. Here you can monitor the data from FEDs to BUs. 4. Elastic Search / Filter Farm monitoring (File Merging & Transfers) Shows the progress of file merging and transfers to T0. Important monitor of file-based filter farm (FFF). 5. CPM controller Central Partition Manager for the TCDS system. Good place to see rates, state of detector inputs, etc. 6. HotSpot Central display for sentinel messages for errors from all processes.

DAQ2 Shift TutorialcDAQ group3 aDAQmon – DAQ Summary History of HLT activity Data taking history DAQ flow DAQ sub-system configuration Status bar gives a quick overview of the DAQ

DAQ2 Shift TutorialcDAQ group4 Main systems (LHC, DCS,...) status FED-RU data stream FED RU configuration Box color: Sub-Sys ID RU/BU box color: CPU 0 100% FED IN FED OUT RU bandwidth plot BU bandwidth plot # Ev. in BU BU RAM disk % BU OUT disk % DAQ Sub-Sys configuration RU/BU box RED frame: flash data not updated Event storage summary

DAQ2 Shift TutorialcDAQ group5 DAQView

DAQ2 Shift TutorialcDAQ group6 DAQView Status & navigation FED Builder FEROL/FMM Event Builder RU/EVM FFF Appliances BU & FU FFF Appliances BU & FU Age of monitor data

DAQ2 Shift TutorialcDAQ group7 DAQView - Navigation Stop refreshing page Switch pages between FEDbuilder, FFF, and all You only need cDAQ Start DAQView if it is not running Current run Duration and start time of run (or last restart of DAQView) Last update of page must be current! If it is stale, you need to restart DAQView

DAQ2 Shift TutorialcDAQ group8 DAQView – FED builder TTC partition name & no. Current TTS state of partition %warning, %busy in TTS partition FEROL PC (link to hyperdaq page) FED information (see next page) min/max # fragments received by FEROL. Highlighted in yellow if different to trigger. Min is only displayed if not equal to max. FED builder name Confused? Try the table help button!

DAQ2 Shift TutorialcDAQ group9 DAQView – FEROL and FMM Entries are of form  FRL_geoslot: FEDSourceID or  FRL_geoslot: FEDSourceID1, FEDSourceID2 or  FEDSourceID For a pseudo-FED (=TTS link only, but no data is read out by DAQ) Additional info may be displayed next to the FEDSourceID (from left to right)  Percentage of time during which FED was in Warning ( ) or Busy ( ) during the last 3 seconds (if non-zero)  Current state of TTS if other than Ready  FEDSourceID (expected) 601 Grey if FRL input not enabled (FMM not enabled in case of pseudo-FED) Highlighted in color of current TTS state if other than Ready  Percentage of time with DAQ backpressure during last update interval (5s) if non-zero  Warnings Received source ID different to expected FED or SLINK CRC errors Number of fragments received by FRL if no data is flowing and this FRL is lagging “behind” uTCA FEDs (TCDS and HF lumi) do not have an FMM  Busy/warning are not visible in DAQView! Check the CPM controller Use this to judge whether a FED is creating dead-time because of a FED problem or because of DAQ-backpressure W:9.9% B:0.2% W <6.9% #FCRC=

DAQ2 Shift TutorialcDAQ group10 DAQView – RU/EVM Information EVM/RU host (link to hyperdaq page) First row is TCDS / EVM Rate (kHz) # fragments built by RU/EVM since start of run # incomplete fragments >> 1 indicates a problem on the RU Throughput (MB/s) Super-fragment size (kB) # events currently in RU >>1 indicates problem in IB # requests by BU normal EVM >> 1 && RUs < 10 Each row is one FEDbuilder Shaded values mean FEDbuilder is not in readout

DAQ2 Shift TutorialcDAQ group11 DAQView – FFF/BU BU host (link to hyperdaq page) Rate per BU (kHz) Throughput (MB/s) Event size (kB) Confused? Try the table help button! Events built since start of run # events being built Resource information (see next page) # files written # LS for which there is a file Current LS number Each line is one Appliance # LS on FUs

DAQ2 Shift TutorialcDAQ group12 DAQView – BU Resources BU resources are used for requesting events  Each resource corresponds to multiple events Less resources mean less event requests to EVM  Load balancing between independent appliances  Backpressure mechanism if FFF/HLT cannot keep up Each BU has a number of resources (#resources) Resources can be blocked (#blocked)  RAM disk becomes full  Not enough FU CPU cores are available to process data  FU processing lags behind Resources for which no event data has been received are counted under #requests  If #requests > 0, the BU is able to accept new events

DAQ2 Shift TutorialcDAQ group13 DAQView – Running, or not? LVL0: DAQ is running No, rate is 0 kHz None of the HF FEDs has sent any events No fragments in RU Many events requested No data flow as HF has not sent any data  Talk to HF expert

DAQ2 Shift TutorialcDAQ group14 DAQView – Who Blocks the Run? ECAL is 100% in Warning Rate is 0 kHz FED 602 is in warning and last event is 9605 There’s backpressure from DAQ RU waits for data from FED 59 FED 59 has not sent any data FED 59 is the culprit  Talk to Tracker expert

DAQ2 Shift TutorialcDAQ group15 DAQView – DAQ backpressure ECAL is 50% in Warning There’s backpressure from DAQ Very few events requested by BUs All BUs are “blocked” or “throttled” RAM disk is full All resources blocked RAM disk is nearly full 25/32 resources blocked No FU cores available All resources blocked Only a few FU cores available 26/32 resources are blocked FFF is blocked  Try to figure out what is wrong (and call DAQ oncall) The rate is 10 kHz

DAQ2 Shift TutorialcDAQ group16 F3 Monitor

DAQ2 Shift TutorialcDAQ group17 Storage & Transfer System 17 Aggregate files (event data, DQM histograms & metadata) as they appear Micro-merger on each FU aggregates the data from all processes on the FU Mini-merger on the BU aggregates the data from all FUs Mega-merger(s) aggregate the data from all BUs Data and meta-data are aggregated per luminosity section Each luminosity section and stream treated independently If previous step has completed successfully, input data can be deleted

DAQ2 Shift TutorialcDAQ group18 F3 Monitor Nice demo available at List of recent runs Access old runs Active run Both boxes must be green Time chart of HLT activity Confused? Try the guide! Stream rates vs LS Stream names (click to hide them) Completeness of data Alert DAQ oncall when multiple boxes are not green (this situation is okay)

DAQ2 Shift TutorialcDAQ group19 Storage Manager Page Gives an overview of the data transfer to tier 0 for recent runs  Number of files, sizes and event rates per stream  Totals per run Check that files are injected, transferred and checked (in future also repacked & deleted)  Suspicious values are color coded  Make an elog entry and send an to in case of

DAQ2 Shift TutorialcDAQ group20 Central Partition Manager

DAQ2 Shift TutorialcDAQ group21 TCDS Combines the pre-LS1:  Trigger Control System (TCS) The conductor of all CMS triggering and data-taking  Trigger Timing and Control (TTC) The distributor of clock, L1As, and synchronisation signals  Trigger Throttling System (TTS) The feedback of readiness states from FEDs to TCS Many-legged creature:  The ‘head’ is the Central Partition Manager (controlled by central DAQ)  Many different legs (i.e., partitions) across the different subsystems (controlled by the subsystems)

DAQ2 Shift TutorialcDAQ group22 TCDSCentral tcds-control-central.cms:2000/urn:xdaq-application:lid=100 tcds-control-central.cms:2000/urn:xdaq-application:lid=100

DAQ2 Shift TutorialcDAQ group23 TCDSCentral tcds-control-central.cms:2000/urn:xdaq-application:lid=100 tcds-control-central.cms:2000/urn:xdaq-application:lid=100 TTC machine interface applications Provide the connection between the LHC RF and timing signals and CMS.

DAQ2 Shift TutorialcDAQ group24 TCDSCentral tcds-control-central.cms:2000/urn:xdaq-application:lid=100 tcds-control-central.cms:2000/urn:xdaq-application:lid=100 Central Partition Manager (CPM) Drives CMS. Controls triggers, calibration sequence, timing and synchronisation, … This application should tell you what and how many triggers are flowing, or why not.

DAQ2 Shift TutorialcDAQ group25 CPMController tcds-control-central.cms:2050/urn:xdaq-application:lid=100 tcds-control-central.cms:2050/urn:xdaq-application:lid=100 Running state shows if triggers are flowing or why not: Stopped Running Blocked by TTS Blocked by DAQ backpressure etc. Hardware status tab

DAQ2 Shift TutorialcDAQ group26 CPMController tcds-control-central.cms:2050/urn:xdaq-application:lid=100 tcds-control-central.cms:2050/urn:xdaq-application:lid=100 Running state: Stopped Running Blocked by TTS Blocked by DAQ backpressure etc. shows what can/will block triggers TTS and trigger blockers tab

DAQ2 Shift TutorialcDAQ group27 CPMController tcds-control-central.cms:2050/urn:xdaq-application:lid=100 tcds-control-central.cms:2050/urn:xdaq-application:lid=100 Running state: Stopped Running Blocked by TTS Blocked by DAQ backpressure etc. This shows which partition is not TTS-READY TTS and trigger blockers tab

DAQ2 Shift TutorialcDAQ group28 CPMController tcds-control-central.cms:2050/urn:xdaq-application:lid=100 tcds-control-central.cms:2050/urn:xdaq-application:lid=100 This tab shows: - What rate of triggers are flowing, per type - What rate of triggers are being suppressed, per type - What the deadtime is, per source - How much time each partition spends in TTS not-READY (at the bottom) Rates and deadtimes tab

DAQ2 Shift TutorialcDAQ group29 CPMController tcds-control-central.cms:2050/urn:xdaq-application:lid=100 tcds-control-central.cms:2050/urn:xdaq-application:lid=100 Add random triggers Input sources

DAQ2 Shift TutorialcDAQ group30 HotSpot Make sure that it updates (pulsates) Check regularly for Errors or Fatal by clicking on corresponding button

DAQ2 Shift TutorialcDAQ group31 HotSpot Click on error Analyze the error and take appropriate action You can use HTML to copy it into the elog Acknowledge understood errors

DAQ2 Shift TutorialcDAQ group32 Handsaw Running in a terminal on the shifter console  You need an account in the online cluster to start it Scrolling display of error messages from DAQ  All messages (and more) are in HotSpot or LVL0  Handsaw is often quicker to find the most relevant message

DAQ2 Shift TutorialcDAQ group33 What to do if it does not work Don’t panic! Keep cool.  Not always easy, especially during stable beams  Think before clicking!  GUIs are sometimes slow in reacting. Be patient… Look for error messages (LVL0, HotSpot, Handsaw) Look at DAQView for anything suspicious  Figure out what subsystem is causing problems  Be aware that one subsystem might get backpressure from DAQ due to other issues Talk to the shift leader and other shifters  They might be aware of problems affecting DAQ  E.g. if a subsystem lost power, DAQ will go into error (you might be the first to realize it!) If you are unsure or stuck, don’t hesitate to call the DAQ oncall anytime (76600)