ATLAS ONLINE MONITORING
FINISHED! Now what? How to check quality of the data?!! DATA FLOWS!
For slide 2: Detector fully installed with all components and all wires, You want to turn it on... How to know if your detector is doing what you want it to do. Is it registering hits, is it sending data out?
Online Monitoring: Surveillance of the data and its quality Why monitor? To be able to discover in a relatively easy way that the detector is not behaving as it should and take action Examples: detector modules that do not work, synchronization problems btw SCT and TRT, efficiencies sink....
Monitors quality of data Provide checks for the shifter so that data quality problems can be fixed in an early stage. Provide automatic alarms/checks Especially important in the beginning of an experiment for efficiently and quickly solving runtime problems For example: if a certain detector segment has a large noise occupancy or is not producing data the run control can do reconfiguration of the module PURPOSE OF ONLINE MONITORING
HISTOGRAM REPOSITORY (HISTOGRAM DISPLAY) PUBLISH OHP Monitoring display SUBSCRIBE UPDATE HISTO DQM Monitoring display I see tracks in the SCT! OHP Tools for retrieving data from dataflow booking, filling and publishing histograms
For slide 6: Dataflow – Yuriy Partition – sub-detector included in TDAQ Configuration files – tool makes this produces xml file fed into TDAQ Then your detector can run As data flows from the detector specially written tools that work in parallell with the dataflow retrieve information, fill histos & sends this to a repository. Application such as OHP, DQMF subscribe to this information and histograms and display them so that shifters can examine their quality M5 experience M6 first time the Inner Detector as a composite system was included in TDAQ and monitored
Readout System ROB,ROS,SFI,SFO, … LVL2/EF Tier 0 Calibration FARM Gatherer Mon Intelligent Monitoring Display Experts Archiver Data Quelity Assessment Alarms notifications DQMF Slow Ctrl. DBS Var. Ref. DBS Monitoring DBS Data Qual. DBS Monitoring Scheme Mon LVL1 Shifter displays OHP Gatherer
Where and what is monitored Detector monitoring: DCS: detector hardware status and conditions Online Monitoring ROD Crate, ROS: data quality and integrity Event Builder: correlation between sub-detectors, consistency of LVL1 information Trigger Monitoring LVL1, LVL2: sample rejected events to check the trigger decision Event Filter: information attached to a sub-set of accepted and rejected events DAQ monitoring: ROS, EB: operational monitoring (buffer occupancies, throughput, s/w and h/w status, errors, etc.)
For slide 8: Grab data from all stages of dataflow Gather info from several detector to fill histo Send to databases Display
For slide 11: Several ways to monitor ID uses Athena Monitoring Tools Offline Athena tools are used online via Athena Processing Tools Connects the offline tools to the online detector data Then other services provide the connection between the histogram algorithm and connection to histogram repository – histogram service
Slide 9,10,11,12 Maybe just use the simpler graphical slide 11? Detector divided into sub-detectors, Inner detector Pixel, SCT, TRT Parts of the detector or the whole detector can run Data is read out in a hierarchical manner Each sub-detector has their own Read Out Drivers Read Out Servers receive output from all RODS etc. The services providing online information and histograms get input from many stages of the dataflow from early ROD level to SFI where full events are available This is a separate system parallel to the dataflow and does not interfere. Have own processors and workstations to handle online monitoring
Online Histogram Presenter (OHP) Atlas Data Quality Framework (DQMF) Online Histogram Display (OH) Trigger Monitoring Online Monitoring applications Trigger OHPDQM OH
Monitoring Displays: Online Histogram Presenter (OHP) OHP displays already existing histograms Displayed when histos are updated, i.e. when the detector is running and histos are published OHP subscribes to one or several histograms (or even all) and displays them online If the detector is not running a message will be displayed saying that no histogram is available
Histograms - online monitoring Each detector or detector-subsystem expert has written tools for retrieving data from detector and filling histograms Global Inner Detector uses 7 tools for Athena PT Responsible: Arshak Tonoyan& Heidi Sandaker monitoring of LVL1, BCIDs matching of ID segments at TRT/SCT and SCT/Pixel boundary surface ID noise occupancies monitoring of no. of hits, residuals on combined tracks matching parameters of top and bottom tracks, SCT&TRT extrapolation of SCT segments to TRT: TRT straw efficiencies, residuals, no. of TRT hits on extrapolated track TRT straw efficiencies Produces many histograms available in Online Histogram Display – only 10 allowed for each sub-detector in Online Histogram Presenter
OHP
What will a shifter do Look at histograms Check if they are as they should Are the histograms actually filling Do they look like the reference-histograms If not check with the relevant detector desk Inform about what you see is it ok? Maybe a module should be masked off Maybe a detector-module needs to be reconfigured Maybe the run even has to stop (seldom!)?
OHP example: SCT Hits for all tracks in the event THE NUMBER OF HITS PER ALL TRACKS SHOULD BE COMPARED WITH A REFERENCE HISTO
SCT + TRT example: The BCIDs of the SCT and TRT VALUE IS 1 IF EVERYTHING IS OK THEN SCT & TRT ARE READING SAME EVENT! ROD ID DIFFERENT BCID BETWEEN SCT AND TRT - SOMETHING IS WRONG
Online display: Data Quality Monitoring (DQM) Automatic checks on data quality using predefined algorithms on histograms or counters etc. Input from Online Histogram Service Can also be done offline using ROOT files Puts flags or alarms if values are off expected limits or e.g. histograms are empty States are green, yellow, red States per subsystem but also overall state
Milestone 6 (M6) Cosmic Run Global Inner Detector monitoring included for the first time Basic functionalities fully working Looked at the online histograms Did offline work – reported results Work will continue to improve histograms what they display, how and which to choose
CONCLUSIONS Monitoring is crucial for good data-taking Several monitoring applications exist and are working M6 was first time the Global Inner Detector was fully tested Doing shift-work really lets you learn a lot about ATLAS and the jungle of software that is out there
MORE INFO, not included in talk
My contribution to the inner- detector monitoring In addition to doing this work partially from Oslo and partially at CERN, I have taken online monitoring shifts Looked at the online monitoring histograms, and did some offline “analysis” looking into number of tracks registered in the Inner detector as a whole (only SCT and TRT included in M6) and SCT and TRT separately and reported this in a spontaneously organized track-number meeting Reported Inner Detector achievements for Heidi at a weekly SCT Commisioning meeting A great chance to see how everything is working and to learn a lot about the detector and how such a huge experiment works
My contribution to the inner- detector monitoring Helped including inner-detector into TDAQ system – to enable to read data from the inner-detector. Done by several tools; output xml files – these are the configuration files. Config files fed into the TDAQ application - when the run- controllers start taking data our detector will show up in the application and will be sending data Another task: including histograms from the inner-detector into OHP. Yet another xml file had to be written correctly in order to subscribe to the histograms which are already produced and filled when the detector turns on For M6 all the basic functionalities were working for the first time!
Atlas Online Monitoring System Highly distributed, different processes run on different machines at once Low level: hardware states, noisy modules, dead channels, electronic mappings, data quality e.g. Synchronization btw different sub-detectors High-level (Athena): run on full events, check physics quantities momentum, spin etc. Both produce histograms presented in ROOT TDAQ (Trigger and data quality ) group provide useful services e.g.: Online Histogram Service OHS provides input to OHP (Online Histogram Presenter)
Monitoring TDR Input retrieved from: For hardware, the electronics are monitored from subdetector front-ends by the Detector Control System (DCS) DataFlow elements from RODs, ROSs, SFIs, SFOs done at operational level Output sent to end-user that means shifter such as me or you, via histograms, flags, values etc First easy monitoring access RODs Digital Signal Processor installed directly on the ROD board. The CPUs on the ROD crate will send the histograms to the end-user.
Monitoring TDR ROS next level Larger regions can be monitored (than from RODs) When info from several detectors is needed Do after Event Builder level SFI first place where fully built events are available
Monitoring TDR Dataflow ROD crate DAQ all software and hardware that configures, controls, monitors 1 or more ROD crate systems independently on the rest of the data flow. ROS
Online Monitoring with Athena AthenaPT (PT: processing task) tool to integrate TDAQ HLT and Athena software components in one environment Takes care of interface between offline athena and the online needs. Each sub-detectors expert writes algorithms for filling histograms with the needed information, f.ex. For innerdetector fill histograms with noise occupancy for the SCT versus TRT AthenaMonitoring package framework takes care of producing histograms in Athena The interface between the detector and the Histogram repository is already taken care of Running Athena algorithm in PT can mean either online mode with DAQ & OKS partition
Monitoring TDR Online Software Used to configure, control and monitor the TDAQ system Responsible for synchronizing involved sub- systems during start-up and shut-down Online software distiguishes various types of users: TDAQ Operator: runs TDAQ system in operating room during data-taking period TDAQ Expert: Can perform changes to configuration has system internal knowledge Sub-system expert, Detector expert responsible for operation of particular sub-system or detector
Information Service: Deployment DAQ Workstation DAQ Control Workstation IS Server ROS EB DAQ Application DAQ Application insert update remove notify subscribe ROD Crate DAQ Application ROD Crate DAQ Application ROD Crate DAQ Application DAQ Application DAQ Control Workstation IS Server DAQ Workstation IS Server get_value STOLEN FROM: Serguei Kolos CERN/PNPI ATLAS Trigger/DAQ Online Software group OHP DQMF
Data Storage Event Filter (EF) Event Builder (EB) HLT/DAQ/DCS system Detector Control System (DCS) LVL2 Trigger Read Out Systems (ROSs) Online Software: Configure Control Monitoring PixelTileCalLArMDTCSCSCTTRT Calorimeter Inner Detector Muon Spectrometer RPC TGC RODs STOLEN FROM: Serguei Kolos CERN/PNPI ATLAS Trigger/DAQ Online Software group
Forward SCT TRT Pixel Detectors Barrel SCT 6 m 2 m
Inner Detector – Silicon Pixel detector (Pixel) Cabling is finishing these days – SemiConductor Tracker(SCT) Barrell and endcap fully installed ~1/2year ago – Transition Radiation Tracker (TRT) TRT fully installed ~year ago SCT and TRT were tested together on the surface about a year ago First cosmic test with SCT were last fall
ATLAS Detector Layout Each partition may be operated independently Some partitions may be operated in parallel ~1000 Read-Out Drivers (RODs) in ~100 VME crates 33 sub-detector Partitions PixelTileCalLArMDTCSCSCTTRT Calorimeter Inner Detector Muon Spectrometer RPC TGC STOLEN FROM: Serguei Kolos CERN/PNPI ATLAS Trigger/DAQ Online Software group
Monitoring Services responsible for routing information and histograms IS: Main task: transport monitoring data request from monitoring destination to monitoring sources and to transport monitoring data back from the sources to the destinations Histogramming Service: a specialization of the IS used to transport histograms Allows different applications to exchange histograms Has a user-interface via application called Histogram Display Error Reporting Service Event Monitoring Service: transports physics events or fractions of events Event sampler
TDAQ: Trigger and Data Acquisition System HLT:High Level Trigger EF: Event Filter DF: Data Flow software EDF:Event Data Flow PT:Processing Task ROD: Read Out Driver ROS: Read Out Subsystem RODC:Read Out Driver Crate ROC: Read Out Crate EMON: Atlas Event Monitoring – sampler system ACRONYMS
Atlantis:Atlas Standalone Event Display Athena: Atlas Offline software framework DQM: Data Quality Monitoring DQMF: Atlas Data Quality Framework OH: Online Histogramming service IS: Information Service DCS: Detector Control System SFI: SubFarm Input SFO: SubFarm Output DSP:Digital Signal Processor OKS: Object Kernel Support Library to support simple in-memoryobject manager. Suitable for real-timeobject manager for e.g. Data Acquisition ACRONYMS