DAQ Andrea Petrucci 6 May 2008 – CMS-UCSD meeting OUTLINE Introduction SCX Setup Run Control Current Status of the Tests Summary.

Slides:



Advertisements
Similar presentations
GNAM and OHP: Monitoring Tools for the ATLAS Experiment at LHC GNAM and OHP: Monitoring Tools for the ATLAS Experiment at LHC M. Della Pietra, P. Adragna,
Advertisements

Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/27 A Control Software for the ALICE High Level Trigger Timm.
CMS Michele Gulmini, CHEP2003, San Diego USA, March Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.
VC Sept 2005Jean-Sébastien Graulich Report on DAQ Workshop Jean-Sebastien Graulich, Univ. Genève o Introduction o Monitoring and Control o Detector DAQ.
March 2003 CHEP Online Monitoring Software Framework in the ATLAS Experiment Serguei Kolos CERN/PNPI On behalf of the ATLAS Trigger/DAQ Online Software.
Data Acquisition Software for CMS HCAL Testbeams Jeremiah Mans Princeton University CHEP2003 San Diego, CA.
First operational experience with the CMS Run Control System Hannes Sakulin, CERN/PH on behalf of the CMS DAQ group 17 th IEEE Real Time Conference,
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Use of ROOT in the D0 Online Event Monitoring System Joel Snow, D0 Collaboration, February 2000.
2/10/2000 CHEP2000 Padova Italy The BaBar Online Databases George Zioulas SLAC For the BaBar Computing Group.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2012 Xavier Vilasis.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
CMS Michele Gulmini, Cern, DAQ Weekly 07/05/ RCMS – Plan of work Michele Gulmini DAQ Weekly 7th May 2002.
HTML+JavaScript M2M Applications Viewbiquity Public hybrid cloud platform for automating and visualizing everything.
Imperial College Tracker Slow Control & Monitoring.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
SCADA. 3-Oct-15 Contents.. Introduction Hardware Architecture Software Architecture Functionality Conclusion References.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
The Run Control and Monitoring System of the CMS Experiment Presented by Andrea Petrucci INFN, Laboratori Nazionali di Legnaro, Italy On behalf of the.
Clara Gaspar, October 2011 The LHCb Experiment Control System: On the path to full automation.
1 Alice DAQ Configuration DB
CMS Luigi Zangrando, Cern, 05/03/ RCMS for XDaq based small DAQ Systems M. Gulmini, M. Gaetano, N. Toniolo, S. Ventura, L. Zangrando INFN – Laboratori.
C.Combaret, L.Mirabito Lab & beamtest DAQ with XDAQ tools.
C. Combaret 14 jan 2010 SDHCAL DAQ status in lyon C. Combaret, for the IPNL team.
G. Maron, Agata Week, Orsay, January Agata DAQ Layout Gaetano Maron INFN – Laboratori Nazionali di Legnaro.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
ALICE, ATLAS, CMS & LHCb joint workshop on
Dynamic configuration of the CMS Data Acquisition Cluster Hannes Sakulin, CERN/PH On behalf of the CMS DAQ group Part 1: Configuring the CMS DAQ Cluster.
Upgrade of the CMS Event Builder Andrea Petrucci - CERN (PH/CMD) on behalf of the CMS DAQ group 19 th International Conference on Computing in High Energy.
7. CBM collaboration meetingXDAQ evaluation - J.Adamczewski1.
The new CMS DAQ system for LHC operation after 2014 (DAQ2) CHEP2013: Computing in High Energy Physics Oct 2013 Amsterdam Andre Holzner, University.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
CMS pixel data quality monitoring Petra Merkel, Purdue University For the CMS Pixel DQM Group Vertex 2008, Sweden.
Introduction CMS database workshop 23 rd to 25 th of February 2004 Frank Glege.
Online Software 8-July-98 Commissioning Working Group DØ Workshop S. Fuess Objective: Define for you, the customers of the Online system, the products.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
DEPARTEMENT DE PHYSIQUE NUCLEAIRE ET CORPUSCULAIRE JRA1 Parallel - DAQ Status, Emlyn Corrin, 8 Oct 2007 EUDET Annual Meeting, Palaiseau, Paris DAQ Status.
VMware vSphere Configuration and Management v6
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Gaetano Maron, CPT week, CERN, 16 April RCS Discussion.
CMS Luigi Zangrando, Cern, 16/4/ Run Control Prototype Status M. Gulmini, M. Gaetano, N. Toniolo, S. Ventura, L. Zangrando INFN – Laboratori Nazionali.
Source Controller software Ianos Schmidt The University of Iowa.
LNL 1 SADIRC2000 Resoconto 2000 e Richieste LNL per il 2001 L. Berti 30% M. Biasotto 100% M. Gulmini 50% G. Maron 50% N. Toniolo 30% Le percentuali sono.
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb The CMS Event Builder Demonstrator based on Myrinet.
Status & development of the software for CALICE-DAQ Tao Wu On behalf of UK Collaboration.
R. Fantechi. Shutdown work Refurbishment of transceiver power supplies Work almost finished in Orsay Small crisis 20 days ago due to late delivery of.
Clara Gaspar, April 2006 LHCb Experiment Control System Scope, Status & Worries.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Event Management. EMU Graham Heyes April Overview Background Requirements Solution Status.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
CMS Luigi Zangrando, Cern, 16/4/ Run Control Prototype Status M. Gulmini, M. Gaetano, N. Toniolo, S. Ventura, L. Zangrando INFN – Laboratori Nazionali.
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
CITA 171 Section 1 DOS/Windows Introduction. DOS Disk operating system (DOS) –Term most often associated with MS-DOS –Single-tasking operating system.
M. Caprini IFIN-HH Bucharest DAQ Control and Monitoring - A Software Component Model.
January 2010 – GEO-ISC KickOff meeting Christian Gräf, AEI 10 m Prototype Team State-of-the-art digital control: Introducing LIGO CDS.
Online Software November 10, 2009 Infrastructure Overview Luciano Orsini, Roland Moser Invited Talk at SuperB ETD-Online Status Review.
Online Data Monitoring Framework Based on Histogram Packaging in Network Distributed Data Acquisition Systems Tomoyuki Konno 1, Anatael Cabrera 2, Masaki.
Maintaining and Updating Windows Server 2008 Lesson 8.
20OCT2009Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2009 Xavier Vilasis.
EPS 2007 Alexander Oh, CERN 1 The DAQ and Run Control of CMS EPS 2007, Manchester Alexander Oh, CERN, PH-CMD On behalf of the CMS-CMD Group.
The Control and Hardware Monitoring System of the CMS Level-1 Trigger Ildefons Magrans, Computing and Software for Experiments I IEEE Nuclear Science Symposium,
Gu Minhao, DAQ group Experimental Center of IHEP February 2011
CMS High Level Trigger Configuration Management
Controlling a large CPU farm using industrial tools
CMS DAQ Event Builder Based on Gigabit Ethernet
Presentation transcript:

DAQ Andrea Petrucci 6 May 2008 – CMS-UCSD meeting OUTLINE Introduction SCX Setup Run Control Current Status of the Tests Summary

Introduction Started commissioning the Readout Builder at its full size Many people working together to get this done For the first time we have almost two full DAQ Slices to test For now tests are limited to two slices of ~640 PCs (rows A, B, E and F) Still in process of making experience with the installation and maintenance of a cluster of O(1000) PCs Also for the XDAQ software and Run Control it is the first time we work with ~1000 PCs communicating to each other 6-May-2007 Andrea Petrucci - UC San Diego2

06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego3 SCX layout RU 320 PCs : Row A with 2 rails (ru-c2a[1-4]-[1-20]) Row B with 2 rails (ru-c2b[1-4]-[1-20]) Row E with 2 rails (ru-c2e[1-4]-[1-20]) Row F with 4 rails (ru-c2f[1-4]-[1-20]) BU-FU 320 PCs : Row A with 2 rails (ru-c2a[5-8]-[1-20]) Row B with 2 rails (ru-c2b[5-8]-[1-20]) Row E with 2 rails (ru-c2e[5-8]-[1-20]) Row F with 2 rails (ru-c2f[5-8]-[1-20]) Row A and B are connected to 1 Force10 switch and row E and F to other. F E B A 2 Force 10

18-Sep-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego4 Used DummyRUs and ~200 FRLs (Tracker) for testing Different type of trapezoidal configurations: –1 slice with 4 rails (68 DummyRUs x 224 BUs) –1 slice with 4 rails (200 FRLs x 68 RUs x 24 BUs x 672 FUs) close to the final slice –4 slices with 2 rails (per slice: 32 RUs x 47 BUs x 147 FUs ) –8 slices with 2 rails (per slice: 32 RUs x 47 BUs x 147 FUs ) A lot of different activities are going on in parallel: –System and software installation/update –System monitoring optimization –… Testing the first slice –The XDAQ installation is XDAQ build 6 –Monitoring system (slp, sentinel, …) is enabled During last months the system was down many times and it takes some time to set up. SCX Setup

06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego5 RU Builder Slices

DAQ Software Installation All the DAQ software installation is managed by a central Quattor server. Quattor is a system administration toolkit providing a powerful, portable and modular tool suite for the automated installation, configuration and management of clusters and farms running Linux. Quattor allows to re-install a pc in few minutes. There are different Quattor templates for each type of PC: –RU and BUFU PCs –Run Control PCs –FRL and FMM PCs –Etc… All the DAQ software developers had put a lot effort to Quattorize their software (RPM). 06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego6

06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego7 A DAQ Configuration contains One XML configuration file per XDAQ executive –Including Myrinet FED-Builder configuration –including O(100000) I2O connections –Up to several 100 MB of XML Control structure –Hierarchy of function managers –Executives and Applications to be controlled Central DAQ System Currently O(1000) hosts –~10% controlling custom hardware O(10000) XDAQ applications electronics channels 40 MHz 100 Hz DAQ Configurator

06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego 8 HWCfg Database EQSet FBSet DPSet Hardware Configuration APISoftware Template API Software Template DB RS3 RS API CMS DAQ Configurator SWTemplate GUI Configurator GUI Configurator API Fill DB 4 Manage/create Software Templates 2 Create FEDBuilderSets & DAQPartitionSets 1 Select DAQPartition (Hardware Structure) & Software Template 3 5 Load configuration and configure the system JAVA Fillers DAQ Configurator Data Flow

06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego9 RCMS is integrated in the general CMS DAQ system, providing control and monitor of the two other components: the DAQ components that have the task to manage the main data flow. They include the Front End Drivers (FED), the Readout Units (RU), the Builder Unit (BU), the Filter Unit (FU), the trigger and data flow control system. the “Detector Control System” DCS, managing the slow controls of the whole experiment The XML data format and the W3C standard SOAP protocol have been adopted as the main means for communication. XDAQ is a C++ framework for a distributed Data Acquisition System, implements: –configuration (parameterization) –communication over multiple network technologies concurrently –high-level provision of system services (memory management, tasks,...) Run Control and Monitor System

06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego10 –SECURITY SERVICE login and user account management; –RESOURCE SERVICE (RS) information about DAQ resources and partitions; –INFORMATION AND MONITOR SERVICE (IMS) Collects messages and monitor data; distributes them to the subscribers; –JOB CONTROL Starts, monitors and stops the software elements of RCMS, including the DAQ components; RCMS Services

06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego11 Collects log information from log4j compliant applications (i.e. on-line process). … Publish Subscriber System Storage System Log Collector Relational DB Oracle,MySQL Access via JDBC Access via TCP RCMS applications and XDAQ applications Send log information directly to a Display System (Chainsaw). Stores log information in a database and visualizes them (LogDBViewer). Logging System

06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego12 Web Browser (GUI) Level 0 FM Level 1 FM Level 2 FM User interaction with Web Browser connected to Level 0 FM. Level 0 FM is entry point to Run Control System. Level 2 FMs are sub-system specific custom implementations. Level 1 FM interface to the Level 0 FM and have to implement a standard set of inputs and states. TOP LTC CSCDAQ RPCDT TRK ECAL HCAL FBRBFF Resources FECFED Resources are on-line system components Function Managers Control Structure

Run Control GUIs 06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego13 1) RCMS GUI 2) Function Manager Level Zero GUI 3) FED and TTS GUI

Tests & Measurements DAQ System 06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego14 GOALS Understand problems to run big DAQ system: Reliability, scalability and monitoring system. Measurements: Comprehend if the performances of the system are acceptable. TESTED CONFIGURATIONS Different configurations have been tested: A.68 dummy RUs x 224 BUs 4 rail from the RUs and 2 rail to the Bus. B.68 dummy RUs x 224 Bus x 672 FUs 4 rail from the RUs and 2 rail to the Bus. C.8 Slices with GTPe and ~200 FRLs, per slice: 32 RUs x 47 BUs x 147 FUs (CMSSW locally). D.4 Slices with GTPe and ~100 FRLs, per slice: 32 RUs x 47 BUs x 147 FUs (CMSSW NFS). The test B should perform almost the same as the final slice configuration (72 RU x 288 Bus x 864 FUs) Create, Initialize, Connect, Configure, Get Ready, Start, Stop, Destroy For these tests I create a Java stand-alone application. It controls the Level Zero FM over the following commands:

Test A: Only EVB 06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego15 CreateInitializeConnectConfigureGet ReadyStartStopDestroy MAX 2,54643,77115,4653,3681,32127,5064,0249,317 MIN 0,61419,9496,8521,7770,79715,9821,4635,779 AVERAGE 1,41926,3618,1531,9780,99117,1682,1086,842 AVEDEV 0,4192,7810,9600,1560,0980,8370,3750,526 N. FAILED Setup parameters: –Dummy events are created in the BUs in generation mode. –1 Slice with 1x1 FED Builders and events are dropped at BUs. –68 dummy RUs x 224 BUs 4 rail from the RUs and 2 rail to the Bus. –Used row E and F (~ 320 PCs). –Controlled 293 XDAQ executives and 585 XDAQ Applications (ATCPs, EVM, RUs and Bus). –XDAQ Monitor Application enabled. –50 iterations of measurement loop (Create, Initialize, Connect, Configure, Get Ready, Start, Stop and Destroy). Results: –RU Throughput at 16, 32 kByte fragment size: ~480 MB/s.

Test B: EVB & Filter Farms 06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego16 Setup parameters: –Dummy events are created in the BUs in generation mode. –1 Slice with 1x1 FED Builders and events are dropped at FUs. –68 dummy RUs x 224 BUs 4 rail from the RUs and 2 rail to the Bus. –3 FUs per BU and 1 Storage Manager. –Used row E and F (~ 320 PCs). –Controlled 965 XDAQ executives and 1539 XDAQ Applications (ATCPs, EVM, RUs, BUs, FUResourceBrokers and FUEventProcessors ). –All libraries was loaded from local disk. –XDAQ Monitor Application enabled. –100 iterations of measurements loop (Create, Initialize, Connect, Configure, Get Ready, Start and Destroy). Results: –Could not reach running state because Filter farm applications crashed. CreateInitializeConnectConfigureGet ReadyStartDestroy MAX 48,658132,87517,10717,0892,557Error31,220 MIN 2,40455,0627,33013,1570,881Error21,949 AVERAGE 4,48762,19511,39914,0201,072Error24,675 AVEDEV 2,0115,9472,4360,7580,072Error1,452 N. FAILED

Test C: all system with 8 Slices 06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego17 CreateInitializeConnectConfigureGet ReadyStartStopDestroy MAX 1,44091,49811,79562,7971,50237,20190,90640,907 MIN 0,40364,6097,66938,9361,15528, ,154 AVERAGE 0,47571,3408,58942,0281,23531, ,165 AVEDEV 0,0703,3601,0411,5910, ,703 N. FAILED Setup parameters: –Events are generated in ~200 FRLs and used GTPe. –8 Slice with 8x8 FED Builders and events are sent to the Storage Manager. –2 rail from the RUs and the BUs. –Per Slice: 32 RUs x 47 BUs x 147 FUs. –Used rows A,B, E and F (~ 640 PCs) for Event Builder and Filter Farm. –Controlled 1976 XDAQ executives and 3202 XDAQ Applications (ATCPs, FRLs, EVM, RUs, Bus, FUResourceBrokers, FUEventProcessors and Storage Managers). –XDAQ Monitor Application enabled and all libraries was loaded from local disk. –83 iterations of measurement loop (Create, Initialize, Connect, Configure, Get Ready, Start, Stop and Destroy). Results: –240 MB/s throughput all the way to the Storage Manager disk (event size 480k)

06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego18 Test D: all System 4 Slices CreateInitializeConnectConfigureGet ReadyStartStopDestroy MAX 5,668174,59211,20333,9583,17628,23441,34561,813 MIN 0,92386,5047,93530,0271,19825,11337,39838,825 AVERAGE 1,463105,8809,59631,0301,41525,92838,42343,572 AVEDEV 0,38418,8960,5660,7650,1510,6200,6413,425 N. FAILED Setup parameters: –Events are generated in ~100 FRLs and used GTPe. –4 Slice with 4x4 FED Builders and events are sent to the Storage Manager. –2 rail from the RUs and the BUs. –Per Slice: 32 RUs x 47 BUs x 147 FUs. –Used rows E and F (~ 320 PCs) for Event Builder and Filter Farm. –Controlled 988 XDAQ executives and 1601 XDAQ Applications (ATCPs, FRLs, EVM, RUs, Bus, FUResourceBrokers, FUEventProcessors and Storage Managers). –XDAQ Monitor Application enabled and Filter Farm libraries was loaded from NFS. –100 iterations of measurement loop (Create, Initialize, Connect, Configure, Get Ready, Start, Stop and Destroy). Results: –The system is getting slower if we load libraries from NFS and less reliable.

Tests summary 06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego19 TotalCreateInitializeConnectConfigure Get ReadyStart A Only EVB (~320 PCs) 55,079 1,41926,3618,1531,9780,991 17,168 B EVB+FF (~320 PCs) - 4,48762,19511,39914,0201,072- C 8 slices (~640 PCs) 154,7390,47571,3408,58942,0281,23531,547 D 4 slices NFS (~320 PCs) 175,3121,463105,8809,59631,0301,41525,928 Performance: –Configuration B (close to final slice): –Reasonable time to initialize, connect and configure. –Configuration C: –The system scales well. –Configuration D: –The system loses performance if it loads library from NFS disk ( ~ 2 times slower).

Problems during the tests 06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego20 Problems observed during the tests: –~15% times the system failed to initialize. The XDAQ executive could not start because the HTTP address was already in use. Also the ATCP application had the same problem. FIXED: It was enough to set the XDAQ HTTP port outside the UNIX Ephemeral port range to solve the problem. –The system could not reach running state because of a fault (segmentation fault) between the communication with BU and FUResourceBroker. FIXED: A bug was found and it is fixed with CMSSW version –The system gets stuck in configuring state ~5% times. It is reproducible only with big system (8 slices and all rows A,B,E and F). Working in progress: the problem seems to be in the RunControl Framework. –The system fails to start (~5% times) and stop (~40% times). Working in progress: DAQ function managers need to be improved. –The XDAQ monitor system has a latency between 2 or 3 minutes. Working in progress: XDAQ developers are working to improve it.

ATCP application 06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego21 Reasonable time to connect all the sockets (max. 15 sec. for 1 slice) Solved the problem of the “address already in use” when starting the listening socket. Created a new HyperDAQ interface: Added “Standard configuration” parameters. Added “debug” page. Integrated to XDAQ monitor system.

06-May-2007 – CMS-UCSD Meeting Andrea Petrucci - UC San Diego22 Summary RU Builder Commissioning –First time used a RU Builder configuration almost the same as the final slice It seems to work fine at 20 kHz per slice and a maximum throughput on the RUs of ~480 MB/s –FUs and monitor system applications are included –Reasonable time to initialize and start the system –Some things are not yet understood (ex. fails to start and stop) Main worries are system instabilities –Cooling and its monitoring –Power cuts –Quattor installation –System configuration –Difficulties issuing the commands on many PCs at the same time