CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.

Slides:

Advertisements

Similar presentations

1 Chapter 11: Data Centre Administration Objectives Data Centre Structure Data Centre Structure Data Centre Administration Data Centre Administration Data.

Advertisements

6/4/20151 Introduction LHCb experiment. LHCb experiment. Common schema of the LHCb computing organisation. Common schema of the LHCb computing organisation.

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

ACAT 2002, Moscow June 24-28thJ. Hernández. DESY-Zeuthen1 Offline Mass Data Processing using Online Computing Resources at HERA-B José Hernández DESY-Zeuthen.

CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.

CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/27 A Control Software for the ALICE High Level Trigger Timm.

K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.

Clara Gaspar, May 2010 The LHCb Run Control System An Integrated and Homogeneous Control System.

L. Granado Cardoso, F. Varela, N. Neufeld, C. Gaspar, C. Haen, CERN, Geneva, Switzerland D. Galli, INFN, Bologna, Italy ICALEPCS, October 2011.

Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',

Control and monitoring of on-line trigger algorithms using a SCADA system Eric van Herwijnen Wednesday 15 th February 2006.

First year experience with the ATLAS online monitoring framework Alina Corso-Radu University of California Irvine on behalf of ATLAS TDAQ Collaboration.

DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.

Virtual Organization Approach for Running HEP Applications in Grid Environment Łukasz Skitał 1, Łukasz Dutka 1, Renata Słota 2, Krzysztof Korcyl 3, Maciej.

LHCb Quarterly Report October Core Software (Gaudi) m Stable version was ready for 2008 data taking o Gaudi based on latest LCG 55a o Applications.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2012 Xavier Vilasis.

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

7/2/2003Supervision & Monitoring section1 Supervision & Monitoring Organization and work plan Olof Bärring.

SCADA. 3-Oct-15 Contents.. Introduction Hardware Architecture Software Architecture Functionality Conclusion References.

Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Clara Gaspar, October 2011 The LHCb Experiment Control System: On the path to full automation.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.

Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.

1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.

4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.

Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Clara Gaspar, March 2005 LHCb Online & the Conditions DB.

Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.

CHEP March 2003 Sarah Wheeler 1 Supervision of the ATLAS High Level Triggers Sarah Wheeler on behalf of the ATLAS Trigger/DAQ High Level Trigger.

CHEP 2006, February 2006, Mumbai 1 LHCb use of batch systems A.Tsaregorodtsev, CPPM, Marseille HEPiX 2006, 4 April 2006, Rome.

Clara Gaspar, July 2005 RTTC Control System Status and Plans.

Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.

Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.

Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)

CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.

DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.

Clara Gaspar, April 2006 LHCb Experiment Control System Scope, Status & Worries.

Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Alignment in real-time in current detector and upgrade 6th LHCb Computing Workshop 18 November 2015 Beat Jost / Cern.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

DAQ & ConfDB Configuration DB workshop CERN September 21 st, 2005 Artur Barczyk & Niko Neufeld.

Maria del Carmen Barandela Pazos CERN CHEP 2-7 Sep 2007 Victoria LHCb Online Interface to the Conditions Database.

M. Caprini IFIN-HH Bucharest DAQ Control and Monitoring - A Software Component Model.

Clara Gaspar, February 2010 DIM A Portable, Light Weight Package for Information Publishing, Data Transfer and Inter-process Communication.

SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,

20OCT2009Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2009 Xavier Vilasis.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.

Stefan Koestner IEEE NSS – San Diego 2006 IEEE - Nuclear Science Symposium San Diego, Oct. 31 st 2006 Stefan Koestner on behalf of the LHCb Online Group.

DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.

Real Time Fake Analysis at PIC

CMS High Level Trigger Configuration Management

Controlling a large CPU farm using industrial tools

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

An Introduction to Computer Networking

The LHCb Run Control System

Philippe Vannerem CERN / EP ICALEPCS - Oct03

IEEE - Nuclear Science Symposium San Diego, Oct. 31st 2006

The LHCb High Level Trigger Software Framework

The Performance and Scalability of the back-end DAQ sub-system

Presentation transcript:

CHEP 2012 – New York City 1

 LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware based – 40MHz >1MHz ◦ Second Level (HLT) – Software based – 1MHz >5KHz  ~1500 Linux PCs  cores  Outside data taking period: ◦ Little or no usage of the HLT Computer system  Low efficiency 2

50 GB/s 250 MB/s Average event size 50 kB Average rate into farm 1 MHz Average rate to tape 5 kHz Subfarms 3

 DIRAC System (Distributed Infrastructure with Remote Agent Control) ◦ specialized system for data production, reconstruction and analysis ◦ produced by HEP experiments ◦ follows the Service Oriented Architecture  4 categories of components:  Resources - provide access to computing and storage facilities  Agents - independent processes to fulfill one or several system functions  Services - help to carry out workload and data management tasks  Interfaces - programming interfaces (APIs) 4

 DIRAC Agents: ◦ light and easy to deploy software components ◦ run in different environments ◦ watch for changes in the services and react:  job submission  result retrieval ◦ can run as part of a job executed on a Worker Node  Called “Pilot Agents” 5

 Workload Management System (WMS) ◦ Supports the data production ◦ Task Queue  Pilot Agents ◦ Deployed close to the computing resources ◦ Presented in a uniform way to the WMS ◦ Check operational environment sanity ◦ Pull the workload from the Task Queue ◦ Process the data ◦ Upload the data 6

 Requirements ◦ Allocate HLT subfarms to process data  Interface Experiment Control System  Does not interfere with normal data acquisition ◦ Manage the start/stop of offline data processing ◦ Balance workload on the allocated nodes ◦ Easy User Interface 7

 Infrastructure ◦ Composed of a Linux PVSS PC  LHCb has a private network ◦ HLT Worker Nodes have private addresses ◦ HLT Worker Nodes not accessible from outside LHCb ◦ Masquerade NAT deployed on the nodes which have access to the LHC Computing Grid (LCG) network ◦ With masquerade NAT HLT Nodes are able to access data from DIRAC 8

Control PC HLT Computer Farm DIRAC Servers CERN Storage (CASTOR) LHCb Network LCG Network NAT Masquerading 9

 PVSS ◦ Main SCADA System at CERN and LHCb ◦ Provide the UI ◦ Provides global interface to communications layer  Communication Layer ◦ Provided by DIM (Distributed Information Management)  Based on the publish/subscribe paradigm  Servers publish commands and services  Clients subscribe them 10

 Farm Monitoring and Control (FMC) ◦ Tools to monitor and manage several parameters of the Worker Nodes ◦ Task Manager Server  Runs on each of the HLT nodes  Publishes commands and services via DIM  Starts/stops processes on nodes remotely  Attributes to each started process a unique identifier (UTGID) 11

 DIRAC Script ◦ Launches a Pilot Agent  Queries DIRAC WMS for tasks  Downloads data to local disk and processes it locally  Uploads data to CERN Storage (CASTOR)  During execution sends information to DIRAC 12

 PVSS Control Manager ◦ Connects/disconnects monitoring of the worker nodes ◦ Manages startup of DIRAC Agents ◦ Balances the load on the allocated farms ◦ Monitors the connection of the required DIM services 13

 Finite State Machine (FSM) ◦ Control System interfaced by a FSM ◦ Interfaces PVSS ◦ Composed of:  Device Units (DUs) – model real devices in the control tree  Control Units (CUs) – group DUs and CUs in logical useful segments 14

 FSM Tree HLTA11 Farm Allocator Farm Node 01 Farm Node NN ONLDIRAC HLTA HLTA01 Farm Allocator Farm Node 01 Farm Node NN HLTF7 Farm Allocator Farm Node 01 Farm Node NN HLTF HLTF01 Farm Allocator Farm Node 01 Farm Node NN … …… …… Control Unit Device Unit Commands States 15

 How it works ◦ Allocate subfarm(s) ◦ Set Nodes to ‘RUN’ (GOTO_RUN) ◦ DIRAC Script is launched on the worker nodes  Delay between launches (DB connections management) ◦ Scripts are launched according to pre-defined rules for load balancing ◦ Variable delay between process launches  No jobs -> longer delays 16

 Granularity: ◦ Subfarm – a whole subfarm needs to be allocated ◦ Can define only some nodes on the farm to process data  CPU checks ◦ Check what type of CPU is available on the node  Set max number of processes accordingly ◦ Set max number of nodes independently  Information exchange with DIRAC ◦ Processing state information available only on the DIRAC system ◦ Job availability evaluated by agent process duration 17

 User Interface (UI) ◦ Developed in PVSS ◦ Coherent look and feel with the LHCb Control Software ◦ Use of synoptic widgets ◦ Provides simple statistics:  Number of allocated farms/nodes  Number of agents running  Agents running time 18

 UI 19 FSM Control Agents running on sub-farm nodes Agents monitoring on DIRAC

 Efficiency of online farm usage improved  Interfaced with DAQ ◦ Does not interfere with Data Acquisition needs  Resource balancing ◦ Processing is balanced according to pre-defined rules  Easy adoption ◦ Maintains a coherent Look and Feel with other LHCb control software 20

21

 State Diagrams Farm Node DU FSM Subfarm CU FSM 22 UNAVAILABLE RECOVER START STOP RECOVER START STOP ALLOCATE DEALLOCATE IDLESTOPPINGRUNNINGSTOPPINGERRORRUNNINGERROR NOT_ALLOCATED ALLOCATED