Peter Chochula.  DCS architecture in ALICE  Databases in ALICE DCS  Layout  Interface to external systems  Current status and experience  Future.

Slides:



Advertisements
Similar presentations
The Detector Control System – FERO related issues
Advertisements

CWG10 Control, Configuration and Monitoring Status and plans for Control, Configuration and Monitoring 16 December 2014 ALICE O 2 Asian Workshop
MUNIS Platform Migration Project WELCOME. Agenda Introductions Tyler Cloud Overview Munis New Features Questions.
André Augustinus ALICE Detector Control System  ALICE DCS is responsible for safe, stable and efficient operation of the experiment  Central monitoring.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
June 23rd, 2009Inflectra Proprietary InformationPage: 1 SpiraTest/Plan/Team Deployment Considerations How to deploy for high-availability and strategies.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Peter Chochula, January 31, 2006  Motivation for this meeting: Get together experts from different fields See what do we know See what is missing See.
Supervision of Production Computers in ALICE Peter Chochula for the ALICE DCS team.
Test Systems Software / FEE Controls Peter Chochula.
Clara Gaspar, May 2010 The LHCb Run Control System An Integrated and Homogeneous Control System.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Peter Chochula.  DAQ architecture and databases  DCS architecture  Databases in ALICE DCS  Layout  Interface to external systems  Current status.
N-Tier Architecture.
Load Test Planning Especially with HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Maintaining a Microsoft SQL Server 2008 Database SQLServer-Training.com.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Robert Gomez-Reino on behalf of PH-CMD CERN group.
These materials are prepared only for the students enrolled in the course Distributed Software Development (DSD) at the Department of Computer.
Module 12: Designing High Availability in Windows Server ® 2008.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Summary DCS Workshop - L.Jirdén1 Summary of DCS Workshop 28/29 May 01 u Aim of workshop u Program u Summary of presentations u Conclusion.
JCOP Workshop September 8th 1999 H.J.Burckhart 1 ATLAS DCS Organization of Detector and Controls Architecture Connection to DAQ Front-end System Practical.
European Organization for Nuclear Research LHC Gas Control System Applications G.Thomas, J.Ortola Vidal, J.Rochez EN-ICE Workshop 23 April 2009.
Update on Database Issues Peter Chochula DCS Workshop, June 21, 2004 Colmar.
Peter Chochula ALICE DCS Workshop, October 6,2005 DCS Computing policies and rules.
Peter Chochula CERN/ALICE 1. 2 Experiment Size: 16 x 26 metres (some detectors placed >100m from the interaction point) Mass: 10,000,000 kg Detectors:
Peter Chochula and Svetozár Kapusta ALICE DCS Workshop, October 6,2005 DCS Databases.
DCS Workshop - L.Jirdén1 ALICE DCS PROJECT ORGANIZATION - a proposal - u Project Goals u Organizational Layout u Technical Layout u Deliverables.
André Augustinus 10 September 2001 DCS Architecture Issues Food for thoughts and discussion.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
Module 13 Implementing Business Continuity. Module Overview Protecting and Recovering Content Working with Backup and Restore for Disaster Recovery Implementing.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
ALICE, ATLAS, CMS & LHCb joint workshop on
André Augustinus 21 June 2004 DCS Workshop Detector DCS overview Status and Progress.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
Peter Chochula ALICE Offline Week, October 04,2005 External access to the ALICE DCS archives.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
14 November 08ELACCO meeting1 Alice Detector Control System EST Fellow : Lionel Wallet, CERN Supervisor : Andre Augustinus, CERN Marie Curie Early Stage.
European Organization for Nuclear Research LHC Gas Control System Applications Generation to Deployment phases Strategy/Principles.
Summary of User Requirements for Calibration and Alignment Database Magali Gruwé CERN PH/AIP ALICE Offline Week Alignment and Calibration Workshop February.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
The DCS Databases Peter Chochula. 31/05/2005Peter Chochula 2 Outline PVSS basics (boring topic but useful if one wants to understand the DCS data flow)
First discussion on MSS for Katrin March 26, 2013 M.Capeans CERN PH-DT.
Database Issues Peter Chochula 7 th DCS Workshop, June 16, 2003.
ATLAS The ConditionDB is accessed by the offline reconstruction framework (ATHENA). COOLCOnditions Objects for LHC The interface is provided by COOL (COnditions.
Office 365 is cloud- based productivity, hosted by Microsoft. Business-class Gain large, 50GB mailboxes that can send messages up to 25MB in size,
M. Caprini IFIN-HH Bucharest DAQ Control and Monitoring - A Software Component Model.
ALICE. 2 Experiment Size: 16 x 26 meters (some detectors >100m away from IP) Weight: 10,000 tons Detectors: 18 Magnets: 2 Dedicated to study of ultra.
A Solution for Maintaining File Integrity within an Online Data Archive Dan Scholes PDS Geosciences Node Washington University 1.
«The past experience with JCOP in ALICE» «JCOP Workshop 2015» 4-5 November 2015 Alexander Kurepin.
DCS Status and Amanda News
WP18, High-speed data recording Krzysztof Wrona, European XFEL
N-Tier Architecture.
PVSS Evolution in Relation to Databases
ProtoDUNE SP DAQ assumptions, interfaces & constraints
The LHCb Run Control System
SpiraTest/Plan/Team Deployment Considerations
Juan Carlos Cabanillas Noris September 29th, 2018
DCS for LHC IF Peter Chochula 28 june 2019.
Offline framework for conditions data
Presentation transcript:

Peter Chochula

 DCS architecture in ALICE  Databases in ALICE DCS  Layout  Interface to external systems  Current status and experience  Future plans

 ALICE DCS is responsible for safe and correct operation of the experiment  DCS interacts with devices, configures them, monitors the operation and executes corrective actions  There are about 1200 network attached devices and ~300 directly connected devices controlled by the DCS  About parameters are actively supervised by the system  DCS interacts with many external systems  Part of the acquired data is made available to offline for analysis (conditions data)

Detector Controls System DETECTORS and DETECTOR-like systems External Services and Systems Electricity Ventilation Cooling Gas Magnets Safety Access Control LHC Configuratio n Database Archival Database Devices SCADA 1000 ins/s Up to 6GB Infrastructure B-field Space Frame Beam Pipe Radiation Environment Alice Systems DAQ TRIGGER HLT Conditions Database ECS OFFLINE 4 SPDPHS FMDT00V00PMDMTRMCHZDCACO SDDSSDTPCTRDTOFHMP Controls Context AD0TRILHC

values/s read by software values/s Injected into PVSS 1000 values/s Written to ORACLE after smoothing in PVSS >200 values/s Sent to consumers Dataflow in ALICE DCS 6GB of data is needed to fully configure ALICE-DCS for operation Several stages of filtering applied to acquired data

 PVSSII system is composed of specialized program modules (managers)  Managers communicate via TCP/IP  ALICE DCS is built from 100 PVSS systems composed of 1900 managers  PVSSII is extended by JCOP and ALICE frameworks on top of which User applications are built 6 CTL API DM EM DRV UI User Application ALICE Framework JCOP Framework PVSSII

7 User Interface Manager Data Manager Driver User Interface Manager User Interface Manager Event Manager API Manager Control Manager In a scattered system, the managers can run on dedicated machines In a simple system all managers run on the same machine

8 Distribution Manager Distribution Manager Distribution Manager User Interface Manager Data Manager Driver User Interface Manager User Interface Manager Event Manager API Manager Control Manager User Interface Manager Data Manager Driver User Interface Manager User Interface Manager Event Manager API Manager Control Manager User Interface Manager Data Manager Driver User Interface Manager User Interface Manager Event Manager API Manager Control Manager In a distributed system several PVSSII systems (simple or scatered) are interconnected

 ALICE DCS currently uses ~100 simple or scattered PVSS systems  Each system is unique and executes dedicated tasks  All systems are interconnected to form one big distributed system  Using DIP protocol, DCS directly interacts with additional highly specialized PVSS systems in ALICE (GAS, Detector Safety System..)

Detector Controls System DETECTORS and DETECTOR-like systems External Services and Systems Electricity Ventilation Cooling Gas Magnets Safety Access Control LHC Configuration Database Archival Database Devices SCADA 1000 ins/s Up to 6GB Infrastructure B-field Space Frame Beam Pipe Radiation Environment Alice Systems DAQ TRIGGER HLT Conditions Database ECS OFFLINE 10 SPDPHS FMDT00V00PMDMTRMCHZDCACO SDDSSDTPCTRDTOFHMP AD0TRILHC

Detector Controls System Configuration Database Archival Database Devices SCADA 1000 ins/s Up to 6GB Conditions Database OFFLINE CONFIGURATION DATABASE: configuration of PVSS systems device settings front-end configuration Stored mostly as code which is compiled online and sent to devices at the start of a run

Detector Controls System Configuration Database Archival Database Devices SCADA 1000 ins/s Up to 6GB Conditions Database OFFLINE ARCHIVAL DATABASE: Parameters acquired from devices

Detector Controls System Configuration Database Archival Database Devices SCADA 1000 ins/s Up to 6GB Conditions Database OFFLINE CONDITIONS DATABASE Stores a subset of archived data Implemented at OFFLINE side Populated after each run with data acquired during the run

 Several stages of data compression and filtration are implemented at DCS  Data sent to archive is filtered:  Only values significantly different from previously archived one are sent to archive ▪ The set difference is large enough to suppress LSB and similar fluctuation ▪ The set difference is small enough to preserve physics information ▪ (Ex.: 0.1C change in the field cage affects track resolution in TPC but has no effect if detected in cavern environment)

Acquired Values Monitored values are acquired: randomly on change or at fixed time intervals

Acquired Values Archival deadband Archived Value A deadband is defined around each archived value Only values exceeding the deadband are sent to archive

Acquired Values Archival deadband Archived Values If a value exceeds the deadband: it is archived new deadband is defined around it

Acquired Values Archival deadband Archived Values The archival rate is significantly reduced (sometimes more than factor of 1000) the data is always archived at randomly

SOREOR The data is sent to archive at random intervals For very short runs or for extremely stable values the archived data might be missing A dedicated mechanism forces the archival of each value required by offline at the start and end of each run (SOR/EOR)

 All PVSS system have direct access to all archived data using PVSS built-in interface  Online analysis purposes  Interactive analysis by shift crew and experts  External system can access data only via dedicated server (AMANDA)  Protection of archive  Load balancing

 AMANDA is a ALICE-grown client- server software used to access ALICE DCS data  Client sends request for data, indicating names of the required parameters and requested time interval (without knowing the archive structure)  Server retrieves the data and sends it back to the client  Several AMANDA servers are deployed in ALICE  Multiple requests are queued in the servers and processed sequentially  In case of overload, it is enough to kill AMANDA server  AMANDA servers operate across the secured network boundary AMANDA client AMANDA server DATA REQUEST SQL req. DATA Firewall

 The main database service for the DCS is based on ORACLE  The DBA tasks are provided by DSA section of the IT-DB, based on a SLA between ALICE and IT  PRIMARY database servers and storage are located in ALICE pit  STANDBY database and tape backups are located in IT

PRIMARY DATABASE - ALICE P2 STANDBY DATABASE - IT Storage Backup SAN DB Servers ~100 DB Clients: Cofiguration Archive Offline clients Backup Streaming DATABASE – IT (limited amount of data)

 The DB is backed-up directly in ALICE site to a dedicated array  Fast recovery  Full backup  The whole DB is mirrored on STANDBY database in IT  The STANDBY database is backed up on taped  In case of DB connectivity problems, the clients can accumulate data in local buffers and dump them to DB once the connection is restored.  Lifetime of local buffers is ~days

PRIMARY DATABASE - ALICE P2 STANDBY DATABASE - IT Storage Backup SAN DB Servers ~100 DB Clients: Cofiguration Archive Offline Backup Disaster scenario tested in 2010 All ALICE DCS redirected to standby database for several days SUCESS!!!

 Number of clients: ~100  The ALICE DCS DB is tuned and tested for:  steady insertion rate of ~1000 inserts/s  peak rate of inserts/s  Current DB size:  ~3TB  2 schemas/detector

 ALICE DB service is in uninterrupted and stable operation since more than 3 years  Initial problems caused by instabilities of RAID arrays solved by firmware upgrades  Operational procedures fine-tuned, to match IT and ALICE requirements ▪ Updates only during LHC technical stops, etc..  The typical operational issues are caused by clients:  Misconfigured smoothing (client overload)  Missing data (stuck client, lost SOR/EOR signals) ▪ However, big improvements on stability during the last year (credits to EN-ICE guys)!

 The smooth and stable operation of the ORACLE database for ALICE DCS is a big achievement  Hereby we wish to express out thanks to the members of the IT-DB DS team for their highly professional help and approach!

 There are some other databases deployed in ALICE DCS, but their use is very light:  MySQL – for bookkeeping on file exchange servers ▪ Zero maintenance, local backup solution  SQL Server – as a storage for system monitoring tools (MS SCOM) ▪ Used as a out-of-the box solution, but growing quickly (will need to move to a new server)

 Currently the service fulfils ALICE needs  No major architectural changes planned in the near future (before the long LHC shutdown)  HW and SW upgrades still foreseen

 A replacement of the DB service in ALICE counting room is prepared for this year  Hardware (blade servers, SAN infrastructure and arrays) being currently installed and commissioned  We wish to keep the retired DB service in operation:  Testing purposes  SW validation

 No significant increase of data volume from detectors planned before the long LHC shutdown  During the shutdown new detector modules will be added to ALICE. This might double the amount of data  New project – the ALICE-LHC interface currently store data to files (luminosities, trigger counters, etc.)  Aim to move to ORACLE  Currently estimating the load – comparable with present DCS archival  We will probably use the retired DB for validation and tests before we make decision on the final setup: ▪ Extension of the newly installed service ▪ Or stay with files

 We will focus mainly on clients  New AMANDA servers to be developed  Fine tuning of PVSS clients  According to operational experience, we need to improve the monitoring  Quality of the produced data (online checks)  Amounts of produced data (regular scans?)  We are getting more request for accessing the data from local analysis code  Currently we are able to satisfy the needs with AMANDA  We plan to explore and possibly extend the streaming setup

 There are 3 big players in the business  DATABASE  ALICE experiment (clients, procedures)  PVSS  Matching of various requirements might create clashes  Deployment of patches on PVSS or DB side might collide with experiment status (critical runs...)  Compatibility of patches needs to be validated in a formal way (IT-EN/ICE-Experiments) ▪ ALICE can for example offer the retired DB service and the test cluster in ALICE lab for validation and participate in the tests

 The main DB service in ALICE DCS is based on ORACLE  The operational experience is very positive (stability, reliability) on server side  Small issues on clients side, being constantly improved  No major modifications expected before the LHC long shutdown  Several upgrades ongoing  Again, thank to IT-DB experts for smooth operation of this critical service