Status of the Accelerator Online Operational Databases

Slides:



Advertisements
Similar presentations
Controls Configuration Service Overview GSI Antonio on behalf of the Controls Configuration team Beams Department Controls Group Data & Applications.
Advertisements

Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Overview of Data Management solutions for the Control and Operation of the CERN Accelerators Database Futures Workshop, CERN June 2011 Zory Zaharieva,
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
Castor F2F Meeting Barbara Martelli Castor Database CNAF.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
E. Hatziangeli – LHC Beam Commissioning meeting - 17th March 2009.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Pierre Charrue – BE/CO.  Preamble  The LHC Controls Infrastructure  External Dependencies  Redundancies  Control Room Power Loss  Conclusion 6 March.
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
D0 Run IIb Review 15-Jul-2004 Run IIb DAQ / Online status Stu Fuess Fermilab.
ASGC 1 ASGC Site Status 3D CERN. ASGC 2 Outlines Current activity Hardware and software specifications Configuration issues and experience.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
Logging Mike Lamont Georges Henry Hemlesoet AB/OP Discussions with M. Pace & C. Roderick.
Eugenia Hatziangeli Beams Department Controls Group CERN, Accelerators and Technology Sector E.Hatziangeli - CERN-Greece Industry day, Athens 31st March.
LHC BLM Software revue June BLM Software components Handled by BI Software section –Expert GUIs  Not discussed today –Real-Time software  Topic.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
1 D0 Taking Stock By Anil Kumar CD/LSCS/DBI/DBA June 11, 2007.
European Organization For Nuclear Research Future Database Requirements in the Accelerator Sector Ronny Billen Database Futures Workshop – 6-7 June 2011.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
1 EIR Nov 4-8, 2002 DAQ and Online WBS 1.3 S. Fuess, Fermilab P. Slattery, U. of Rochester.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
ClinicalSoftwareSolutions Patient focused.Business minded. Slide 1 Opus Server Architecture Fritz Feltner Sept 7, 2007 Director, IT and Systems Integration.
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
DIAMON Project Project Definition and Specifications Based on input from the AB/CO Section leaders.
16-17 January 2007 Post-Mortem Workshop Logging data in relation with Post-Mortem and archiving Ronny Billen AB-CO.
LHC Logging Cluster Nilo Segura IT/DB. Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
PIC port d’informació científica Luis Diaz (PIC) ‏ Databases services at PIC: review and plans.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
European Organization For Nuclear Research CERN Accelerator Logging Service Overview Focus on Data Extraction for Offline Analysis Ronny Billen & Chris.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Understanding and Improving Server Performance
RHEV Platform at LHCb Red Hat at CERN 17-18/1/17
DCS Status and Amanda News
WP18, High-speed data recording Krzysztof Wrona, European XFEL
2007 IEEE Nuclear Science Symposium (NSS)
Trumping Oracle Exadata
High Availability Linux (HA Linux)
IT-DB Physics Services Planning for LHC start-up
ALICE Monitoring
Database Services at CERN Status Update
Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
Injectors BLM system: PS Ring installation at EYETS
Database involvement in Timing
Instrumentation for SPS 2006 restart
Computing infrastructure for accelerator controls and security-related aspects BE/CO Day – 22.June.2010 The first part of this talk gives an overview of.
Castor services at the Tier-0
eLTC - Controls Summary
How can a detector saturate a 10Gb link through a remote file system
NGS Oracle Service.
Scalable Database Services for Physics: Oracle 10g RAC on Linux
HEPiX Fall 2017 CERN project Follow-up
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
Storage Virtualization
Oracle Storage Performance Studies
ASM-based storage to scale out the Database Services for Physics
Support for ”interactive batch”
Scalable Database Services for Physics: Oracle 10g RAC on Linux
High-Performance Storage System for the LHCb Experiment
The Problem ~6,000 PCs Another ~1,000 boxes But! Affected by:
LHC BLM Software audit June 2008.
Presentation transcript:

Status of the Accelerator Online Operational Databases Ronny Billen, Chris Roderick LTC – 7 March 2008 Accelerators and Beams Department Controls Group

LTC - Controls session - Databases 7-03-2008 LTC - Controls session - Databases

LTC - Controls session - Databases Outline The Accelerator Online Operational Databases Current Database Server Situation Evolution of the Provided Services Performance  Hitting The limits 2008: Planned Upgrade and Migration Implications, Policy and Constraints for Applications Logging Data : Expected Vs Acceptable The Future Conclusions 7-03-2008 LTC - Controls session - Databases

The Accelerator Online Operational Databases Data needed instantaneously to interact with the accelerator Database is between the accelerator equipment and the client (operator, equipment specialist, software developer) Many database services, including APIs, and applications LSA – Accelerator Settings database MDB – Measurement database LDB – Logging database CCDB – Controls Configuration E-Logbook – Electronic Logbooks CESAR – SPS-EA Controls LASER – Alarms database TIM – Technical Infrastructure Monitoring database 3-tier deployment of services for resource optimization Client  Application Server  Database Server 7-03-2008 LTC - Controls session - Databases

Current Database Server Situation SUNLHCLOG Often referred to as the “LHC Logging Database” Technical 2-node cluster SUN Fire V240 2 x {single core 1GHz CPU, 4GB RAM, 2 x 36GB disks, 2 PS} External Storage 9TB RAID 1+0 / RAID 5 mirrored & striped (~60% usable) History Purchased original setup: March 2004 Purchased extra disks: October 2006 Main accounts - data Logging: LHC HWC, Injectors, Technical Services Measurements: LHC HWC, Injectors Settings: LSA for LHC, SPS, LEIR, PS, PSB, AD Today’s specifics 150 simultaneous user sessions Oracle data-files 4.7 TB,  7-03-2008 LTC - Controls session - Databases

Current Database Server Situation SUNSLPS Often referred to as the “Controls Configuration Database” Technical Server SUN E420R {450MHz CPU, 4GB RAM, 2x36GB disks} External Storage 218GB History Installed in January 2001 Main accounts - data AB-Controls, FESA, CMW, RBAC, OASIS CESAR, PO-Controls, INTERLOCK e-Logbooks, ABS-cache Historical SPS and TZ data LSA Test Today’s specifics 200-300 simultaneous user sessions Oracle data-files 32GB 7-03-2008 LTC - Controls session - Databases

Evolution of the Provided Services LSA Settings: operationally used since 2006 Deployed on SUNLHCLOG to get best performance Used for LEIR, SPS, SPS & LHC transfer lines, LHC HWC Continuously evolving due to requirements from LHC and PS Measurement Service: operationally used since mid-2005 Satisfying central short-term persistence for Java clients Provides data filtering and transfer to long-term logging service Generates accelerator statistics Increasingly used for complete accelerator complex Logging Service: operationally used since mid-2003 Scope extended to all accelerators, technical data of experiments Equipment expert data for LHC HWC: accounts for >90% volume Largest consumer of database and application server resources 7-03-2008 LTC - Controls session - Databases

Evolution of the Logging – Data Volume 7-03-2008 LTC - Controls session - Databases

Evolution of the Logging – Data Rates  CIET  CRYO  QPS 7-03-2008 LTC - Controls session - Databases

Performance  Hitting The Limits I/O Limits I/O subsystem is used for reading and writing data Recent samples: 4 to 37 clients waiting for I/O subsystem No of active sessions waiting for I/O subsystem 7-03-2008 LTC - Controls session - Databases

Performance  Hitting The Limits CPU Limits CPU is always needed to do anything: Data writing and extraction Data filtering (CPU intensive) and migration from MDBLDB Exporting archive log files to tape, Incremental back-ups Migrating historic data to dedicated read-only storage Hitting the I/O limits burns CPU Percentage of CPU used on I/O wait events 7-03-2008 LTC - Controls session - Databases

Performance  Hitting The Limits Storage Limits Pre-defined allocated data-files difficult to manage (due to size) Monthly allocations always insufficient (necessary) Archive log file size insufficient (when backup service down) Storage Utilisation 7-03-2008 LTC - Controls session - Databases

2008: Planned Upgrade and Migration Separate into 3 high-availability database services Deploy each service on a dedicated Oracle Real Application Cluster Settings & Controls Configuration (including logbooks) Highest-availability, Fast response Low CPU usage, Low disk I/O ~20GB data Measurement Service Highest-availability CPU intensive (data filtering MDBLDB), Very high disk I/O ~100GB (1 week latency) or much more for HWC / LHC operation Logging Service High-availability CPU intensive (data extraction), High disk I/O ~10TB per year 7-03-2008 LTC - Controls session - Databases

2008: Planned Upgrade and Migration Additional server for DataGuard testing: Standby database for LSA Oracle RAC 1 Oracle RAC 2 Oracle RAC 3 11.4TB usable CTRL 2 x quad-core 2.8GHz CPU 8GB RAM CTRL Clustered NAS shelf 14x146GB FC disks LSA Settings Controls Configuration E-Logbook CESAR Measurements HWC Measurements Logging Clustered NAS shelf 14x300GB SATA disks 7-03-2008 LTC - Controls session - Databases

2008: Planned Upgrade and Migration Dell PowerEdge 1950 Server specifications: 2x Intel Xeon quad-core 2.33 GHz CPU 2x 4 MB L2 cache 8GB RAM 2x power supplies, Network cards (10Gb Ethernet), 2x 72GB system disks NetApp Clustered NAS FAS3040 Storage specifications: 2x disk Controllers (support for 336 disks (24 shelves)) 2x disk shelves (14x 146GB Fibre Channel 10,000rpm) 8GB RAM (cache) RAID-DP Redundant hot-swappable: controllers, cooling fans, power supplies, optics, and network cards Certified >3000 I/O per second 7-03-2008 LTC - Controls session - Databases

2008: Planned Upgrade and Migration Purchase order for storage (2/11) Purchase order for servers (7/122) NetApps NAS storage shelves Dell servers Additional mounting rails for servers Servers Rack space Server and storage Oracle system software Database structures Database services Switch to services of new platform Migration of existing 5TB logging data to new platform Purchase additional logging storage for beyond 2008 launched Sep-2007 launched Oct-2007 arrived at CERN Nov-2007 arrived at CERN Jan-2008 ordered Jan-2008 stress-tested Jan-2008 liberated Feb-2008 fully installed 7-Mar-2008 installed, configured 14-Mar-2008 deployed (AB/CO/DM) ready for switch-over (1-day stop) 21-Mar-2008? (later) (Sep-2008) 7-03-2008 LTC - Controls session - Databases

Implications, Policy and Constraints for Applications Foreseen for all services, already implemented for a few: Implications All applications should be cluster-aware Database load-balancing / fail-over (connection modifications) Application fail-over (application modifications) Policy Follow naming conventions for data objects Constraints Use APIs for data transfer (no direct table access) Enforce controlled data access Register authorized applications (purpose, responsible) Implement application instrumentation Provide details of all database operations (who, what, where) 7-03-2008 LTC - Controls session - Databases

Logging Data: Expected Vs Acceptable Beam related equipment starting to produce data BLM 6,400 monitors * 12 * 2(losses & thresholds) + crate status = ~154,000 values per second (filtered by concentrator & MDB) XPOC More to come… Limits Maximum: 1 Hz data frequency in Logging database Not a data dump Consider final data usage before logging – only log what is needed Logging noise will have a negative impact on data extraction performance and analysis 7-03-2008 LTC - Controls session - Databases

LTC - Controls session - Databases The Future Logging Data Original idea  keep data available online indefinitely Data rates estimated ~10TB/year Closely monitor evolution of storage usage Order new disks for 2009 data (in Sept 2008) Migrate existing data (~4TB) to new disks Service Availability New infrastructure has high-redundancy for high-availability Scheduled interventions will still need to be planned Use of a standby database will be investigated, with the objective of reaching 100% uptime for small databases 7-03-2008 LTC - Controls session - Databases

LTC - Controls session - Databases Conclusions Databases play a vital role in the commissioning and operation of the Accelerators Database performance and availability have a direct impact on operations Today, the main server SUNLHCLOG is heavily overloaded Based on experience, and the evolution of existing services, the new database infrastructure has been carefully planned to: Address performance issues Provide maximum availability Provide independence between the key services Scale in function of data volumes, and future requirements The new database infrastructure should be operational ahead of injector chain start-up and LHC parallel sector HWC 7-03-2008 LTC - Controls session - Databases