Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status of the Accelerator Online Operational Databases

Similar presentations


Presentation on theme: "Status of the Accelerator Online Operational Databases"— Presentation transcript:

1 Status of the Accelerator Online Operational Databases
Ronny Billen, Chris Roderick LTC – 7 March 2008 Accelerators and Beams Department Controls Group

2 LTC - Controls session - Databases
LTC - Controls session - Databases

3 LTC - Controls session - Databases
Outline The Accelerator Online Operational Databases Current Database Server Situation Evolution of the Provided Services Performance  Hitting The limits 2008: Planned Upgrade and Migration Implications, Policy and Constraints for Applications Logging Data : Expected Vs Acceptable The Future Conclusions LTC - Controls session - Databases

4 The Accelerator Online Operational Databases
Data needed instantaneously to interact with the accelerator Database is between the accelerator equipment and the client (operator, equipment specialist, software developer) Many database services, including APIs, and applications LSA – Accelerator Settings database MDB – Measurement database LDB – Logging database CCDB – Controls Configuration E-Logbook – Electronic Logbooks CESAR – SPS-EA Controls LASER – Alarms database TIM – Technical Infrastructure Monitoring database 3-tier deployment of services for resource optimization Client  Application Server  Database Server LTC - Controls session - Databases

5 Current Database Server Situation
SUNLHCLOG Often referred to as the “LHC Logging Database” Technical 2-node cluster SUN Fire V x {single core 1GHz CPU, 4GB RAM, 2 x 36GB disks, 2 PS} External Storage 9TB RAID 1+0 / RAID 5 mirrored & striped (~60% usable) History Purchased original setup: March 2004 Purchased extra disks: October 2006 Main accounts - data Logging: LHC HWC, Injectors, Technical Services Measurements: LHC HWC, Injectors Settings: LSA for LHC, SPS, LEIR, PS, PSB, AD Today’s specifics 150 simultaneous user sessions Oracle data-files 4.7 TB,  LTC - Controls session - Databases

6 Current Database Server Situation
SUNSLPS Often referred to as the “Controls Configuration Database” Technical Server SUN E420R {450MHz CPU, 4GB RAM, 2x36GB disks} External Storage 218GB History Installed in January 2001 Main accounts - data AB-Controls, FESA, CMW, RBAC, OASIS CESAR, PO-Controls, INTERLOCK e-Logbooks, ABS-cache Historical SPS and TZ data LSA Test Today’s specifics simultaneous user sessions Oracle data-files 32GB LTC - Controls session - Databases

7 Evolution of the Provided Services
LSA Settings: operationally used since 2006 Deployed on SUNLHCLOG to get best performance Used for LEIR, SPS, SPS & LHC transfer lines, LHC HWC Continuously evolving due to requirements from LHC and PS Measurement Service: operationally used since mid-2005 Satisfying central short-term persistence for Java clients Provides data filtering and transfer to long-term logging service Generates accelerator statistics Increasingly used for complete accelerator complex Logging Service: operationally used since mid-2003 Scope extended to all accelerators, technical data of experiments Equipment expert data for LHC HWC: accounts for >90% volume Largest consumer of database and application server resources LTC - Controls session - Databases

8 Evolution of the Logging – Data Volume
LTC - Controls session - Databases

9 Evolution of the Logging – Data Rates
 CIET  CRYO  QPS LTC - Controls session - Databases

10 Performance  Hitting The Limits
I/O Limits I/O subsystem is used for reading and writing data Recent samples: 4 to 37 clients waiting for I/O subsystem No of active sessions waiting for I/O subsystem LTC - Controls session - Databases

11 Performance  Hitting The Limits
CPU Limits CPU is always needed to do anything: Data writing and extraction Data filtering (CPU intensive) and migration from MDBLDB Exporting archive log files to tape, Incremental back-ups Migrating historic data to dedicated read-only storage Hitting the I/O limits burns CPU Percentage of CPU used on I/O wait events LTC - Controls session - Databases

12 Performance  Hitting The Limits
Storage Limits Pre-defined allocated data-files difficult to manage (due to size) Monthly allocations always insufficient (necessary) Archive log file size insufficient (when backup service down) Storage Utilisation LTC - Controls session - Databases

13 2008: Planned Upgrade and Migration
Separate into 3 high-availability database services Deploy each service on a dedicated Oracle Real Application Cluster Settings & Controls Configuration (including logbooks) Highest-availability, Fast response Low CPU usage, Low disk I/O ~20GB data Measurement Service Highest-availability CPU intensive (data filtering MDBLDB), Very high disk I/O ~100GB (1 week latency) or much more for HWC / LHC operation Logging Service High-availability CPU intensive (data extraction), High disk I/O ~10TB per year LTC - Controls session - Databases

14 2008: Planned Upgrade and Migration
Additional server for DataGuard testing: Standby database for LSA Oracle RAC 1 Oracle RAC 2 Oracle RAC 3 11.4TB usable CTRL 2 x quad-core 2.8GHz CPU 8GB RAM CTRL Clustered NAS shelf 14x146GB FC disks LSA Settings Controls Configuration E-Logbook CESAR Measurements HWC Measurements Logging Clustered NAS shelf 14x300GB SATA disks LTC - Controls session - Databases

15 2008: Planned Upgrade and Migration
Dell PowerEdge 1950 Server specifications: 2x Intel Xeon quad-core 2.33 GHz CPU 2x 4 MB L2 cache 8GB RAM 2x power supplies, Network cards (10Gb Ethernet), 2x 72GB system disks NetApp Clustered NAS FAS3040 Storage specifications: 2x disk Controllers (support for 336 disks (24 shelves)) 2x disk shelves (14x 146GB Fibre Channel 10,000rpm) 8GB RAM (cache) RAID-DP Redundant hot-swappable: controllers, cooling fans, power supplies, optics, and network cards Certified >3000 I/O per second LTC - Controls session - Databases

16 2008: Planned Upgrade and Migration
Purchase order for storage (2/11) Purchase order for servers (7/122) NetApps NAS storage shelves Dell servers Additional mounting rails for servers Servers Rack space Server and storage Oracle system software Database structures Database services Switch to services of new platform Migration of existing 5TB logging data to new platform Purchase additional logging storage for beyond 2008 launched Sep-2007 launched Oct-2007 arrived at CERN Nov-2007 arrived at CERN Jan-2008 ordered Jan-2008 stress-tested Jan-2008 liberated Feb-2008 fully installed 7-Mar-2008 installed, configured 14-Mar-2008 deployed (AB/CO/DM) ready for switch-over (1-day stop) 21-Mar-2008? (later) (Sep-2008) LTC - Controls session - Databases

17 Implications, Policy and Constraints for Applications
Foreseen for all services, already implemented for a few: Implications All applications should be cluster-aware Database load-balancing / fail-over (connection modifications) Application fail-over (application modifications) Policy Follow naming conventions for data objects Constraints Use APIs for data transfer (no direct table access) Enforce controlled data access Register authorized applications (purpose, responsible) Implement application instrumentation Provide details of all database operations (who, what, where) LTC - Controls session - Databases

18 Logging Data: Expected Vs Acceptable
Beam related equipment starting to produce data BLM 6,400 monitors * 12 * 2(losses & thresholds) + crate status = ~154,000 values per second (filtered by concentrator & MDB) XPOC More to come… Limits Maximum: 1 Hz data frequency in Logging database Not a data dump Consider final data usage before logging – only log what is needed Logging noise will have a negative impact on data extraction performance and analysis LTC - Controls session - Databases

19 LTC - Controls session - Databases
The Future Logging Data Original idea  keep data available online indefinitely Data rates estimated ~10TB/year Closely monitor evolution of storage usage Order new disks for 2009 data (in Sept 2008) Migrate existing data (~4TB) to new disks Service Availability New infrastructure has high-redundancy for high-availability Scheduled interventions will still need to be planned Use of a standby database will be investigated, with the objective of reaching 100% uptime for small databases LTC - Controls session - Databases

20 LTC - Controls session - Databases
Conclusions Databases play a vital role in the commissioning and operation of the Accelerators Database performance and availability have a direct impact on operations Today, the main server SUNLHCLOG is heavily overloaded Based on experience, and the evolution of existing services, the new database infrastructure has been carefully planned to: Address performance issues Provide maximum availability Provide independence between the key services Scale in function of data volumes, and future requirements The new database infrastructure should be operational ahead of injector chain start-up and LHC parallel sector HWC LTC - Controls session - Databases


Download ppt "Status of the Accelerator Online Operational Databases"

Similar presentations


Ads by Google