LCG 3D and Oracle Cluster Storage Workshop 20-21 Mar 2006 Barbara Martelli, INFN - CNAF Gianluca Peco, INFN Bologna
Overview LCG 3D project LCG 3D deployment plans Oracle technologies deployed at CNAF Experience with Oracle Real Application Clusters RAC tests results
LCG 3D Goals LCG 3D is a joint project between the service providers (CERN and LCG sites) and the service users (experiments and grid projects) aimed at: Defining distributed database services and application access allowing LCG applications and services to find relevant database back-ends, authenticate and use the provided data in a location independent way. Helping to avoid the costly parallel development of data distribution, backup and high availability mechanisms in each experiment or grid site in order to limit the support costs. Enabling a distributed deployment of an LCG database infrastructure with a minimal number of LCG database administration personnel.
LCG 3D Non Goals Store all database data Experiments are free to deploy databases and replicate data under their responsibility Setup a single monolithic distributed database system Given constraints like WAN connections one can not assume that a single synchronously updated database would work or give sufficient availability Setup a single vendor system Technology independence and multi-vendor implementation will be required to minimize the long term risks and to adapt to the different requirements/constraints on different tiers.
LCG 3D Project Structure WP1 - Data Inventory and Application requirements Working Group Members are s/w providers from experiments and grid services based on RDBMS data. Gather data properties (volume, ownership, access patterns) requirements and integrate the provided service into their software. WP2 - Service Definition and Implementation Working Group Members are site technology and deployment experts. Propose an agreeable deployment setup and common deployment procedures, deploy the DB service according to 3D agreed policies. CNAF involvement
Proposed Service Architecture M M O T0 - autonomous T3/4 T1- db back bone - all data replicated - reliable service T2 - local db cache -subset data -only local service O M O M Oracle Streams Cross vendor extract MySQL Files Proxy Cache Slide by Dirk Duellmann, CERN-IT
Proposed Service Structure 3 separate environments with different service levels and different HW resources: Development environment: Shared HW setup DBA limited support (via email) 8/5 monitoring and availability Integration: Dedicated node for a defined slot (usually a week) to perfom performance and functionality tests DBA support via email or phone Production: 24/7 monitoring and availability Backups every 10 minutes Limited number and scheduled number interventions
Present CNAF DB Service Structure 3D proposed service policy will be met by steps: At present we have: Test environment (used for testing new Oracle based technologies such as RAC, Grid Control and Streams) Preproduction environment composed by 2 dual RAC with 12 FC each of shared storage allocated to LHCb and Atlas 1 HP Proliant DL380G4 for Service instances such as Castor2 stager and FTS. By the end of April preproduction environment will be moved to production. When RAC tests will be finished, test machines will become our development/integration environment.
3D Milestones 31.03.06 Tier-1 services starts - milestone for earlly production Tier 1 sites CNAF will start with 2 dual RAC (LHCb,ATLAS) Bookkeeping DB replica LFC file catalog replica VOMS replica 31.05.06 Service review workshop >> hardware defined for full production. Experiment and site reports after first 3 month of service deployment. Define db resource requirements for full service. Milestone for experiments and all tier 1 sites 30.09.06 Full LCG database service in place - milestone for all Tier 1 sites
3D – Oracle Technologies 3D project leverages on several Oracle technologies to guarantee scalability and reliability of provided DB services: Oracle Streams Replication Oracle Real Application Clusters Oracle Enterprise Manager / Grid Control
Oracle Streams Replication Consumption Staging Capture Streams captures events Implicitly: log-based capture of DML and DDL Explicitly: Direct enqueue of user messages Captured events are published in the staging area Streams publishes captured events into a staging area Implemented as a queue Messages remain in staging area until consumed by all subscribers Other staging areas can subscribe to events in same database or in a remote database Events can be routed through a series of staging areas
Staged events are consumed by subscribers Implicitly: Apply Process Consumption Staging Capture Transformations can be performed as events enter, leave or propagate between staging areas Staged events are consumed by subscribers Implicitly: Apply Process Default Apply User-Defined Apply Explicitly: Application dequeue via API (C++, Java…) The default apply engine will directly apply the DML or DDL represented in the LCR apply to local Oracle table apply via DB Link to non-Oracle table Automatic conflict detection with optional resolution unresolved conflicts placed in exception queue Rule based configuration: expressed as “WHERE” clause
Example of Streams Replication User executes an update statement at source node: update table1 set field1= ‘value3’ where table1id = ‘id1’; Update table1 set field1=‘value3’ where table1id=‘id1’; table1 table1id |field1|.. id1 | value3 |… id2 | value2 | ... Apply Queue ----- LCRs ------ Propagation ACK table1 Source Node Capture Destination Node Redo Log
Oracle Real Application Cluster The Oracle Real Application Cluster technology allows to share a database amongst several database servers All datafiles, control files, PFILEs, and redo log files in RAC environments must reside on cluster-aware shared disks so that all of the cluster database instances can access them. RAC aims to provide highly available, fault tolerant and scalable database services Database servers Network shared disks (Cluster Filesystem)
Shared Storage on RAC Various choices for shared storage management in RAC: Raw Devices Network FS Supported only on certificated devices Oracle Cluster File System v2 Posix Compliant, general purpose (can be used also for Oracle Homes) Automatically configured to use Direct I/O Enable async I/O by setting filesystemio_options = SETALL Automatic Storage Management Logical Volume Manager with striping and mirroring Dynamic data distribution within and between storage arrays
Preproduction Environment Setup Gigabit Switch Gigabit Switch Private LHCB link Private LHCB link Private ATLS link Private ATLS link Dual Xeon 3,2GHz 4GB memory 2x73GB disks in RAID1 Aggiungere l’hp proliant con castor 2 e la slide sul backup rac-lhcb-02 rac-atlas-02 rac-lhcb-01 rac-atlas-01 Dell 224F 14 x 73GB disks Dell 224F 14 x 73GB disks
1.2 TB RAID-5 disk array formatted with OCFS2 RAC testbed Disk I/O traffic Fiber Channel Sw ORA-RAC-01 GigaSw1 ORA-RAC-02 Private network for interconnect traffic ORA-RAC-03 IBM FAStT900 FC RAID Controller ORA-RAC-04 1.2 TB RAID-5 disk array formatted with OCFS2 4 x Dual Xeon 2.8 GHz 4 GB RAM Red Hat Enterprise 4 on RAID-1 disks 2 x Intel PRO1000 NICs 1 QLogic 2312 FC HBA with 2 x 2Gb/s links Public and VIP Network Interface GigaSw2 Clients Clients Clients
RAC Test AS3AP 1-4 nodes Select Query 1GB cache
Select Query 8GB no db cache RAC Test AS3AP 1-4 nodes Overview Summarize the main plans Explain the long-term course to follow Select Query 8GB no db cache
RAC Test OLTP 4 nodes
RAC Test OLTP 1-2-4 nodes Con una applicazione OLTP la scalabilita' del sistema e' meno evidente Probabilmente il sistema shared disks utilizzato non e' adeguato alla tipologia di accesso ai dati da parte dell'applicazione.
RAC Test OLTP 4 nodes TransactionPerMinute con workload OLTP ( Order Entry) O_DIRECT Abilitato ASYNC_IO Abilitato TransactionPerMinute con workload OLTP ( Order Entry) O_DIRECT Disabilitato ASYNC_IO Disabilitato