Experience in running relational databases on clustered storage On behalf IT-DB storage team, IT Department HEPIX 2014.

Slides:



Advertisements
Similar presentations
Tom Hamilton – America’s Channel Database CSE
Advertisements

ITEC474 INTRODUCTION.
Tag line, tag line Provisioning Manager 4.0 Customer Strategic Presentation March 2010.
What’s in Windows Server 2012 for SQL Server N.Raja / James Crawshaw Technology Specialists - Microsoft DBI332.
Evolution of database services
On behalf DBoD team, IT Department HEPIX 2014 Nebraska Union, University of Nebraska – Lincoln, USA.
IBM® Spectrum Storage Virtualize™ V V7000 Unified in a nutshell
European Organization for Nuclear Research Virtualization Review and Discussion Omer Khalid 17 th June 2010.
Chapter 5 Configuring the RMAN Environment. Objectives Show command to see existing settings Configure command to change settings Backing up the controlfile.
CHAPTER 17 Configuring RMAN. Introduction to RMAN RMAN was introduced in Oracle 8.0. RMAN is Oracle’s tool for backup and recovery. RMAN is much more.
Backup and restore of Oracle databases: introducing a disk layer
Oracle backup and recovery strategy
CERN IT Department CH-1211 Geneva 23 Switzerland t CERN IT Department CH-1211 Geneva 23 Switzerland t
Introduction to Oracle Backup and Recovery
Storwize V7000 IP Replication solution explained
1© Copyright 2013 EMC Corporation. All rights reserved. EMC XtremSW Cache Performance. Intelligence. Protection.
Database Upgrade/Migration Options & Tips Sreekanth Chintala Database Technology Strategist.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
1 © Copyright 2009 EMC Corporation. All rights reserved. Agenda Storing More Efficiently  Storage Consolidation  Tiered Storage  Storing More Intelligently.
Experience and Lessons learnt from running High Availability Databases on Network Attached Storage Ruben Gaspar Manuel Guijarro et al IT/DES.
PPOUG, 05-OCT-01 Agenda RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features.
Database storage at CERN
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Introducing Snap Server™ 700i Series. 2 Introducing the Snap Server 700i series Hardware −iSCSI storage appliances with mid-market features −1U 19” rack-mount.
CERN IT Department CH-1211 Geneva 23 Switzerland t Experience with NetApp at CERN IT/DB Giacomo Tenaglia on behalf of Eric Grancher Ruben.
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
XenDesktop Built on FlexPod Flexible IT Infrastructure for Desktop Virtualization.
5 Copyright © 2004, Oracle. All rights reserved. Using Recovery Manager.
5 Copyright © 2008, Oracle. All rights reserved. Using RMAN to Create Backups.
Chapter 7 Making Backups with RMAN. Objectives Explain backup sets and image copies RMAN Backup modes’ Types of files backed up Backup destinations Specifying.
Experience in running relational databases on clustered storage CERN, IT Department CHEP 2015, Okinawa, Japan 13/04/2015.
Luca Canali, CERN Dawid Wojcik, CERN
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Backup & Recovery Backup and Recovery Strategies on Windows Server 2003.
11g(R1/R2) Data guard Enhancements Suresh Gandhi
CERN - IT Department CH-1211 Genève 23 Switzerland t Experience and Lessons learnt from running High Availability Databases on Network Attached.
Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.
Selling the Storage Edition for Oracle November 2000.
CERN IT Department CH-1211 Genève 23 Switzerland t Evolution of virtual infrastructure with Hyper-V Juraj Sucik, Slavomir Kubacka Internet.
CERN IT Department CH-1211 Geneva 23 Switzerland t Daniel Gomez Ruben Gaspar Ignacio Coterillo * Dawid Wojcik *CERN/CSIC funded by Spanish.
CERN IT Department CH-1211 Geneva 23 Switzerland t IT/DB Tests and evolution SSD as flash cache.
ESRI User Conference 2004 ArcSDE. Some Nuggets Setup Performance Distribution Geodatabase History.
VMware vSphere Configuration and Management v6
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
3 Copyright © 2006, Oracle. All rights reserved. Using Recovery Manager.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Overview of Oracle Backup and Recovery Darl Kuhn, Regis University.
Workshop sullo Storage da Small Office a Enterprise Class Presentato da:
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Juraj Sucik, Michal Kwiatek, Rafal.
LHC Logging Cluster Nilo Segura IT/DB. Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition.
PIC port d’informació científica Luis Diaz (PIC) ‏ Databases services at PIC: review and plans.
NetApp: for the Cloud All Flash SDN © 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 1.
REMINDER Check in on the COLLABORATE mobile app Best Practices for Oracle on VMware - Deep Dive Darryl Smith Chief Database Architect Distinguished Engineer.
Considerations and Benefits for Archive and Compliance Deploying Enterprise Vault on NetApp Storage 1.
Extending Auto-Tiering to the Cloud For additional, on-demand, offsite storage resources 1.
VVols with Adaptive Flash and InfoSight Analytics 1 Manchester Virtualisation User Group Rich Fenton (Nimble North Senior Systems Engineer)
CommVault Architecture
Deploying disk deduplication for Hyper-v 3.0 Žigmund Maťašovský.
ProStoria DATA-AS-A-SERVICE FOR DEVOPS. Agenda: ProStoria presentation Contact data.
PHD Virtual Technologies “Reader’s Choice” Preferred product.
Monitoring Storage Systems for Oracle Enterprise Manager 12c
Sebastian Solbach Consulting Member of Technical Staff
Installation and database instance essentials
Monitoring Storage Systems for Oracle Enterprise Manager 12c
Oracle Storage Performance Studies
ASM-based storage to scale out the Database Services for Physics
About ProLion CEO, Robert Graf Headquarter in Austria
IBM Tivoli Storage Manager
DataOptimizer Transparent File Tiering for NetApp Storage Robert Graf
Presentation transcript:

Experience in running relational databases on clustered storage On behalf IT-DB storage team, IT Department HEPIX 2014 Nebraska Union, University of Nebraska – Lincoln, USA

Agenda 3 CERN databases basic description Storage evolution using Netapp Caching technologies Flash cache Flash pool Data motion Snapshots Cloning Backup to disk directNFS Monitoring In-house tools Netapp tools iSCSI access Conclusions

Agenda 4 CERN databases basic description Storage evolution using Netapp Caching technologies Flash cache Flash pool Data motion Snapshots Cloning Backup to disk directNFS Monitoring In-house tools Netapp tools iSCSI access Conclusions

CERN’s Databases ~100 Oracle databases, most of them RAC Mostly NAS storage plus some SAN with ASM ~500 TB of data files for production DBs in total Examples of critical production DBs: LHC logging database ~190 TB, expected growth up to ~70 TB / year 13 production experiments’ databases ~10-20 TB in each Read-only copies (Active Data Guard) But also as DBaaS, as single instances 148 MySQL Open community databases (5.6.17) 15 Postgresql databases (being migrated to 9.2.9) 12 Oracle11g → migrating towards Oracle12c multi-tenancy 5

Use case: Quench Protection System Critical system for LHC operation Major upgrade for LHC Run 2 ( ) High throughput for data storage requirement Constant load of 150k changes/s from 100k signals Whole data set is transfered to long-term storage DB Query + Filter + Insertion Analysis performed on both DBs 6 Backup LHC Logging (long-term storage) RDB Archive 16Projects Around LHC

Quench Protection system: tests 7 After two hours buffering Nominal conditions Stable constant load of 150k changes/s 100 MB/s of I/O operations 500 GB of data stored each day Peak performance Exceeded 1 million value changes per second MB/s of I/O operations

Oracle basic setup Mount Options for Oracle files when used with NFS on NAS devices (Doc ID ) global namespace Oracle RAC database at least 10 file systems 10GbE 12gbps gpn Private network (mtu=9000) mtu=1500

Oracle file systems Mount pointContent /ORA/dbs0a/${DB_UNIQUE_NAME}ADR (including listener) /adump log files /ORA/dbs00/${DB_UNIQUE_NAME}Control File + copy of online redo logs /ORA/dbs02/${DB_UNIQUE_NAME}Control File + archive logs (FRA) /ORA/dbs03/${DB_UNIQUE_NAME}*Datafiles /ORA/dbs04/${DB_UNIQUE_NAME}Control File + copy of online redo logs + block change tracking file + spfile /ORA/dbs0X/${DB_UNIQUE_NAME}*More datafiles volumes if needed /CRS/dbs00/${DB_UNIQUE_NAME}Voting disk /CRS/dbs02/${DB_UNIQUE_NAME}Voting disk + OCR /CRS/dbs00/${DB_UNIQUE_NAME}Voting disk + OCR * They are mounted using their own lif to ease volume movements within the cluster 9

Agenda 10 CERN databases basic description Storage evolution using Netapp Caching technologies Flash cache Flash pool Data motion Snapshots Cloning Backup to disk directNFS Monitoring In-house tools Netapp tools iSCSI access Conclusions

11 Netapp evolution at CERN (last 8 years) FAS3000 FAS6200& FAS % FC disks Flash pool/cache = 100% SATA disk + SSD DS14 mk4 FC DS4246 6gbps 2gbps Data ONTAP® 7-mode Data ONTAP® Cluster-Mode scaling up scaling out

A few 7-mode concepts 12 Private network FlexVolume Remote Lan Manager Service Processor Rapid RAID Recovery Maintenance center (at least 2 spares) raid_dp or raid4 raid.scrub.schedule raid.media_scrub.rat e once weekly constantly reallocate Thin provisioning File access Block access NFS, CIFSFC,FCoE, iSCSI client access

Private network Cluster interconnect Cluster mgmt network A few C-mode concepts cluster node shell systemshell C-mode cluster ring show RDB: vifmgr + bcomd + vldb + mgmt Vserver (protected via Snapmirror) Global namespace Logging files from the controller no longer accessible by simple NFS export client access 13

Consolidation 56 controllers (FAS3000) & 2300 disks (1400TB storage) … Storage islands, accessible via private network controllers (FAS6220) & 960 disks (1660 TB storage) Easy management  Difficulties finding slots for interventions

RAC50 setup Cluster interconnect, using FC gbic’s for distance longer than 5m. SFP must be from CISCO (if CISCO switches in use) Cv cluster interconnect primary switch (private network) secondary switch (private network)

Agenda 16 CERN intro CERN databases basic description Storage evolution using Netapp Caching technologies Flash cache Flash pool Data motion Snapshots Cloning Backup to disk directNFS Monitoring In-house tools Netapp tools iSCSI access Conclusions

Flash cache Helps increase random IOPS on disks Warm-up effect ( options flexscale.rewarm ) cf operations (takeover/giveback) invalidate the cache, user initiated ones do not since ONTAP 8.1 TR-3832 :Flash Cache Best Practice Guide (Technical Report from Netapp) For databases Decide what volumes to cache: fas3240> priority on fas3240> priority set volume volname cache=[reuse|keep] options flexscale.lopri_blocks off Flash Cache Flash Pool

Flash cache: database benchmark Inner table (3TB) where a row = a block (8k). Outer table (2% of Inner table) each row contains rowid of inner table v$sysstat ‘physical reads’ Starts with db file sequential read but after a little while changes to db file parallel read 18 Random Read IOPS* No PAMPAM + Kernel NFS (RHE5) PAM + dNFS First run Second run *fas3240, 32 disks SATA 2TB, Data Ontap 8.0.1, Oracle 11gR2 ~ 472 data disks ~240 data disks

Flash cache: long running backups… During backups SSD cache is flushed IO latency increases – hit% on PAM goes down ~ 1% Possible solutions: Data Guard priority set enabled_components=cache Large IO windows to improve sequential IO detection, possible in C-mode: vserver nfs modify -vserver vs1 -v3-tcp-max-read-size © by Luca Canali

Agenda 20 CERN intro CERN databases basic description Storage evolution using Netapp Caching technologies Flash cache Flash pool Data motion Snapshots Cloning in Oracle12c Backup to disk directNFS Monitoring In-house tools Netapp tools iSCSI access Conclusions

Flash pool aggregates 64 bits aggregates Aggregate with snapshots, they must be deleted before converting into hybrid aggregate SSD rules: minimum number and extensions depending on the model e.g. FAS , 6 (with 100GB SSD) No mixed type of disks in a hybrid aggregate: just SAS + SSD, FC + SSD, SATA + SSD. No mixed type of disks in a raid_gp. You can combine different protection levels among SSD RAID and HDD RAID, e.g. raid_dp or raid4 Hybrid aggregate can not be rollbacked If SSD raid_gps are not available the whole aggregate is down SSD raid_gps doesn’t count in total aggregate space Maximum SSD size depending on model & ONTAP release ( ). TR-4070: Flash Pool Design and Implementation Guide Flash Cache Flash Pool

Flash pool behaviour 22 Blocks going into SSD determined by Write and Read policies. They apply to volumes or globally on whole aggregate. Sequential data is not cached. Data cannot be pinned Heat map in order to decide what stays and for how long in SSD cache random overwrites, size < 16Kb Write to disk read overwrite Eviction scanner Insert into SSD read write Every 60 secs & SSD consumption > 75% hot warm neutral cold evict cold neutral

Flash pool: performance counters 23 Performance counters: wafl_hya_per_aggr (299) & wafl_hya_per_vvol (16) We have automated the way to query those: Around 25% difference in an empty system: Ensures enough pre-erased blocks to write new data Read-ahead caching algorithms

Flash pool behaviour: warm-up times 24 Using fio: 500GB dataset, random IO Read cache warms slower than write cache Reads (ms) costs more than writes (μs) Stats of SSD consumption can be retrieved using: wafl_hya_per_vvol object, at nodeshell in diagnostic level. 6 hours

Flash pool: random reads 25 Equivalent to 600 HDD Just 48 HDD in reality

26 1TB dataset, 100% in SSD, 56 sessions, random reads

10TB dataset, 36% in SSD, 32 sessions, random reads

Flash pool: long running backups 28 Full backup running for 3,5 days

Agenda 29 CERN intro CERN databases basic description Storage evolution using Netapp Caching technologies Flash cache Flash pool Data motion Snapshots Cloning Backup to disk directNFS Monitoring In-house tools Netapp tools iSCSI access Conclusions

Vol move Powerful feature: rebalancing, interventions,… whole volume granularity Transparent but watch-out on high IO (writes) volumes Based on SnapMirror technology Initial transfer rac50::> vol move start -vserver vs1rac50 -volume movemetest -destination-aggregate aggr1_rac5071 -cutover- window 45 -cutover-attempts 3 -cutover-action defer_on_failure Example vol move command:

Oracle12c: online datafile move 31 Very robust, even with high IO load It takes advantage of database memory buffers Works with OMF (Oracle management files) Track it at alert.log and v$session_longops

Oracle12c: online datafile move (II) 32 alter database move datafile Move was completed.

Agenda 33 CERN intro CERN databases basic description Storage evolution using Netapp Caching technologies Flash cache Flash pool Data motion Backup and Recovery: snapshots Cloning Backup to disk directNFS Monitoring In-house tools Netapp tools iSCSI access Conclusions

34 DBaaS:Backup management Same backup procedure for all RDBMS Backup workflow: mysql> FLUSH TABLES WITH READ LOCK; mysql> FLUSH LOGS; or Oracle>alter database begin backup; Or Postgresql> SELECT pg_start_backup('$SNAP'); mysql> UNLOCK TABLES; Or Oracle>alter database end backup; or Postgresql> SELECT pg_stop_backup(), pg_create_restore_point('$SNAP'); snapshot resume … some time later new snapshot

Snapshots in Oracle 35 Storage-based technology Speed-up of backups/restores: from hours/days to seconds Handled by a plug-in on our backup and recovery solution: /etc/init.d/syscontrol --very_silent -i rman_backup start -maxRetries 1 -exec takesnap_zapi.pl -debug -snap dbnasr0009-priv:/ORA/dbs03/PUBSTG level_EXEC_SNAP -i pubstg_rac50 Example: Drawback: lack of integration with RMAN Ontap commands: snap create/restore Snaprestore requires license snapshots not available via RMAN API But some solutions exist: Netapp MML Proxy api, Oracle snapmanager pubstg: 280GB size, ~ 1 TB archivelogs/day 8secs adcr: 24TB size, ~ 2,5 TB archivelogs/day lifGlobal namespace 9secs

Agenda 36 CERN intro CERN databases basic description Storage evolution using Netapp Caching technologies Flash cache Flash pool Data motion Snapshots Cloning Backup to disk direcNFS Monitoring In-house tools Netapp tools iSCSI access Conclusions

Cloning 37 On the storage backend, a new license is required Offers new possibilities, especially on clustered databases Oracle12c multi-tenancy and PostgreSQL, both include especial SQL to handle cloning TR-4266: NetApp Cloning Plug-in for Oracle Multitenant Database 12c It can combine with any strategy of data protection that involves snapshots (still not there) E.g. create a cloned db from a backup snapshot for fast testing an application upgrade

Agenda 38 CERN intro CERN databases basic description Storage evolution using Netapp Caching technologies Flash cache Flash pool Data motion Snapshots Clonning Backup to disk direcNFS Monitoring In-house tools Netapp tools iSCSI access Conclusions

Backup architecture Custom solution: about 15k lines of code, Perl + Bash Export policies restrict access to the NFS shares to DB servers Extensive use of compression and some deduplication Storage compression does not require license. Oracle Advanced Compression license required for low, medium and high types (see later) We send compressed: 1 out of 4 full backups All archivelogs

Backup to disk: throughput (one head) 40 data scrubbing compression ratio 720 TB used 603TB saved due to compression mainly but also deduplication (control files)

Oracle12c compression Oracle , new servers (32 cores,129GB RAM) Oracle new servers (32 cores,129GB RAM) no-compressed (t)basiclowmediumhighNo-compressed- fs Inline- compression Netapp 8.2P3 392GB (devdb11)62.24GB(1h54’)89.17GB (27’30’’)73.84GB (1h01’)50.71GB (7h17’)349GB(22’35’’)137GB(22’35’’) Percentage saved (%)82%74.4%78.8%85.4%0%62% no-compressed (t)basiclowmediumhighNo-compressed- fs Inline- compression Netapp 8.2P3 376GB (devdb11 upgraded to 12c) 45.2GB (1h29’)64.13GB (22’)52.95GB (48’)34.17GB (5h17’)252.8GB(22’)93GB(20’) Percentage saved (%) 82.1%74.6%79%86.4%0%64.5% 229.2GB (tablespace using Oracle Crypto) 57.4GB (2h45’)57.8GB (10’)58.3GB (44’’)56.7GB (4h13’)230GB(12’30’’)177GB(15’45’’) Percentage saved (%) 74.95%74.7%74.5%75.2%0%22.7% Intel(R) Xeon(R) CPU E GHz

Agenda 42 CERN intro CERN databases basic description Storage evolution using Netapp Caching technologies Flash cache Flash pool Data motion Snapshots Clonning Backup to disk directNFS Monitoring In-house tools Netapp tools iSCSI access Conclusions

Oracle directNFS Oracle12c, enable dNFS by: $ORACLE_HOME/rdbms/lib/make -f ins_rdbms.mk dnfs_on Mount Options for Oracle files when used with NFS on NAS devices [ID ] RMAN backups for disk backups kernel NFS [ID ] Linux/NetApp: RHEL/SUSE Setup Recommendations for NetApp Filer Storage (Doc ID )

Oracle directNFS (II) dnfs vs Kernel NFS stack while using parallel queries 44

Agenda 45 CERN intro CERN databases basic description Storage evolution using Netapp Caching technologies Flash cache Flash pool Data motion Snapshots Clonning Backup to disk direcNFS Monitoring In-house tools Netapp tools iSCSI access Conclusions

In-house tools 46 Main aim is to allow access to the storage for our DBAs and system admins. Based on ZAPI (Netapp storage API) and programmed in Perl and Bash. It’s about 5000 lines of code All scripts work on C-mode and 7-mode → no need to know how to connect to the controllers and different ONTAP CLI versions

In-house tool: snaptool.pl 47 create, list, delete, clone, restore… e.g. API available programmatically

48 Check online statistics of a particular file system or controller serving it Volume stats & histograms: In-house tool: smetrics

Agenda 49 CERN databases basic description Storage evolution using Netapp Caching technologies Flash cache Flash pool Data motion Snapshots Clonning in Oracle12c Backup to disk directNFS Monitoring In-house tools Netapp tools iSCSI access Conclusions

Netapp monitoring/mgmt tools 50 Unified OnCommand Manager 5.2 (linux) Authentication using PAM Extensive use of reporting (in 7-mode) Work for both 7-mode and C-mode Performance management console (performance counters display) Alarms OnCommand Performance Manager (OPM) & OnCommand Unified Manager (OUM) Used for C-mode Virtual machine (VM) that runs on a VMware ESX or ESXi Server System Manager We use it mainly to check setups My Autosupport at NOW website

Agenda 51 CERN intro CERN databases basic description Storage evolution using Netapp Caching technologies Flash cache Flash pool Data motion Snapshots Cloning Backup to disk directNFS Monitoring In-house tools Netapp tools iSCSI access Conclusions

iSCSI SAN 52 iSCSI requires an extra license All storage features available for luns Access through CERN routable network Using Netapp CINDER (Openstack block storage implementation) driver Exports iSCSI SAN storage to KVM or Hyper-V hypervisors Solution is being tested

Conclusions 53 Positive experience so far running on C-mode Mid to high end NetApp NAS provide good performance using the FlashPool SSD caching solution Flexibility with cluster ONTAP, helps to reduce the investment Design of stacks and network access require careful planning

Acknowledgement 54 IT-DB colleagues, especially Lisa Azzurra Chinzer and Miroslav Potocky members of the storage team

Questions 55