Implementing ASM Without HW RAID, A User’s Experience

Slides:



Advertisements
Similar presentations
Tom Hamilton – America’s Channel Database CSE
Advertisements

Refeng Wu CQ5 WCM System Administrator
5 Copyright © 2005, Oracle. All rights reserved. Managing Database Storage Structures.
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Oracle High Availability Solutions RAC and Standby Database Copyright System Managers LLC 2008.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
Delphix User Experience
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Introduction to Oracle Backup and Recovery
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
Simplify your Job – Automatic Storage Management Angelo Session id:
High Availability & Oracle RAC 18 Aug 2005 John Sheaffer Platform Solution Specialist
1 Copyright © 2009, Oracle. All rights reserved. Exploring the Oracle Database Architecture.
Re-defining Database Storage Management
PPOUG, 05-OCT-01 Agenda RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features.
Module 10 Configuring and Managing Storage Technologies.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Luca Canali, CERN Dawid Wojcik, CERN
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Backup & Recovery Backup and Recovery Strategies on Windows Server 2003.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
ASM Configuration Review Luca Canali, CERN-IT Distributed Database Operations Workshop CERN, November 26 th, 2009.
ASM General Architecture
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
6 Copyright © 2007, Oracle. All rights reserved. Managing Database Storage Structures.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
High Availability in DB2 Nishant Sinha
Oracle 10g Automatic Storage Management Overview of ASM as a Storage Option for Oracle 10g.
Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation.
CERN IT Department CH-1211 Genève 23 Switzerland t DBA Experience in a multiple RAC environment DM Technical Meeting, Feb 2008 Miguel Anjo.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Overview of Oracle Backup and Recovery Darl Kuhn, Regis University.
Scalable Oracle 10g for the Physics Database Services Luca Canali, CERN IT January, 2006.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
PIC port d’informació científica Luis Diaz (PIC) ‏ Databases services at PIC: review and plans.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
2 Copyright © 2006, Oracle. All rights reserved. RAC and Shared Storage.
Oracle Standby Implementation Tantra Invedy. Standby Database Introduction Fail over Solution Disaster Recovery Solution if remote Ease of implementation.
System Storage TM © 2007 IBM Corporation IBM System Storage™ DS3000 Series Jüri Joonsaar Tartu.
Open-E Data Storage Software (DSS V6)
Storage Area Networks The Basics.
Integrating Disk into Backup for Faster Restores
Managing Multi-User Databases
Backup & Recovery of Physics Databases
How To Pass Oracle 1z0-060 Exam In First Attempt?
High Availability Linux (HA Linux)
Flash Storage 101 Revolutionizing Databases
IT-DB Physics Services Planning for LHC start-up
Computer Hard Drive.
Direct Attached Storage and Introduction to SCSI
Maximum Availability Architecture Enterprise Technology Centre.
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Denny Cherry twitter.com/mrdenny
Oracle Database Monitoring and beyond
Storage Virtualization
Chapter 10: Mass-Storage Systems
Introduction of Week 6 Assignment Discussion
Direct Attached Storage and Introduction to SCSI
Oracle Storage Performance Studies
Case studies – Atlas and PVSS Oracle archiver
ASM-based storage to scale out the Database Services for Physics
Oracle Solutions for Data Archiving
CERN DB Services: Status, Activities, Announcements
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Specialized Cloud Architectures
ASM File Group Parity Protection New to ASM for Oracle Database 19c
ASM Database Clones New to ASM for Oracle Database 18c
Mass-Storage Systems (Disk Scheduling)
Presentation transcript:

Implementing ASM Without HW RAID, A User’s Experience 11/25/08 Implementing ASM Without HW RAID, A User’s Experience Luca Canali, CERN Dawid Wojcik, CERN UKOUG, Birmingham, December 2008 The title has to be worked on.

Outlook Introduction to ASM 11/25/08 Introduction to ASM Disk groups, fail groups, normal redundancy Scalability and Performance of the solution Possible pitfalls, sharing experiences Implementation details, monitoring, and tools to ease ASM deployment

Architecture and main concepts 11/25/08 Why ASM ? Provides functionality of volume manager and a cluster file system Raw access to storage for performance Why ASM-provided mirroring? Allows to use lower-cost storage arrays Allows to mirror across storage arrays arrays are not single points of failure Array (HW) maintenances can be done in a rolling way Stretch clusters

ASM and cluster DB architecture 11/25/08 Oracle architecture of redundant low-cost components Servers SAN Storage This is the architecture deployed at CERN for the Physics DBs, more on https://twiki.cern.ch/twiki/pub/PSSGroup/HAandPerf/Architecture_description.pdf

Files, extents, and failure groups 11/25/08 Files, extents, and failure groups Files and extent pointers Failgroups and ASM mirroring

ASM disk groups Example: HW = 4 disk arrays with 8 disks each 11/25/08 Example: HW = 4 disk arrays with 8 disks each An ASM diskgroup is created using all available disks The end result is similar to a file system on RAID 1+0 ASM allows to mirror across storage arrays Oracle RDBMS processes directly access the storage RAW disk access ASM Diskgroup Mirroring Striping Striping Failgroup1 Failgroup2

Performance and scalability 11/25/08 ASM with normal redundancy Stress tested for CERN’s use Scales and performs

Case Study: the largest cluster I have ever installed, RAC5 11/25/08 The test used:14 servers

Multipathed fiber channel 11/25/08 8 FC switches: 4Gbps (10Gbps uplink)

Many spindles 11/25/08 26 storage arrays (16 SATA disks each)

Case Study: I/O metrics for the RAC5 cluster 11/25/08 Measured, sequential I/O Read: 6 GB/sec Read-Write: 3+3 GB/sec Measured, small random IO Read: 40K IOPS (8 KB read ops) Note: 410 SATA disks, 26 HBAS on the storage arrays Servers: 14 x 4+4Gbps HBAs, 112 cores, 224 GB of RAM

How the test was run A custom SQL-based DB workload: 11/25/08 A custom SQL-based DB workload: IOPS: Probe randomly a large table (several TBs) via several parallel queries slaves (each reads a single block at a time) MBPS: Read a large (several TBs) table with parallel query The test table used for the RAC5 cluster was 5 TB in size created inside a disk group of 70TB Scripts are available on request

Possible pitfalls Production Stories Sharing experiences 11/25/08 Production Stories Sharing experiences 3 years in production, 550 TB of raw capacity

Rebalancing speed 11/25/08 Rebalancing is performed (and mandatory) after space management operations Typically after HW failures (restore mirror) Goal: balanced space allocation across disks Not based on performance or utilization ASM instances are in charge of rebalancing Scalability of rebalancing operations? In 10g serialization wait events can limit scalability Even at maximum speed rebalancing is not always I/O bound

Rebalancing, an example 11/25/08 Rebalancing, an example Rebalancing speed is measured in MB/minute to conform to V$ASM_OPERATION units Test conditions may vary the results (OS, storage, Oracle version, number of ASM files, etc)‏ It’s a good idea to repeat the measurements when several parameters of the environment change to get meaningful results.

VLDB and rebalancing 11/25/08 Rebalancing operations can move more data than expected Example: 5 TB (allocated): ~100 disks, 200 GB each A disk is replaced (diskgroup rebalance) The total IO workload is 1.6 TB (8x the disk size!) How to see this: query v$asm_operation, the column EST_WORK keeps growing during rebalance The issue: excessive repartnering Rebalancing in RAC is failed over when an instance crashes, but Does not restart if all instance are down (typical of single instance)‏ No obvious way to tell if a diskgroup has a pending rebalance op A partial work around is to query v$ASM_DISK to see if there are disk occupation imbalances total_mb and free_mb

Rebalancing issues wrap-up 11/25/08 Rebalancing can be slow Many hours for very large disk groups Risk associated 2nd disk failure while rebalancing Worst case - loss of the diskgroup because partner disks fail Similar problems with RAID5 systems volume rebuild

Fast Mirror Resync 11/25/08 ASM 10g with normal redundancy does not allow to offline part of the storage A transient error in a storage array can cause several hours of rebalancing to drop and add disks It is a limiting factor for scheduled maintenances 11g has new feature ‘fast mirror resync’ Great feature for rolling intervention on HW

ASM and filesystem utilities 11/25/08 Only a few tools can access ASM Asmcmd, dbms_file_transfer, xdb, ftp Limited operations (no copy, rename, etc) Require open DB instances file operations difficult in 10g 11g asmcmd has the copy command

ASM metadata corruption ASM and corruption 11/25/08 ASM metadata corruption Can be caused by ‘bugs’ One case in prod after disk eviction Physical data corruption ASM will fix automatically most corruption on primary extent Typically when doing a full backup Secondary extent corruption goes undetected untill disk failure/rebalance can expose it

For HA our experience is that disaster recovery is needed 11/25/08 Corruption issues were fixed using physical standby to move to ‘fresh’ storage For HA our experience is that disaster recovery is needed Standby DB On-disk (flash) copy of DB

Implementation details

Storage deployment 11/25/08 Current storage deployment for Physics Databases at CERN SAN, FC (4Gb/s) storage enclosures with SATA disks (8 or 16) Linux x86_64, no ASM lib, device mapper instead (naming persistence + HA) Over 150 FC storage arrays (production, integration and test) and ~ 2000 LUNs exposed Biggest DB over 7TB (more to come when LHC starts – estimated growth up to 11TB/year)

Storage deployment ASM implementation details 11/25/08 ASM implementation details Storage in JBOD configuration (1 disk -> 1 LUN) Each disk partitioned on OS level 1st partition – 45% of disk size – faster part of disk – short stroke 2nd partition – rest – slower part – full stroke inner sectors – full stroke outer sectors – short stroke

Storage deployment Two diskgroups created for each cluster 11/25/08 Two diskgroups created for each cluster DATA – data files and online redo logs – outer part of the disks RECO – flash recovery area destination – archived redo logs and on disk backups – inner part of the disks One failgroup per storage array Failgroup1 Failgroup2 Failgroup3 Failgroup4 DATA_DG1 RECO_DG1

Storage management 11/25/08 SAN configuration in JBOD configuration – many steps, can be time consuming Storage level logical disks LUNs mappings FC infrastructure – zoning OS – creating device mapper configuration multipath.conf – name persistency + HA

Storage management Storage manageability 11/25/08 Storage manageability DBAs set-up initial configuration ASM – extra maintenance in case of storage maintenance (disk failure) Problems How to quickly set-up SAN configuration How to manage disks and keep track of the mappings: physical disk -> LUN -> Linux disk -> ASM Disk SCSI [1:0:1:3] & [2:0:1:3] -> /dev/sdn & /dev/sdax -> /dev/mpath/rstor901_3 -> ASM – TEST1_DATADG1_0016

Storage management Solution 11/25/08 Solution Configuration DB - repository of FC switches, port allocations and of all SCSI identifiers for all nodes and storages Big initial effort Easy to maintain High ROI Custom tools Tools to identify SCSI (block) devices <-> device mapper device <-> physical storage and FC port Device mapper mapper device <-> ASM disk Automatic generation of device mapper configuration

SCSI id (host,channel,id) -> storage name and FC port Storage management [ ~]$ lssdisks.py The following storages are connected: * Host interface 1: Target ID 1:0:0: - WWPN: 210000D0230BE0B5 - Storage: rstor316, Port: 0 Target ID 1:0:1: - WWPN: 210000D0231C3F8D - Storage: rstor317, Port: 0 Target ID 1:0:2: - WWPN: 210000D0232BE081 - Storage: rstor318, Port: 0 Target ID 1:0:3: - WWPN: 210000D0233C4000 - Storage: rstor319, Port: 0 Target ID 1:0:4: - WWPN: 210000D0234C3F68 - Storage: rstor320, Port: 0 * Host interface 2: Target ID 2:0:0: - WWPN: 220000D0230BE0B5 - Storage: rstor316, Port: 1 Target ID 2:0:1: - WWPN: 220000D0231C3F8D - Storage: rstor317, Port: 1 Target ID 2:0:2: - WWPN: 220000D0232BE081 - Storage: rstor318, Port: 1 Target ID 2:0:3: - WWPN: 220000D0233C4000 - Storage: rstor319, Port: 1 Target ID 2:0:4: - WWPN: 220000D0234C3F68 - Storage: rstor320, Port: 1 SCSI Id Block DEV MPath name MP status Storage Port ------------- ---------------- -------------------- ---------- ------------------ ----- [0:0:0:0] /dev/sda - - - - [1:0:0:0] /dev/sdb rstor316_CRS OK rstor316 0 [1:0:0:1] /dev/sdc rstor316_1 OK rstor316 0 [1:0:0:2] /dev/sdd rstor316_2 FAILED rstor316 0 [1:0:0:3] /dev/sde rstor316_3 OK rstor316 0 [1:0:0:4] /dev/sdf rstor316_4 OK rstor316 0 [1:0:0:5] /dev/sdg rstor316_5 OK rstor316 0 [1:0:0:6] /dev/sdh rstor316_6 OK rstor316 0 . . . Custom made script SCSI id (host,channel,id) -> storage name and FC port SCSI ID -> block device -> device mapper name and status -> storage name and FC port

device mapper name -> ASM disk and status Storage management [ ~]$ listdisks.py DISK NAME GROUP_NAME FG H_STATUS MODE MOUNT_S STATE TOTAL_GB USED_GB ---------------- ------------------ ------------- ---------- ---------- ------- -------- ------- ------ ----- rstor401_1p1 RAC9_DATADG1_0006 RAC9_DATADG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 111.8 68.5 rstor401_1p2 RAC9_RECODG1_0000 RAC9_RECODG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 119.9 1.7 rstor401_2p1 -- -- -- UNKNOWN ONLINE CLOSED NORMAL 111.8 111.8 rstor401_2p2 -- -- -- UNKNOWN ONLINE CLOSED NORMAL 120.9 120.9 rstor401_3p1 RAC9_DATADG1_0007 RAC9_DATADG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 111.8 68.6 rstor401_3p2 RAC9_RECODG1_0005 RAC9_RECODG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 120.9 1.8 rstor401_4p1 RAC9_DATADG1_0002 RAC9_DATADG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 111.8 68.5 rstor401_4p2 RAC9_RECODG1_0002 RAC9_RECODG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 120.9 1.8 rstor401_5p1 RAC9_DATADG1_0001 RAC9_DATADG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 111.8 68.5 rstor401_5p2 RAC9_RECODG1_0006 RAC9_RECODG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 120.9 1.8 rstor401_6p1 RAC9_DATADG1_0005 RAC9_DATADG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 111.8 68.5 rstor401_6p2 RAC9_RECODG1_0007 RAC9_RECODG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 120.9 1.8 rstor401_7p1 RAC9_DATADG1_0000 RAC9_DATADG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 111.8 68.6 rstor401_7p2 RAC9_RECODG1_0001 RAC9_RECODG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 120.9 1.8 rstor401_8p1 RAC9_DATADG1_0004 RAC9_DATADG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 111.8 68.6 rstor401_8p2 RAC9_RECODG1_0004 RAC9_RECODG1 RSTOR401 MEMBER ONLINE CACHED NORMAL 120.9 1.8 rstor401_CRS1 rstor401_CRS2 rstor401_CRS3 rstor402_1p1 RAC9_DATADG1_0015 RAC9_DATADG1 RSTOR402 MEMBER ONLINE CACHED NORMAL 111.8 59.9 . . . Custom made script device mapper name -> ASM disk and status

device mapper alias – naming persistency and multipathing (HA) Storage management [ ~]$ gen_multipath.py # multipath default configuration for PDB defaults { udev_dir /dev polling_interval 10 selector "round-robin 0" . . . } multipaths { multipath { wwid 3600d0230006c26660be0b5080a407e00 alias rstor916_CRS wwid 3600d0230006c26660be0b5080a407e01 alias rstor916_1 Custom made script device mapper alias – naming persistency and multipathing (HA) SCSI [1:0:1:3] & [2:0:1:3] -> /dev/sdn & /dev/sdax -> /dev/mpath/rstor916_1

Storage monitoring ASM-based mirroring means ASM level monitoring 11/25/08 ASM-based mirroring means Oracle DBAs need to be alerted of disk failures and evictions Dashboard – global overview – custom solution – RACMon ASM level monitoring Oracle Enterprise Manager Grid Control RACMon – alerts on missing disks and failgroups plus dashboard Storage level monitoring RACMon – LUNs’ health and storage configuration details – dashboard

Storage monitoring ASM instance level monitoring 11/25/08 ASM instance level monitoring Storage level monitoring new failing disk on RSTOR614 new disk installed on RSTOR903 slot 2

Oracle ASM diskgroups with normal redundancy Conclusions 11/25/08 Oracle ASM diskgroups with normal redundancy Used at CERN instead of HW RAID Performance and scalability are very good Allows to use low-cost HW Requires more admin effort from the DBAs than high end storage 11g has important improvements Custom tools to ease administration

Thank you Q&A Links: http://cern.ch/phydb http://www.cern.ch/canali 11/25/08 Thank you Links: http://cern.ch/phydb http://www.cern.ch/canali