Download presentation
Presentation is loading. Please wait.
Published byJonas Dawson Modified over 9 years ago
1
Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc
2
The Un-”Show Stopper” NAS for Oracle is not “file serving”, let me explain… Think of GbE NFS I/O paths from Oracle Servers to the NAS device that are totally direct. No VLANing sort of indirection. –In these terms, NFS over GbE is just a protocol as is FCPover Fiber Channel –The proof is in the numbers. A single dual-socket/dual-core ADM server running Oracle10gR2 can push through 273MB/s of large I/Os (scattered reads, direct path read/write, etc) of triple-bonded GbE NICs! Compare that to infrastructure and HW costs of 4GbE FCP (~450MB/s, but you need 2 cards for redundancy) –OLTP over modern NFS with GbE is not a challenging I/O profile. However, not all NAS devices are created equal by any means
3
Agenda Oracle on NAS NAS Architecture Proof of Concept Testing Special Characteristics
4
Oracle on NAS
5
Connectivity –Fantasyland Dream Grid™ would be nearly impossible with FibreChannel switched fabric, for instance: 128 nodes == 256 HBAs, 2 switches each with 256 ports just for the servers then you have to work out storage paths Simplicity –NFS is simple. Anyone with a pulse can plug in cat-5 and mount filesystems. –MUCH MUCH MUCH MUCH MUCH simpler than: Raw partitions for ASM Raw, OCFS2 for CRS Oracle Home? Local Ext3 or UFS? What a mess –Supports shared Oracle Home, shared APPL_TOP too –But not simpler than a Certified Third Party Cluster Filesystem, but that is a different presentation Cost –FC HBAs are always going to be more expensive than NICs –Ports on enterprise-level FC switches are very expensive
6
Oracle on NAS NFS Client Improvements –Direct IO open(,O_DIRECT,) works with Linux NFS clients, Solaris NFS client, likely others Oracle Improvements init.ora filesystemio_options=directIO No async I/O on NFS, but look at the numbers Oracle runtime checks mount options Caveat: It doesn’t always get it right, but at least it tries (OSDS) Don’t be surprised to see Oracle offer a platform-independent NFS client NFS V4 will have more improvements
7
NAS Architecture
8
Single-headed Filers Clustered Single-headed Filers Asymmetrical Multi-headed NAS Symmetrical Multi-headed NAS
9
Single Headed Filer Architecture
10
NAS Architecture: Single-headed Filer Filesystems /u01 /u02 /u03 GigE Network
11
Oracle Database Servers Filesystems /u01 /u02 /u03 A single one of these… Has the same (or more) bus bandwidth as this! Oracle Servers Accessing a Single-headed Filer: I/O Bottleneck I/O Bottleneck
12
Oracle Servers Accessing a Single-headed Filer: Single Point of Failure Oracle Database Servers Filesystems /u01 /u02 /u03 Single Point of Failure Highly Available through failover-HA, DataGuard, RAC, etc
13
Clustered Single-headed Filers
14
Architecture: Cluster of Single-headed Filers Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover
15
Oracle Servers Accessing a Cluster of Single-headed Filers Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover Oracle Database Servers
16
Architecture: Cluster of Single-headed Filers Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover Oracle Database Servers What if /u03 I/O saturates this Filer?
17
Filer I/O Bottleneck. Resolution == Data Migration Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover Oracle Database Servers Filesystems /u04 Migrate some of the “hot” data to /u04
18
Data Migration Remedies I/O Bottleneck Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover Oracle Database Servers Filesystems /u04 Migrate some of the “hot” data to /u04 NEW Single Point of Failure
19
Summary: Single-headed Filers Cluster to mitigate S.P.O.F –Clustering is a pure afterthought with filers –Failover Times? Long, really really long. –Transparent? Not in many cases. Migrate data to mitigate I/O bottlenecks –What if the data “hot spot” moves with time? The Dog Chasing His Tail Syndrome Poor Modularity Expanded by pairs for data availability What’s all this talk about CNS?
20
Asymmetrical Multi-headed NAS Architecture
21
FibreChannel SAN … … Three Active NAS Heads / Three For Failover and “Pools of Data” Note: Some variants of this architecture support M:1 Active:Standby but that doesn’t really change much. Oracle Database Servers SAN Gateway
22
Asymmetrical NAS Gateway Architecture Really not much different than clusters of single-headed filers: –1 NAS head to 1 filesystem relationship –Migrate data to mitigate I/O contention –Failover not transparent But: –More Modular Not necessary to scale up by pairs
23
Symmetric Multi-headed NAS
24
HP Enterprise File Services Clustered Gateway
25
Symmetric vs Asymmetric NAS Head NAS Head NAS Head /Dir1/File1/Dir2/File2/Dir3/File3 /Dir1/File1/Dir2/File2/Dir3/File3 /Dir2/File2 NAS Head NAS Head NAS Head /Dir1/File1 /Dir2/File2 /Dir3/File3 /Dir2/File2 /Dir1/File1 /Dir2/File2 /Dir1/File1 EFS-CG
26
Enterprise File Services Clustered Gateway Component Overview Cluster Volume Manager –RAID 0 –Expand Online Fully Distributed, Symmetric Cluster Filesystem –The embedded filesystem is a fully distributed, symmetric cluster filesystem Virtual NFS Services –Filesystems are presented through Virtual NFS Services Modular and Scalable –Add NAS heads without interruption –All filesystems can be presented for read/write through any/all NAS heads
27
EFS-CG Clustered Volume Manager RAID 0 –LUNS are RAID 1, so this implements S.A.M.E. Expand online –Add LUNS, grow volume Up to 16TB –Single Volume
28
The EFS-CG Filesystem All NAS devices have embedded operating systems and file systems, but the EFS-CG is: –Fully Symmetric Distributed Lock Manager No Metadata Server or Lock Server –General Purpose clustered file system –Standard C Library and POSIX support –Journaled with Online recovery Proprietary format but uses standard Linux file system semantics and system calls including flock() and fcntl() clusterwide Expand a single filesystem online up to 16TB, up to 254 filesystems in current release.
29
EFS-CG Filesystem Scalability
30
Scalability. Single Filesystem Export Using x86 Xeon-based NAS Heads (Old Numbers) 123 246 493 739 986 1,084 1,196 0 200 400 600 800 1,000 1,200 MegaBytes per Second (MB/s) 12468910 Cluster Size (Nodes) HP StorageWorks Clustered File System is optimized for both READ and WRITE performance. Approximate Single- headed Filer limit NAS Heads
31
Virtual NFS Services Specialized Virtual Host IP Filesystem groups are exported through VNFS VNFS failover and rehosting are 100% transparent to NFS client –Including active file descriptors, file locks (e.g. fctnl/flock), etc
32
EFS-CG Filesystems and VNFS
33
/u01 /u02 NAS Head /u04 /u03 vnfs2b /u03 NAS Head /u01 vnfs1 Enterprise File Services Clustered Gateway /u04 NAS Head /u02 NAS Head /u04 /u03 vnfs1bvnfs3b … Enterprise File Services Clustered Gateway Oracle Database Servers
34
EFS-CG Management Console
35
EFS-CG Proof of Concept
36
Goals –Use Oracle10g (10.2.0.1) with a single high performance filesystem for the RAC database and measure: –Durability –Scalability –Virtual NFS functionality
37
EFS-CG Proof of Concept The 4 filesystems presented by the EFS-CG were: –/u01. This filesystems contained all Oracle executables (e.g., $ORACLE_HOME) –/u02. This filesystem contained the Oracle10gR2 clusterware files (e.g., OCR, CSS) and some datafiles and External Tables for ETL testing –/u03. This filesystem was lower-performance space used for miscellaneous tests such as backup disk-to-disk –/u04. This filesystem resided on a high-performance volume that spanned two storage arrays. It contained the main benchmark database
38
EFS-CG P.O.C. Parallel Tablespace Creation All datafiles created in a single exported filesystem –Proof of multi-headed, single filesystem write scalability
39
EFS-CG P.O.C. Parallel Tablespace Creation
40
EFS-CG P.O.C. Full Table Scan Performance All datafiles located in a single exported filesystem –Proof of multi-headed, single filesystem sequential I/O scalability
41
EFS-CG P.O.C. Parallel Query Scan Throughput
42
EFS-CG P.O.C. OLTP Testing OLTP Database based on an Order Entry Schema and workload Test areas –Physical I/O Scalability under Oracle OLTP –Long Duration Testing
43
EFS-CG P.O.C. OLTP Workload Transaction Avg Cost Oracle StatisticsAverage Per Transaction SGA Logical Reads33 SQL Executions5 Physical I/O6.9 * Block Changes8.5 User Calls6 GCS/GES Messages Sent12 * Averages with RAC can be deceiving, be aware of CR sends
44
EFS-CG P.O.C. OLTP Testing
45
EFS-CG P.O.C. OLTP Testing. Physical I/O Operations
46
EFS-CG Handles all OLTP I/O Types Sufficiently—no Logging Bottleneck
47
Long Duration Stress Test Benchmarks do not prove durability –Benchmarks are “sprints” –Typically 30-60 minute measured runs (e.g., TPC-C) This long duration stress test was no benchmark by any means –Ramp OLTP I/O up to roughly 10,000/sec –Run non-stop until the aggregate I/O breaks through 10 Billion physical transfers –10,000 physical I/O transfers per second for every second of nearly 12 days
48
Long Duration Stress Test
52
Special Characteristics
53
The EFS-CG NAS Heads are Linux Servers –Tasks can be executed directly within the EFS-CG NAS Heads at FCP speed: –Compression –ETL, data importing –Backup –etc..
54
Example of EFS-CG Special Functionality A table is exported on one of the RAC nodes The export file is then compressed on the EFS-CG NAS head: –CPU from NAS Head, instead of database servers The NAS heads are really just protocol engines. I/O DMAs are offloaded to the I/O subsysystems. There are plenty of spare cycles. –Data movement at FCP rate instead of GigE Offload the I/O fabric (NFS paths from servers to the EFS-CG)
55
Export a Table to NFS Mount
56
Compress it on the NAS Head
57
Questions and Answers
58
Backup Slide
59
EFS-CG NAS Head SAN Ethernet SwitchFiberChannel Switches … 3 GbE NFS Paths: Can be triple bonded, etc EFS-CG Scales “Up” and “Out”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.