Database Storage for Dummies Adam Backman President – White Star Software, LLC.

Slides:



Advertisements
Similar presentations
Archive Task Team (ATT) Disk Storage Stuart Doescher, USGS (Ken Gacke) WGISS-18 September 2004 Beijing, China.
Advertisements

Buying Database Hardware Adam Backman – President White Star Software, LLC.
Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Introduction to Storage Area Network (SAN) Jie Feng Winter 2001.
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
Module – 3 Data protection – raid
T OP N P ERFORMANCE T IPS Adam Backman Partner, White Star Software.
Numbers, We don’t need no stinkin’ numbers Adam Backman Vice President DBAppraise, Llc.
File Systems.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
Chapter 3 Presented by: Anupam Mittal.  Data protection: Concept of RAID and its Components Data Protection: RAID - 2.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.
High Performance Computing Course Notes High Performance Storage.
Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.
OS and Hardware Tuning. Tuning Considerations Hardware  Storage subsystem Configuring the disk array Using the controller cache  Components upgrades.
I/O Systems and Storage Systems May 22, 2000 Instructor: Gary Kimura.
VIRTUALIZATION AND YOUR BUSINESS November 18, 2010 | Worksighted.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
12.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 12: Mass-Storage Systems.
Secondary Storage Unit 013: Systems Architecture Workbook: Secondary Storage 1G.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
DAS Last Update Copyright Kenneth M. Chipps Ph.D. 1.
11 Capacity Planning Methodologies / Reporting for Storage Space and SAN Port Usage Bob Davis EMC Technical Consultant.
SQL Server 2008 Implementation and Maintenance Chapter 7: Performing Backups and Restores.
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
Database Storage Considerations Adam Backman White Star Software DB-05:
Architecture of intelligent Disk subsystem
Hardware (The part you can kick). Overview  Selection Process  Equipment Categories  Processors  Memory  Storage  Support.
© 2009 IBM Corporation IBM Systems & Technology Group System x and BladeCenter® Why BladeCenter S SAN is the Right Choice Lowest cost, lowest complexity,
Nexenta Proprietary Global Leader in Software Defined Storage Nexenta Technical Sales Professional (NTSP) COURSE CONTENT.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Data Storage CPTE 433 John Beckett. The Paradox “If I can go to a computer store and buy 1000 gigabytes for $50, why does it cost more in your server.
Database Edition for Sybase Sales Presentation. Market Drivers DBAs are facing immense time pressure in an environment with ever-increasing data Continuous.
Module – 4 Intelligent storage system
Top 10 Performance Hints Adam Backman White Star Software
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
DONE-08 Sizing and Performance Tuning N-Tier Applications Mike Furgal Performance Manager Progress Software
Chapter 12 – Mass Storage Structures (Pgs )
© 2011 IBM Corporation Sizing Guidelines Jana Jamsek ATS Europe.
The concept of RAID in Databases By Junaid Ali Siddiqui.
VMware vSphere Configuration and Management v6
Buying Database Hardware Adam Backman – President White Star Software, LLC.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Chapter 14: Mass-Storage Systems Disk Structure. Disk Scheduling. RAID.
Page 1 Mass Storage 성능 분석 강사 : 이 경근 대리 HPCS/SDO/MC.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Lecture Topics: 11/22 HW 7 File systems –block allocation Unix and NT –disk scheduling –file caches –RAID.
RAID Presentation Raid is an acronym for “Redundant array of independent Drives”, or Redundant array of inexpensive drives”. The main concept of RAID is.
What is raid? RAID is the term used to describe a storage systems' resilience to disk failure through the use of multiple disks and by the use of data.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 10: Mass-Storage Systems.
Adam Backman Chief Cat Wrangler – White Star Software
Vladimir Stojanovic & Nicholas Weaver
Storage Networking.
Introduction to Networks
Introduction to Networks
Storage Virtualization
Oracle Storage Performance Studies
Storage Networking.
Presentation transcript:

Database Storage for Dummies Adam Backman President – White Star Software, LLC

Introduction  Why is database storage important?  OpenEdge considerations  What is RAID?  Hardware vs. Software RAID  How to buy disk capacity  Hardware options  Wrap up

About the speaker  President – White Star Software −Serving the Progress community since 1985 −Consulting  Application support (design, build, review, …)  Administration (Database, operating system, storage) −Training (Application and administration)  Vice President – DBAppraise −Simplifying the task of monitoring and managing the worlds best business applications −Remote management of your OpenEdge environment by the worlds best OpenEdge administrators

Why is storage so important? Storage Goals −Reliability Protection of your data from data loss −Availability Data is available to the users −Performance Uniform application response time under varying workloads These different goals tend to work against each other

 Everything starts on the disk  When tuning, it is the second most likely cause for performance issues after application code  Disk is an order of magnitude slower than memory  Performance tuning is a process of pushing the bottleneck to the fastest resource  Network Disk Memory CPU Why is storage so important?

OpenEdge Considerations  Type II storage area  Database block size  Records per block  Before image cluster size  After image I/O

Records Type I Storage Areas  Data blocks are social −They allow data from any table in the area to be stored within a single block −Index blocks only contain data for a single index  Data and index blocks can be tightly interleaved potentially causing scatter

Database Blocks

Type II Storage Areas  Data is clustered  Clustering improves performance −Better proximity of data −Less disk head movement −Able to take advantage of read ahead algorithms  Performance increase has been tested and proven to be (nearly) universal  Dump and load or table/index move required to move from type I to type II areas

Type II Storage Areas  Data is clustered together  A cluster will only contain records from a single table  A cluster can contain 8, 64 or 512 blocks  This helps performance as data scatter is reduced  Disk arrays have a feature called read-ahead that really improves efficiency with type II areas

Type II Clusters Order Line OrderCustomer

Storage Areas Compared Data Block Index Block Data Block Index Block Data Block Index Block Data Block Index Block Type IType II Cluster

Importance of database block size  OpenEdge® default block size is still 1kb  Larger block allow for more records or index entries per physical operation  Matching operating system and database block sizes improves efficiency −Generally 8k block size is best for most −4k is best for Windows servers  Larger blocks *may* require a higher records per block setting

Records per block setting  Each area can have a different setting  Default is 64, this is generally too low for 8k blocks  Calculation: Database-block-size/(Mean-record-size + 20) = Maximum-Rec/block 20 is the approximate overhead per record Records per block setting should equal the next HIGHER binary number between 1 and 256  Set too low and you waste space and reduce the efficiency of every read operation  Set too high and run the risk of record fragmentation

Example: Records /Block  Mean record size = 90  Add 20 bytes for overhead ( = 110)  Divide product into database blocksize Example: 8192 ÷ 110 =  Choose next higher binary number 128  Default records per block is 64

Before Image Cluster Size  One the best ways to improve OLTP performance  Default has been increased to 512k but in most cases a setting above 4096k (4MB) is more appropriate  BI cluster size determines the amount of transactional (create, update, delete) work that is done between checkpoints  The goal is to have 120 seconds between checkpoints at you busiest update time of the day

After Image affect on I/O decisions  After Imaging provides an extra level of data protection in case of media loss  Everyone should be using after imaging  After image files should be isolated from the rest of the database −Best case: different physical hardware −Usual scenario: Different file system  Writes to this file have the potential to be expensive

What RAID really means Most commonly used RAID levels:  RAID 0: This level is also called striping.  RAID 1: This is referred to as mirroring.  RAID 5: Most controversial RAID level  RAID 10: This is mirroring and striping. Also known as RAID (OpenEdge preferred)

Raid 0: Striping Stripe 1 Stripe 2 Stripe 3 Stripe 4... Disk 1 Disk 2 Disk 3 Volume Group DiskArray Stripes 1, 4, … Stripes 2, 5, …Stripes 3, 6, …

RAID 0: Striping (cont.)  Good for read and write I/O performance  No failover protection  lower data reliability (1 fails they all fail)

What is Stripe Width?  Also called chunk size  Stripe width is the amount of data put on a physical volume before moving to the next disk in the set  128k is a good stripe width for 8k block size databases but performance has been proven to increase with even larger stripe widths (upto and including 2MB tested)

RAID 1: Mirroring Primary Parity Disk 1 Disk 2 Parity 1 Parity 2

RAID 1: Mirroring (cont.)  OK for read and write applications  Good failover protection  High data reliability  Most expensive in terms of hardware

RAID 5: Poor man’s mirroring  User information is striped  Parity information is striped with user info −Write primary data −Calculate parity −Write parity  Good for read intensive applications  Poor performance for writes after cache is exhausted  Single disk failure is protected but performance will suffer

RAID 10: Mirroring and Striping  Ideal for both read, write or mixed applications  High level of data reliability though not as high as RAID 1 due to striping  Just as expensive as RAID 1  Generally, the recommended RAID level for most OpenEdge applications

RAID 10 vs. RAID 5 cache fill rate Typical Production DB Example: 4GB / ( 200 io/sec – 800 io/sec ) = cache doesn’t fill! fillTime = cacheSize / (requestRate – serviceRate) Maintenance Example: 4GB / ( 5000 io/sec – 3200 io/sec ) = 583 sec. (≈ 10 min.) (RAID10) 4GB / ( 5000 io/sec – 200 io/sec ) = 218 sec (≈ 4 min.) (RAID5) Heavy Update Production DB Example: 4GB / ( 1200 io/sec – 800 io/sec ) = 2621 sec. (≈ 44 min.) (RAID10) 4GB / ( 1200 io/sec – 200 io/sec ) = 1049 sec. (≈ 17 min.) (RAID5) 4 disks RAID10 vs RAID5 4KB db blocks 4GB RAM cache ( blocks)

Hardware vs. Software RAID  Software RAID −Uses primary CPU resources −Less scalable −Generally less expensive  Hardware RAID −The preferred option −Dedicated resources (memory and CPU) for storage −Much greater scalability −More expensive up front but you pay once and reap the benefits for the life of the hardware

Buying Disks  Buy small disks (individual drives) Each disk regardless of it’s size is capable of doing approximately the same number of I/Os per second  Buy fast disks Slow disk = slow performance  Buy reliable disks  Buy many disks The outer portion of the disk is up to 20% faster than the inner portion of the disk  Try to leave room for inexpensive growth −Upgrades tend to be more expensive

Buying Disk Arrays Considerations Include:  Reliability  Features (remote mirroring, software, …)  Storage capacity  Throughput capacity  Support capacity  Replacement/upgrade path

Hardware options  Many excellent options for all size operations  Small to medium scale −iSCSI −Low cost −General availability components (SAS, SCSI, non-fibre channel)  Medium to large scale −SAN −mid-range cost (fibre channel drives, Switches, …) −Good scalability  Enterprise scale −Fibre throughout the array −Nearly unlimited scalability

Hardware Examples – Architecture  Direct Attached Storage −SAS and SATA drives −No dedicated array cache −Single path  iSCSI −SAS, SATA and fibre channel drives −Limited array cache −Multi-path  SAN −Fibre channel drives −Multi-path throughout the array

Network Attached Storage (NAS)  The most well known NAS company is NetApp  NAS devices are great for file storage  NetApp even calls their device a “filer”  These are NOT good devices for databases due to the file vs. block nature of the storage

Hardware options – Upgrade Path  Few people consider the replacement of their storage when first making the array purchase  Replacement is still the way that most small to medium size systems are upgraded  In-place upgrades are that require downtime or reconfiguration are the hallmark of midrange arrays  Zero downtime in-place upgrades are now the norm in enterprise arrays.

Hardware Examples  Equilogic – Great availability, reasonable pricing  EMC VNXe – Lower end EMC versus VMAX  HP EVA – Super ease of use, self tuning  Hitachi Data Systems – Ultra scalable  IBM XIV – Commodity hardware, enterprise features  FusionIO – Ultra fast but VERY expensive and mostly unproven technology

Points to Remember  Disks are a good place to put money in the hardware acquisition process  Take advantage by optimizing your database storage −Type II −Database block size −BI cluster size −Isolating After Image extents  Buy what you need but remember that you may need to upgrade so buy with growth in mind  People generally overbuy CPU capacity and under buy disk throughput capacity

Questions

Thank you for your time!