Building Things :Disks Databases. Introduction This is a mix and match section It contains all the information that does not have a home elsewhere in.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

File Management.
Memory.
Virtual Memory on x86. Virtual Memory Refresher Problems with using raw memory: – Isolation Protect system and programs from other programs Two programs.
1 Overview Assignment 4: hints Memory management Assignment 3: solution.
Paging: Design Issues. Readings r Silbershatz et al: ,
Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
Part IV: Memory Management
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
File Systems.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Chapter 10: File-System Interface
Memory Management Norman White Stern School of Business.
Other Disk Details. 2 Disk Formatting After manufacturing disk has no information –Is stack of platters coated with magnetizable metal oxide Before use,
Operating Systems File Systems (in a Day) Ch
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
Computer Organization and Architecture
12: IO Systems1 I/O SYSTEMS This Chapter is About Processor I/O Interfaces: Connections exist between processors, memory, and IO devices. The OS must manage.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
I/O Systems and Storage Systems May 22, 2000 Instructor: Gary Kimura.
Memory Allocation CS Introduction to Operating Systems.
Storage and NT File System INFO333 – Lecture Mariusz Nowostawski Noria Foukia.
Day 10 Hardware Fault Tolerance RAID. High availability All servers should be on UPSs –2 Types Smart UPS –Serial cable connects from UPS to computer.
Database Storage Considerations Adam Backman White Star Software DB-05:
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
CSI-09 COMMUNICATION TECHNOLOGY FAULT TOLERANCE AUTHOR: V.V. SUBRAHMANYAM.
Review of Memory Management, Virtual Memory CS448.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 8: Main Memory.
IOS110 Introduction to Operating Systems using Windows Session 5 1.
Chapter 3.5 Memory and I/O Systems. 2 Memory Management Memory problems are one of the leading causes of bugs in programs (60-80%) MUCH worse in languages.
File Systems CSCI What is a file? A file is information that is stored on disks or other external media.
Memory Management – Page 1 of 49CSCI 4717 – Computer Architecture Memory Management Uni-program – memory split into two parts –One for Operating System.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
1 Address Translation Memory Allocation –Linked lists –Bit maps Options for managing memory –Base and Bound –Segmentation –Paging Paged page tables Inverted.
Adaptive Server IQ Multiplex. AS IQ Multiplex All of the released software for ASIQ is now Multiplex enabled All platforms are now Multiplex enabled You.
Disk & File System Management Disk Allocation Free Space Management Directory Structure Naming Disk Scheduling Protection CSE 331 Operating Systems Design.
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
File Systems cs550 Operating Systems David Monismith.
It consists of two parts: collection of files – stores related data directory structure – organizes & provides information Some file systems may have.
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
File Systems - Part I CS Introduction to Operating Systems.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Lecture Topics: 11/22 HW 7 File systems –block allocation Unix and NT –disk scheduling –file caches –RAID.
Lesson Objectives Aims Key Words Paging, Segmentation, Virtual Memory
Memory COMPUTER ARCHITECTURE
Lecture 12 Virtual Memory.
Virtual Memory - Part II
CS703 - Advanced Operating Systems
File System Structure How do I organize a disk into a file system?
CSE451 I/O Systems and the Full I/O Path Autumn 2002
Swapping Segmented paging allows us to have non-contiguous allocations
Database Performance Tuning and Query Optimization
Chapter 8: Main Memory.
RAID RAID Mukesh N Tekwani
O.S Lecture 13 Virtual Memory.
CS Introduction to Operating Systems
Create Database and Start Server
CSE451 Virtual Memory Paging Autumn 2002
Chapter 11 Database Performance Tuning and Query Optimization
2 Disks and Things.
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

Building Things :Disks Databases

Introduction This is a mix and match section It contains all the information that does not have a home elsewhere in this presentation It covers –Disk layouts –Disk Farms –Striping on Striping –Database build

Disk Layouts Basically in an IQ-M system you can have a number of different disk configurations We consider the following –Simple SCSI (or UDMA-[1]66) Disks –Basic Disk Farms (Collections of raw disks) –RAID Systems (NT RAID, Clariion etc.) –Disk Subsystems (EMC, MTI Guardian etc.)

Basic Disk Farms Here we are considering a number of dumb disks There is no intelligence in the disks – or the controllers Here is where we use IQ-M striping –Here we have a number of disk drives (on various controllers) –We stripe across a series of disks

Disk Striping Basic Disks 1-dimensional IQ Striping In the 1-dimensional striping we are spreading the data writes (and reads) across a series of disk drives – limiting the overall disk head movement.

Problems with Disk Striping IQ-M has to make location decisions as to where to write and read the data It is not as simple as writing to bit bucket or a serial list of disk blocks It is faster than simple 1 disk = 1 device The performance hit is measured in extra micro seconds per read/write – but with a 10 TBytes database this could be important

ADCB RAID – 0,1 Redundant Array of Inexpensive Disks Raid 1 - Mirror In RAID 1 – Mirror, we can consider that Disks A and B are mirrored and Disks C and D This improves read and write performance, as the controller will pick which disk heads are closer to the data required. Also there is an improvement in data security – 2 copies ADCB Raid 0 - Stripe In RAID 0 – Stripe, we can spread the reads and writes as per the last slide This dramatically improves the read and write performance, but does nothing for the data security

ADCB RAID – 0/1 and 5 Redundant Array of Inexpensive Disks Raid 0/1 – Stripe and Mirror In RAID 0/1 – Stripe and Mirror, we can see that the stripe sets on A and B mirror to C and D This gives security – in the mirroring (copies) of the data, and improved performance due both to the two copies and the stripe sets ADCB Raid 5 – Checksum Multi-read In RAID 5 – Csum/MR, The data is in stripes across the disks then a checksum is written to the final disk This gives a small performance improvement for reads, no improvement for writes (in fact it will slow writes down!). The only bonus is security. The system will allow a disk failure in the pack without requiring the 2 to 1 overhead of mirroring systems

Other RAID There is RAID 2, 3 and 4 These are variations on the mirroring/striping and checksum theme There are also proprietary RAID schemes –RAID SThe is EMCs RAID system –RAID MThis is the MTI RAID Scheme We will come on to talk about proprietary schemes in a few slides time

Well is RAID useful ? Yes it is – mainly ! If the RAID system handles the data distribution at the hardware level (with RAID controllers) then RAID is both fast and safe Windows NT (and 2000) can drive RAID through the operating system This can be slower than IQ-M performing the striping –Watch out for large numbers of reads – and individual read performance slowing down –Also Avoid RAID 5. It is very slow for writes

Disk Subsystems (emc, HDS) These are disk arrays where we have no idea what the internal organization of the disks is (and nor do we care) Generally we can consider that we are writing to memory (and usually we are!) The tuning of the disks is usually best left up to the Hardware Support guys

Striping on Striping There is one very contentious area in disk organisation This is 2 dimensional disk striping or striping on stripes This is driven by using an operating system or RAID unit to provide 1 dimensional striping on a series of disks Then we apply IQ striping onto the existing striping 2 Dimensional stripes Do it right – its very fast, do it wrong and it isnt!

A diagram Operating System Striping AS IQ-M Striping The reason this works is now the individual disks only have a very small amount of data to read and write so we rely on very fast disk -> processor communications If this is done properly it is the fastest disk access for ASIQ-M The secret, if there is one, is to make the block size the same as the micro stripe size… But this needs further experimentation

Disk Striping By default Disk_Striping is set ON for RAW devices and OFF for file system devices We should be using RAW devices – they are (generally) a lot faster and potentially safer than file system devices You may want to play with this parameter if you are running devices on disk farms or RAID array systems

File system Storage If you must have the IQ Store on Operating System File System OS_File_Cache_Buffering can be set off for Solaris and Windows NT (and 2000) However the system will slow down in the following areas –IQ Page Size < file system block size –During Loads –Solaris > 4Gb memory The description in the manual is awesome!

Create Database OK so we have built the device or devices – about time we considered some of the options to create the database Of all of the options when issuing a CREATE DATABASE command the following slide details the most important

CASE - 1 CASE –CASE RESPECT is the fastest –There is a 10-20% hit going to CASE IGNORE Implications to FP Indexes –Regardless of RESPECT vs. IGNORE all 1-byte and 2-byte FP indexes store all the binary values for the data –So ABC, abc, Abc, Abc are all stored even for CASE IGNORE

CASE - 2 Remember because we store all bitmaps we can go from 1-byte to 2-byte FP, or 2-byte to flat FP where we might not want to A solution to this is to set the server in CASE RESPECT (because it is faster) Then use an ETL tool to rtrim() and ucase() or lcase() all of the incoming character data

CASE - 2 The HG index stores data in what is called conditioned mode. –For a CASE IGNORE database there is only one entry per logical value ABC = abc = Abc etc. –For a CASE RESPECT database there has to be one entry per value ABC != abc != Abc etc.

CASE – 3 For an LF index we hold partially conditioned values –For CASE RESPECT and CASE IGNORE all values have a bit- map –This can be wasteful on space The reason for having this is two fold –To allow for the recreation of the FP index from the LF –To allow for some rare cases (some group bys) where we still project values from an LF index

COLLATION –Set to ISO_BINENG, this is the fastest –If you must have a collation sequence this will slow the system down by around 10% for 8 bit character sets and 50% for multi-byte character sets –There are substantial slow downs for all multi-byte character sets

IQ PAGE SIZE - 1 This is an area of extreme contention The IQ PAGE size (effectively) determines the the size of the smallest addressable area in memory NOT ON DISK ! The disk parameter is the BLOCK SIZE

IQ PAGE SIZE – 2 The rules for IQ PAGE SIZE (or memory buffer size) are simple Set to 64K unless…. –Set to 128K when the memory model exceeds around 1 Gbytes per cache, and the number of rows in the FACT table exceeds 100m –Set to 256K when the memory model exceeds 4 Gbytes per cache (this may be a rare case!) –Do not set to 512K – there is a little bug…

BLOCK SIZE The block size is set automatically when the IQ PAGE SIZE is set (In 11.x IQ you could set the Max Compression parameter that would vary the number of blocks per page) In IQ 12, you can vary the BLOCK SIZE but it does not do much except in some extreme cases

So what is BLOCK SIZE? Disk Device Memory Cache A Page A Chunk (A variable length object) When a page is written out of memory it is compressed. Only the resulting used blocks within the page are written to disk, this set of blocks is called a chunk

Writing to Disk Default Page in memory 64Kb – 16 Blocks Compress before Disk Write Operation 10 Blocks used before compression Compressor 4 blocks used after compression Disk Write 1 Chunk on Disk the size of 4 blocks - 16kbytes

Disk_Striping_Packed The problem with large systems is that we only have 1 freelist This means that if we want 8 blocks of space we will grab the first > 8 space we can find, which tends to fragment the devices If Disk_Striping_Packed is ON then we have one freelist for each number of blocks available –1 for 1 block free –1 for 2 blocks free –Etc. The trade-off is better space usage against slightly worse localisation.

BLOCKS So as can be seen the IO operation is performed in blocks but how many are written at one time depends upon –The usage (how much of the page is used) –The Compressibility (how well the used blocks compress) We have found the default works the best –That is for a 64K page 16 blocks of 4k

Block Size The compressor is designed to run in 2Kbyte chunks so –Never make the block size less than 2Kbyte –Always make is a multiple of 2Kbyte The compressor does not have a huge amount of memory to run in so –If the Page size is big (>128Kbyte) do not set the block size much bigger than 32-64Kbyte

Scoping - 1 ASIQ-M used the ASA scoping rules, not the ANSI rules In ANSI there are strict rules as to where aliases can be used, and what they are ASA has rules that an extension to the ANSI rules and allow much more freedom In ASA you can re-use and rename anything, anywhere

Scoping - 2 An example: Select alpha = 27, beta, gamma From tablename Where alpha between 27 and 29 In ASE rows are returned that match the where clause i.e. alpha between 27 and 29 In ASA/ASIQ all table rows are returned Why ?

Scoping – 3 Select alpha = 27, beta, gamma From tablename Where alpha between 27 and 29 In ASIQ the following happens –In the select we have defined an alias alpha and assigned a value to it (27) –Then in the where clause we check the value of alpha against the values 27 thru 29 and the condition is true because alpha is 27 Note: Alpha is defined as an alias here not a column identifier

A Picture Select alpha = 27, beta, gamma From tablename Where alpha between 27 and 29 ASE ASIQ In ASE this alpha is a alias In ASE this alpha is a column name In ASIQ this alpha is a alias In ASIQ this alpha is also a alias, because it has been defined To make the code correct for ASIQ you need to prepend the second alpha with The table name. E.g. tablename.alpha

Building Things - End