Storage Tuning for Relational Databases Philippe Bonnet – Spring 2015
Agenda Disk abstraction – Unix file system Row store – Extent-based allocation – Page layout – Row layout Column store – Virtual IDs – PAX model
File Layer How to represent files?
Inode Name Layer How to avoid carrying inodes around?
File Name Layer How about user-friendly names?
File Name Layer Representing directories
Path Name Layer Hierarchy of Directories
Path Name Layer How about flexible management of files How to avoid cycles in a path? How to rename a file?
Absolute Path Name Layer How to name a file regardless of the current working directory?
Unix File System Naming Scheme Disk Layout for a file system
Symbolic Link Layer How to attach new disks to a file system? (1) represents a file system (2) device and root inode for the given file system Inode pinned in memory for floppy Inode pinned in memory for /dev/fd1 (1) name of parent inode, i.e., floppy
Symbolic Link Layer How to create links across file systems (where the inode numbers are not unique)?
Naming Layers in Unix File System
Putting it all together: inode
API Calls: Open f_table fd_table
API Calls: Read f_table fd_table
What Would a DB Designer Do? Similarities with FS: - name mapping (from table, attributes at API level, array of bytes at disk level) - quantized IOs (block device abstraction of secondary storage) Differences from FS: - Structured Data – A Table is a multiset of records – Indexed access Using SQL Server v7 as Example
Storage Architecture 1.Row Store2.Column Store rowidAtt1Att2Att3Att4 1A098zerherP idAtt1 1A idAtt idAtt3 1zerher idAtt4 1P
Pages Structure page { block contents[PAGE_SIZE]; } integer PAGE_SIZE = N // N = 16 for a 8KB page and 512B disk sectors integer PAGE_SIZE_IN_BYTES = 8 * 1024; Procedure PAGE_ID_TO_PAGE (integer page_id) returns instance of page { offset = page_id * PAGE_SIZE; Instance of page p; for i from 0 to PAGE_SIZE -1 { p.block[i] = BLOCK_NUMBER_TO_BLOCK(offset + i) } return p; }
Database Files Extent-based allocation 1 Extent = 8 pages Mixed/Uniform extents GAM bitmap over extents Is extent allocated? SGAM bitmap over extents Is extent mixed and has at least 1 unused page? PFS page over 8000 pages 1B per page: How much is page used?
Representing Tables How to store this data? Bootstrapping Problem!!
Finding Data Pages
Row store Page Layout structure record_id { integer page_id; integer row_id: } procedure RECORD_ID_TO_BYTES(int record_id) returns bytes { pid = record_id.page_id; p = PAGE_ID_TO_PAGE(pid); byte byte_array[PAGE_SIZE_IN_BYTES]; byte_array = p.contents; byte_address = byte_array + PAGE_SIZE_IN_BYTES-1; row_start = byte_address – record_id.row_id * 2 // each address entry is 2B return RECORD_ADDRESS_TO_BYTES(int row_address); }
Record Structure Procedure column_id_to_bytes return bytes
Storing Large Attributes
Inserting Data CREATE TABLE Variable (Col1 char(3) NOT NULL, Col2 varchar(250) NOT NULL, Col3 varchar(5) NULL, Col4 varchar(20) NOT NULL, Col5 smallint NULL) name colid xtype length xoffset Col Col Col Col Col INSERT Variable VALUES ('AAA', REPLICATE('X',250), NULL, 'ABC', 123) id name indid first minlen Variable 0 0xC sysindexes syscolumns
Operations on pages: 1. new row 2. row delete 3. row update: rtrx id 4. row update: roll pointer 5. row updare: field 6. row offset array update 7. page header update 8. page trailer update Other operations on pages: 9. checkpoint
Columnstore Ids Explicit IDs – Expand size on disk – Expand size when transferring data to RAM Virtual IDs – Offset as virtual ID – Trades simple arithmetic for space I.e., CPU time for IO time – Assumes fixed width attributes Challenge when using compression
Page Layout source: IEEE Row store: N-ary Storage Model – NSM) Decomposed Storage Model – DSM PAX Model – Partition Attributes Across
PAX Model Invented by A.Ailamaki in early 2000s IO Pattern of NSM Great for cache utilization – columns packed together in cache lines