1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: In- Memory DB (IMDB) and Column-Oriented DB.

1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: In- Memory DB (IMDB) and Column-Oriented DB

2 Outline In-Memory Database (IMDB) Column-Oriented Database (C-Store)

333 In-Memory Database (IMDB)

Introduction In Memory database system (IMDB) Data resides permanently on main physical memory Backup copy (optionally) on disk Disk Resident database system (DRDB) Data resides on disk Data may be cached into memory for access Main difference is that in IMDB, the primary copy lives permanently in memory 4

Questions about IMDB Is it reasonable to assume that the entire database fits in memory? Yes, for some applications! What is the difference between a IMDB and a DRDB with a very large cache? In DRDB, even if all data fits in memory, the structures and algorithms are designed for disk access. 5

Differences in properties of main memory and disk The access time for main memory is orders of magnitude less than for disk storage Main memory is normally volatile, while disk storage is not The layout of data on disk is much more critical than the layout of data in main memory 6

Impact of memory resident data The differences in properties of main-memory and disk have important implications in:  Concurrency control  Commit processing  Access methods  Data representation  Query processing  Recovery  Performance 7

Concurrency control Access to main memory is much faster than disk access, so we can expect that transactions complete more quickly in an IMDB system Lock contention may not be as important as it is when the data is disk resident 8

Commit Processing As protection against media failure, it is necessary to have a backup copy and to keep a log of transaction activity The need for a stable log threatens to undermine the performance advantages that can be achieved with memory resident data 9

Access Methods The costs to be minimized by the access structures (indexes) are different 10

Data representation Main memory databases can take advantage of efficient pointer following for data representation 11

A study of Index Structures for Main Memory Database Management Systems Tobin J. Lehman Michael J. Carey VLDB 1986 12

Disk versus In Memory Primary goals for a disk-oriented index structure design:  Minimize the number of disk accesses  Minimize disk space Primary goals of an In Memory index design:  Reduce overall computation time  Using as little memory as possible 13

Classic index structures Arrays:  A: use minimal space, providing that the size is known in advance  D: impractical for anything but a read-only environment AVL Trees:  Balanced binary search tree  The tree is kept balanced by executing rotation operations when needed  A: fast search  D: poor storage utilization 14

Classic index structures (cont) B trees:  Every node contains some ordered data items and pointers  Good storage utilization  Searching is reasonably fast  Updating is also fast 15

Hash-based indexing Chained Bucket Hashing:  Static structure, used both in memory and disk  A: fast, if proper table size is known  D: poor behavior in a dynamic environment Extendible Hashing:  Dynamic hash table that grows with data  A hash node contain several data items and splits in two when an overflow occurs  Directory grows in powers of two when a node overflows and has reached the max depth for a particularly directory size 16

Hash-based indexing (cont) Linear Hashing:  Uses a dynamic hash table  Nodes are split in predefined linear order  Buckets can be ordered sequentially, allowing the bucket address to be calculated from a base address  The event that triggers a node split can be based on storage utilization Modified Linear Hashing:  More oriented towards main memory  Uses a directory which grows linearly  Chained single items nodes  Splitting criteria is based on average length of the hash chains 17

The T tree A binary tree with many elements kept in order in a node (evolved from AVL tree and B tree) Intrinsic binary search nature Good update and storage characteristics Every tree has associated a minimum and maximum count Internal nodes (nodes with two children) keep their occupancy in the range given by min and max count 18

The T tree 19

Search algorithm for T tree Similar to searching in a binary tree Algorithm  Start at the root of the tree  If the search value is less than the minimum value of the node Then search down the left sub-tree If the search value is greater than the maximum value in the node  Then search the right sub-tree  Else search the current node  The search fails when a node is searched and the item is not found, or when a node that bounds the search value cannot be found 20

Insert algorithm Insert (x): Search to locate the bounding node If a bounding node is found:  Let a be this node  If value fits then insert it into a and STOP  Else remove min element amin from node Insert x Go to the leaf containing greatest lower bound for a and insert amin into this leaf 21

Insert algorithm (cont) If a bounding node is not found  Let a be the last node on the search path  If insert value fits then insert it into the node  Else create a new leaf with x in it If a new leaf was added  For each node in the search path (from leaf to root) If the two sub-trees heights differ by more than one, then rotate and STOP 22

Delete algorithm (1)Search for the node that bounds the delete value; search for the delete value within this node, reporting an error and stopping if it is not found (2)If the delete will not cause an underflow then delete the value and STOP Else, if this is an internal node, then delete the value and ‘borrow’ the greatest lower bound Else delete the element (3)If the node is a half-leaf and can be merged with a leaf, do it, and go to (5) 23

Delete algorithm (cont) (4)If the current node (a leaf) is not empty, then STOP Else free the node and go to (5) (5)For every node along the path from the leaf up to the root, if the two sub-trees of the node differ in height by more than one, then perform a rotation operation STOP when all nodes have been examined or a node with even balanced has been discovered 24

LL Rotation 25

LR Rotation 26

Summary We introduced a new In Memory index structure, the T tree For unordered data, Modified Linear Hashing should give excellent performance for exact match queries For ordered data, the T Tree provides excellent overall performance for a mix of searches, inserts and deletes, and it does so at a relatively low cost in storage space 27

But… Even if the T trees have more keys in each node, only the two end keys are actually used for comparison Since for every key in node we store a pointer to the record, and most of the time the record pointers are not used, the space is ‘wasted’ 28

29 Column-Oriented Database C-Store: A Column-Oriented RDBMS; Michael Stonebraker

30 Store fields in one record contiguously on disk Use B-tree indexing Use small (e.g. 4K) disk blocks Align fields on byte or word boundaries Conventional (row-oriented) query optimizer and executor (technology from 1979) Aries-style transactions Traditional Row-Oriented Database

Terminology -- “Row Store” Record 2 Record 4 Record 1 Record 3 E.g. DB2, Oracle, Sybase, SQLServer, …

32 Can insert and delete a record in one physical write Good for on-line transaction processing (OLTP) But not for read mostly applications  Data warehouses  CRM Row-Stores are Write Optimized

33 With Bitmap indices Better sequential read Integration of “data cube” products Materialized views Elephants Have Extended Row Stores But there may be a better idea…….

Column Stores

35 Ad-hoc queries read 2 columns out of 20 In a very large warehouse, Fact table is rarely clustered correctly Column store reads 10% of what a row store reads At 100K Feet….

36 Brandeis/Brown/MIT/UMass-Boston project  Usual suspects participating  Enough coded to get performance numbers for some queries  Complete status later Pioneering Work  Sybase IQ (early ’90s)  MonetDB (see CIDR ’05 for the most recent description) C-Store (Column Store) Project

37 Code the columns to save space No alignment Big disk blocks Only materialized views (perhaps many) Focus on Sorting not indexing Automatic physical DBMS design Optimize for grid computing Innovative redundancy Xacts – but no need for Mohan Data ordered on anything, Not just time Compression Column optimizer and executor C-Store Technical Ideas

38 Dense pack columns  E.g. a 5 bit field takes 5 bits Current CPU speed going up faster than disk bandwidth  Faster to shift data in CPU than to waste disk bandwidth No Alignment

39 Tunable Big (minimum size is 64K) Big Disk Blocks

40 Projection (materialized view) is some number of columns from a fact table Plus columns in a dimension table – with a 1-n join between Fact and Dimension table Stored in order of a storage key(s) Several may be stored! With a permutation, if necessary, to map between them Table (as the user specified it and sees it) is not stored! No secondary indexes (they are a one column sorted MV plus a permutation, if you really want one) Only Materialized Views

41 User view:  EMP (name, age, salary, dept)  Dept (dname, floor) Possible set of MVs:  MV-1 (name, dept, floor) in floor order  MV-2 (salary, age) in age order  MV-3 (dname, salary, name) in salary order Example:

42 Not enough 4-star wizards to go around Accept a “training set” of queries and a space budget Choose the MVs auto-magically Re-optimize periodically based on a log of the interactions Automatic Physical DBMS Design

43 I.e. shared-nothing  Dewitt (Gamma) was right Horizontal partitioning and intra-query parallelism as in Gamma Optimize for Grid Computing

44 Hardly any warehouse is recovered by a redo from the log  Takes too long! Store enough MVs at enough places to ensure K- safety Rebuild dead objects from elsewhere in the network K-safety is a DBMS-design problem! Innovative Redundancy

45 Undo from a log (that does not need to be persistent) Redo by rebuild from elsewhere in the network Snapshot isolation (run queries as of a tunable time in the recent past)  To solve read-write conflicts Distributed Xacts  Without a prepare message (no 2 phase commit) XACTS – No Mohan

46 That would be too limiting So how to do fast updates to dense pack column storage that is not in entry sequence? Storage (sort) Key(s) is not Necessarily Time MIT

Solution – a Hybrid Store Read-optimized Column store Write-optimized Column store Tuple mover (Much like Monet) (What we have been talking about so far) (Batch rebuilder) (Much like Monet) (What we have been talking about so far) (What we have been talking about so far) (Much like MonetDB) (Batch rebuilder)

48 Column operations – not row operations Columns remain coded – if possible Late materialization of columns Column Optimizer Chooses MVs on which to run the query  Most important task Build in snowflake schemas  Which are simple to optimize without exhaustive search Looking at extensions Column Executor

49 100X popular row store in 40% of the space 7X popular row store in 1/6 th of the space Code available with BSD license Performance

50 Extension of algorithms to non-snowflake schemas Study of L2 cache performance Study of coding strategies Study of executor options Study of recovery tactics Non-cursor interface Study of optimizer primitives University Research

51 END

1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: In- Memory DB (IMDB) and Column-Oriented DB.

Similar presentations

Presentation on theme: "1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: In- Memory DB (IMDB) and Column-Oriented DB."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: In- Memory DB (IMDB) and Column-Oriented DB.

Similar presentations

Presentation on theme: "1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: In- Memory DB (IMDB) and Column-Oriented DB."— Presentation transcript:

Similar presentations

About project

Feedback