TerarkDB Introduction Peng Lei Terark Inc ( ). All rights reserved 1.

Slides:



Advertisements
Similar presentations
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Advertisements

Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Dr. Kalpakis CMSC 661, Principles of Database Systems Representing Data Elements [12]
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
File Systems.
Chapter 11: File System Implementation
BTrees & Bitmap Indexes
B+-tree and Hashing.
File System Implementation
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
File System Implementation
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Oracle Database Administration Database files Logical database structures.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Cloud Computing Lecture Column Store – alternative organization for big relational data.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Chapter 3.5 Memory and I/O Systems. 2 Memory Management Memory problems are one of the leading causes of bugs in programs (60-80%) MUCH worse in languages.
IT253: Computer Organization
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
File System Implementation
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
12.1 Silberschatz, Galvin and Gagne ©2003 Operating System Concepts with Java Chapter 12: File System Implementation Chapter 12: File System Implementation.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Collections Data structures in Java. OBJECTIVE “ WHEN TO USE WHICH DATA STRUCTURE ” D e b u g.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 4 Logical & Physical Database Design
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
FILE ORGANIZATION.
CS 440 Database Management Systems Lecture 6: Data storage & access methods 1.
ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by.
CS 540 Database Management Systems
Bigtable: A Distributed Storage System for Structured Data
CS4432: Database Systems II
Introduction to File Processing with PHP. Review of Course Outcomes 1. Implement file reading and writing programs using PHP. 2. Identify file access.
W4118 Operating Systems Instructor: Junfeng Yang.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Why indexing? For efficient searching of a document
CHP - 9 File Structures.
CS 540 Database Management Systems
CSE-291 (Cloud Computing) Fall 2016
Database Implementation Issues
External Memory Hashing
Database Implementation Issues
Physical Database Design
Introduction to Database Systems
DATABASE IMPLEMENTATION ISSUES
ICOM 5016 – Introduction to Database Systems
Secondary Storage Management Hank Levy
Database Implementation Issues
Indexes and Performance
Database Implementation Issues
Lecture 20: Representing Data Elements
Database Implementation Issues
Presentation transcript:

TerarkDB Introduction Peng Lei Terark Inc ( ). All rights reserved 1

Why TerarkDB High Compression with High Performance Not space-time tradeoff Space and time are both reduced Latency is stable and very low Schema with rich data types Optimize different data types in different ways Multiple indices on one table Column store, column groups Terark Inc ( ). All rights reserved2

3

4

Migrate to TerarkDB Easy & simple API C++(With bindings for Java, Python…) Leverage the full power of TerarkDB Storage engine(MongoDB, MySQL, FUSE…) Very low migration effort, zero training High level DB interface Leverage most power of TerarkDB Terark Inc ( ). All rights reserved5

Searchable index compression Indices are relatively small Compressed, in-memory String fields Nested succinct tries Integer fields Small range compression Fixed length binary fields Configurable compression Terark Inc ( ). All rights reserved6

Seekable data compression: Concepts Traditional database compression Compress multiple records into one block/page Compressed on disk/ssd, uncompressed in memory, double caching Large blocks gain compression, lose reading speed Our seekable data compression Exact read by record id, no extra decompression Larger data set, higher compression ratio No need to cache uncompressed records Utilize all free memory for file system cache Much faster read speed Terark Inc ( ). All rights reserved7

Seekable data compression: Algorithms Small binary & string fields Nested succinct tries Very high compression ratio Relatively slow read(but much faster than block compression) Large binary & string fields Global+Local dictionary compression(lz77 variation) High compression ratio, higher than gzip, sometimes beat bzip2 Very fast read(at memcpy speed) Terark Inc ( ). All rights reserved8

TerarkDB Architecture Terark Inc ( ). All rights reserved9

Architecture highlights Integrate novel technologies together Loosely coupled components Flexible Extensible Transparent on-disk dir/file organization Minimum overhead Terark Inc ( ). All rights reserved10

Glossary Column/Field: A typed atomic object Row/Record: An object including multiple columns Record id: Invariant integer record identifier(object pointer…) Table: A collection including many records Index: An ordered map of keys(multiple fields) to record id Unique Index: An index in which keys are all different Column Group: A collection of a subset of columns, identified by record id Segment: A subset of a table, in which all record id are continuous Terark Inc ( ). All rights reserved11

A table is a 2D array Delmark Column[0]Column[1]….Column[N-1] Record[0]0,0 Record[1]0,0 Record[2]1,0Logical deleted but still exists …1,1 Physical deleted must also be logical deleted.... Terark Inc ( ). All rights reserved12 The record id is conceptually the array index of the first dimension, deleted records still occupy record id(s). Logical delmark: the record is invisible Physical delmark: the record doesn’t exist A physical deleted record must be logical deleted. Physical deletion mark is used for record id invariant. record id invariant Logical Physical id mapping uses rank select, the overhead is very low(typically less than 1%, even 0.1%) Logical id and physical id are identical if there is no physical deletions, so the id mapping overhead is avoided

Record id invariant Same record id, same data Reading of same record id gets same data Searching of same key gets same record id Segments changes(compressed, merged, purged) don’t impact record id Record id invariant lifetime(configurable) Permanent: Invariable between table reloads Can be used as a permanent id(such as: Table life: Physically deleted record id is squeezed on reload Can minimize the id space Can slightly improve the performance Terark Inc ( ). All rights reserved13

Segment, Index, Column Group Index[0]Index[1]Column[0]….Column[N-1] Seg[0] Seg[1] Seg[2] ….. Terark Inc ( ). All rights reserved14 Index operations: 1.Search key for getting record id, iterate keys in sorting order 2.Read key by record id(for reducing storage size) Such columns do not need to be stored in column groups Traditional databases need to store all columns Non-index columns are stored in column groups, column groups are defined in table schema, this is fine grained column store. Typical column groups are compressed by seekable data compression algorithms. Fixed-length column groups can be configured as inplace-updatable.

Two-stage searches Search key for getting (logical) record id Exact search: Fastest Range search: By index iterator Regex search: Make best efforts to avoid linear scan Read record data by record id Read the full record Read specified columns Read specified colgroups(fastest) Update for inplace-updatable… Terark Inc ( ). All rights reserved15

Writable & Readonly segment Writable segment Can be writing or frozen Implemented by a traditional database Is not compressed Accessing is slower than readonly segment Readonly segment(the core competitiveness) High compression & High performance Stable latency(no slow query, much better P99 latency) No need for dedicated cache(double caching is gone) Fixed length columns are still inplace-updatable Terark Inc ( ). All rights reserved16

Writing & Frozen segment A writing segment is a traditional database A writing segment become frozen when size is large enough Then a new writable segment is created and become the new writing segment Insertions/Updates/Deletions are in realtime Will not block user threads A frozen segment May be a readonly segment(is always frozen) May be a writable segment(is waiting for compressing or is compressing) Deletions(and inplace updates) are in realtime May be registered for later sync to compressed(or merged) readonly segment Terark Inc ( ). All rights reserved17

Writing, Writable, Frozen, Readonly Terark Inc ( ). All rights reserved18 Writable Frozen Readonly Writing

The writing segment The writing segment is the newest writable segment This is the first place where new records go in Records in writing segment can be updated directly Hot data(esp. frequent updated records) are likely in writing segment Index synchronizing in writes(insert/update/delete) Index sync can slow down the insertion Index sync can be disabled for some reasons For example: When batch importing data, there are no concurrent search operations, this can significantly improve the performance If not synced, the inserted record cannot be searched by any index, but still accessible by record id Terark Inc ( ). All rights reserved19

Frozen segments A frozen segment is writable or readonly Records are readonly(even for writable segments) Except for the inplace-updatable column groups Deletion: set the logical deletion mark Update: set the logical deletion mark and create a new record in the writing segment( even for writable segments ) Physical deletion Permanently purge the logical deleted records Need to rebuild the readonly segments Terark Inc ( ). All rights reserved20

Readonly segments Most data are stored in readonly segment(say 99% of total) Fast searchable index compression Fast seekable data compression(up to 7GB/s at 8x compression ratio) Built by background threads, user threads are not blocked Larger segment, higher compression, higher speed Optimized for fine grained column store Terark Inc ( ). All rights reserved21

Building readonly segments Building a readonly segment is: To compress a writable segment into one readonly segment To purging the logical deleted records To merge multiple readonly segments into one readonly segment Once a writing segment is frozen It is put into the compression queue Then compressed into a readonly segment by background threads Compression is usually slower than insertion There may be many threads running compression Once the compressing(merging/purging) is done The source segments are replaced by the result segment and will be deleted Sync the registered updates/deletions during compressing Record id invariant is kept(same record id gets same data) Terark Inc ( ). All rights reserved22

Inplace updatable column groups Principle: Less pain, more gain Just for fixed length colgroups Can be directly accessed by memory address Can be implemented with very low overhead For all segments(including readonly segments) When a segment is in compressing, merging, or physical deleting(purging) Register the updates Sync the updates when compressing/merging is completed Does not block the user threads Terark Inc ( ). All rights reserved23

Questions Terark Inc ( ). All rights reserved24