Download presentation
Presentation is loading. Please wait.
Published byJazmine Anger Modified over 9 years ago
1
Company LOGO MVCC on Flash Memory Fan Yulei, Lab of WAMDM, School of Information, Renmin University of China, Beijing, China, 2009-06-13
2
Outline Motivation MVCC Berkeley DB PostgreSQL Future work
3
Motivation Characteristics Not In-Place Update HDD Flash
4
Motivation Transaction CC 2PL MVCC Conflict graph Timestamp Index CC Recovery Log Transaction Media 2PL MVCC 1st : Lock 2nd : Release Lock Multiple Version Directed Acycling Graph Timestamp Ordering Index : B+-Tree Log File & Data File Checkpoint: D & S Read Log file Undo & Redo Backup Database Hot-standby : mirrored media Kinds of Lock Snapshot Isolation
5
MVCC Monoversion Schedule s = r 1 (x) w 1 (x) r 2 (x) w 2 (y) r 1 (y) w 1 (z) c 1 c 2 s ’ = r 1 (x) w 1 (x) r 2 (x) r 1 (y) w 2 (y) w 1 (z) c 1 c 2 Multiversion Schedule & Monoversion Schedule Multiversion Schedule m = r 1 (x 0 ) w 1 (x 1 ) r 2 (x 1 ) w 2 (y 2 ) r 1 (y 0 ) w 1 (z 1 ) c 1 c 2 h(r i (x))=w j (x) & h(w i (x))=w i (x): version function Monoversion Schedule m = r 1 (x 0 ) w 1 (x 1 ) r 2 (x 1 ) w 2 (y 2 ) r 1 (y 2 ) w 1 (z 1 ) c 1 c 2 s = r 1 (x) w 1 (x) r 2 (x) w 2 (y) r 1 (y) w 1 (z) c 1 c 2 Monoversion Schedule is a special case of Multiversion Schedule Conflict cycle: t 1,t 2
6
MVCC Traditional Conflict s = w 0 (x) c 0 w 1 (x) c 1 r 2 (x) w 2 (y) c 2 m = w 0 (x 0 ) c 0 w 1 (x 1 ) c 1 r 2 (x 0 ) w 2 (y 2 ) c 2 View Equivalent Reads-From Relationship RF(m) := {(t i, x, t j ) | r j (x i ) ∈ OP(m) & t i, t j ∈ trans(m)} View Equivalent trans(m) = trans(m ’ ) and RF(m) = RF(m ’ ) Example m = w 0 (x 0 ) w 0 (y 0 ) c 0 r 3 (x 0 ) w 3 (x 3 ) c 3 w 1 (x 1 ) c 1 r 2 (x 1 ) w 2 (y 2 ) c 2 m ’ = w 0 (x 0 ) w 0 (y 0 ) c 0 w 1 (x 1 ) c 1 r 2 (x 1 ) r 3 (x 0 ) w 2 (y 2 ) w 3 (x 3 ) c 3 c 2
7
MVCC Multiversion View Serializability Serializable but not View Equivalent m = w 0 (x 0 ) w 0 (y 0 )c 0 r 1 (x 0 ) r 1 (y 0 ) w 1 (x 1 ) w 1 (y 1 )c 1 r 2 (x 0 ) r 2 (y 1 )c 2 s = w 0 (x) w 0 (y)c 0 r 1 (x) r 1 (y) w 1 (x) w 1 (y)c 1 r 2 (x) r 2 (y)c 2 MVSR m’ is a serialized monoversion schedule trans(m) = trans(m’) m and m’ are view equivalent Example m = w 0 (x 0 ) w 0 (y 0 ) c 0 w 1 (x 1 ) c 1 r 2 (x 1 ) r 3 (x 0 ) w 3 (x 3 ) c 3 w 2 (y 2 ) c 2 m ’ = w 0 (x 0 ) w 0 (y 0 ) c 0 r 3 (x 0 ) w 3 (x 3 ) c 3 w 1 (x 1 ) c 1 r 2 (x 1 ) w 2 (y 2 ) c 2 s = w 0 (x) w 0 (y) c 0 r 3 (x) w 3 (x) c 3 w 1 (x) c 1 r 2 (x) w 2 (y) c 2
8
MVCC Conflict Graph G(m) = (V, E) V = trans(m) ; E = {(t i, t j ) | r j (x i ) ∈ OP(m) & t i, t j ∈ trans(m)}} m and m’ are View Equivalent => G(m) = G(m’) Version Oder m = w 0 (x 0 ) w 0 (y 0 ) w 0 (z 0 ) c 0 r 1 (x 0 ) r 2 (x 0 ) r 2 (z 0 ) r 3 (z 0 ) w 1 (y 1 ) w 2 (x 2 ) w 3 (y 3 ) w 3 (z 3 ) c 1 c 2 c 3 r 4 (x 2 ) r 4 (y 3 ) r 4 (z 3 ) c 4 Version Oder = {x 0 «x 2, y 0 «y 1 «y 3, z 0 «z 3 } MVSG MVSG = G(m) + Version Order r k (x j ) and w i (x i ), k≠i≠j If x i « x j then (t i, t j ) ∈ E; else (t k, t i ) ∈ E M ∈ MVSR iff MVSG(m, «) have no cycle T0 T2 T3 T1 T4 r 2 (x 0 )r 2 (y 1 ) r 2 (x 1 )r 2 (y 0 )
9
MVCC Multiversion Conflict r i (x j ) and w k (x k ) and r i (x j ) < w k (x k ) Multiversion Conflict Serializability m’ is a serialized monoversion schedule trans(m) = trans(m’) Pair of operations with conflict: same ordering Multiversion Conflict Graph E={( t i, t k ) | r i (x j ) < w k (x k ) } M ∈ MVCR iff MSVG(m, «) have no cycle all MVSR MCSR VSR CSR
10
MVCC Limit the number of version: k=2 w 0 (x 0 ) c 0 r 1 (x 0 ) w 3 (x 3 ) c 3 w 1 (x 1 ) c 1 r 2 (x 1 ) w 2 (x 2 ) c 2 w 0 (x 0 ) c 0 r 1 (x 0 ) w 1 (x 1 ) c 1 r 2 (x 1 ) w 2 (x 2 ) c 2 w 3 (x 3 ) c 3 w 0 (x 0 ) c 0 r 1 (x 0 ) w 1 (x 1 ) c 1 w 3 (x 3 ) c 3 r 2 (x 3 ) w 2 (x 2 ) c 2 w 0 (x 0 ) c 0 r 2 (x 0 ) w 2 (x 2 ) c 2 r 1 (x 2 ) w 1 (x 1 ) c 1 w 3 (x 3 ) c 3 w 0 (x 0 ) c 0 r 2 (x 0 ) w 2 (x 2 ) c 2 w 3 (x 3 ) c 3 r 1 (x 3 ) w 1 (x 1 ) c 1 w 0 (x 0 ) c 0 w 3 (x 3 ) c 3 r 1 (x 3 ) w 1 (x 1 ) c 1 r 2 (x 1 ) w 2 (x 2 ) c 2 w 0 (x 0 ) c 0 w 3 (y 3 ) c 3 r 2 (x 3 ) w 2 (x 2 ) c 2 r 1 (x 2 ) w 1 (x 1 ) c 1 K-version view serializability (kVSR): Serializable View equivalent k newest/nearest version Hierarchy Relationship x 1,x 2 x 2,x 3 x 1,x 3 x 1,x 2
11
MVCC MVCC Protocol MVTO (multiversion timestamp ordering) MV2PL : 2VPL three kinds of kinds: rl, wl, cl MVSGT ROMV Read-only transaction
12
Berkeley DB Five components Deadlock detection db_deadlock DB_ENV->lock_detect, DB_ENV->set_lk_detect Checkpoints db_checkpoint DB_ENV->txn_checkpoint Database and log file archival db_archive DB_ENV->log_archive Log file removal db_archive DB_ENV->log_archive Recovery procedures db_recover DB_ENV->open a standalone utility one or more library interfaces
13
Berkeley DB Transaction API Transaction Subsystem and Related Methods Description DB_ENV->txn_checkpoint, DB_ENV->txn_recover DB_ENV->txn_stat DB_ENV->open DB_ENV->close DB_ENV->remove Transaction Subsystem Configuration DB_ENV->set_timeout DB_ENV->set_tx_max DB_ENV->set_tx_timestamp Transaction Operations DB_ENV->txn_begin DB_TXN->abort DB_TXN->commit DB_TXN->discard DB_TXN->id DB_TXN->prepare DB_TXN->set_name DB_TXN->set_timeout
14
Berkeley DB 2PL In Berkeley DB Locks are released during DB_TXN->abort or DB_TXN->commit. Guidelines: If possible, use nested transactions to protect the parts of your transaction most likely to deadlock Transaction limits Transaction IDs: 31-bit unsigned integer (OX80000000) Cursors: can not span more transactions, must be opened and closed within a single transaction Multiple Threads of Control:
15
Berkeley DB Several filesystem operations on Berkeley DB Disk seek to database file, Database file read, Disk seek to log file, Log file write, Disk seek to update log file metadata, Log metadata write, Flush log file information to disk, Flush log file metadata to disk Ways to increase transactional throughput Berkeley DB software support group commit Additional tuning parameters Tune the size of the database cache Put the database and the log files on different disks Set the filesystem configuration Upgrade your hardware Turn on DB_TXN_WRITE_NOSYNC or DB_TXN_NOSYNC flags –ACI, but not D
16
PostgreSQL PG: a sanpshot of data Reading never blocks writing Writing never blocks reading Three undesirable phenomena dirty reads, non-repeatable reads, phantom read SQL Transaction Isolation Levels Isolation LevelDirty ReadNon-Repeatable ReadPhantom Read Read uncommittedPossible Read committedNot possiblePossible Repeatable readNot possible Possible SerializableNot possible
17
PostgreSQL Read Committed Isolation Level the default isolation level A SELECT query sees only data committed The SELECT does see the effects of previous updates executed within this same transaction Two successive SELECTs can see different data Other transactions commit changes during executions NOT adequate for many applications that do complex queries and updates Serializable Isolation Level This level emulates serial transaction execution.
18
PostgreSQL Data consistency checks at the application level Readers in PostgreSQL don't lock data To ensure the current existence of a row and protect it against concurrent updates one must use SELECT FOR UPDATE or an appropriate LOCK TABLE statement. (SELECT FOR UPDATE locks just the returned rows against concurrent updates, while LOCK TABLE protects the whole table.) Lock and Tables Table-level Lock Row-level : when rows are being updated Lock and Index Gist and R-tree : released after statement is done Hash Index : released after page is processed B-Tree : released immediately after each index tuple is fetched/inserted
19
ASLRSLRELSUELSLSRELELAEL AccessShareLock √√√√√√√× RowShareLock √√√√√√×× RowExclusiveLock √√√√×××× ShareUpdateExclusiveLock √√√××××× ShareLock √√××√××× ShareRowExclusiveLock √√×××××× ExclusiveLock √××××××× AccessExclusiveLock ×××××××× SRDRIRURATDTCILT AccessShareLock √√√√√√√√ RowShareLock √√ RowExclusiveLock √√√√ ShareUpdateExclusiveLock √ ShareLock √√ ShareRowExclusiveLock √ ExclusiveLock √ AccessExclusiveLock √√√
20
Future work Experiment BDB & PG Code Transaction on Flash Memory Concurrency Control MVCC Recovery Log
21
Company LOGO
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.