History & Motivations –RDBMS
History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User
Transaction –Powerful abstraction concept which forms the “interface contract” between an application program and a transactional server Program Start Begin Transaction Commit Transaction Program End Application Lifecycle Transaction Boundary
Transaction (cont’d) The core requirement on a DBMS is ACID guarantees for set of operations in the same transaction concurrency control component to guarantee the isolation properties of transactions, for both committed and aborted transactions recovery component to guarantee the atomicity and durability of transactions
RDBMS Architecture – Heavy!!! Language and Interface Layer Query Decomposition and Optimization Layer Query Execution Layer Access Layer Storage Layer Request execution threads Requests Clients Database Server Data Access Database To facilitate disk I/O parallelism between different requests …
RDBMS Architecture – How data is stored Page 1) The minimum unit of data transfer between disk and main memory 2) The unit of caching in memory Page 1) The minimum unit of data transfer between disk and main memory 2) The unit of caching in memory Slot = A page number + A slot number Slot = A page number + A slot number Database usually has a cretain amount of preallocated disk space consists of one or more extents Database usually has a cretain amount of preallocated disk space consists of one or more extents Each extent is a range of pages that are contiguous on disk A page number A disk number + A physical address on disk by looking up an entry in an extent table and adding a relative offset A page number A disk number + A physical address on disk by looking up an entry in an extent table and adding a relative offset
RDBMS Computational Model – Page model Parallelized transaction execution Requests Processing of pages (read or write) ACID Properties of Transaction Page based Concurrency Control and Recovery should be based on page model t = r(x)r(y)r(z)w(u)w(x) r(x) r(y) r(z) w(u) w(x) Partial Order ※ The details of how data is manipulated within the local variables of the executing programs are mostly irrelevant
Needs for huge data from Google –More than 15,000 commodity-class PC's –Multiple clusters distributed worldwide –Thousands of queries served per second –One query reads 100's of MB of data –One query consumes 10's of billions of CPU cycles –Google stores dozens of copies of the entire Web! Conclusion: Need large, distributed, highly fault tolerant file system Traditional DBMS cannot tolerate
Problems of RDBMS –RDBMS’s clustering Data Copy Cost Transaction Maintain cost Performance does not increase as we expected
Problems of RDBMS –Scale-up vs Scale-out (Cost perspective) 인텔 제온 E V3 ( 하스웰 - EP) 인텔 ( 소켓 2011-V3) / 테트라데카 (14) 코 어 / 쓰레드 28 개 / 64(32) 비트 / 2.6GHz / DDR4 / PCI-Express 40 개 레인 인텔 코어 i5-6 세대 6600 ( 스카이레이크 ) 인텔 ( 소켓 1151) / DDR4 / DDR3L / 64 비트 / 쿼드 코어 / 쓰 레드 4 개 / 3.3GHz / 인텔 HD 530 / PCI- Express 16 개 레인 \250,000 \3,400,000
Google File System –Beginning of the big data platforms –Affects to Hadoop –Chunk : Analogous to block, except larger (typically 64MB)
Google File System –Read Algorithm (1/2)
Google File System –Read Algorithm (2/2)
Google File System –Write Algorithm (1/4)
Google File System –Write Algorithm (2/4)
Google File System –Write Algorithm (3/4)
Google File System –Write Algorithm (4/4)
Hadoop –HDFS + MapReduce 128MB file (e.g. /data/hdfs/block1) on Local Filesystem 128MB file (e.g. /data/hdfs/block1) on Local Filesystem
Hadoop –HDFS + MapReduce (Computational Model) On Local Filesyste m
Gartner’s hype cycle 2012
Gartner’s hype cycle 2013
Gartner’s hype cycle 2014
Gartner’s hype cycle 2015 –Big data dropped from cycle, Big data is now into practice
Thank you