Presentation is loading. Please wait.

Presentation is loading. Please wait.

- History and Motivations

Similar presentations


Presentation on theme: "- History and Motivations"— Presentation transcript:

1 - History and Motivations
Big Data Platforms - History and Motivations Jae Hyung Kim Ph.D. Candidate , Department of Computer Science, Yonsei University

2 Lab Vision and Research Area
Data Engineering Lab. has various efforts in the area of both data processing system technologies with modern hardware and bioinformatics based on data mining. Dataware: Data-centric system over HW and SW. Storing and processing data using novel memory storage Tiering Data among DRAM, NVRAM, SSD and HDD Cooperation between SW and storage: In-storage processing and migrating recovery to storage SW layer. Distributed processing on the faster networks: Data placement and scheduling tasks of Hadoop stack on 10G networks. Bioinformatics Systems Biology Studies Developing Tools for Bio-data Analysis Our vision is to optimize typical data processing and management technologies for modern hardware and big data management. We also aim to research on various computational methods for omics data analysis and high-throughput biological data analysis.

3 Data Management with Modern Hardware
Data Management Technologies with Modern Hardware Efficient page layout and file organization Query processing and index structures Column data store technologies Big Data Management on Modern Hardware Boosting Hadoop performance using NVRAM and SSD Distributed graph processing Optimizing Hadoop on 10G networks Data Processing in Solid State Drives SSD guaranteeing ACID properties In-Storage processing: filtering records in SSD Flash SSD PRAM RDBMS NVRAM NVDIMM SQL-on-Hadoop NVMe PCIe Interface Publications NoSQL VLDB, ICDE, CIKM, Information Systems, etc. 10G Networks Distributed processing Graph parallel computation GPU Multi-core CPUs Modern Hardware Projects SKTelecom, LG electronics, KISTI, etc Patents 11 applied (4 Int’l) patents 5 issued patents Dataware Technologies

4 Database, Data Mining, Bioinformatics
Network Biology Graph Theory Machine Learning Data Integration System Biology Studies Microarrays Protein Abundance Literature data Clinical data Somatic mutation data Research Goal Disease Analysis and Functional Genomics by Computational Approach Various Biological data Developing Tools for Bio data Analysis and Visualization tools for Various Bio-data Publications (~2016) Nucleic Acids Research, Bioinformatics, PLoS One, Information Sciences, ISMB, Informatics Sciences Molecular biosystems, Journal of biomedical Informatics, Computer Methods and Programs in biomedicine, etc.

5 Index Introduction RDBMS vs Big Data Platforms Growing Big Data Platforms

6 DB시장 규모 및 전망 국내 RDBMS 시장 전망 2017년 약 6,000억원 DB 라이선스 매출 및 유지보수 매출만 포함

7 DB시장 규모 및 전망 2013년 국내 DB시장 점유율

8 글로벌 DB 시장 규모 2017년 500억 달러 (≒ 60조원) DB 라이선스 매출 및 유지보수 매출만 포함

9 DB시장 규모 및 전망

10 Introduction

11 Introduction History & Motivations RDBMS

12 … History & Motivations (cont’d) Concurrent Access Handling Failures …
Introduction History & Motivations (cont’d) Concurrent Access Handling Failures Shared Data User

13 Introduction Transaction Powerful abstraction concept which forms the “interface contract” between an application program and a transactional server Application Lifecycle Program Start Begin Transaction . . . Commit Transaction Program End Transaction Boundary

14 Transaction (cont’d) The core requirement on a DBMS is
Introduction Transaction (cont’d) The core requirement on a DBMS is ACID guarantees for set of operations in the same transaction concurrency control component to guarantee the isolation properties of transactions, for both committed and aborted transactions recovery component to guarantee the atomicity and durability of transactions

15 … RDBMS Architecture – Heavy!!! Clients Requests Database Server
Introduction RDBMS Architecture – Heavy!!! Clients Requests Language and Interface Layer Query Decomposition and Optimization Layer Database Server Query Execution Layer Request execution threads Access Layer Storage Layer To facilitate disk I/O parallelism between different requests Data Access Database

16 RDBMS Architecture – How data is stored
Introduction RDBMS Architecture – How data is stored Database usually has a cretain amount of preallocated disk space consists of one or more extents Page 1) The minimum unit of data transfer between disk and main memory 2) The unit of caching in memory Each extent is a range of pages that are contiguous on disk Slot = A page number + A slot number A page number  A disk number + A physical address on disk by looking up an entry in an extent table and adding a relative offset

17 RDBMS Computational Model – Page model
Introduction RDBMS Computational Model – Page model Requests  Processing of pages (read or write) ACID Properties of Transaction Page based Concurrency Control and Recovery should be based on page model ※ The details of how data is manipulated within the local variables of the executing programs are mostly irrelevant Parallelized transaction execution r(x) r(y) r(z) t = r(x)r(y)r(z)w(u)w(x) Partial Order w(u) w(x)

18 Conclusion: Need large, distributed, highly fault tolerant file system
Introduction Needs for huge data from Google More than 15,000 commodity-class PC's Multiple clusters distributed worldwide Thousands of queries served per second One query reads 100's of MB of data One query consumes 10's of billions of CPU cycles Google stores dozens of copies of the entire Web! Conclusion: Need large, distributed, highly fault tolerant file system  Traditional DBMS cannot tolerate

19 RDBMS vs Big Data Platforms

20 RDBMS vs Big Data Platforms
Problems of RDBMS RDBMS’s clustering Transaction Maintain cost Data Copy Cost  Performance does not increase as we expected

21 RDBMS vs Big Data Platforms
인텔 제온 E5-2697V3 (하스웰-EP) 인텔(소켓2011-V3) / 테트라데카(14) 코어 / 쓰레드 28개 / 64(32)비트 / 2.6GHz / DDR4 / PCI-Express 40개 레인 Problems of RDBMS Scale-up vs Scale-out (Cost perspective) \3,400,000 \250,000 인텔 코어i5-6세대 6600 (스카이레이크) 인텔(소켓1151) / DDR4 / DDR3L / 64비트 / 쿼드 코어 / 쓰레드 4개 / 3.3GHz / 인텔 HD 530 / PCI-Express 16개 레인

22 RDBMS vs Big Data Platforms
Google File System Beginning of the big data platforms Affects to Hadoop Chunk : Analogous to block, except larger (typically 64MB)

23 RDBMS vs Big Data Platforms
Google File System Read Algorithm (1/2)

24 RDBMS vs Big Data Platforms
Google File System Read Algorithm (2/2)

25 RDBMS vs Big Data Platforms
Google File System Write Algorithm (1/4)

26 RDBMS vs Big Data Platforms
Google File System Write Algorithm (2/4)

27 RDBMS vs Big Data Platforms
Google File System Write Algorithm (3/4)

28 RDBMS vs Big Data Platforms
Google File System Write Algorithm (4/4)

29 RDBMS vs Big Data Platforms
Hadoop HDFS + MapReduce 128MB file (e.g. /data/hdfs/block1) on Local Filesystem

30 RDBMS vs Big Data Platforms
Hadoop HDFS + MapReduce (Computational Model) On Local Filesystem

31 Growing Bigdata Platforms

32 Growing Big Data Platforms

33 Growing Big Data Platforms
Gartner’s hype cycle 2012

34 Growing Big Data Platforms
Gartner’s hype cycle 2013

35 Growing Big Data Platforms
Gartner’s hype cycle 2014

36 Growing Big Data Platforms
Gartner’s hype cycle 2015 Big data dropped from cycle, Big data is now into practice

37 Emerging Hardwares

38 Emerging H/Ws History of Memory

39 Emerging H/Ws All flash array

40 Emerging H/Ws All flash array

41 Emerging H/Ws NVRAM

42 Emerging H/Ws NVDIMM

43 Q&A Thank you


Download ppt "- History and Motivations"

Similar presentations


Ads by Google