Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hadoop & Neptune Feb. 2009 김형준.

Similar presentations


Presentation on theme: "Hadoop & Neptune Feb. 2009 김형준."— Presentation transcript:

1 Hadoop & Neptune Feb. 2009 http://www.openneptune.com http://www.jaso.co.kr 김형준

2 The Data 'Tsunami'

3 More CPU Faster Disk Program Tuning More Memory

4 Uninstall

5 Where? Distributed File System How? Distributed/Parallel Computing

6 Hadoop DFS Unlimited Storage No Backup, Self-healing Thousands Nodes But, No POSIX No Random write

7 : machine : daemon process NameNode (DFS Master) JobTracker (Job Master) DataNode (DFS Slave) TaskTracker (Task Mgmt.) Local Disk DataNode (DFS Slave) TaskTracker (Task Mgmt.) Local Disk DataNode (DFS Slave) TaskTracker (Task Mgmt.) Local Disk Secondary NameNode ClientAPI control data control data

8 Hadoop MapReduce 1TB group by -> 10 분 More Machine -> 1 분

9 map (k1,v1) → list(k2,v2) reduce (k2, list (v2)) → result value This is a book. That book is on the desk. I like that book. This is a book. That book is on the desk. I like that book. (This,1) (book, 1) (That, 1) (book, 1) … (I,1) (that, 1) (book, 1) … map() (book, [1,1,1]) … (is, [1,1]) … (This,[1]) (book, 3) … (is, 2) … (This,1) reduce() Exec distributed/parallel Map&Reduce execution platform Split Partition Merge Sort

10 : machine : daemon process NameNode (DFS Master) JobTracker (Job Master) DataNode (DFS Slave) TaskTracker (Task Mgmt.) Local Disk DataNode (DFS Slave) TaskTracker (Task Mgmt.) Local Disk DataNode (DFS Slave) TaskTracker (Task Mgmt.) Local Disk Secondary NameNode ClientAPI control data control data

11 A piece of Cake

12 Neptune Database running on DFS(Hadoop) Unlimited Structured Data No Backup But, No JOIN, No SQL No Multiple row operation No Aggregation function

13 Operation Create/Drop Table put/get like/between scan/merge scan(join) MapReduce

14 Why Neptune? Tablet A-3 Tablet A-N … Tablet A-2 TabletA-1 TableA JobTracker Make Map&Reduce function Run on Map&Reduce framework META Table Get tablet list Map Task TaskTracker Map Task TaskTracker Map Task TaskTracker Map Task Task assign to each node TaskTracker Reduce Task TaskTracker Reduce Task TableB Tablet B-2 Tablet B-1 분산 / 병렬처리 : Speed, Scalability 분산 / 병렬처리 : Speed, Scalability

15 분산파일시스템 (Hadoop or other) TabletServer #1 TabletServer #2 TabletServer #n Cluster Management System Neptune Master Neptune Master 분산 / 병렬컴퓨팅 플랫폼 (Hadoop) 사용자 애플리케이션 Neptune ( 대용량분산 데이터 저장소 ) 논리적 Table 물리적 저장소

16 When use Neptune Large Data Online put/get and analysis Less complex Google Personalized Search Google analytics

17 Finding developer

18 Cheap Hardware and Smart Software Use cheap commodity hardware  frequent failure Develop smart software for reducing the cost of failure Easy Management High Scalability by automatic discovery of new servers and racks High Redundancy for failure of servers, racks, even data centers Speed and Then More Speed High speed with low cost Rapid development and deployment of new products Use existing technologies Use techniques from the leading edge of computer science Use open source codes as a starting point Principle of Google Infra

19 Google Infra Google Linux GFS Bigtable Map & Reduce Client API Chubby Cluster Mgmt Batch application Online Services Hardware Low-end commodity servers 40 or more pizza box server per rack Google’s core competency Google’s software stack

20 Q&A


Download ppt "Hadoop & Neptune Feb. 2009 김형준."

Similar presentations


Ads by Google