Hadoop & Neptune Feb. 2009 김형준.

Hadoop & Neptune Feb. 2009 http://www.openneptune.com http://www.jaso.co.kr 김형준

The Data 'Tsunami'

More CPU Faster Disk Program Tuning More Memory

Uninstall

Where? Distributed File System How? Distributed/Parallel Computing

Hadoop DFS Unlimited Storage No Backup, Self-healing Thousands Nodes But, No POSIX No Random write

: machine : daemon process NameNode (DFS Master) JobTracker (Job Master) DataNode (DFS Slave) TaskTracker (Task Mgmt.) Local Disk DataNode (DFS Slave) TaskTracker (Task Mgmt.) Local Disk DataNode (DFS Slave) TaskTracker (Task Mgmt.) Local Disk Secondary NameNode ClientAPI control data control data

Hadoop MapReduce 1TB group by -> 10 분 More Machine -> 1 분

map (k1,v1) → list(k2,v2) reduce (k2, list (v2)) → result value This is a book. That book is on the desk. I like that book. This is a book. That book is on the desk. I like that book. (This,1) (book, 1) (That, 1) (book, 1) … (I,1) (that, 1) (book, 1) … map() (book, [1,1,1]) … (is, [1,1]) … (This,[1]) (book, 3) … (is, 2) … (This,1) reduce() Exec distributed/parallel Map&Reduce execution platform Split Partition Merge Sort

: machine : daemon process NameNode (DFS Master) JobTracker (Job Master) DataNode (DFS Slave) TaskTracker (Task Mgmt.) Local Disk DataNode (DFS Slave) TaskTracker (Task Mgmt.) Local Disk DataNode (DFS Slave) TaskTracker (Task Mgmt.) Local Disk Secondary NameNode ClientAPI control data control data

A piece of Cake

Neptune Database running on DFS(Hadoop) Unlimited Structured Data No Backup But, No JOIN, No SQL No Multiple row operation No Aggregation function

Operation Create/Drop Table put/get like/between scan/merge scan(join) MapReduce

Why Neptune? Tablet A-3 Tablet A-N … Tablet A-2 TabletA-1 TableA JobTracker Make Map&Reduce function Run on Map&Reduce framework META Table Get tablet list Map Task TaskTracker Map Task TaskTracker Map Task TaskTracker Map Task Task assign to each node TaskTracker Reduce Task TaskTracker Reduce Task TableB Tablet B-2 Tablet B-1 분산 / 병렬처리 : Speed, Scalability 분산 / 병렬처리 : Speed, Scalability

분산파일시스템 (Hadoop or other) TabletServer #1 TabletServer #2 TabletServer #n Cluster Management System Neptune Master Neptune Master 분산 / 병렬컴퓨팅 플랫폼 (Hadoop) 사용자 애플리케이션 Neptune ( 대용량분산 데이터 저장소 ) 논리적 Table 물리적 저장소

When use Neptune Large Data Online put/get and analysis Less complex Google Personalized Search Google analytics

Finding developer

Cheap Hardware and Smart Software Use cheap commodity hardware  frequent failure Develop smart software for reducing the cost of failure Easy Management High Scalability by automatic discovery of new servers and racks High Redundancy for failure of servers, racks, even data centers Speed and Then More Speed High speed with low cost Rapid development and deployment of new products Use existing technologies Use techniques from the leading edge of computer science Use open source codes as a starting point Principle of Google Infra

Google Infra Google Linux GFS Bigtable Map & Reduce Client API Chubby Cluster Mgmt Batch application Online Services Hardware Low-end commodity servers 40 or more pizza box server per rack Google’s core competency Google’s software stack

Hadoop & Neptune Feb. 2009 김형준.

Similar presentations

Presentation on theme: "Hadoop & Neptune Feb. 2009 김형준."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hadoop & Neptune Feb. 2009 김형준.

Similar presentations

Presentation on theme: "Hadoop & Neptune Feb. 2009 김형준."— Presentation transcript:

Similar presentations

About project

Feedback