Distributed Data Storage and Processing over Commodity Clusters Sector & Sphere Yunhong Gu Univ. of Illinois at of Chicago, Feb. 17, 2009.

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Overview of MapReduce and Hadoop
Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
An Introduction to Sector/Sphere Sector & Sphere Yunhong Gu Univ. of Illinois at Chicago and VeryCloud June 22, 2010.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
The Hadoop Distributed File System, by Dhyuba Borthakur and Related Work Presented by Mohit Goenka.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
MapReduce.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Distributed Data Storage and Parallel Processing Engine Sector & Sphere Yunhong Gu Univ. of Illinois at Chicago.
On the Varieties of Clouds for Data Intensive Computing 董耀文 Antslab Robert L. Grossman University of Illinois at Chicago And Open Data.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Hadoop & Condor Dhruba Borthakur Project Lead, Hadoop Distributed File System Presented at the The Israeli Association of Grid Technologies.
MapReduce How to painlessly process terabytes of data.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, Yunhong Gu, and Andrew Levine Open Cloud Consortium.
Yunhong Gu and Robert Grossman University of Illinois at Chicago 碩資工一甲 王聖爵
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
Toward Efficient and Simplified Distributed Data Intensive Computing IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 6, JUNE 2011PPT.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
Lecture 4. MapReduce Instructor: Weidong Shi (Larry), PhD
Hadoop Aakash Kag What Why How 1.
Large-scale file systems and Map-Reduce
CSE-291 (Cloud Computing) Fall 2016
Introduction to MapReduce and Hadoop
Introduction to HDFS: Hadoop Distributed File System
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
MapReduce Simplied Data Processing on Large Clusters
The Basics of Apache Hadoop
湖南大学-信息科学与工程学院-计算机与科学系
Distributed P2P File System
Hadoop Technopoints.
CS 345A Data Mining MapReduce This presentation has been altered.
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Distributed Data Storage and Processing over Commodity Clusters Sector & Sphere Yunhong Gu Univ. of Illinois at of Chicago, Feb. 17, 2009

What is Sector/Sphere? Sector: Distributed Storage System Sphere: Run-time middleware that supports simplified distributed data processing. Open source software, GPL, written in C++. Started since 2006, current version

Overview Motivation Sector Sphere Experimental studies Future work

Motivation Super-computer model: Expensive, data IO bottleneck Sector/Sphere model: Inexpensive, parallel data IO

Motivation Parallel/Distributed Programming with MPI, etc.: Flexible and powerful. BUT complicated, no data locality Sector/Sphere model: Clusters are a unity to the developer, simplified programming interface, data locality support from the storage layer. Limited to certain data parallel applications.

Motivation Systems for single data centers: Requires additional effort to locate and move data. Sector/Sphere model: Support wide-area data collection and distribution.

Sector: Distributed Storage System Security ServerMaster slaves SSL Client User account Data protection System Security Storage System Mgmt. Processing Scheduling Service provider System access tools App. Programming Interfaces Storage and Processing Data UDT Encryption optional

Sector: Distributed Storage System Sector stores files on the native/local file system of each slave node. Sector does not split files into blocks  Pro: simple/robust, suitable for wide area  Con: file size limit Sector uses replications for better reliability and availability The master node maintains the file system metadata. No permanent metadata is needed. Topology aware

Sector: Write/Read Write is exclusive Replicas are updated in a chained manner: the client updates one replica, and then this replica updates another, and so on. All replicas are updated upon the completion of a Write operation. Read: different replicas can serve different clients at the same time. Nearest replica to the client is chosen whenever possible.

Sector: Tools and API Supported file system operation: ls, stat, mv, cp, mkdir, rm, upload, download  Wild card characters supported System monitoring: sysinfo. C++ API: list, stat, move, copy, mkdir, remove, open, close, read, write, sysinfo.

Sphere: Simplified Data Processing Data parallel applications Data is processed at where it resides, or on the nearest possible node (locality) Same user defined functions (UDF) can be applied on all elements (records, blocks, or files) Processing output can be written to Sector files, on the same node or other nodes Generalized Map/Reduce

Sphere: Simplified Data Processing InputOutputUDF InputIntermediateUDFOutputUDF Input 1 OutputUDF Input 2

Sphere: Simplified Data Processing for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …); SphereStream sdss; sdss.init("sdss files"); SphereProcess myproc; myproc->run(sdss,"findBrownDwarf", …); myproc->read(result); findBrownDwarf(char* image, int isize, char* result, int rsize);

Sphere: Data Movement Slave -> Slave Local Slave -> Slaves (Shuffle/Hash) Slave -> Client

Load Balance & Fault Tolerance The number of data segments is much more than the number of SPEs. When an SPE completes a data segment, a new segment will be assigned to the SPE. If one SPE fails, the data segment assigned to it will be re-assigned to another SPE and be processed again. Detect and remove "fault" nodes.

Open Cloud Testbed 4 Racks in Baltimore (JHU), Chicago (StarLight and UIC), and San Diego (Calit2) 10Gb/s inter-site connection on CiscoWave 1Gb/s inter-rack connection Two dual-core AMD CPU, 12GB RAM, 1TB single disk

Open Cloud Testbed

Example: Sorting a TeraByte Data is split into small files, scattered on all slaves Stage 1: On each slave, an SPE scans local files, sends each record to a bucket file on a remote node according to the key, so that all buckets are sorted. Stage 2: On each destination node, an SPE sort all data inside each bucket.

TeraSort 10-byte90-byte Key Value 10-bit Bucket-0 Bucket-1 Bucket Stage 1: Hash based on the first 10 bits Bucket-0 Bucket-1 Bucket-1023 Stage 2: Sort each bucket on local node Binary Record 100 bytes

Performance Results: TeraSort Data Size SphereHadoop (3 replicas) Hadoop (1 replica) UIC300GB UIC + StarLight600GB UIC + StarLight + Calit2 900GB UIC + StarLight + Calit2 + JHU 1.2TB Run time: seconds Sector v1.16 vs Hadoop 0.17

Performance Results: TeraSort Sorting 1.2TB on 120 nodes Hash vs. Local Sort: 981sec : 545sec Hash  Per rack: 220GB in/out; Per node: 10GB in/out  CPU: 130% MEM: 900MB Local Sort  No network IO  CPU: 80% MEM: 1.4GB Hadoop: CPU 150% MEM 2GB

CreditStone Merchant IDTime KeyValue 3-byte merch-000X merch-001X merch-999X Stage 1: Process each record and hash into buckets according to merchant ID merch-000X merch-001X merch-999x Stage 2: Compute fraudulent rate for each merchant Trans ID|Time|Merchant ID|Fraud|Amount | | |0|66.49 Text Record Transform Text Record Fraud

Performance Results: CreditStone RacksJHUJHU, SLJHU, SL, Calit2 JHU, SL, Calit2, UIC Number of Nodes Size of Dataset (GB) Size of Dataset (rows)15B29.5B44.5B58.5B Hadoop (min) Sector with Index (min) Sector w/o Index (min) * Courtesy of Jonathan Seidman of Open Data Group.

System Monitoring (Testbed)

System Monitoring (Sector/Sphere)

Future Work High Availability  Multiple master servers Scheduling Optimize data channel Enhance compute model and fault tolerance

For More Information Sector/Sphere code & docs: Open Cloud Consortium: NCDM:

Inverted Index 1st letter word_x word_y word_y word_z 1word_x Bucket-A Bucket-B Bucket-Z Stage 1: Process each HTML file and hash (word, file_id) pair to buckets Bucket-A Bucket-B Bucket-Z Stage 2: Sort each bucket on local node, merge same word HTML page_1 1word_y 1word_z word_z 1, 5, 10word_z