Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.

Slides:

Advertisements

Similar presentations

ScaleDB Transactional Shared Disk storage engine for MySQL

Advertisements

Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Mapreduce and Hadoop Introduce Mapreduce and Hadoop

A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.

The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.

©2006 ITT Educational Services Inc. Course Name: IT390 Business Database Administration Unit 5 Slide 1 IT390 Business Database Administration Unit 5 :

 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)

Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.

Quantum Confidential | LATTUS OBJECT STORAGE JANET LAFLEUR SR PRODUCT MARKETING MANAGER QUANTUM.

11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.

Google Distributed System and Hadoop Lakshmi Thyagarajan.

Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.

SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.

Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)

M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.

A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.

Software Architecture

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.

Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.

HAMS Technologies 1

MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.

Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.

Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.

Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Introduction to Hadoop and HDFS

f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read

SEMINAR ON Guided by: Prof. D.V.Chaudhari Seminar by: Namrata Sakhare Roll No: 65 B.E.Comp.

The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.

Hadoop Ali Sharza Khan High Performance Computing 1.

Limitless Storage, Boundless Opportunities Technology Overview – January 2009.

Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!

Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.

Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.

11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.

Hadoop implementation of MapReduce computational model Ján Vaňo.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.

Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.

Load Rebalancing for Distributed File Systems in Clouds.

Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.

Next Generation of Apache Hadoop MapReduce Owen

Beyond Hadoop The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise. Ibrahim.

INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.

Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.

BIG DATA/ Hadoop Interview Questions.

B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.

Advanced Operating Systems Chapter 6.1 – Characteristics of a DFS Jongchan Shin.

Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit

Hadoop Aakash Kag What Why How 1.

Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng

An Open Source Project Commonly Used for Processing Big Data Sets

Large-scale file systems and Map-Reduce

Database Applications (15-415) Hadoop Lecture 26, April 19, 2016

EECS 498 Introduction to Distributed Systems Fall 2017

CS110: Discussion about Spark

Main Memory Background Swapping Contiguous Allocation Paging

Presentation transcript:

Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage

Copyright © 2012 Cleversafe, Inc. All rights reserved. 2 How Cleversafe’s Dispersed Storage Works Data is expanded, virtualized, transformed, sliced and dispersed using Information Dispersal Algorithms. 1 DATA Cleversafe IDA Real- time bit perfect data is retrieved from a subset of slices. 3 SITE 1 SITE 2SITE 3SITE 4 Slices are distributed to separate disks, storage nodes and geographic locations. 2 DATA [ Total slices = ‘width’ = N ] [ Subset required to read = ‘threshold’ = K ] Cleversafe Confidential Information

Copyright © 2012 Cleversafe, Inc. All rights reserved. 3 Object-based Access Methods

Copyright © 2012 Cleversafe, Inc. All rights reserved. 4 How Hadoop Works Popular open-source MapReduce implementation, commercialized by Cloudera and others Take the computation to the data, not the data to the computation Cleversafe Confidential Information Compute Storage

Copyright © 2012 Cleversafe, Inc. All rights reserved. 5 Hadoop MapReduce Challenges Enables computations where data exists but has limitations –HDFS utilizes a single server for metadata operations – if this server fails – data could be inaccessible or result in permanent loss of data. Federation helps but is passive/active with manual management –Maintains 3 copies of data for protection – not a big deal in terabyte range – but scale up to petabyte and Exabyte levels and management/overhead costs are unmanageable Cleversafe Confidential Information

Copyright © 2012 Cleversafe, Inc. All rights reserved. 6 dsNet Slicestor Combining computation and dispersed storage Hadoop MapReduce computation runs directly on dsNet Slicestors Jobs are assigned to stores for completely local data access Replace underlying HDFS with Dispersed Storage® while maintaining HDFS interface to MapReduce process dsNet Storage dsNet API Hadoop MapReduce Local data access Cleversafe Confidential Information

Copyright © 2012 Cleversafe, Inc. All rights reserved. 7 System Architecture Cleversafe Confidential Information MASTER Job Tracker Log SLAVES ACCESSERS Maps Reduces Maps Reduces Object Vaults Object Vaults Metadata Vaults Metadata Vaults Analytic Vaults Analytic Vaults Task Tracker

Copyright © 2012 Cleversafe, Inc. All rights reserved. 8 New SliceStream™ Protocol Concept: Manipulate input so that, after dispersal, raw data falls in contiguous chunks Read directly from raw slices bypassing IDA reconstruction o Fall back to full IDA reconstruction if an error occurs Result: Full reliability/availability of dispersal On a healthy dsNet, most reads for a MapReduce task can be satisfied locally Cleversafe Confidential Information

Copyright © 2012 Cleversafe, Inc. All rights reserved. 9 Key Features and Benefits Cost-effective scalability –Infinite scalability in a single system Increased performance and productivity –Computation brought to the data –dsNet Slicestors provides both computation and storage –Geographic distribution enabled Lower storage costs –Information dispersal calls for one instance of the data vs. 3x with replication Significantly higher reliability and availability –Information dispersal eliminates single points of failure –Continuous data availability with multiple simultaneous device or site failures Drop in replacement for existing MapReduce jobs via standard Hadoop File System interfaces Cleversafe Confidential Information