© 2014 MapR Technologies 1 Ted Dunning February 20, 2015.

Slides:

Advertisements

Similar presentations

Can’t We All Just Get Along? Sandy Ryza. Introductions Software engineer at Cloudera MapReduce, YARN, Resource management Hadoop committer.

Advertisements

© 2015 Ellen Friedman 1 Big Data Stories: Decisions That Drive Successful Projects Ellen Friedman Strata Conference San Jose 18 February 2015.

Hadoop 2.0 and YARN SUBASH D’SOUZA. Who am I?  Senior Specialist Engineer at Shopzilla  Co-Organizer for the Los Angeles Hadoop User group  Organizer.

© Hortonworks Inc Running Non-MapReduce Applications on Apache Hadoop Hitesh Shah & Siddharth Seth Hortonworks Inc. Page 1.

Wei-Chiu Chuang 10/17/2013 Permission to copy/distribute/adapt the work except the figures which are copyrighted by ACM.

Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013.

Resource Management with YARN: YARN Past, Present and Future

Why static is bad! Hadoop Pregel MPI Shared cluster Today: static partitioningWant dynamic sharing.

Chandler ISR June Chandler Open Source Personal Information Manager , calendar, contacts, tasks, free-form items Easy sharing and collaboration.

1 Community (Optimize both Yarn & Non Yarn Hadoop clusters)

Hadoop Ecosystem Overview

Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.

SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.

Making Apache Hadoop Secure Devaraj Das Yahoo’s Hadoop Team.

A Platform for Fine-Grained Resource Sharing in the Data Center

Apache Spark and the future of big data applications Eric Baldeschwieler.

I n t u i t C o n f i d e n t i a l 1 IDN Gold Developer Solution Builders Software Enterprises May 7th, 2007 Rick Powell, President

Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.

State of the Elephant Hadoop yesterday, today, and tomorrow Page 1 Owen

Our Experience Running YARN at Scale Bobby Evans.

f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read

Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.

Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.

Requirements for Secure, Multi-Tenant Hadoop

Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.

Oracle Data Integrator Architecture Components.

Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.

Hadoop implementation of MapReduce computational model Ján Vaňo.

1 Modern Approaches of Customer’s Dream Distribution Across the Cluster Evgenij Kozhevnikov, Samara AUGUST 4, 2015.

A Platform for Fine-Grained Resource Sharing in the Data Center

Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.

Spark and Jupyter 1 IT - Analytics Working Group - Luca Menichetti.

Next Generation of Apache Hadoop MapReduce Owen

Apache Mesos What is it ? Beyond Hadoop Resource Sharing Mesos Intentions Architecture Users

Part III BigData Analysis Tools (YARN) Yuan Xue

INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.

This is a free Course Available on Hadoop-Skills.com.

BIG DATA/ Hadoop Interview Questions.

Apache Hadoop on Windows Azure Avkash Chauhan

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI 11’ Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D.

Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.

© 2014 MapR Technologies 1 Ted Dunning. © 2014 MapR Technologies 2 Me, Us Ted Dunning, MapR Chief Application Architect, Apache Member –Committer PMC.

Scalable containers with Apache Mesos and DC/OS

About Hadoop Hadoop was one of the first popular open source big data technologies. It is a scalable fault-tolerant system for processing large datasets.

Introduction to Distributed Platforms

Spark and YARN: Better Together

INTRODUCTION TO BIGDATA & HADOOP

Running Multiple Schedulers in Kubernetes

Chapter 10 Data Analytics for IoT

Status and Challenges: January 2017

Docker Birthday #3.

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Data Platform and Analytics Foundational Training

Apache Hadoop YARN: Yet Another Resource Manager

Report from MesosCon North America June 2016, Denver, U.S.

INDIGO – DataCloud PaaS

Hadoop Clusters Tess Fulkerson.

Software Engineering Introduction to Apache Hadoop Map Reduce

APACHE HAWQ 2.X A Hadoop Native SQL Engine

NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.

Capital One Architecture Team and DataTorrent

Introduction to Apache

Execution Framework: Hadoop 2.x

Introduction Apache Mesos is a type of open source software that is used to manage the computer clusters. This type of software has been developed by the.

Container cluster management solutions

2 Programming Environment for Global AI and Modeling Supercomputer GAIMSC 2/19/2019.

Cloud Computing Large-scale Resource Management

Presentation transcript:

© 2014 MapR Technologies 1 Ted Dunning February 20, 2015

© 2014 MapR Technologies 2 Contact Information Ted Dunning Chief Applications Architect, MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & Mahout Mentor for Myriad & Apache’s Storm, Flink, Datafu, Optiq, Drill Twitter Hashtag today: #StrataHadoop

© 2014 MapR Technologies 3 Myriad Project Very new open source / open community project Started as collaboration between Mesosphere, MapR & eBay Proposal to be an incubator project of the Apache Foundation submitted 12 February 2015 Goal: global resource management for multiple data centers

© 2014 MapR Technologies 4 Agenda The need Recap How it works Use Cases Lessons Learned The Future

© 2014 MapR Technologies 5 What We Need Tight integration of resources and programming models User specified resources and allocation models Lightweight executive Strong isolation Fast task launch

© 2014 MapR Technologies 6 What We Need Very fast scheduling Very careful (slow) scheduling Long-lived system tasks Short-lived tasks Long-lived ephemeral tasks Pre-emption

© 2014 MapR Technologies 7 What We Need Very good support of entire Hadoop eco-system –Tight integration of MapReduce2 –Tez –Impala –Drill –Spark Very good support of everything else –Arbitrary containers –Web servers –Systems processes without containers –User defined containers –Licensing constraints

© 2014 MapR Technologies 8 This is a problem

© 2014 MapR Technologies 9 And an opportunity

© 2014 MapR Technologies 10 What We Have - Yarn Resource Manager, NodeManager, heartbeat –Direct lineage from JobTracker, TaskTracker Application Master, Task containers –The other half of the JobTracker and TaskTracker Monolithic scheduling Pre-emption Hadoop standard Pre-defined resources Good Hadoop eco support –MapReduce2, Tez, Impala, Drill, Spark

© 2014 MapR Technologies 11 What We Have - Mesos Two level scheduling –Bottom level is application specific –Frameworks to ease complexity –Offers, Returns Actor-based, bidi RPC –Super fast process launch Marathon, Chronos –ISO8601, jboss, jetty, sinatra, rails User defined resources, attributes Some Hadoop (Spark native!)

© 2014 MapR Technologies 12 Sound the same Very much not

© 2014 MapR Technologies 13 Myriad integrates Mesos and Yarn

© 2014 MapR Technologies 14 How It Works Mesos creates virtual clusters YARN uses resources provided by Mesos Myriad can ask YARN to release some resources Or give it more Mesos YARN cluster Web Servers

© 2014 MapR Technologies 15

© 2014 MapR Technologies 16

© 2014 MapR Technologies 17

© 2014 MapR Technologies 18

© 2014 MapR Technologies 19 How Myriad Works Mesos runs Yarn –Yarn runs Yarn programs –Multiple Yarns supported –Multiple Yarn versions easy Mesos runs program + Yarn fakeout –Gets resources back from Yarn quickly –High priority “Yarn” program –As Yarn executes “tasks”, resources given back to Mesos –Allows fast spinup/spindown of Yarn resources

© 2014 MapR Technologies 20 How Myriad Works Mesos Persistence Layer

© 2014 MapR Technologies 21 How Myriad Works Mesos Persistence Layer

© 2014 MapR Technologies 22 Let’s see some examples

© 2014 MapR Technologies 23 #1 – I wanna cluster

© 2014 MapR Technologies 24 I Want a Cluster Very common need –Ephemeral clusters for multi-tenancy –Quick dev or QA clusters –Compatibility testing Yarn doesn’t run Yarn well –Especially across incompatible versions –Encapsulation can’t be unrolled Myriad does this trivially, but –Must have data localization, universal name space

© 2014 MapR Technologies 25 #2 – Version upgrade

© 2014 MapR Technologies 26 YARN Version Upgrade Another very common need –Need to test first –Applications roll over to new cluster –Resources follow applications –Data layer must remain inter-operable Yarn doesn’t run Yarn well (again) –Especially across incompatible versions –Encapsulation can’t be unrolled Myriad does this trivially, but –Must have data localization, universal name space

© 2014 MapR Technologies 27 #3 – Resource slosh

© 2014 MapR Technologies 28 Resource Slosh Resource slosh –Data ingestion pulse requires many web-servers –After ingestion, analytics pulse requires many Hadoop nodes –Data layer must remain inter-operable Conflict between Sysop/Hadoop viewpoints Myriad does this trivially, but –Must have data localization, universal name space

© 2014 MapR Technologies 29 Resource Slosh Resource slosh –Data ingestion pulse requires many web-servers –After ingestion, analytics pulse requires many Hadoop nodes –Data layer must remain inter-operable Conflict between Sysop/Hadoop viewpoints Myriad does this trivially, but –Must have data localization, universal name space

© 2014 MapR Technologies 30 Some Lessons Learned Omega paper –Not news –Single scheduler framework not viable Multi-cultural software is actually pretty cool –But you have to value both cultures One incubator project (Slider) doesn’t change that

© 2014 MapR Technologies 31 The Future Incubator –Proposal at –Initial team from Mesosphere, Ebay, MapR Community building –Diversity is good already –Starting with very lean team

© 2014 MapR Technologies 32 The Future Incubator –Proposal at –Initial team from Mesosphere, Ebay, MapR Community building –Diversity is good already –Starting with very lean team Older whisky, faster horses, more features –Apologies to the cowboy and the poet –And Tom T HallTom T Hall

© 2014 MapR Technologies 33 World domination

© 2014 MapR Technologies 34 World domination

© 2014 MapR Technologies 35 World domination Peaceful coexistence via specialization

© 2014 MapR Technologies 36 Myriad Project Blog “Project Myriad: No Hadoop is an Island” mapr-bloghttp://bit.ly/myriad- mapr-blog Proposal to be an incubator project of the Apache Foundation submitted 12 February Initial code on github: Join us! Twitter for Myriad [no, it’s not an official project logo]

© 2014 MapR Technologies 37 Contact Information Ted Dunning Chief Applications Architect, MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & Mahout Mentor for Myriad & Apache’s Storm, Flink, Datafu, Optiq, Drill Twitter Hashtag today: #StrataHadoop