CS162 Operating Systems and Systems Programming Lecture 24 Capstone: Cloud Computing April 29, 2013 Anthony D. Joseph

Slides:



Advertisements
Similar presentations
Ali Ghodsi UC Berkeley & KTH & SICS
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Parallel Computing MapReduce Examples Parallel Efficiency Assignment
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
Lecture 18-1 Lecture 17-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Hilfi Alkaff November 5, 2013 Lecture 21 Stream Processing.
Spark: Cluster Computing with Working Sets
Data-Intensive Computing with MapReduce/Pig Pramod Bhatotia MPI-SWS Distributed Systems – Winter Semester 2014.
Shark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Hive on Spark.
Why static is bad! Hadoop Pregel MPI Shared cluster Today: static partitioningWant dynamic sharing.
Distributed Computations
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
April 30, 2014 Anthony D. Joseph CS162 Operating Systems and Systems Programming Lecture 24 Capstone: Cloud Computing.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
A Platform for Fine-Grained Resource Sharing in the Data Center
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
MapReduce.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Tyson Condie.
Cloud Computing Ali Ghodsi UC Berkeley, AMPLab
CS162 Operating Systems and Systems Programming Lecture 24 Capstone: Cloud Computing December 2, 2013 Anthony D. Joseph and John Canny
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Operating Systems and The Cloud David E. Culler CS162 – Operating Systems and Systems Programming Lecture 39 December 1, 2014 Proj: CP 2 12/3.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
MapReduce M/R slides adapted from those of Jeff Dean’s.
The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing The volume of data being made publicly available.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Operating Systems and The Cloud, Part II: Search => Cluster Apps => Scalable Machine Learning David E. Culler CS162 – Operating Systems and Systems Programming.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
A Platform for Fine-Grained Resource Sharing in the Data Center
Apache PIG rev Tools for Data Analysis with Hadoop Hadoop HDFS MapReduce Pig Statistical Software Hive.
Next Generation of Apache Hadoop MapReduce Owen
Spark System Background Matei Zaharia  [June HotCloud ]  Spark: Cluster Computing with Working Sets  [April NSDI.
BIG DATA/ Hadoop Interview Questions.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Big Data is a Big Deal!.
Hadoop.
Introduction to Distributed Platforms
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Spark Presentation.
Introduction to MapReduce and Hadoop
Introduction to Spark.
湖南大学-信息科学与工程学院-计算机与科学系
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
CS110: Discussion about Spark
Ch 4. The Evolution of Analytic Scalability
Overview of big data tools
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Cloud Computing Large-scale Resource Management
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

CS162 Operating Systems and Systems Programming Lecture 24 Capstone: Cloud Computing April 29, 2013 Anthony D. Joseph

24.2 4/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Goals for Today Distributed systems Cloud Computing programming paradigms Cloud Computing OS Note: Some slides and/or pictures in the following are adapted from slides Ali Ghodsi.

24.3 4/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Background of Cloud Computing 1990: Heyday of parallel computing, multi-processors –52% growth in performance per year! 2002: The thermal wall –Speed (frequency) peaks, but transistors keep shrinking The Multicore revolution –15-20 years later than predicted, we have hit the performance wall

24.4 4/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 At the same time… Amount of stored data is exploding… 4

24.5 4/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Data Deluge Billions of users connected through the net –WWW, FB, twitter, cell phones, … –80% of the data on FB was produced last year Storage getting cheaper –Store more data!

24.6 4/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Solving the Impedance Mismatch Computers not getting faster, and we are drowning in data –How to resolve the dilemma? Solution adopted by web-scale companies –Go massively distributed and parallel

24.7 4/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Enter the World of Distributed Systems Distributed Systems/Computing –Loosely coupled set of computers, communicating through message passing, solving a common goal Distributed computing is challenging –Dealing with partial failures (examples?) –Dealing with asynchrony (examples?) Distributed Computing versus Parallel Computing? –distributed computing=parallel computing + partial failures

24.8 4/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Dealing with Distribution We have seen several of the tools that help with distributed programming –Message Passing Interface (MPI) –Distributed Shared Memory (DSM) –Remote Procedure Calls (RPC) But, distributed programming is still very hard –Programming for scale, fault-tolerance, consistency, …

24.9 4/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 The Datacenter is the new Computer “Program” == Web search, , map/GIS, … “Computer” == 10,000’s computers, storage, network Warehouse-sized facilities and workloads Built from less reliable components than traditional datacenters

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Datacenter/Cloud Computing OS If the datacenter/cloud is the new computer –What is its Operating System? –Note that we are not talking about a host OS

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Classical Operating Systems Data sharing –Inter-Process Communication, RPC, files, pipes, … Programming Abstractions –Libraries (libc), system calls, … Multiplexing of resources –Scheduling, virtual memory, file allocation/protection, …

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Datacenter/Cloud Operating System Data sharing –Google File System, key/value stores Programming Abstractions –Google MapReduce, PIG, Hive, Spark Multiplexing of resources –Apache projects: Mesos, YARN (MRv2), ZooKeeper, BookKeeper, …

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Google Cloud Infrastructure Google File System (GFS), 2003 –Distributed File System for entire cluster –Single namespace Google MapReduce (MR), 2004 –Runs queries/jobs on data –Manages work distribution & fault- tolerance –Colocated with file system Apache open source versions Hadoop DFS and Hadoop MR

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 GFS/HDFS Insights Petabyte storage –Files split into large blocks (128 MB) and replicated across several nodes –Big blocks allow high throughput sequential reads/writes Data striped on hundreds/thousands of servers –Scan 100 TB on 1 50 MB/s = 24 days –Scan on 1000-node cluster = 35 minutes

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 GFS/HDFS Insights (2) Failures will be the norm –Mean time between failures for 1 node = 3 years –Mean time between failures for 1000 nodes = 1 day Use commodity hardware –Failures are the norm anyway, buy cheaper hardware No complicated consistency models –Single writer, append-only data

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 MapReduce Insights Restricted key-value model –Same fine-grained operation (Map & Reduce) repeated on big data –Operations must be deterministic –Operations must be idempotent/no side effects –Only communication is through the shuffle –Operation (Map & Reduce) output saved (on disk)

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 What is MapReduce Used For? At Google: –Index building for Google Search –Article clustering for Google News –Statistical machine translation At Yahoo!: –Index building for Yahoo! Search –Spam detection for Yahoo! Mail At Facebook: –Data mining –Ad optimization –Spam detection

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 MapReduce Pros Distribution is completely transparent –Not a single line of distributed programming (ease, correctness) Automatic fault-tolerance –Determinism enables running failed tasks somewhere else again –Saved intermediate data enables just re-running failed reducers Automatic scaling –As operations as side-effect free, they can be distributed to any number of machines dynamically Automatic load-balancing –Move tasks and speculatively execute duplicate copies of slow tasks (stragglers)

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 MapReduce Cons Restricted programming model –Not always natural to express problems in this model –Low-level coding necessary –Little support for iterative jobs (lots of disk access) –High-latency (batch processing) Addressed by follow-up research –Pig and Hive for high-level coding –Spark for iterative and low-latency jobs

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Pig High-level language: –Expresses sequences of MapReduce jobs –Provides relational (SQL) operators (JOIN, GROUP BY, etc) –Easy to plug in Java functions Started at Yahoo! Research –Runs about 50% of Yahoo!’s jobs

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Example Problem Given user data in one file, and website data in another, find the top 5 most visited pages by users aged Load Users Load Pages Filter by age Join on name Group on url Count clicks Order by clicks Take top 5 Example from

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 In MapReduce Example from

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 In Pig Latin Example from Users = load ‘users’ as (name, age); Filtered = filter Users by age >= 18 and age <= 25; Pages = load ‘pages’ as (user, url); Joined = join Filtered by name, Pages by user; Grouped = group Joined by url; Summed = foreach Grouped generate group, count(Joined) as clicks; Sorted = order Summed by clicks desc; Top5 = limit Sorted 5; store Top5 into ‘top5sites’;

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Translation to MapReduce Notice how naturally the components of the job translate into Pig Latin. Users = load … Filtered = filter … Pages = load … Joined = join … Grouped = group … Summed = … count()… Sorted = order … Top5 = limit … Example from Load Users Load Pages Filter by age Join on name Group on url Count clicks Order by clicks Take top 5 Job 1 Job 2 Job 3

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Hive Relational database built on Hadoop –Maintains table schemas –SQL-like query language (which can also call Hadoop Streaming scripts) –Supports table partitioning, complex data types, sampling, some query optimization Developed at Facebook –Used for many Facebook jobs

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Spark Motivation Complex jobs, interactive queries and online processing all need one thing that MR lacks: Efficient primitives for data sharing Stage 1 Stage 2 Stage 3 Iterative job Query 1 Query 2 Query 3 Interactive mining Job 1 Job 2 … Stream processing

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Spark Motivation Complex jobs, interactive queries and online processing all need one thing that MR lacks: Efficient primitives for data sharing Stage 1 Stage 2 Stage 3 Iterative job Query 1 Query 2 Query 3 Interactive mining Job 1 Job 2 … Stream processing Problem: in MR, the only way to share data across jobs is using stable storage (e.g. file system)  slow!

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Examples iter. 1 iter Input HDFS read HDFS write HDFS read HDFS write Input query 1 query 2 query 3 result 1 result 2 result 3... HDFS read Opportunity: DRAM is getting cheaper  use main memory for intermediate results instead of disks

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 iter. 1 iter Input Goal: In-Memory Data Sharing Distributed memory Input query 1 query 2 query 3... one-time processing × faster than network and disk

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Solution: Resilient Distributed Datasets (RDDs) Partitioned collections of records that can be stored in memory across the cluster Manipulated through a diverse set of transformations (map, filter, join, etc) Fault recovery without costly replication –Remember the series of transformations that built an RDD (its lineage) to recompute lost data

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Administrivia Project 4 –Design Doc due tonight (4/29) by 11:59pm, reviews Wed-Fri –Code due next week Thu 4/9 by 11:59pm Final Exam Review –Monday 5/6, 2-5pm in 100 Lewis Hall My RRR week office hours –Monday 5/6, 1-2pm and Wednesday 5/8, 2-3pm CyberBunker.com 300Gb/s DDoS attack against Spamhaus –35 yr old Dutchman “S.K.” arrested in Spain on 4/26 –Was using van with “various antennas” as mobile office

/29/2013 Anthony D. Joseph CS162 ©UCB Spring min Break

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Rapid innovation in datacenter computing frameworks No single framework optimal for all applications Want to run multiple frameworks in a single datacenter –…to maximize utilization –…to share data between frameworks Pig Datacenter Scheduling Problem Dryad Pregel Percolator C IEL

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Hadoop Pregel MPI Shared cluster Today: static partitioningDynamic sharing Where We Want to Go

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Solution: Apache Mesos Mesos Node Hadoop Pregel … Node Hadoop Node Pregel … Mesos is a common resource sharing layer over which diverse frameworks can run Run multiple instances of the same framework –Isolate production and experimental jobs –Run multiple versions of the framework concurrently Build specialized frameworks targeting particular problem domains –Better performance than general-purpose abstractions

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Mesos Goals High utilization of resources Support diverse frameworks (current & future) Scalability to 10,000’s of nodes Reliability in face of failures Resulting design: Small microkernel-like core that pushes scheduling logic to frameworks

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Mesos Design Elements Fine-grained sharing: –Allocation at the level of tasks within a job –Improves utilization, latency, and data locality Resource offers: –Simple, scalable application-controlled scheduling mechanism

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Element 1: Fine-Grained Sharing Framework 1 Framework 2 Framework 3 Coarse-Grained Sharing (HPC):Fine-Grained Sharing (Mesos): + Improved utilization, responsiveness, data locality Storage System (e.g. HDFS) Fw. 1 Fw. 3 Fw. 2 Fw. 1 Fw. 3 Fw. 2 Fw. 3 Fw. 1 Fw. 2 Fw. 1 Fw. 3 Fw. 2

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Element 2: Resource Offers Option: Global scheduler –Frameworks express needs in a specification language, global scheduler matches them to resources + Can make optimal decisions – Complex: language must support all framework needs – Difficult to scale and to make robust – Future frameworks may have unanticipated needs

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Element 2: Resource Offers Mesos: Resource offers –Offer available resources to frameworks, let them pick which resources to use and which tasks to launch  Keeps Mesos simple, lets it support future frameworks  Decentralized decisions might not be optimal

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Mesos Architecture MPI job MPI scheduler Hadoop job Hadoop scheduler Allocation module Mesos master Mesos slave MPI executor Mesos slave MPI executor task Resource offer Pick framework to offer resources to

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Mesos Architecture MPI job MPI scheduler Hadoop job Hadoop scheduler Allocation module Mesos master Mesos slave MPI executor Mesos slave MPI executor task Resource offer Pick framework to offer resources to Resource offer = list of (node, availableResources) E.g. { (node1, ), (node2, ) } Resource offer = list of (node, availableResources) E.g. { (node1, ), (node2, ) }

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Mesos Architecture MPI job MPI scheduler Hadoop job Hadoop scheduler Allocation module Mesos master Mesos slave MPI executor Hadoop executor Mesos slave MPI executor task Pick framework to offer resources to task Framework- specific scheduling Resource offer Launches and isolates executors

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Deployments 1,000’s of nodes running over a dozen production services Genomics researchers using Hadoop and Spark on Mesos Spark in use by Yahoo! Research Spark for analytics Hadoop and Spark used by machine learning researchers

/29/2013 Anthony D. Joseph CS162 ©UCB Spring 2013 Summary Cloud computing/datacenters are the new computer –Emerging “Datacenter/Cloud Operating System” appearing Pieces of the DC/Cloud OS –High-throughput filesystems (GFS/HDFS) –Job frameworks (MapReduce, Spark, Pregel) –High-level query languages (Pig, Hive) –Cluster scheduling (Apache Mesos)