© 2015 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Iterative processing October 20, 2015.

Slides:



Advertisements
Similar presentations
Overview of this week Debugging tips for ML algorithms
Advertisements

Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,
Distributed Graph Processing Abhishek Verma CS425.
UC Berkeley Spark Cluster Computing with Working Sets Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.
Spark: Cluster Computing with Working Sets
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica Spark Fast, Interactive,
Spark Fast, Interactive, Language-Integrated Cluster Computing.
Matei Zaharia Large-Scale Matrix Operations Using a Data Flow Engine.
Data-Intensive Computing with MapReduce/Pig Pramod Bhatotia MPI-SWS Distributed Systems – Winter Semester 2014.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Discretized Streams An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker,
Shark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Hive on Spark.
Distributed Computations
Pregel: A System for Large-Scale Graph Processing
Big Data Infrastructure Jimmy Lin University of Maryland Monday, April 13, 2015 Session 10: Beyond MapReduce — Graph Processing This work is licensed under.
Paper by: Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) Pregel: A System for.
Distributed Computations MapReduce
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
Pregel: A System for Large-Scale Graph Processing
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Parallel Programming Models Basic question: what is the “right” way to write parallel programs –And deal with the complexity of finding parallelism, coarsening.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
MapReduce M/R slides adapted from those of Jeff Dean’s.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Data Structures and Algorithms in Parallel Computing Lecture 4.
Data Structures and Algorithms in Parallel Computing
Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
REX: RECURSIVE, DELTA-BASED DATA-CENTRIC COMPUTATION Yavuz MESTER Svilen R. Mihaylov, Zachary G. Ives, Sudipto Guha University of Pennsylvania.
Spark System Background Matei Zaharia  [June HotCloud ]  Spark: Cluster Computing with Working Sets  [April NSDI.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Big Data Infrastructure Week 11: Analyzing Graphs, Redux (1/2) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
PySpark Tutorial - Learn to use Apache Spark with Python
Hadoop.
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
Large-scale file systems and Map-Reduce
PREGEL Data Management in the Cloud
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Introduction to Spark.
MapReduce Simplied Data Processing on Large Clusters
湖南大学-信息科学与工程学院-计算机与科学系
February 26th – Map/Reduce
Distributed Systems CS
Cse 344 May 4th – Map/Reduce.
Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC
Introduction to Spark.
CS639: Data Management for Data Science
5/7/2019 Map Reduce Map reduce.
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
Lecture 29: Distributed Systems
CS639: Data Management for Data Science
Presentation transcript:

© 2015 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Iterative processing October 20, 2015

© 2015 A. Haeberlen, Z. Ives Announcements HW3 will be released tomorrow MS1 due October 29 th at 10:00pm EDT MS2 due November 5 th at 10:00pm EDT Emergency office hours after class Final project: Mini-Facebook application Two-person team project, due at the end of the semester Specifications will be available soon Please start thinking about a potential team member Once the spec is out, please send me an and tell me who is on your team! There will be exactly one three-person team, which, in the interest of fairness, must do 50% more work. I will approve the first team that asks. All other teams must have two members. 2 University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives Examples from earlier years 3 University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives Some 'lessons learned' from earlier years The most common mistakes were: Started too late; tried to do everything at the last minute You need to leave enough time at the end to do debugging etc Underestimated amount of integration work Suggestion: Define clean interfaces, build dummy components for testing, exchange code early and throughout the project Suggestion: Work together from the beginning! (Pair programming?) Unbalanced team You need to pick your teammate wisely, make sure they pull their weight, keep them motivated,... You and your teammate should have compatible goals (Example: Go for the Facebook award vs. get a passing grade) Started coding without a design FIRST make a plan: what pages will there be? how will the user interact with them? how will the interaction between client and server work? what components will the program have? what tables will there be in the database, and what fields should be in them? etc. 4 University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives Facebook award Facebook is sponsoring an award for the best final projects Backpack or duffle bag for each team member, with surprise contents Winners will be announced on the course web page ("NETS212 Hall of Fame") 5 University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives Plan for today Generalizing the computation model Bulk synchronous parallelism (BSP) Pregel Spark 6 University of Pennsylvania NEXT

© 2015 A. Haeberlen, Z. Ives Recall: Iterative computation in MapReduce MapReduce is functional map() and reduce() 'forget' all state between iterations Hence, we have no choice but to put the state into the intermediate results This is a bit cumbersome University of Pennsylvania mapred mapred mapred mapred Init state, convert input into input + state Iterative comp. Test for convergence Discard state, output results

© 2015 A. Haeberlen, Z. Ives What if we could remember? Suppose we were to change things entirely: Graph is partitioned across a set of machines State is kept entirely in memory Computation consists of message passing, i.e., sending updates from one portion to another Let’s look at two versions of this: Pregel (Malewicz et al., SIGMOD'10 – Google's version) Spark (Zaharia et al., NSDI'12 – UC Berkeley's version) University of Pennsylvania 8

© 2015 A. Haeberlen, Z. Ives Let's think about the MapReduce model How does MapReduce process graphs? "Think like a vertex" What do the vertices do? What are the edges, really? How good a fit is MapReduce's keys  values model for this?... and what are the consequences? 9 University of Pennsylvania vertex ID vertex value vertex ID

© 2015 A. Haeberlen, Z. Ives The BSP model This is similar to the bulk-synchronous parallelism (BSP) model Developed by Leslie Valiant at Harvard during the 1980s BSP computations consist of: Lots of components that process data A network for communication, and a way to synchronize Three distinct phases: Concurrent computation Communication Barrier synchronization Repeat 10 University of Pennsylvania... Valiant, "A bridging model for parallel computation", CACM Vol. 33 No. 8, Aug. 1990

© 2015 A. Haeberlen, Z. Ives Properties of the BSP model Can BSP computations have: Deadlocks? Race conditions? If so, when? If not, why note? How well do BSP computations scale? Why? Are there algorithms for which it cannot (or should not) be used? 11 University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives Plan for today Generalizing the computation model Bulk synchronous parallelism (BSP) Pregel Spark 12 University of Pennsylvania NEXT

© 2015 A. Haeberlen, Z. Ives The basic Pregel execution model University of Pennsylvania vertex ID vertex value A sequence of supersteps, for each vertex:  Receive incoming messages  Compute()  Update value / state  Send outgoing messages  Optionally change toplogy vertex value’ Malewicz et al., "Pregel: a system for large-scale graph processing", Proc. ACM SIGMOD 2010

© 2015 A. Haeberlen, Z. Ives Pregel: Termination test How do we know when the computation is done? Vertexes can be active or inactive Each vertex can independently vote to halt, transition to inactive Incoming messages reactivate the vertex Algorithm terminates when all vertexes are inactive Examples of when a vertex might vote to halt? 14 University of Pennsylvania ActiveInactive Vote to halt Message received

© 2015 A. Haeberlen, Z. Ives Pregel: A simple example (max value) University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives Pregel: Producing output Output is the set of values explicitly output by the vertices Often a directed graph isomorphic to the input but it doesn't have to be (edges can be added or removed) Example: Clustering algorithm What if we need some global statistic instead? Example: Number of edges in the graph, average value Each vertex can output a value to an aggregator in superstep S System combines these values using a form of 'reducer', and result is available to all vertexes in superstep S+1 Aggregators need to be commutative and associative (why?) 16 University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives Example: PageRank in Pregel 17 University of Pennsylvania class PageRankVertex : public Vertex { public: virtual void Compute(MessageIterator* msgs) { if (superstep() >= 1) { double sum = 0; for (; !msgs->Done(); msgs->Next()) sum += msgs->Value(); *MutableValue() = 0.15 / NumVertices() * sum; } if (superstep() < 30) { const int64 n = GetOutEdgeIterator().size(); SendMessageToAllNeighbors(GetValue() / n); } else { VoteToHalt(); } };

© 2015 A. Haeberlen, Z. Ives Pregel: Additional complications How to coordinate? Basic Master/worker design (just like MapReduce) How to achieve fault tolerance? Crucial!! Why? Failures detected via heartbeats (just like in MapReduce) Uses checkpointing and recovery Basic checkpointing vs. confined recovery How to partition the graph among the workers? Very tricky problem! Addressed in much more detail in later work 18 University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives Summary: Pregel Bulk Syncronous Parallelism – sequence of synchronized supersteps Consider the nodes to have state (memory) that carries from superstep to superstep Connections to MapReduce model? University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives Plan for today Generalizing the computation model Bulk synchronous parallelism (BSP) Pregel Spark 20 University of Pennsylvania NEXT

© 2015 A. Haeberlen, Z. Ives Another Abstraction: Spark Let’s think of just having a big block of RAM, partitioned across machines… And a series of operators that can be executed in parallel across the different partitions That’s basically Spark's resilient distributed datasets (RDDs) Spark programs are written by defining functions to be called over items within collections (similar model to LINQ, FlumeJava, Apache Crunch, and several other environments) University of Pennsylvania 21 Zaharia et al., "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing", Proc. NSDI 2012

© 2015 A. Haeberlen, Z. Ives Spark: Transformations and actions RDDs are read-only, partitioned collections Programmer starts by defining a new RDD based on data in stable storage Example: lines = spark.textFile("hdfs://foo/bar"); Programmer can create more RDDs by applying transformations to existing ones Example: errors = lines.filter(_.startsWith("ERROR")); Only when an action is performed does Spark do actual work: Example: errors.count() Example: errors.filter(_contains("HDFS")). map(_split("\t")(3)).collect() 22 University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives Spark: Lineage Spark keeps track of how RDDs have been constructed Result is a lineage graph Vertexes represent RDDSs, edges represent transformations What could this be useful for? Fault tolerance: When a machine fails, the corresponding piece of the RDD can be recomputed efficiently How would a multi-stage MapReduce program achieve this? Efficiency: Not all RDDs have to be 'materialized' (i.e., kept in RAM as a full copy) 23 University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives Spark: A closer look at transformations In Spark, we treat RDDs as another kind of parametrized collection JavaRDD represents a partitioned set of items JavaPairRDD represents a partitioned set of keys/values We can perform operations like: myFilterFn is an object that implements a callback, which is invoked for each element Receives an element as an argument; returns a boolean newRdd includes only elements for which this was 'true' Does this remind you of anything? University of Pennsylvania 24 JavaRDD myRdd = context.textFile(“myFile”, 1); newRdd = myRdd.filter(myFilterFn);

© 2015 A. Haeberlen, Z. Ives Spark: Another example callback Original RDD contains Strings Callback transforms each String to a (String,String) tuple PairFunction takes a T and returns a (K,V) New RDD contains pairs of strings 25 University of Pennsylvania JavaPairRDD derivedRdd = myRdd. mapToPair( new PairFunction () public Tuple2 call(String s) { String[] parts = s.split(“ “); return new Tuple2 (parts[0], parts[1]); } } );

© 2015 A. Haeberlen, Z. Ives Example Spark WordCount in Java University of Pennsylvania 26

© 2015 A. Haeberlen, Z. Ives Parallel Operations in Spark University of Pennsylvania 27 Lazy operations that define new RDDs based on existing ones Acually launch computations

© 2015 A. Haeberlen, Z. Ives Spark: Implementation Developer writes a driver program that connects to a cluster of workers Driver defines RDDs, invokes actions, tracks lineage Workers are long-lived processes that store pieces of RDDs in memory and perform computations on them Many of the details will sound familiar: Scheduling, fault detection and recovery, handling stragglers, etc. 28 University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives What can you do easily in Spark? Global aggregate computations that produce program state – compute the count() of an RDD, compute the max diff, etc. Loops! Built-in abstractions for some other common operations like joins See also Apache Crunch / Google FlumeJava for a very similar approach University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives What else might we want to do? Spark makes it much easier to do multi-stage MapReduce Later we will see a series of higher-level languages that support optimization, where alternative implementations are explored… Hybrid languages (Pig Latin) Database languages (Dremel, Hive, Shark, Hyrax, …) University of Pennsylvania 30

© 2015 A. Haeberlen, Z. Ives Stay tuned Next time you will learn about: Web programming 31 University of Pennsylvania

© 2015 A. Haeberlen, Z. Ives Spark: Example (PageRank) Foo 32 University of Pennsylvania // Load graph as an RDD of (URL, outlinks) pairs val links = spark.textFile(...).map(...).persist() var ranks = // RDD of (URL, rank) pairs for (i <- 1 to ITERATIONS) { // Build an RDD of (targetURL, float) pairs // with the contributions sent by each page val contribs = links.join(ranks).flatMap { (url, (links, rank)) => links.map(dest => (dest, rank/links.size)) } // Sum contributions by URL and get new ranks ranks = contribs.reduceByKey((x,y) => x+y).mapValues(sum => a/N + (1-a)*sum) }

© 2015 A. Haeberlen, Z. Ives Backup slides Here be dragons 33 University of Pennsylvania