Cloud Computing Lecture #2 From Lisp to MapReduce and GFS

Slides:



Advertisements
Similar presentations
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 Presented by Wenhao Xu University of British Columbia.
Advertisements

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
The google file system Cs 595 Lecture 9.
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
NFS, AFS, GFS Yunji Zhong. Distributed File Systems Support access to files on remote servers Must support concurrency – Make varying guarantees about.
Google File System 1Arun Sundaram – Operating Systems.
Distributed Computations
MapReduce: Simplified Data Processing on Large Clusters Cloud Computing Seminar SEECS, NUST By Dr. Zahid Anwar.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Cloud Computing Lecture #2 Introduction to MapReduce Jimmy Lin The iSchool University of Maryland Monday, September 8, 2008 This work is licensed under.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
MapReduce Technical Workshop This presentation includes course content © University of Washington Redistributed under the Creative Commons Attribution.
The Google File System.
Distributed Computations MapReduce
Lecture 2 – MapReduce: Theory and Implementation CSE 490h – Introduction to Distributed Computing, Spring 2007 Except as otherwise noted, the content of.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Google File System.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
Lecture #8 Giant-Scale Services CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
1 The Google File System Reporter: You-Wei Zhang.
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Intro to Map/Reduce Andrew Rau-Chaplin - Adapted from What is Cloud Computing? (and an intro to parallel/distributed processing), Jimmy Lin, The iSchool.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
MAP REDUCE : SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS Presented by: Simarpreet Gill.
MapReduce M/R slides adapted from those of Jeff Dean’s.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY Network File System Except as.
Presenters: Rezan Amiri Sahar Delroshan
GFS : Google File System Ömer Faruk İnce Fatih University - Computer Engineering Cloud Computing
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
MAP REDUCE BASICS CHAPTER 2. Basics Divide and conquer – Partition large problem into smaller subproblems – Worker work on subproblems in parallel Threads.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
MapReduce Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture Chunkservers Master Consistency Model File Mutation Garbage.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
Google File System Sanjay Ghemwat, Howard Gobioff, Shun-Tak Leung Vijay Reddy Mara Radhika Malladi.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: “MapReduce and SQL” & “SSD and DB”
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Advanced Database Systems: DBS CB, 2nd Edition
Google File System.
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Distributed System Gang Wu Spring,2018.
CENG334 Introduction to Operating Systems
Introduction to MapReduce
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
Presentation transcript:

Cloud Computing Lecture #2 From Lisp to MapReduce and GFS Jimmy Lin The iSchool University of Maryland Wednesday, January 30, 2008 Material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details

Today’s Topics Functional Programming MapReduce Google File System (GFS) Lisp MapReduce GFS

Functional Programming MapReduce = functional programming meets distributed processing on steroids Not a new idea… dates back to the 50’s (or even 30’s) What is functional programming? Computation as application of functions Theoretical foundation provided by lambda calculus How is it different? Traditional notions of “data” and “instructions” are not applicable Data flows are implicit in program Different orders of execution are possible Exemplified by LISP and ML Lisp MapReduce GFS

Overview of Lisp Lisp ≠ Lost In Silly Parentheses We’ll focus on particular a dialect: “Scheme” Lists are primitive data types Functions written in prefix notation '(1 2 3 4 5) '((a 1) (b 2) (c 3)) (+ 1 2)  3 (* 3 4)  12 (sqrt (+ (* 3 3) (* 4 4)))  5 (define x 3)  x (* x 5)  15 Lisp MapReduce GFS

Functions Functions are defined by binding lambda expressions to variables Syntactic sugar for defining functions Above expressions is equivalent to: Once defined, function can be applied: (define foo (lambda (x y) (sqrt (+ (* x x) (* y y))))) (define (foo x y) (sqrt (+ (* x x) (* y y)))) Lisp MapReduce GFS (foo 3 4)  5

Other Features In Scheme, everything is an s-expression No distinction between “data” and “code” Easy to write self-modifying code Higher-order functions Functions that take other functions as arguments (define (bar f x) (f (f x))) Doesn’t matter what f is, just apply it twice. (define (baz x) (* x x)) (bar baz 2)  16 Lisp MapReduce GFS

Recursion is your friend Simple factorial example Even iteration is written with recursive calls! (define (factorial n) (if (= n 1) 1 (* n (factorial (- n 1))))) (factorial 6)  720 (define (factorial-iter n) (define (aux n top product) (if (= n top) (* n product) (aux (+ n 1) top (* n product)))) (aux 1 n 1)) (factorial-iter 6)  720 Lisp MapReduce GFS

Lisp  MapReduce? What does this have to do with MapReduce? After all, Lisp is about processing lists Two important concepts in functional programming Map: do something to everything in a list Fold: combine results of a list in some way Lisp MapReduce GFS

Map Map is a higher-order function How map works: Function is applied to every element in a list Result is a new list f f f f f Lisp MapReduce GFS

Fold Fold is also a higher-order function How fold works: Accumulator set to initial value Function applied to list element and the accumulator Result stored in the accumulator Repeated for every item in the list Result is the final value in the accumulator f f f f f final value Lisp MapReduce GFS Initial value

Map/Fold in Action Simple map example: Fold examples: Sum of squares: (map (lambda (x) (* x x)) '(1 2 3 4 5))  '(1 4 9 16 25) (fold + 0 '(1 2 3 4 5))  15 (fold * 1 '(1 2 3 4 5))  120 (define (sum-of-squares v) (fold + 0 (map (lambda (x) (* x x)) v))) (sum-of-squares '(1 2 3 4 5))  55 Lisp MapReduce GFS

Lisp  MapReduce Let’s assume a long list of records: imagine if... We can distribute the execution of map operations to multiple nodes We have a mechanism for bringing map results back together in the fold operation That’s MapReduce! (and Hadoop) Implicit parallelism: We can parallelize execution of map operations since they are isolated We can reorder folding if the fold function is commutative and associative Lisp MapReduce GFS

Typical Problem Iterate over a large number of records Map: extract something of interest from each Shuffle and sort intermediate results Reduce: aggregate intermediate results Generate final output Key idea: provide an abstraction at the point of these two operations Lisp MapReduce GFS

MapReduce Programmers specify two functions: map (k, v) → <k’, v’>* reduce (k’, v’) → <k’, v’>* All v’ with the same k’ are reduced together Usually, programmers also specify: partition (k’, number of partitions ) → partition for k’ Often a simple hash of the key, e.g. hash(k’) mod n Allows reduce operations for different keys in parallel Lisp MapReduce GFS

It’s just divide and conquer! Data Store map Initial kv pairs map Initial kv pairs map Initial kv pairs Initial kv pairs map k1, values… k1, values… k1, values… k1, values… k3, values… k3, values… k3, values… k3, values… k2, values… k2, values… k2, values… k2, values… Barrier: aggregate values by keys k1, values… k2, values… k3, values… Lisp MapReduce GFS reduce reduce reduce final k1 values final k2 values final k3 values

Recall these problems? How do we assign work units to workers? What if we have more work units than workers? What if workers need to share partial results? How do we aggregate partial results? How do we know all the workers have finished? What if workers die? Lisp MapReduce GFS

MapReduce Runtime Handles data distribution Handles scheduling Gets initial data to map workers Shuffles intermediate key-value pairs to reduce workers Optimizes for locality whenever possible Handles scheduling Assigns workers to map and reduce tasks Handles faults Detects worker failures and restarts Everything happens on top of GFS (later) Lisp MapReduce GFS

“Hello World”: Word Count Map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_values: EmitIntermediate(w, "1"); Reduce(String key, Iterator intermediate_values): // key: a word, same for input and output // intermediate_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result)); Lisp MapReduce GFS

Behind the scenes… Lisp MapReduce GFS

Bandwidth Optimizations Take advantage of locality Move the process to where the data is! Use “Combiner” functions Executed on same machine as mapper Results in a “mini-reduce” right after the map phase Reduces key-value pairs to save bandwidth When can you use combiners? Lisp MapReduce GFS

Skew Problem Issue: reduce is only as fast as the slowest map Solution: redundantly execute map operations, use results of first to finish Addresses hardware problems... But not issues related to inherent distribution of data Lisp MapReduce GFS

Data, Data, More Data All of this depends on a storage system for managing all the data… That’s where GFS (Google File System), and by extension HDFS in Hadoop Lisp MapReduce GFS

Assumptions High component failure rates “Modest” number of HUGE files Inexpensive commodity components fail all the time “Modest” number of HUGE files Just a few million (!!!) Each is 100MB or larger; multi-GB files typical Files are write-once, mostly appended to Perhaps concurrently Large streaming reads High sustained throughput favored over low latency Lisp MapReduce GFS

GFS Design Decisions Files stored as chunks Fixed size (64MB) Reliability through replication Each chunk replicated across 3+ chunkservers Single master to coordinate access, keep metadata Simple centralized management No data caching Little benefit due to large data sets, streaming reads Familiar interface, but customize the API Simplify the problem; focus on Google apps Add snapshot and record append operations Lisp MapReduce GFS

GFS Architecture Single master Mutiple chunkservers Lisp MapReduce GFS Can anyone see a potential weakness in this design?

Single master From distributed systems we know this is a Single point of failure Scalability bottleneck GFS solutions: Shadow masters Minimize master involvement Never move data through it, use only for metadata (and cache metadata at clients) Large chunk size Master delegates authority to primary replicas in data mutations (chunk leases) Simple, and good enough! Lisp MapReduce GFS

Metadata Global metadata is stored on the master File and chunk namespaces Mapping from files to chunks Locations of each chunk’s replicas All in memory (64 bytes / chunk) Fast Easily accessible Master has an operation log for persistent logging of critical metadata updates Persistent on local disk Replicated Checkpoints for faster recovery Lisp MapReduce GFS

Mutations Mutation = write or append Goal: minimize master involvement Must be done for all replicas Goal: minimize master involvement Lease mechanism: Master picks one replica as primary; gives it a “lease” for mutations Primary defines a serial order of mutations All replicas follow this order Data flow decoupled from control flow Lisp MapReduce GFS

Atomic Record Append Client specifies data GFS appends it to the file atomically at least once GFS picks the offset Works for concurrent writers Used heavily by Google apps E.g., for files that serve as multiple-producer/single-consumer queues Lisp MapReduce GFS

Relaxed Consistency Model “Consistent” = all replicas have the same value “Defined” = replica reflects the mutation, consistent Some properties: Concurrent writes leave region consistent, but possibly undefined Failed writes leave the region inconsistent Some work has moved into the applications: E.g., self-validating, self-identifying records Google apps can live with it What about other apps? Lisp MapReduce GFS

Master’s Responsibilities (1/2) Metadata storage Namespace management/locking Periodic communication with chunkservers Give instructions, collect state, track cluster health Chunk creation, re-replication, rebalancing Balance space utilization and access speed Spread replicas across racks to reduce correlated failures Re-replicate data if redundancy falls below threshold Rebalance data to smooth out storage and request load Lisp MapReduce GFS

Master’s Responsibilities (2/2) Garbage Collection Simpler, more reliable than traditional file delete Master logs the deletion, renames the file to a hidden name Lazily garbage collects hidden files Stale replica deletion Detect “stale” replicas using chunk version numbers Lisp MapReduce GFS

Fault Tolerance High availability Data integrity Fast recovery: master and chunkservers restartable in a few seconds Chunk replication: default 3 replicas Shadow masters Data integrity Checksum every 64KB block in each chunk Lisp MapReduce GFS

Parallelization Problems How do we assign work units to workers? What if we have more work units than workers? What if workers need to share partial results? How do we aggregate partial results? How do we know all the workers have finished? What if workers die? How do we assign work units to workers? What if we have more work units than workers? What if workers need to share partial results? How do we aggregate partial results? How do we know all the workers have finished? What if workers die?

Managing Dependencies Remember: Mappers run in isolation You have no idea in what order the mappers run You have no idea on what node the mappers run You have no idea when each mapper finishes Question: what if your computation is a non-commutative operation on mapper results? Answer: Cleverly “hide” dependencies in the reduce stage The reducer can hold state across multiple map operations Careful choice of partition function Careful choice of sorting function Example: computing conditional probabilities

Other things to beware of… Object creation overhead Reading in external resources is tricky Possibility of creating hotspots in underlying file system