CPS110: Introduction to Google Landon Cox April 20, 2009.

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Overview of MapReduce and Hadoop
MAP-REDUCE: WIN EPIC WIN MAP-REDUCE: WIN -OR- EPIC WIN CSC313: Advanced Programming Topics.
Distributed Computations
MapReduce: Simplified Data Processing on Large Clusters Cloud Computing Seminar SEECS, NUST By Dr. Zahid Anwar.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
MapReduce Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, Vol. 51, No. 1, January Shahram.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Distributed Computations MapReduce
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
MapReduce: Simplified Data Processing on Large Clusters
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Introduction to MapReduce Amit K Singh. “The density of transistors on a chip doubles every 18 months, for the same cost” (1965) Do you recognize this.
MapReduce: Acknowledgements: Some slides form Google University (licensed under the Creative Commons Attribution 2.5 License) others from Jure Leskovik.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Parallel Programming Models Basic question: what is the “right” way to write parallel programs –And deal with the complexity of finding parallelism, coarsening.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Map Reduce: Simplified Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google, Inc. OSDI ’04: 6 th Symposium on Operating Systems Design.
MAP REDUCE : SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS Presented by: Simarpreet Gill.
MapReduce How to painlessly process terabytes of data.
Google’s MapReduce Connor Poske Florida State University.
CPS110: Introduction to Google Landon Cox April 10, 2008.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Mass Data Processing Technology on Large Scale Clusters Summer, 2007, Tsinghua University All course material (slides, labs, etc) is licensed under the.
Outline for today  Administrative  Next week: Monday lecture, Friday discussion  Objective  Google File System  Paper: Award paper at SOSP in 2003.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Distributed Computations MapReduce/Dryad M/R slides adapted from those of Jeff Dean’s Dryad slides adapted from those of Michael Isard.
Vamsi Thummala Slides by Prof. Cox.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
MapReduce Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
MapReduce : Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
MapReduce: simplified data processing on large clusters Jeffrey Dean and Sanjay Ghemawat.
The Google File System Landon Cox February 26, 2016.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
MapReduce: Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Lecture 4. MapReduce Instructor: Weidong Shi (Larry), PhD
Large-scale file systems and Map-Reduce
MapReduce Simplied Data Processing on Large Clusters
湖南大学-信息科学与工程学院-计算机与科学系
February 26th – Map/Reduce
Map reduce use case Giuseppe Andronico INFN Sez. CT & Consorzio COMETA
Cse 344 May 4th – Map/Reduce.
Distributed System Gang Wu Spring,2018.
Introduction to MapReduce
5/7/2019 Map Reduce Map reduce.
Presentation transcript:

CPS110: Introduction to Google Landon Cox April 20, 2009

A brief history of Google BackRub: disk drives 24 GB total storage =

A brief history of Google Google: disk drives 366 GB total storage =

A brief history of Google Google: ,000 machines ? PB total storage

A brief history of Google Min 45 containers/data center 45 containers x 1000 servers x 36 sites = ~ 1.6 million servers (lower bound)

Google design principles  Workload: easy to parallelize  Want to take advantage of many processors, disks  Why not buy a bunch of supercomputers?  Leverage parallelism of lots of (slower) cheap machines  Supercomputer price/performance ratio is poor  What is the downside of cheap hardware?  Lots of hardware failures 1.Use lots of cheap, commodity hardware 2.Provide reliability in software

What happens on a query? DNS

What happens on a query? Spell Checker Ad Server Document Servers (TB) Index Servers (TB)

Google hardware model  Google machines are cheap and likely to fail  What must they do to keep things up and running?  Store data in several places (replication)  When one machine fails, shift load onto ones still around  Does replication get you anything else?  Enables more parallel reads  Better performance  Good since vast majority of Google traffic is reads

Fault tolerance and performance  Google machines are cheap and likely to fail  Does it matter how fast an individual machine is?  Somewhat, but not that much  Parallelism enabled by replication has a bigger impact  Any downside to having a ton of machines?  Space  Power consumption  Cooling costs

Fault tolerance and performance  Google machines are cheap and likely to fail  Any workloads where this wouldn’t work?  Lots of writes to the same data  Web examples? (web is mostly read)

Google power consumption  A circa 2003 mid-range server  Draws 90 W of DC power under load  55 W for two CPUs  10 W for disk drive  25 W for DRAM and motherboard  Assume 75% efficient ATX power supply  120 W of AC power per server  10 kW per rack

Google power consumption  A server rack fits comfortably in 25 ft 2  Power density of 400 W/ ft 2  Higher-end server density = 700 W/ ft 2  Typical data centers provide W/ ft 2  Google needs to bring down the power density  Requires extra cooling or space  Lower power servers?  Slower, but must not harm performance  Depreciate faster, but must not affect price/performance

Course administration  Project 3  Due on Wednesday  Several groups are done  Difficulty? Too easy? Too hard?  What if I added a DNS poisoning part?  Next week  Review session on Monday  Exam on Wednesday

OS Complexity  Lines of code  XP: 40 million  Linux 2.6: 6 million  (mostly driver code)  Sources of complexity  Multiple instruction streams (processes)  Multiple interrupt sources (I/O, timers, faults)  How can we keep everything straight?

Complexity in Google  Consider the Google hardware model  Thousands of cheap, commodity machines  Why is this a hard programming environment?  Speed through parallelism (concurrency)  Constant node failure (fault tolerance)

Complexity in Google Google provides abstractions to make programming easier.

Abstractions in Google  Google File System  Provides data-sharing and durability  MapReduce  Makes parallel programming easier  BigTable  Manages large relational data sets  Chubby  Distributed locking service

Problem: lots of data  Example:  20+ billion web pages x 20KB = 400+ terabytes  One computer can read MB/sec from disk  ~four months to read the web  ~1,000 hard drives just to store the web  Even more to do something with the data

Solution: spread the load  Good news  Same problem with 1000 machines, < 3 hours  Bad news: programming work  Communication and coordination  Recovering from machine failures  Status reporting  Debugging and optimizing  Placement  Bad news II: repeat for every problem

Machine hardware reality  Multiple cores  2-6 locally-attached disks  2TB to ~12 TB of disk  Typical machine runs  GFS chunkserver  Scheduler daemon for user tasks  One or many tasks

Machine hardware reality  Single-thread performance doesn’t matter  Total throughput/$ more important than peak perf.  Stuff breaks  One server may stay up for three years (1,000 days)  If you have 10,000 servers, expect to lose 10/day  If you have 1,000,000 servers, expect to lose 1,000/day  “Ultra-reliable” hardware doesn’t really help  Scale trumps minor individual improvements  Still have to deal with fault-tolerance in software

Google hardware reality

MapReduce  Widely applicable, simple programming model  Hides lots of messy details  Automatic parallelization  Load balancing  Network/disk transfer optimization  Handling of machine failures  Robustness  Sounds like a pretty good abstraction!

Typical MapReduce problem 1.Read a lot of data (TBs) 2.Map  Extract something you care about from each record 3.Shuffle and sort Map output 4.Reduce  Aggregate, summarize, filter or transform sorted output 5.Write out the results Outline remains the same, only change the map and reduce functions

More specifically  Programmer specifies two main methods  Map (k,v)  *  Reduce (k’, *)  *  All v’ and k’ are reduced together, in order  Usually also specify  Partition(k’, total partitions)  partition for k’  Often a simple hash of the key  Allows Reduce to be parallelized

Example  Word frequencies in web pages  Input = files with one document/record Map Key=doc. URL Value=doc. content Key=doc. URL Value=doc. content Key’=word Value’=count Key’=word Value’=count Key’=word Value’=count Key’=word Value’=count Key’=word Value’=count Key’=word Value’=count Key’=word Value’=count Key’=word Value’=count Map Key=“document1” Value=“to be or not to be” Key=“document1” Value=“to be or not to be” Key’=“to” Value’=“1” Key’=“to” Value’=“1” Key’=“be” Value’=“1” Key’=“be” Value’=“1” Key’=“or” Value’=“1” Key’=“or” Value’=“1” Key’=“not” Value’=“1” Key’=“not” Value’=“1” Key’=“to” Value’=“1” Key’=“to” Value’=“1” Key’=“be” Value’=“1” Key’=“be” Value’=“1”

Example continued  MapReduce lib gathers all pairs with same key  (shuffle and sort)  Reduce combines values for a key Reduce Key’=“to” Value’=“1” Key’=“to” Value’=“1” Key’=“be” Value’=“1” Key’=“be” Value’=“1” Key’=“or” Value’=“1” Key’=“or” Value’=“1” Key’=“not” Value’=“1” Key’=“not” Value’=“1” Key’=“to” Value’=“1” Key’=“to” Value’=“1” Key’=“be” Value’=“1” Key’=“be” Value’=“1” Key’=“to” Value’=“2” Key’=“to” Value’=“2” Key’=“be” Value’=“2” Key’=“be” Value’=“2” Key’=“or” Value’=“1” Key’=“or” Value’=“1” Key’=“not” Value’=“1” Key’=“not” Value’=“1”

Example pseudo-code Map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_values: EmitIntermediate(w, "1"); Reduce(String key, Iterator intermediate_values): // key: a word, same for input and output // intermediate_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));

Widely applicable at Google  Implemented as a C++ library  Linked to user programs  Can read and write many data types distributed grep distributed sort term-vector per host document clustering machine learning web access log stats web link-graph reversal inverted index construction statistical machine translation

Example: query freq. over time

Example: language model stats  Used in machine learning translation  Need to count # of times every 5-word sequence occurs  Keep all those where count >= 4  Easy with MapReduce:  Map: extract 5-word sequences  count from document  Reduce: combine counts, write out count if large enough

Example: joining with other data  Generate per-doc summary  Include per-host info  E.g. # of pages on host, important terms on host  Easy with MapReduce:  Map  Extract hostname from URL  Lookup per-host info  Combine with per-doc data and emit  Reduce  Identity function (just emit key/value directly)

MapReduce programs at Google

New MapReduce Programs

MapReduce architecture  How is this implemented?  One master, many workers  Input data split into M map tasks (64MB each)  Reduce phase partitioned into R reduce tasks  Tasks are assigned to workers dynamically  Often: M=200,000; R=4,000; workers=2,000

MapReduce architecture 1.Master assigns each map to a free worker  Considers locality of data to worker  Worker reads task input (often from local disk)  Worker produces R local files with k/v pairs 2.Master assigns each reduce task to a free worker  Worker reads intermediate k/v pairs from map workers  Worker sorts & applies user’s Reduce op to get output

MapReduce fault tolerance  Handled via re-execution  On worker failure:  Detect failure via periodic heartbeats  Re-execute completed and in-progress map tasks  Re-execute in progress reduce tasks  Task completion committed through master  What about Master failure?  Log state transformations to Google File System  New master uses log to recover and continue

MapReduce fault tolerance  How likely is master to fail?  Not likely  Individual machine can run for three years  P(node failure)  How likely is it that at least one worker will fail?  Very likely  For N workers  1 – P(no nodes fail)  = 1 – (P(worker1 doesn’t fail)*…*P(workerN doesn’t fail))  = 1 – ((1-P(worker1 failure))*… *(1-P(worker1 failure)))  = 1 – (1-P(node failure))^N Failure exponentially more likely as N grows!!

MapReduce performance Sort 10^ byte records (~1TB) in ~10.5 minutes. 50 lines of C++ code running on 1800 machines.

Goals for the class 1.Demystify the operating system  How does my computer start running?  How does a program load into memory? 2.Demystify Internet systems  How does my browser know where to go?  How does Google work?

Two roles of the OS?

“illusionist” 1

Abstractions, hardware reality Hardware OS Applications CPS 104 CPS 100 CPS 108 P1 P2 (lecture)

“government” 2

Main government functions  Resource allocation (who gets what and when)  Lock acquisition  Processes  Disk requests  Page eviction  Isolation and security (law and order)  Access control  Kernel bit  Authentication

Thanks, it’s been fun! See you next week