Concurrency for data-intensive applications

Slides:

Advertisements

Similar presentations

Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Parallel Computing MapReduce Examples Parallel Efficiency Assignment

CS 345 Computer System Overview

Weekly Report Ph.D. Student: Leo Lee date: Oct. 9, 2009.

Distributed Computations

MapReduce: Simplified Data Processing on Large Clusters Cloud Computing Seminar SEECS, NUST By Dr. Zahid Anwar.

1: Operating Systems Overview

MapReduce Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, Vol. 51, No. 1, January Shahram.

MapReduce Simplified Data Processing on Large Clusters Google, Inc. Presented by Prasad Raghavendra.

Distributed Computations MapReduce

L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.

Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.

“Evaluating MapReduce for Multi-core and Multiprocessor Systems” Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis Computer.

MapReduce ： Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.

Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.

By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.

MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.

Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.

Computer System Architectures Computer System Software

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.

Map Reduce and Hadoop S. Sudarshan, IIT Bombay

LOGO OPERATING SYSTEM Dalia AL-Dabbagh

Process Management. Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication.

Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.

Take a Close Look at MapReduce Xuanhua Shi. Acknowledgement  Most of the slides are from Dr. Bing Chen,

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

Parallel Programming Models Basic question: what is the “right” way to write parallel programs –And deal with the complexity of finding parallelism, coarsening.

Operating System 4 THREADS, SMP AND MICROKERNELS

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.

1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.

MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.

MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.

Map Reduce: Simplified Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google, Inc. OSDI ’04: 6 th Symposium on Operating Systems Design.

MAP REDUCE : SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS Presented by: Simarpreet Gill.

MapReduce How to painlessly process terabytes of data.

MapReduce M/R slides adapted from those of Jeff Dean’s.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming.

MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.

MapReduce on FutureGrid Andrew Younge Jerome Mitchell.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.

Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.

MapReduce ： Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Full and Para Virtualization

C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.

MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.

MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.

COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University

1.3 Operating system services An operating system provide services to programs and to the users of the program. It provides an environment for the execution.

MapReduce: Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc.

Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.

TensorFlow– A system for large-scale machine learning

Accelerating MapReduce on a Coupled CPU-GPU Architecture

MapReduce Simplied Data Processing on Large Clusters

COS 418: Distributed Systems Lecture 1 Mike Freedman

February 26th – Map/Reduce

Cse 344 May 4th – Map/Reduce.

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Distributed System Gang Wu Spring，2018.

5/7/2019 Map Reduce Map reduce.

Operating System Overview

COS 518: Distributed Systems Lecture 11 Mike Freedman

MapReduce: Simplified Data Processing on Large Clusters

Presentation transcript:

Concurrency for data-intensive applications MapReduce Concurrency for data-intensive applications Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems MapReduce Jeff Dean Sanjay Ghemawat Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems Motivation Application characteristics Large/massive amounts of data Simple application processing requirements Desired portability across variety of execution platforms Execution platforms Cluster CMP/SMP GPGPU Architecture SPMD MIMD SIMD Granularity Process Thread x 10 Thread x 100 Partition File Buffer Sub-array Bandwidth Scare GB/sec GB/sec x 10 Failures Common Uncommon Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems Motivation Programming model Purpose Focus developer time/effort on salient (unique, distinguished) application requirements Allow common but complex application requirements (e.g., distribution, load balancing, scheduling, failures) to be met by support environment Enhance portability via specialized run-time support for different architectures Pragmatics Model correlated with characteristics of application domain Allows simpler model semantics and more efficient support environment May not express well applications in other domains Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems MapReduce model Basic operations Map: produce a list of (key, value) pairs from the input structured as a (key value) pair of a different type (k1,v1)  list (k2, v2) Reduce: produce a list of values from an input that consists of a key and a list of values associated with that key (k2, list(v2))  list(v2) Note: inspired by map/reduce functions in Lisp and other functional programming languages. Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems Example map(String key, String value) : // key: document name // value: document contents for each word w in value: EmitIntermediate(w, “1”); reduce(String key, Iterator values) : // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems Example: map phase inputs tasks (M=3) partitions (intermediate files) (R=2) When in the course of human events it … (when,1), (course,1) (human,1) (events,1) (best,1) … map (in,1) (the,1) (of,1) (it,1) (it,1) (was,1) (the,1) (of,1) … It was the best of times and the worst of times… Over the past five years, the authors and many… map (over,1), (past,1) (five,1) (years,1) (authors,1) (many,1) … (the,1), (the,1) (and,1) … This paper evaluates the suitability of the … map (this,1) (paper,1) (evaluates,1) (suitability,1) … (the,1) (of,1) (the,1) … Note: partition function places small words in one partition and large words in another. Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems Example: reduce phase partition (intermediate files) (R=2) reduce task (in,1) (the,1) (of,1) (it,1) (it,1) (was,1) (the,1) (of,1) … sort run-time function (the,1), (the,1) (and,1) … (the,1) (of,1) (the,1) … (and, (1)) (in,(1)) (it, (1,1)) (the, (1,1,1,1,1,1)) (of, (1,1,1)) (was,(1)) reduce user’s function (and,1) (in,1) (it, 2) (of, 3) (the,6) (was,1) Note: only one of the two reduce tasks shown Dennis Kafura – CS5204 – Operating Systems

Execution Environment Dennis Kafura – CS5204 – Operating Systems

Execution Environment No reduce can begin until map is complete Tasks scheduled based on location of data If map worker fails any time before reduce finishes, task must be completely rerun Master must communicate locations of intermediate files Note: figure and text from presentation by Jeff Dean. Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems Backup Tasks A slow running task (straggler) prolong overall execution Stragglers often caused by circumstances local to the worker on which the straggler task is running Overload on worker machined due to scheduler Frequent recoverable disk errors Solution Abort stragglers when map/reduce computation is near end (progress monitored by Master) For each aborted straggler, schedule backup (replacement) task on another worker Can significantly improve overall completion time Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems Backup Tasks (2) with backup tasks (normal) (1) without backup tasks Dennis Kafura – CS5204 – Operating Systems

Strategies for Backup Tasks (1) Create replica of backup task when necessary Note: figure from presentation by Jerry Zhao and Jelena Pjesivac-Grbovic Dennis Kafura – CS5204 – Operating Systems

Strategies for Backup Tasks (2) Leverage work completed by straggler - avoid resorting Note: figure from presentation by Jerry Zhao and Jelena Pjesivac-Grbovic Dennis Kafura – CS5204 – Operating Systems

Strategies for Backup Tasks (3) Increase degree of parallelism – subdivide partitions Note: figure from presentation by Jerry Zhao and Jelena Pjesivac-Grbovic Dennis Kafura – CS5204 – Operating Systems

Positioning MapReduce Note: figure from presentation by Jerry Zhao and Jelena Pjesivac-Grbovic Dennis Kafura – CS5204 – Operating Systems

Positioning MapReduce Note: figure from presentation by Jerry Zhao and Jelena Pjesivac-Grbovic Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems MapReduce on SMP/CMP SMP memory L2 cache L1 cache . . . CMP memory L2 cache L1 . . . Dennis Kafura – CS5204 – Operating Systems

Phoenix runtime structure Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems Code size Comparison with respect to sequential code size Observations Concurrency add significantly to code size ( ~ 40%) MapReduce is code efficient in compatible applications Overall, little difference in code size of MR vs Pthreads Pthreads version lacks fault tolerance, load balancing, etc. Development time and correctness not known Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems Speedup measures Significant speedup is possible on either architecture Clear differences based on application characteristics Effects of application characteristics more pronounced than architectural differences Superlinear speedup due to Increased cache capacity with more cores Distribution of heaps lowers heap operation costs More core and cache capacity for final merge/sort step Dennis Kafura – CS5204 – Operating Systems

Execution time distribution Execution time dominated by Map task Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems MapReduce vs Pthreads MapReduce compares favorably with Pthreads on applications where the MapReduce programming model is appropriate MapReduce is not a general-purpose programming model Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems MapReduce on GPGPU General Purpose Graphics Processing Unit (GPGPU) Available as commodity hardware GPU vs. CPU 10x more processors in GPU GPU processors have lower clock speed Smaller caches on GPU Used previously for non-graphics computation in various application domains Architectural details are vendor-specific Programming interfaces emerging Question Can MapReduce be implemented efficiently on a GPGPU? Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems GPGPU Architecture Many Single-instruction, Multiple-data (SIMD) multiprocessors High bandwidth to device memory GPU threads: fast context switch, low creation time Scheduling Threads on each multiprocessor organized into thread groups Thread groups are dynamically scheduled on the multiprocessors GPU cannot perform I/O; requires support from CPU Application: kernel code (GPU) and host code (CPU) Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems System Issues Challenges Requires low synchronization overhead Fine-grain load balancing Core tasks of MapReduce are unconventional to GPGPU and must be implemented efficiently Memory management No dynamic memory allocation Write conflicts occur when two threads write to the same shared region Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems System Issues Optimizations Two-step memory access scheme to deal with memory management issue Steps Determine size of output for each thread Compute prefix sum of output sizes Results in fixed size allocation of correct size and allows each thread to write to pre-determined location without conflict Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems System Issues Optimizations (continued) Hashing (of keys) Minimizes more costly comparison of full key value Coalesced accesses Access by different threads to consecutive memory address are combined into one operation Keys/values for threads are arranged in adjacent memory locations to exploit coalescing Built in vector types Data may consist of multiple items of same type For certain types (char4, int4) entire vector can be read as a single operations Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems Mars Speedup Compared to Phoenix Optimizations Hashing (1.4-4.1X) Coalesced accesses (1.2-2.1X) Built-in vector types (1.1-2.1X) Dennis Kafura – CS5204 – Operating Systems

Execution time distribution Significant execution time in infrastructure operations IO Sort Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems Co-processing Co-processing (speed-up vs. GPU only) CPU – Phoenix GPU - Mars Dennis Kafura – CS5204 – Operating Systems

Dennis Kafura – CS5204 – Operating Systems Overall Conclusion MapReduce is an effective programming model for a class of data-intensive applications MapReduce is not appropriate for some applications MapReduce can be effectively implemented on a variety of platforms Cluster CMP/SMP GPGPU Dennis Kafura – CS5204 – Operating Systems