Concurrency in Go CS 240 – Fall 2018 Rec. 2.

Slides:



Advertisements
Similar presentations
MapReduce Simplified Data Processing on Large Clusters
Advertisements

 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
MapReduce.
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,
1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.
MapReduce: Simplified Data Processing on Large Clusters Cloud Computing Seminar SEECS, NUST By Dr. Zahid Anwar.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Lecture 3-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) August 31, 2010 Lecture 3  2010, I. Gupta.
MapReduce.
Frankie Pike. 2010: 1.2 zettabytes 1.2 trillion gigabytes DVDs past the moon 2-way = 6 newspapers everyday ~58% growth per year Why care?
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Vyassa Baratham, Stony Brook University April 20, 2013, 1:05-2:05pm cSplash 2013.
Distributed Computing with Turing Machine. Turing machine  Turing machines are an abstract model of computation. They provide a precise, formal definition.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Data Parallelism Task Parallel Library (TPL) The use of lambdas Map-Reduce Pattern FEN 20141UCN Teknologi/act2learn.
Map-Reduce examples 1. So, what is it? A two phase process geared toward optimizing broad, widely distributed parallel computing platforms Apache Hadoop.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014.
Teaching Parallelism in a Python- Based CS1 at a Small Institution Challenges, Technical and Non-Technical Material, And CS2013 coverage Steven Bogaerts.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Lecture 4. MapReduce Instructor: Weidong Shi (Larry), PhD
Lecture 4: Mapreduce and Hadoop
”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.
MapReduce “MapReduce allows us to stop thinking about fault tolerance.” Cathy O’Neil & Rachel Schutt, 2013.
Hadoop Aakash Kag What Why How 1.
Large-scale file systems and Map-Reduce
Hadoop MapReduce Framework
MapReduce Types, Formats and Features
Data Virtualization Tutorial: XSLT and Streaming Transformations
Map Reduce.
Subtraction by counting on
Central Florida Business Intelligence User Group
Lecture 3. MapReduce Instructor: Weidong Shi (Larry), PhD
Parallel and Distributed Computing
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
COS 518: Distributed Systems Lecture 10 Andrew Or, Mike Freedman
COS 418: Distributed Systems Lecture 1 Mike Freedman
湖南大学-信息科学与工程学院-计算机与科学系
EECS 498 Introduction to Distributed Systems Fall 2017
Cse 344 May 2nd – Map/reduce.
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
Concurrency Wrap-Up & Final Review
CS110: Discussion about Spark
Ch 4. The Evolution of Analytic Scalability
Chapter 2 Lin and Dyer & MapReduce Basics Chapter 2 Lin and Dyer &
CIS 488/588 Bruce R. Maxim UM-Dearborn
Data processing with Hadoop
IST256 : Applications Programming for Information Systems
CS 345A Data Mining MapReduce This presentation has been altered.
Distributed Computing:
A Short Intro to Go CS 240 – Fall 2018 Rec. 1.
Introduction to MapReduce
AGENDA Buzz word. AGENDA Buzz word What is BIG DATA ? Big Data refers to massive, often unstructured data that is beyond the processing capabilities.
CS639: Data Management for Data Science
5/7/2019 Map Reduce Map reduce.
Chapter 2 Lin and Dyer & MapReduce Basics Chapter 2 Lin and Dyer &
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
Concurrency in GO CS240 23/8/2017.
Distributed Systems and Concurrency: Map Reduce
Presentation transcript:

Concurrency in Go CS 240 – Fall 2018 Rec. 2

Housekeeping Should have a working doMap() in Assignment 1

We Should Probably Teach you Map Reduce The Hello World of Map Reduce: Word Count If we have time: Let’s Make, a very basic, Google Maps from Raw Data (A Solution to the Final Project for CS 245 – Databases) You’re welcome

Abstract Map Reduce map(key, value) -> list(<k’, v’>) Apply function to (key, value) pair Outputs set of intermediate pairs reduce(key, list<value>) -> <k’, v’> Applies aggregation function to values Outputs result

Word Count – The Hello World of Map Reduce

A Motivating Problem for Map Reduce “Find me the closest Starbucks to KAUST. Actually, I’ll give you a place and something to look for, and you find me the closest one. Here’s a 1 TB text file … good luck” GPS Coordinates Site Name [22.3, 39.1] Tim Hortons [22.2, 39.1] KAUST Library [35.7, 139.7] Starbucks ... ... In KAUST In Tokyo, Japan “It’s ok, I didn’t want to enjoy my weekend anyway”

A Motivating Problem for Map Reduce GPS Coordinates Site Name [22.3, 39.1] Tim Hortons [22.2, 39.1] KAUST Library [35.7, 139.7] Starbucks ... ... 1 2 3 4 (0,0) (0,1) (0,2) (0,3) (0,4) (1,0) (1,1) (1,2) (1,3) (1,4) (2,0) (2,1) (2,2) (2,3) (2,4) (3,0) (3,1) (3,2) (3,3) (3,4) Map to grids 1 Reduce to single files 2 3

Split the File and Map Each Chunk Independently (1/2) Mapping Nodes Mapper GPS Coordinates Site Name [22.3, 39.1] Tim Hortons [22.2, 39.1] KAUST Library [35.7, 139.7] Starbucks ... ... [42.0, 69.0] Chanak Train Stop [22.2, 39.2] Burger King Mapper

Split the File and Map Each Chunk Independently (2/2) KEY <grid>: VALUE <locations and name> ... Mapping Nodes Notice the duplicate grids (KEYS) (1,2): [22.3, 39.1] Tim Hortons (1,2): [22.2, 39.1] KAUST Library (1,2): ... (2,4): [35.7, 139.7] Starbucks (2,4): ... Mapper GPS Coordinates Site Name [22.3, 39.1] Tim Hortons [22.2, 39.1] KAUST Library [35.7, 139.7] Starbucks ... ... [42.0, 69.0] Chanak Train [22.2, 39.2] Burger King (1,3): [42.0, 69.0] Chanak Train (1,3): ... (1,2): [22.2, 39.2] Burger King (1,2): ... Mapper

Collect the Mapper Results and Reduce to Single Files (1/2) Reducing Nodes (1,2): [22.3, 39.1] Tim Hortons (1,2): [22.2, 39.1] KAUST Library (1,2): ... (2,4): [35.7, 139.7] Starbucks (2,4): ... Reducer (1,2) (1,3): [42.0, 69.0] Chanak Train (1,3): ... (1,2): [22.2, 39.2] Burger King (1,2): ... Reducer (1,3), (2,4)

Collect the Mapper Results and Reduce to Single Files (2/2) KEY <grid>: [ VELUES <locations and names>, ...] Reducing Nodes (1,2): [ [22.3, 39.1] Tim Hortons, [22.2, 39.1] KAUST Library, [22.2, 39.2] Burger King ...] ] (1,2): [22.3, 39.1] Tim Hortons (1,2): [22.2, 39.1] KAUST Library (1,2): ... (2,4): [35.7, 139.7] Starbucks (2,4): ... Reducer (1,2) (2,4): [ [35.7, 139.7] Starbucks, ...] (1,3): [42.0, 69.0] Chanak Train (1,3): ... (1,2): [22.2, 39.2] Burger King (1,2): ... Reducer (1,3), (2,4) (1,3): [ [42.0, 69.0] Chanak Train Stop, ...],

How Hadoop Does it (1/2) Mapping Nodes Mapper Mapper

How Hadoop Does it (2/2) Mapper Reducer Mapper Reducer Mapping Nodes Reducing Nodes Mapper Reducer Mapper Reducer

It’s like parallel that’s not in parallel What is Concurrency? It’s like parallel that’s not in parallel

What is Parallelism? Time Sequential Parallel f(X) f(X) = A f(Y) f(Y) = B Parallel f(X) f(X) = A f(Y) f(Y) = B

Parallelism in Go Demo: parallel.go

What is Concurrency? Time Sequential Concurrent f(X) f(X) = A f(Y) f(Y) = B Concurrent f(X) f(X) = A f(Y) f(Y) = B

Concurrency Could be Parallel but not Always Time Concurrent but not Parallel f(X) f(X) = A f(Y) f(Y) = B Concurrent and Parallel f(X) f(X) = A f(Y) f(Y) = B f(Z) f(Z) = A f(W) f(W) = B

Parallel is Always Concurrent Time Parallel but not Concurrent? f(X) f(X) = A f(Y) f(Y) = B Nope … still concurrent Parallel → Concurrent Concurrent ↛ Parallel

Why Care about Concurrency If something concurrent but not parallel takes as much time as something sequential, why make it concurrent? Time Sequential f(X) f(X) = A f(Y) f(Y) = B Concurrent f(X) f(X) = A f(Y) f(Y) = B

Concurrency is a Design Pattern “Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once.” - Rob Pike ”Concurrency is not Parallelism” by Rob Pike : https://talks.golang.org/2012/waza.slide#1

Distributed Systems are Unpredictable Servers need to react to: Others servers Crashes Users …

The Design Problem Concurrency Solves Demo: concurrent.go

Making Bank Deposits Concurrent (1/5) Add($10) Server Database Add($10) Time

Making Bank Deposits Concurrent (2/5) Add($10) Server Read x = 0 Database Add($10) read $0 Time

Making Bank Deposits Concurrent (3/5) Add($10) Server Read x = 0 x += 10 Write x Database 10 Add($10) read $0 $10 Time

Making Bank Deposits Concurrent (4/5) Add($10) Server Read x = 0 x += 10 Write x x = 10 Database 10 Add($10) read $0 $10 read $10 Time

Making Bank Deposits Concurrent (5/5) Add($10) Server Read x = 0 x += 10 Write x x = 10 Database 20 Add($10) read $0 $10 read $10 Time $20

Concurrent Bank Deposits! Yay? (1/5) Add($10) Server Database Add($10) Time

Concurrent Bank Deposits! Yay? (2/5) Add($10) Server Read x = 0 Database Add($10) read $0 Time

Concurrent Bank Deposits! Yay? (3/5) Add($10) Server Read x = 0 Database Add($10) read $0 read $0 Time

Concurrent Bank Deposits! Yay? (4/5) Add($10) Server Read x = 0 x += 10 Write x Database 10 Add($10) read $0 read $0 $10 Time

Concurrent Bank Deposits! Yay? (5/5) Add($10) Server Read x = 0 x += 10 Write x Database 10 Add($10) read $0 read $0 $10 $10 Time

Concurrency Needs to be Synchronized Locks – limit access using shared memory Channels – pass information using a queue

Channels, Locks and More Demo: sync.go

Visualize Everything We’ve Learned And also see many different methods of achieving synchronization: http://divan.github.io/posts/go_concurrency_visualize/