Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li.

Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li

Threads, Locks, and Messages ML experts repeatedly solve the same parallel design challenges: – Implement and debug complex parallel system – Tune for a specific parallel platform – Two months later the conference paper contains: “We implemented ______ in parallel.” The resulting code: – is difficult to maintain – is difficult to extend – couples learning model to parallel implementation 2

Map-Reduce / Hadoop Build learning algorithms on-top of high-level parallel abstractions... a better answer:

Motivation Large-Scale Data Processing – Want to use 1000s of CPUs But don ’ t want hassle of managing things MapReduce provides – Automatic parallelization & distribution – Fault tolerance – I/O scheduling – Monitoring & status updates

Map/Reduce map(key, val) is run on each item in set – emits new-key / new-val pairs reduce(key, vals) is run for each unique key emitted by map() – emits final output

Count count in docs map(key=url, val=contents): For each word w in contents, emit (w, “ 1 ” ) reduce(key=word, values=uniq_counts): Sum all “ 1 ” s in values list Emit result “ (word, sum) ” see bob throw see spot run see1 bob1 run1 see 1 spot 1 throw1 bob1 run1 see 2 spot 1 throw1

Grep – Input consists of (url+offset, single line) – map(key=url+offset, val=line): If contents matches regexp, emit (line, “ 1 ” ) – reduce(key=line, values=uniq_counts): Don ’ t do anything; just emit line

Reverse Web-Link Graph Map – For each URL linking to target, … – Output pairs Reduce – Concatenate list of all source URLs – Outputs: pairs

Job Processing JobTracker TaskTracker 0 TaskTracker 1TaskTracker 2 TaskTracker 3TaskTracker 4TaskTracker 5 1.Client submits “ grep ” job, indicating code and input files 2.JobTracker breaks input file into k chunks, (in this case 6). Assigns work to ttrackers. 3.After map(), tasktrackers exchange map- output to build reduce() keyspace 4.JobTracker breaks reduce() keyspace into m chunks (in this case 6). Assigns work. 5.reduce() output may go to NDFS “ grep ”

Execution

Parallel Execution

Refinement: Locality Optimization Master scheduling policy: – Asks GFS for locations of replicas of input file blocks – Map tasks scheduled so GFS input block replica are on same machine or same rack Effect – Thousands of machines read input at local disk speed Without this, rack switches limit read rate Combiner – Useful for saving network bandwidth

Belief Propagation Label Propagation Kernel Methods Deep Belief Networks Neural Networks Tensor Factorization PageRank Lasso Map-Reduce for Data-Parallel ML Excellent for large data-parallel tasks! 19 Data-ParallelGraph-Parallel Cross Validation Feature Extraction Map Reduce Computing Sufficient Statistics Is there more to Machine Learning ?

Properties of Graph Parallel Algorithms Dependency Graph Iterative Computation What I Like What My Friends Like Factored Computation

? Belief Propagation Label Propagation Kernel Methods Deep Belief Networks Neural Networks Tensor Factorization PageRank Lasso Map-Reduce for Data-Parallel ML Excellent for large data-parallel tasks! 21 Data-ParallelGraph-Parallel Cross Validation Feature Extraction Map Reduce Computing Sufficient Statistics Map Reduce?

Why not use Map-Reduce for Graph Parallel Algorithms?

Data Dependencies Map-Reduce does not efficiently express dependent data – User must code substantial data transformations – Costly data replication Independent Data Rows

Slow Processor Iterative Algorithms Map-Reduce not efficiently express iterative algorithms: Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Iterations Barrier

MapAbuse: Iterative MapReduce Only a subset of data needs computation: Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Iterations Barrier

MapAbuse: Iterative MapReduce System is not optimized for iteration: Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Iterations Disk Penalty StartupPenalty

Belief Propagation SVM Kernel Methods Deep Belief Networks Neural Networks Tensor Factorization PageRank Lasso Map-Reduce for Data-Parallel ML Excellent for large data-parallel tasks! 27 Data-ParallelGraph-Parallel Cross Validation Feature Extraction Map Reduce Computing Sufficient Statistics Map Reduce? GraphLab

The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 28

Data Graph 29 A graph with arbitrary data (C++ Objects) associated with each vertex and edge. Vertex Data: User profile text Current interests estimates Edge Data: Similarity weights Graph: Social Network

Implementing the Data Graph Multicore Setting In Memory Relatively Straight Forward – vertex_data(vid)  data – edge_data(vid,vid)  data – neighbors(vid)  vid_list Challenge: – Fast lookup, low overhead Solution: – Dense data-structures – Fixed Vdata&Edata types – Immutable graph structure

label_prop(i, scope){ // Get Neighborhood data (Likes[i], W ij, Likes[j])  scope; // Update the vertex data // Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); } Update Functions 32 An update function is a user defined program which when applied to a vertex transforms the data in the scopeof the vertex

The Scheduler 34 CPU 1 CPU 2 The scheduler determines the order that vertices are updated. e e f f g g k k j j i i h h d d c c b b a a b b i i h h a a i i b b e e f f j j c c Scheduler The process repeats until the scheduler is empty.

Choosing a Schedule GraphLab provides several different schedulers – Round Robin: vertices are updated in a fixed order – FIFO: Vertices are updated in the order they are added – Priority: Vertices are updated in priority order 35 The choice of schedule affects the correctness and parallel performance of the algorithm Obtain different algorithms by simply changing a flag! --scheduler=roundrobin --scheduler=fifo --scheduler=priority

Ensuring Race-Free Code How much can computation overlap?

GraphLab Ensures Sequential Consistency 38 For each parallel execution, there exists a sequential execution of update functions which produces the same result. CPU 1 CPU 2 Single CPU Single CPU Parallel Sequential time

CPU 1 CPU 2 Common Problem: Write-Write Race 39 Processors running adjacent update functions simultaneously modify shared data: CPU1 writes:CPU2 writes: Final Value

Consistency Rules 40 Guaranteed sequential consistency for all update functions Data

Full Consistency 41

Obtaining More Parallelism 42

Edge Consistency 43 CPU 1 CPU 2 Safe Read

Consistency Through R/W Locks Read/Write locks: – Full Consistency – Edge Consistency Write Canonical Lock Ordering ReadWrite Read Write

Consistency Through Scheduling Edge Consistency Model: – Two vertices can be Updated simultaneously if they do not share an edge. Graph Coloring: – Two vertices can be assigned the same color if they do not share an edge. Barrier Phase 1 Barrier Phase 2 Barrier Phase 3

Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li.

Similar presentations

Presentation on theme: "Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li.

Similar presentations

Presentation on theme: "Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li."— Presentation transcript:

Similar presentations

About project

Feedback