Download presentation
Presentation is loading. Please wait.
Published byHoward Ferguson Modified over 8 years ago
1
Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li
2
Threads, Locks, and Messages ML experts repeatedly solve the same parallel design challenges: – Implement and debug complex parallel system – Tune for a specific parallel platform – Two months later the conference paper contains: “We implemented ______ in parallel.” The resulting code: – is difficult to maintain – is difficult to extend – couples learning model to parallel implementation 2
3
Map-Reduce / Hadoop Build learning algorithms on-top of high-level parallel abstractions... a better answer:
4
Motivation Large-Scale Data Processing – Want to use 1000s of CPUs But don ’ t want hassle of managing things MapReduce provides – Automatic parallelization & distribution – Fault tolerance – I/O scheduling – Monitoring & status updates
5
Map/Reduce map(key, val) is run on each item in set – emits new-key / new-val pairs reduce(key, vals) is run for each unique key emitted by map() – emits final output
6
Count count in docs map(key=url, val=contents): For each word w in contents, emit (w, “ 1 ” ) reduce(key=word, values=uniq_counts): Sum all “ 1 ” s in values list Emit result “ (word, sum) ” see bob throw see spot run see1 bob1 run1 see 1 spot 1 throw1 bob1 run1 see 2 spot 1 throw1
7
Grep – Input consists of (url+offset, single line) – map(key=url+offset, val=line): If contents matches regexp, emit (line, “ 1 ” ) – reduce(key=line, values=uniq_counts): Don ’ t do anything; just emit line
8
Reverse Web-Link Graph Map – For each URL linking to target, … – Output pairs Reduce – Concatenate list of all source URLs – Outputs: pairs
9
Job Processing JobTracker TaskTracker 0 TaskTracker 1TaskTracker 2 TaskTracker 3TaskTracker 4TaskTracker 5 1.Client submits “ grep ” job, indicating code and input files 2.JobTracker breaks input file into k chunks, (in this case 6). Assigns work to ttrackers. 3.After map(), tasktrackers exchange map- output to build reduce() keyspace 4.JobTracker breaks reduce() keyspace into m chunks (in this case 6). Assigns work. 5.reduce() output may go to NDFS “ grep ”
10
Execution
11
Parallel Execution
18
Refinement: Locality Optimization Master scheduling policy: – Asks GFS for locations of replicas of input file blocks – Map tasks scheduled so GFS input block replica are on same machine or same rack Effect – Thousands of machines read input at local disk speed Without this, rack switches limit read rate Combiner – Useful for saving network bandwidth
19
Belief Propagation Label Propagation Kernel Methods Deep Belief Networks Neural Networks Tensor Factorization PageRank Lasso Map-Reduce for Data-Parallel ML Excellent for large data-parallel tasks! 19 Data-ParallelGraph-Parallel Cross Validation Feature Extraction Map Reduce Computing Sufficient Statistics Is there more to Machine Learning ?
20
Properties of Graph Parallel Algorithms Dependency Graph Iterative Computation What I Like What My Friends Like Factored Computation
21
? Belief Propagation Label Propagation Kernel Methods Deep Belief Networks Neural Networks Tensor Factorization PageRank Lasso Map-Reduce for Data-Parallel ML Excellent for large data-parallel tasks! 21 Data-ParallelGraph-Parallel Cross Validation Feature Extraction Map Reduce Computing Sufficient Statistics Map Reduce?
22
Why not use Map-Reduce for Graph Parallel Algorithms?
23
Data Dependencies Map-Reduce does not efficiently express dependent data – User must code substantial data transformations – Costly data replication Independent Data Rows
24
Slow Processor Iterative Algorithms Map-Reduce not efficiently express iterative algorithms: Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Iterations Barrier
25
MapAbuse: Iterative MapReduce Only a subset of data needs computation: Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Iterations Barrier
26
MapAbuse: Iterative MapReduce System is not optimized for iteration: Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Iterations Disk Penalty StartupPenalty
27
Belief Propagation SVM Kernel Methods Deep Belief Networks Neural Networks Tensor Factorization PageRank Lasso Map-Reduce for Data-Parallel ML Excellent for large data-parallel tasks! 27 Data-ParallelGraph-Parallel Cross Validation Feature Extraction Map Reduce Computing Sufficient Statistics Map Reduce? GraphLab
28
The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 28
29
Data Graph 29 A graph with arbitrary data (C++ Objects) associated with each vertex and edge. Vertex Data: User profile text Current interests estimates Edge Data: Similarity weights Graph: Social Network
30
Implementing the Data Graph Multicore Setting In Memory Relatively Straight Forward – vertex_data(vid) data – edge_data(vid,vid) data – neighbors(vid) vid_list Challenge: – Fast lookup, low overhead Solution: – Dense data-structures – Fixed Vdata&Edata types – Immutable graph structure
31
The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 31
32
label_prop(i, scope){ // Get Neighborhood data (Likes[i], W ij, Likes[j]) scope; // Update the vertex data // Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); } Update Functions 32 An update function is a user defined program which when applied to a vertex transforms the data in the scopeof the vertex
33
The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 33
34
The Scheduler 34 CPU 1 CPU 2 The scheduler determines the order that vertices are updated. e e f f g g k k j j i i h h d d c c b b a a b b i i h h a a i i b b e e f f j j c c Scheduler The process repeats until the scheduler is empty.
35
Choosing a Schedule GraphLab provides several different schedulers – Round Robin: vertices are updated in a fixed order – FIFO: Vertices are updated in the order they are added – Priority: Vertices are updated in priority order 35 The choice of schedule affects the correctness and parallel performance of the algorithm Obtain different algorithms by simply changing a flag! --scheduler=roundrobin --scheduler=fifo --scheduler=priority
36
The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 36
37
Ensuring Race-Free Code How much can computation overlap?
38
GraphLab Ensures Sequential Consistency 38 For each parallel execution, there exists a sequential execution of update functions which produces the same result. CPU 1 CPU 2 Single CPU Single CPU Parallel Sequential time
39
CPU 1 CPU 2 Common Problem: Write-Write Race 39 Processors running adjacent update functions simultaneously modify shared data: CPU1 writes:CPU2 writes: Final Value
40
Consistency Rules 40 Guaranteed sequential consistency for all update functions Data
41
Full Consistency 41
42
Obtaining More Parallelism 42
43
Edge Consistency 43 CPU 1 CPU 2 Safe Read
44
Consistency Through R/W Locks Read/Write locks: – Full Consistency – Edge Consistency Write Canonical Lock Ordering ReadWrite Read Write
45
Consistency Through Scheduling Edge Consistency Model: – Two vertices can be Updated simultaneously if they do not share an edge. Graph Coloring: – Two vertices can be assigned the same color if they do not share an edge. Barrier Phase 1 Barrier Phase 2 Barrier Phase 3
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.