Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li.

Similar presentations


Presentation on theme: "Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li."— Presentation transcript:

1 Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li

2 Threads, Locks, and Messages ML experts repeatedly solve the same parallel design challenges: – Implement and debug complex parallel system – Tune for a specific parallel platform – Two months later the conference paper contains: “We implemented ______ in parallel.” The resulting code: – is difficult to maintain – is difficult to extend – couples learning model to parallel implementation 2

3 Map-Reduce / Hadoop Build learning algorithms on-top of high-level parallel abstractions... a better answer:

4 Motivation Large-Scale Data Processing – Want to use 1000s of CPUs But don ’ t want hassle of managing things MapReduce provides – Automatic parallelization & distribution – Fault tolerance – I/O scheduling – Monitoring & status updates

5 Map/Reduce map(key, val) is run on each item in set – emits new-key / new-val pairs reduce(key, vals) is run for each unique key emitted by map() – emits final output

6 Count count in docs map(key=url, val=contents): For each word w in contents, emit (w, “ 1 ” ) reduce(key=word, values=uniq_counts): Sum all “ 1 ” s in values list Emit result “ (word, sum) ” see bob throw see spot run see1 bob1 run1 see 1 spot 1 throw1 bob1 run1 see 2 spot 1 throw1

7 Grep – Input consists of (url+offset, single line) – map(key=url+offset, val=line): If contents matches regexp, emit (line, “ 1 ” ) – reduce(key=line, values=uniq_counts): Don ’ t do anything; just emit line

8 Reverse Web-Link Graph Map – For each URL linking to target, … – Output pairs Reduce – Concatenate list of all source URLs – Outputs: pairs

9 Job Processing JobTracker TaskTracker 0 TaskTracker 1TaskTracker 2 TaskTracker 3TaskTracker 4TaskTracker 5 1.Client submits “ grep ” job, indicating code and input files 2.JobTracker breaks input file into k chunks, (in this case 6). Assigns work to ttrackers. 3.After map(), tasktrackers exchange map- output to build reduce() keyspace 4.JobTracker breaks reduce() keyspace into m chunks (in this case 6). Assigns work. 5.reduce() output may go to NDFS “ grep ”

10 Execution

11 Parallel Execution

12

13

14

15

16

17

18 Refinement: Locality Optimization Master scheduling policy: – Asks GFS for locations of replicas of input file blocks – Map tasks scheduled so GFS input block replica are on same machine or same rack Effect – Thousands of machines read input at local disk speed Without this, rack switches limit read rate Combiner – Useful for saving network bandwidth

19 Belief Propagation Label Propagation Kernel Methods Deep Belief Networks Neural Networks Tensor Factorization PageRank Lasso Map-Reduce for Data-Parallel ML Excellent for large data-parallel tasks! 19 Data-ParallelGraph-Parallel Cross Validation Feature Extraction Map Reduce Computing Sufficient Statistics Is there more to Machine Learning ?

20 Properties of Graph Parallel Algorithms Dependency Graph Iterative Computation What I Like What My Friends Like Factored Computation

21 ? Belief Propagation Label Propagation Kernel Methods Deep Belief Networks Neural Networks Tensor Factorization PageRank Lasso Map-Reduce for Data-Parallel ML Excellent for large data-parallel tasks! 21 Data-ParallelGraph-Parallel Cross Validation Feature Extraction Map Reduce Computing Sufficient Statistics Map Reduce?

22 Why not use Map-Reduce for Graph Parallel Algorithms?

23 Data Dependencies Map-Reduce does not efficiently express dependent data – User must code substantial data transformations – Costly data replication Independent Data Rows

24 Slow Processor Iterative Algorithms Map-Reduce not efficiently express iterative algorithms: Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Iterations Barrier

25 MapAbuse: Iterative MapReduce Only a subset of data needs computation: Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Iterations Barrier

26 MapAbuse: Iterative MapReduce System is not optimized for iteration: Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Data CPU 1 CPU 2 CPU 3 Iterations Disk Penalty StartupPenalty

27 Belief Propagation SVM Kernel Methods Deep Belief Networks Neural Networks Tensor Factorization PageRank Lasso Map-Reduce for Data-Parallel ML Excellent for large data-parallel tasks! 27 Data-ParallelGraph-Parallel Cross Validation Feature Extraction Map Reduce Computing Sufficient Statistics Map Reduce? GraphLab

28 The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 28

29 Data Graph 29 A graph with arbitrary data (C++ Objects) associated with each vertex and edge. Vertex Data: User profile text Current interests estimates Edge Data: Similarity weights Graph: Social Network

30 Implementing the Data Graph Multicore Setting In Memory Relatively Straight Forward – vertex_data(vid)  data – edge_data(vid,vid)  data – neighbors(vid)  vid_list Challenge: – Fast lookup, low overhead Solution: – Dense data-structures – Fixed Vdata&Edata types – Immutable graph structure

31 The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 31

32 label_prop(i, scope){ // Get Neighborhood data (Likes[i], W ij, Likes[j])  scope; // Update the vertex data // Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); } Update Functions 32 An update function is a user defined program which when applied to a vertex transforms the data in the scopeof the vertex

33 The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 33

34 The Scheduler 34 CPU 1 CPU 2 The scheduler determines the order that vertices are updated. e e f f g g k k j j i i h h d d c c b b a a b b i i h h a a i i b b e e f f j j c c Scheduler The process repeats until the scheduler is empty.

35 Choosing a Schedule GraphLab provides several different schedulers – Round Robin: vertices are updated in a fixed order – FIFO: Vertices are updated in the order they are added – Priority: Vertices are updated in priority order 35 The choice of schedule affects the correctness and parallel performance of the algorithm Obtain different algorithms by simply changing a flag! --scheduler=roundrobin --scheduler=fifo --scheduler=priority

36 The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 36

37 Ensuring Race-Free Code How much can computation overlap?

38 GraphLab Ensures Sequential Consistency 38 For each parallel execution, there exists a sequential execution of update functions which produces the same result. CPU 1 CPU 2 Single CPU Single CPU Parallel Sequential time

39 CPU 1 CPU 2 Common Problem: Write-Write Race 39 Processors running adjacent update functions simultaneously modify shared data: CPU1 writes:CPU2 writes: Final Value

40 Consistency Rules 40 Guaranteed sequential consistency for all update functions Data

41 Full Consistency 41

42 Obtaining More Parallelism 42

43 Edge Consistency 43 CPU 1 CPU 2 Safe Read

44 Consistency Through R/W Locks Read/Write locks: – Full Consistency – Edge Consistency Write Canonical Lock Ordering ReadWrite Read Write

45 Consistency Through Scheduling Edge Consistency Model: – Two vertices can be Updated simultaneously if they do not share an edge. Graph Coloring: – Two vertices can be assigned the same color if they do not share an edge. Barrier Phase 1 Barrier Phase 2 Barrier Phase 3


Download ppt "Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li."

Similar presentations


Ads by Google