Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta and Andrew Lumsdaine. International Conference on Parallel Computing (ParCO),2009 Extending Task Parallelism.

Similar presentations


Presentation on theme: "Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta and Andrew Lumsdaine. International Conference on Parallel Computing (ParCO),2009 Extending Task Parallelism."— Presentation transcript:

1 Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta and Andrew Lumsdaine. International Conference on Parallel Computing (ParCO),2009 Extending Task Parallelism For Frequent Pattern Mining.

2 Overview Introduce Frequent Pattern Mining (FPM). Formal definition. Apriori algorithm for FPM. Task-parallel implementation of Apriori. Requirements for efficient parallelization. Cilk-style task scheduling Shortcomings w.r.t Apriori Clustered task scheduling policy Results

3 FPM: A Formal Definition Let I = {i ₁, i ₂, … i n } be a set of n items. Let D = { T ₁, T ₂ …, T m } be a set of m transactions such that T i ⊆  A set i ⊆ I of size k is called k-itemset Support of k-itemset is ∑j = 1, m ( 1: i ⊆  j ) The number of transactions in D having i as a subset. “Frequent Pattern Mining problem aims to find all i ∈ D that have a support are ≥ to a user supplied value”.

4 Apriori Algorithm for FPM TIDItem 1ABCE 2BCAF 3GHAC 4ADBH 5EDAB 6ABCD 7BDAG 8ACDB Transaction Database

5 Apriori Algorithm TIDItem 1ABCE 2BCAF 3GHAC 4ADBH 5EDAB 6ABCD 7BDAG 8ACDB A A B B C C D D E E F F G G H H 12345678 1245678 12368 45678 12 37 3 2 Transaction Database TID List

6 Apriori Algorithm for FPM A A B B C C D D 12345678 1245678 12368 45678 1245678 AB CD 68 Join Support (AB) = 87.5% Support (CD) = 25%

7 Apriori Algorithm for FPM Transaction Database A A B B C C D D E E F F G G H H Support = 37.5% (3/8) A A B B C C D D E E F F G G H H CD Spawn Wait All AB AC AD BC BD ABC ABD

8 Cilk-style parallelization 1 2 3 45 6 7 89 1011 Order of discovery 11 5 3 12 4 10 69 78 Order of completion Depth-first discovery, post-order finish n n n-1 n-2 n-3 n-4 n-3 n-4 n-5 n-6 1 Thread

9 Cilk-style parallelization Thd 1Thd 2 n Thd 1Thd 2 n-2 n-1 n Thd 1Thd 2 n-2n-1 n Thd 1Thd 2 nn-4 n-3 n-2 n-1 1. Breadth-first theft. 2. Steal one task at a time. 3. Stealing is expensive. Steal (n-1)Steal (n-3) Thread-local Deques n n n-1 n-2 n-3 n-4 n-3 n-4 n-5 n-6 Thd 1Thd 2 n-3n-4 nn-2 n-1

10 Efficient Parallelization of FPM AB AC AD A A ABC ABD AB Shortcomings of Cilk-style w.r.t FPM: 1. Exploits data locality only b/w parent-child tasks. 2.Stealing does not consider data locality. 3. Tasks are stolen one at a time. Tasks with overlapping memory accesses: 1. Executed by the same thread. 2. Stolen together by the same thread.

11 Clustered Scheduling Policy Cluster k-itemset based on common (k-1) prefix AB AC AD ABC ABD 1. Hash Table - std::hash_map. Hash(A) Hash(A) xor Hash(B) Thread-local deque Thread-local hash table Hash Table 2. Hash - std::hash.

12 Clustered Scheduling Policy AB AC AD ABC ABD Hash(A) Hash(A) xor Hash(B) Thd 1 Hash Table Thd 2 Hash Table

13 Clustered Scheduling Policy AB AC AD Steal an entire bucket of tasks. Hash(A) Thd 1 Hash Table ABC ABD Hash(A) xor Hash(B) Thd 2 Hash Table

14 Where does PFunc fit in? Customizable task scheduling and priorities. Cilk-style, LIFO, FIFO, Priority-based scheduling built-in. Custom scheduling policies are simple to implement. Eg.,Clustered scheduling policy. Chosen at compile time. Much like STL (Eg., stl::vector ). namespace pfunc { struct hashS: public schedS{}; template struct scheduler { … }; } // namespace pfunc

15 So, how does it work? Select Scheduling Policy and priority Hash Table-Based Reference to itemset Task T; SetPriority (T, ref (ABD)); Spawn (T); Task T; SetPriority (T, ref (ABD)); Spawn (T); Program GetPriority (T) - ABC Generate Hash Key Hash(A) xor Hash(B) Generate Hash Key Hash(A) xor Hash(B) Place task Scheduler ABC ABD Task Queue BCD BCE

16 Performance Analysis 8 Threads Dual AMD 8356, Linux 2.6.24, GCC 4.3.2

17 Performance Analysis - IPC DatasetSupportIPC(Cilk)IPC(Clustered) accidents0.250.5950.604 chess0.60.5600.669 connect0.80.5430.809 kosark0.00130.6920.717 pumsb0.750.4940.719 pumsb_star0.30.5270.698 mushroom0.100.5700.705 T40I10D100K0.0050.6270.727 T10I4D100K0.000060.5560.716 8 Threads Higher the better! Dual AMD 8356, Linux 2.6.24, GCC 4.3.2

18 Performance Analysis – L1 DTLB Misses DatasetSupportCilk DTLB L1M/L2H Clustered DTLB L1M/L2H accidents0.250.0000480.000046 chess0.60.0007970.000242 connect0.80.0002490.000112 kosark0.00130.0004000.000185 pumsb0.750.0002300.000114 pumsb_star0.30.0003150.000145 mushroom0.100.0004770.000267 T40I10D100K0.0050.0003680.000305 T10I4D100K0.000060.0002180.000144 8 Threads Lower the better! Dual AMD 8356, Linux 2.6.24, GCC 4.3.2

19 Performance Analysis – L2 DTLB Misses DatasetSupportCilk DTLB L1M/L2M Clustered DTLB L1M/L2M accidents0.250.0001610.000110 chess0.60.0010060.000032 connect0.80.0012040.000141 kosark0.00130.0006590.000123 pumsb0.750.0012760.000126 pumsb_star0.30.0010820.000114 mushroom0.100.0009500.000022 T40I10D100K0.0050.0009000.000021 T10I4D100K0.000060.0008760.000044 8 Threads Lower the better! Dual AMD 8356, Linux 2.6.24, GCC 4.3.2

20 Conclusions For task parallel FPM. Clustered scheduling outperforms Cilk-style. Exploits data locality. Better work-stealing policy. PFunc provides support for facile customizations. Task scheduling policy, task priorities, etc. Being released under COIN-OR. Eclipse Public License version 1.0. Future work. Task queues based on multi-dimensional index structures. K-d trees.

21 Fibonacci 37 ThreadsCilk (secs)PFunc/Cil k TBB/CilkPFunc/TBB 12.172.27184.4310.5004 21.152.11354.19240.5041 40.552.21314.41830.5009 80.282.21144.98390.4437 160.152.49445.93700.4201 2x faster than TBB 2x slower than Cilk. But provides more flexibility. Fibonacci is the worst case behavior!


Download ppt "Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta and Andrew Lumsdaine. International Conference on Parallel Computing (ParCO),2009 Extending Task Parallelism."

Similar presentations


Ads by Google