Frequent Itemset Mining on Graphics Processors Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He 1, Qiong Luo Hong Kong Univ. of Sci.

Slides:

Advertisements

Similar presentations

Association Rule Mining

Advertisements

Mars: A MapReduce Framework on Graphics Processors Bingsheng He 1, Wenbin Fang, Qiong Luo Hong Kong Univ. of Sci. and Tech. Naga K. Govindaraju Tuyong.

Frequent Closed Pattern Search By Row and Feature Enumeration

Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.

Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,

1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.

Sparse LU Factorization for Parallel Circuit Simulation on GPU Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Department of Electronic Engineering,

Large Scale Machine Learning based on MapReduce & GPU Lanbo Zhang.

Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant.

Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.

Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:

Data Mining Association Analysis: Basic Concepts and Algorithms

Spring 2003Data Mining by H. Liu, ASU1 5. Association Rules Market Basket Analysis and Itemsets APRIORI Efficient Association Rules Multilevel Association.

Spring 2005CSE 572, CBS 598 by H. Liu1 5. Association Rules Market Basket Analysis and Itemsets APRIORI Efficient Association Rules Multilevel Association.

Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.

Association Analysis: Basic Concepts and Algorithms.

Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.

Data Mining Association Analysis: Basic Concepts and Algorithms

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.

Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.

Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

Performance and Scalability: Apriori Implementation.

GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.

GPGPU platforms GP - General Purpose computation using GPU

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta and Andrew Lumsdaine. International Conference on Parallel Computing (ParCO),2009 Extending Task Parallelism.

林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.

GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.

Sequential PAttern Mining using A Bitmap Representation

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.

Mining High Utility Itemset in Big Data

9/03Data Mining – Association G Dong (WSU) 1 5. Association Rules Market Basket Analysis APRIORI Efficient Mining Post-processing.

Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.

Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.

LCM ver.3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

Data Mining Find information from data data ? information.

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.

Association Analysis (3)

Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,

Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta and Andrew Lumsdaine. International Conference on Parallel Computing (ParCO),2009 Extending Task Parallelism.

Sunpyo Hong, Hyesoon Kim

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.

1 The Strategies for Mining Fault-Tolerant Patterns Jia-Ling Koh Department of Information and Computer Education National Taiwan Normal University.

Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.

1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.

ECLAT Jelena Stančić 3212/2015. E CLAT Equivalence Class Transformation A Frequent Itemset Mining (FIM) algorithm scans the database, and finds item-sets.

Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.

Data Mining and Its Applications to Image Processing

Frequent Pattern Mining

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Rule Mining

Data Mining Association Analysis: Basic Concepts and Algorithms

A Parameterised Algorithm for Mining Association Rules

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Analysis: Basic Concepts

Presentation transcript:

Frequent Itemset Mining on Graphics Processors Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He 1, Qiong Luo Hong Kong Univ. of Sci. and Tech. Microsoft Research Asia 1 Presenter: Wenbin Fang

2/33 Outline Contribution Introduction Design Evaluation Conclusion

3/33 Contribution Accelerate the Apriori algorithm for Frequent Itemset Mining using Graphics Processors (GPUs). Two GPU implementations: 1.Pure Bitmap-based implementation (PBI): processing entirely on the GPU. 2.Trie-based implementation (TBI): GPU/CPU co-processing.

4/33 Frequent Itemset Mining (FIM) Finding groups of items, or itemsets that co-occur frequently in a transaction database. Transaction IDItems 1A, B, C, D 2A, B, D 3A, C, D 4C, D Minimum support: 2 1-itemsets (frequent items): A : 3 B : 2 C : 3 D : 4

5/33 Frequent Itemset Mining (FIM) Aims at finding groups of items, or itemsets that co-occur frequently in a transaction database. Transaction IDItems 1A, B, C, D 2A, B, D 3A, C, D 4C, D Minimum support: 2 1-itemsets (frequent items): A, B, C, D 2-itemsets: AB: 2 AC: 2 AD: 3 BD: 2 CD: 3

6/33 Frequent Itemset Mining (FIM) Aims at finding groups of items, or itemsets that co-occur frequently in a transaction database. Transaction IDItems 1A, B, C, D 2A, B, D 3A, C, D 4C, D Minimum support: 2 1-itemsets (frequent items): A, B, C, D 2-itemsets: AB, AC, AD, BD, CD 3-itemsets: ABD, ACD

7/33 Graphics Processors (GPUs) Exist in commodity machines, mainly for graphics rendering. Specialized for compute-intensive, highly data parallel apps. Compared with CPUs, GPUs provide 10x faster computational horsepower, and 10x higher memory bandwidth. CPUGPU --From NVIDA CUDA Programming Guide

8/33 Programming on GPUs OpenGL/DirectX AMD CTM NVIDIA CUDA The many-core architecture model of the GPU SIMD parallelism (Single Instruction, Multiple Data)

9/33 Hierarchical multi-threaded in NVIDIA CUDA … A warp = 32 GPU threads => SIMD schedule unit. …………… Thread Block # of threads in a thread block # of thread blocks Thread Block Warp

10/33 General Purpose GPU Computing (GPGPU) Applications utilizing GPUs Scientific computing Molecular Dynamics Simulation Weather forecasting Linear algebra Computational finance Database applications Basic DB Operators [SIGMOD’04] Sorting [SIGMOD’06] Join [SIGMOD’08]

11/33 Our work As a first step, we consider the GPU-based Apriori, with intention to extend to another efficient FIM algorithm -- FP-growth. Why Apriori? 1.a classic algorithm for mining frequent itemsets. 2.also applied in other data mining tasks, e.g., clustering, and functional dependency.

12/33 The Apriori Algorithm Input: 1) Transaction Database 2) Minimum support Output: All frequent itemsets L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } Frequent 1-itemsets Candidate 2-itemsets Frequent 2-itemsets Candidate 3-itemsets Frequent 3-itemsets … Candidate (K-1)-itemsets Frequent (K-1)-itemsets Candidate K-itemsets Frequent K-itemsets

13/33 Outline Contribution Introduction Design Evaluation Conclusion

14/33 L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } GPU-based Apriori Input: 1) Transaction Database 2) Minimum support Output: All frequent itemsets L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } Pure Bitmap-based Impl. (PBI) Itemsets: bitmap Candidate generation on the GPU Transactions: bitmap Support counting on the GPU Trie-based Impl. (TBI) Transactions: bitmap Support counting on the GPU Itemsets: Trie Candidate generation on the CPU

15/33 Horizontal and Vertical data layout Horizontal data layout TIDItems 1A, B, C, D 2A, B, D 3A, C, D 4C, D Vertical data layout ItemsetTID List AB1, 2 AC1, 3 AD1, 2, 3 BD1, 2, 3 CD1, 3, 4 ItemsetTID List ABD1, 2 ACD1, 3 Scan all transactions Support counting is done on specific itemsets. 1.Intersect two transaction lists. 2.Count the number of transactions in the intersection result.

16/33 Bitmap representation for transactions T1T2T3T4 AB1100 AC1010 AD1110 BD1110 CD1011 Intersection = bitwise AND operation # of transactions # of itemsets Counting = # of 1’s in a string of bits

17/33 Lookup table IndexCount … Lookup table T1T2T3T4 ABD1100 ACD1010 Bitmap representation for transactions Constant memory 1.Cacheable 2.64 KB 3.Shared by all GPU threads IndexCount (0) (1)1 … (65534) (65535)16 1 byte 2 16 = # of 1’s = TABLE[12]; // decimal: 12 // binary: 1100 // (a string of bits)

18/33 Support Counting on the GPU (Cont.) Thread block 1 Thread block 2 T1T2T3T4 AB1100 AC1010 AD1110 BD1110 CD1011 T1T2T3T4 ABD1100 ACD Intersect two transaction lists. 2.Count the number of transaction in the intersection result. LOOKUP TABLE 2

19/33 Support Counting on the GPU (Cont.) int AND int LOOKUP TABLE Counts of 1’s for every 16-bit integer Parallel Reduce Support for this itemset Thread Block Thread 1 Thread 2 Access vector type int4 In one instruction Example: Counts: 2 Support:2 AD AB ABD

20/33 Input: 1) Transaction Database 2) Minimum support Output: All frequent itemsets L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. Join Subset test //Generate frequent k-itemsets Support Counting k += 1 } Support Counting on the GPU Candidate Generation 1.Join e.g., Join two 2-itemsets to obtain a candidate 3-itemset: AAA AC JOIN AD => ACD 2.Subset test e.g., Test all 2-subsets of ACD: {AC, AD, CD} GPU-based Apriori

21/33 L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } GPU-based Apriori Input: 1) Transaction Database 2) Minimum support Output: All frequent itemsets L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } Pure Bitmap-based Impl. (PBI) Itemsets: bitmap Candidate generation on the GPU Transactions: bitmap Support counting on the GPU Trie-based Impl. (TBI) Transactions: bitmap Support counting on the GPU Itemsets: Trie Candidate generation on the CPU Itemsets: bitmap Candidate generation on the GPU

Pure Bitmap-based Impl. (PBI) One GPU thread generates one candidate itemset. 22/33 ABCD ABD1101 ACD1011 ABCD AB1100 AC1010 AD1001 BD0101 CD0011 Bitwise OR In Join (e.g., AB JOIN AD = ABD) Binary search In Subset test (e.g., 2-subsets {AB, AD, BD}) # of items # of itemsets

23/33 L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } GPU-based Apriori Input: 1) Transaction Database 2) Minimum support Output: All frequent itemsets L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } Pure Bitmap-based Impl. (PBI) Itemsets: bitmap Candidate generation on the GPU Transactions: bitmap Support counting on the GPU Trie-based Impl. (TBI) Transactions: bitmap Support counting on the GPU Itemsets: Trie Candidate generation on the CPU Itemsets: Trie Candidate generation on the CPU

D 24/33 Trie-based Impl. (TBI) Root ABCD BCDDD 1-itemsets: {A, B, C, D} 2-itemsets: {AB, AC, AD, BD, CD} A AB B JOINAC C = ABC C {AB, AC, BC} B ABJOINAD= ABD {AB, AD, BD} D DD D ACJOINAD= ACD {AC, AD, CD} On CPU 1, Irregular memory access 2, Branch divergence C D Candidate 3-itemsets: { ABD, ACD} Depth 0 Depth 1 Depth 2

25/33 Outline Contribution Introduction Design Evaluation Conclusion

26/33 Experimental setup Intel Core2 quad-core CPU NV GTX 280 GPU Processors 2.66 GHz * GHz * 30 * 8 Memory Bandwidth (GB/sec) Development Env. Windows XP + Visual Studio CUDA Dataset#ItemAvg. Length #TranDensity T40I10D100K (synthetic) 1, ,0004% Retail16, ,1620.6% Chess % Platform: Experimental datasets: Density = Avg. Length / # items

27/33 Apriori Implementations Impl.Candidate Generation Support Countnig ItemsetsTransactions BORGELTSingle-threaded on the CPU Trie GOETHALSSingle-threaded on the CPU Multi-threaded on the CPU TrieHorizontal layout TBI-CPUSingle-threaded on the CPU Multi-threaded on the CPU TrieBitmap TBI-GPUSingle-threaded on the CPU Multi-threaded on the GPU TrieBitmap PBI-GPUMulti-threaded on the GPU Bitmap Best Apriori implementation in FIMI repository. (Frequent Itemset Mining Implementations Repository)

28/33 TBI-CPU vs GOETHALS Impl.Itemset/Candidat e Generation Transactions / Suport Counting TBI-CPU Trie / CPUBitmap / CPU GOETHALS Trie / CPUHorizontal layout / CPU Dense Dataset - Chess Sparse Dataset- Retail The impact of using bitmap representation for transactions in support counting. 1.2x ~ 25.7x

Sparse Dataset- Retail 29/33 TBI-GPU vs TBI-CPU Impl.Itemset/Candidat e Generation Transactions / Suport Counting TBI-GPU Trie / CPUBitmap / GPU TBI-CPU Trie / CPUBitmap / CPU Dense Dataset - Chess The impact of GPU acceleration in support counting. 1.1x ~ 7.8x

30/33 PBI-GPU vs TBI-GPU Impl.Itemset/Candidat e Generation Transactions / Suport Counting PBI-GPU Bitmap / GPU TBI-GPU Trie / CPUBitmap / GPU Sparse Dataset- Retail Dense Dataset - Chess The impact of bitmap-based itemset and trie-based itemset in candidate generation. PBI-GPU is faster in dense dataset. TBI-GPU is better in sparse dataset.

31/33 PBI-GPU/TBI-CPU vs BORGELT Impl.Itemset/Candidat e Generation Transactions / Suport Counting PBI-GPU Bitmap / GPU TBI-GPU Trie / CPUBitmap / GPU BORGELT Trie /CPU Sparse Dataset- Retail Dense Dataset - Chess Comparison to the best Apriori implementation in FIMI. 1.2x ~ 24.2x

32/33 Comparison to FP-growth With minsup 1%, 60%, and 0.01% PARSEC benchmark

33/33 Conclusion GPU-based Apriori Pure Bitmap-based impl. Bitmap Representation for itemsets. Bitmap Representation for transactions. GPU processing. Trie-based impl. Trie Representation for itemsets. Bitmap Representation for transactions. GPU + CPU co-processing. Better than CPU-based Apriori. Still worse than CPU-based FP-growth

Backup Slide Time breakdown on dense dataset Chess Time breakdown on dense dataset Retail Time Breakdown