Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.

Slides:

Advertisements

Similar presentations

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.

Advertisements

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.

ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

An Effective GPU Implementation of Breadth-First Search Lijuan Luo, Martin Wong and Wen-mei Hwu Department of Electrical and Computer Engineering, UIUC.

Frequent Itemset Mining on Graphics Processors Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He 1, Qiong Luo Hong Kong Univ. of Sci.

FP-Growth algorithm Vasiljevic Vladica,

Data Mining Association Analysis: Basic Concepts and Algorithms

Large Scale Machine Learning based on MapReduce & GPU Lanbo Zhang.

Rakesh Agrawal Ramakrishnan Srikant

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

Weekly Report Ph.D. Student: Leo Lee date: Oct. 9, 2009.

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining Association Analysis: Basic Concepts and Algorithms

Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.

Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.

Performance and Scalability: Apriori Implementation.

GPGPU platforms GP - General Purpose computation using GPU

Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.

Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.

Challenges Bit-vector approach Conclusion & Future Work A subsequence of a string of symbols is derived from the original string by deleting some elements.

林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.

1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.

Sequential PAttern Mining using A Bitmap Representation

AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.

EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,

YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.

HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

Accelerating image recognition on mobile devices using GPGPU

Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley.

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:

Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?

Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.

Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)

Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.

IIIT Hyderabad Scalable Clustering using Multiple GPUs K Wasif Mohiuddin P J Narayanan Center for Visual Information Technology International Institute.

IP Routing Processing with Graphic Processors Author: Shuai Mu, Xinya Zhang, Nairen Zhang, Jiaxin Lu, Yangdong Steve Deng, Shu Zhang Publisher: IEEE Conference.

QCAdesigner – CUDA HPPS project

INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter:

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

 Genetic Algorithms  A class of evolutionary algorithms  Efficiently solves optimization tasks  Potential Applications in many fields  Challenges.

 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.

ACCELERATING QUERY-BY-HUMMING ON GPU Pascal Ferraro, Pierre Hanna, Laurent Imbert, Thomas Izard ISMIR 2009 Presenter: Chung-Che Wang (Focus on the performance.

CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.

Association Analysis (3)

Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.

HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.

Exploiting Computing Power of GPU for Data Mining Application Wenjing Ma, Leonid Glimcher, Gagan Agrawal.

AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.

A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.

By Shivaraman Janakiraman, Magesh Khanna Vadivelu.

Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Final Review Peixiang Zhao.

Gspan: Graph-based Substructure Pattern Mining

Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.

Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.

Sequential Pattern Mining Using A Bitmap Representation

Real-Time Ray Tracing Stefan Popov.

Spatio-temporal Rule Mining: Issues and Techniques

Vasiljevic Vladica, FP-Growth algorithm Vasiljevic Vladica,

Mining Association Rules from Stars

COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong

732A02 Data Mining - Clustering and Association Analysis

Peng Jiang, Linchuan Chen, and Gagan Agrawal

Objectives Data Mining Course

Market Basket Analysis and Association Rules

Geometrically Inspired Itemset Mining*

Presentation transcript:

Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009

Outline References CUDA Work plan

Outline References CUDA Work plan

References

Frequent itemset mining on graphics Introduction –Two representative algorithms: Apriori and FP- growth; FP-growth were generally faster than Apriori; Apriori-borgelt was slightly faster when the support was high; –No prior work focuses on studying the GPU acceleration for FIM algorithms. –Challenge: the data structure is not aligned and access patterns are not regular (pointer-chasing).

Frequent itemset mining on graphics Background and related work-GPGPU –The parallel primitives [19] are a small set of common operations exploiting the architectural features of GPUs. We utilize map, reduce, and prefix sum primitives in our two FIM implementations. –Improvement - Memory optimizations: Local memory optimization for temporal locality Coalesced access optimization of device memory for spatial locality The built-in vector data type to reduce the number of memory access. –Difference we study the GPU acceleration of Apriori for FIM, which incurs much more complex control fows and memory accesses than performing database joins or maintaining quantiles from data streams.

Frequent itemset mining on graphics Implementation

Frequent itemset mining on graphics Implementation

Frequent itemset mining on graphics Implementation-Pure Bitmap Implementation

Frequent itemset mining on graphics Implementation-PBI Given m frequent (K ¡1)-itemsets, and n items. In order to check whether one (K ¡ 1)-itemset is frequent, we need to access (logm*(n/128)*16) bytes of data, where logm is the cost of performing a binary search, and (n/128)*16 is the size of a row (in bytes) in the bitmap of (K¡1)- itemsets. Typically, if m = and n = 10000, we need to access about 16 KB for checking only one (K ¡ 1)-subset. This problem in our pure bitmap- based solution triggers us to consider adopting another data structure in the Candidate Generation procedure in the presence of a large number of items.

Frequent itemset mining on graphics Implementation-Trie based Implemetation The candidate generation based on trie traversal is implemented on the CPU. This decision is based on the fact that, the trie is an irregular structure and difficult to share among SIMD threads. Thus, we store the trie representing itemsets in the CPU memory, and the bitmap representation of transactions in the GPU device memory.

Frequent itemset mining on graphics Implementation-TBI

Frequent itemset mining on graphics Experiments

Frequent itemset mining on graphics Experiments

Frequent itemset mining on graphics Results

Frequent itemset mining on graphics Results

Frequent itemset mining on graphics Results

Frequent itemset mining on graphics Results

Outline References CUDA Work plan

CUDA Review the code of K-means –CPU: 1101 S (10 S) –GPU: still need debug, no results right now

Outline References CUDA Work plan

Work Plan Summary this month Make plan for next month Try to implement a data mining algorithm Homework

References Key wordsGoogle scholarACM portal GPU decision tree2, GPU k-means GPU SVM GPU Apriori1, GPU Expectation Maximization GPU PageRank4,260 5 GPU AdaBoost GPU k-nn GPU Naive Bayes 1042 (false positive) GPU CART1,0403 (false positive)

Thanks for your listening