Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley.

Slides:



Advertisements
Similar presentations
Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25.
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth
Frequent Closed Pattern Search By Row and Feature Enumeration
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner Takeaki Uno Tatsuya Asai Hiroaki Arimura Yuzo.
A Fast High Utility Itemsets Mining Algorithm Ying Liu,Wei-keng Liao,and Alok Choudhary KDD’05 Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Frequent Itemsets from Uncertain Data Presented by Chun-Kit Chui, Ben Kao, Edward Hung Department of Computer Science, The University of Hong Kong.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
Data Mining Association Analysis: Basic Concepts and Algorithms
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
Fast Vertical Mining Using Diffsets Mohammed J. Zaki Karam Gouda
ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery.
1 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,
Mining High Utility Itemset in Big Data
Mining Frequent Patterns without Candidate Generation.
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
Implementation of “A New Two-Phase Sampling Based Algorithm for Discovering Association Rules” Tokunbo Makanju Adan Cosgaya Faculty of Computer Science.
A Test Paradigm for Detecting Changes in Transactional Data Streams Willie Ng and Manoranjan Dash DASFAA 2008.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Mining Top-K High Utility Itemsets Date: 2013/04/08 Author: Cheng Wei Wu, Bai-En Shie, Philip S. Yu, Vincent S. Tseng Source: KDD ’12 Advisor: Dr. Jia-Ling.
Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:
1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Interactive Discovery of Influential Friends from Social Networks By: Behzad Rezaie In the Name of God Professor: Dr. Mashayekhi May 11, 2014
From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu, Guimei Liu, Hongjun Lu, Proc. of the 2002 IEEE International.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
By Shivaraman Janakiraman, Magesh Khanna Vadivelu.
Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.
CFI-Stream: Mining Closed Frequent Itemsets in Data Streams
Finding Maximal Frequent Itemsets over Online Data Streams Adaptively
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
UP-Growth: An Efficient Algorithm for High Utility Itemset Mining
Frequent Pattern Mining
The Concept of Maximal Frequent Itemsets
CARPENTER Find Closed Patterns in Long Biological Datasets
Mining Frequent Itemsets over Uncertain Databases
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
A Parameterised Algorithm for Mining Association Rules
Transactional data Algorithm Applications
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
732A02 Data Mining - Clustering and Association Analysis
Frequent-Pattern Tree
FP-Growth Wenlong Zhang.
DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004
Junqiang Liu, Rong Zhao, Xiangcai Yang, Yong Zhang, Xiaoning Jiang
Presentation transcript:

Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley Western Australia PAKDD08 Efficient Mining of High Utility Itemsets from Large Datasets 1

Outline Introduction Preliminaries Method – Compressed Transaction Utility-Prol Experiments Conclusions 2

Introduction The goal of frequent itemset mining is to find items that co-occur in a transaction database above a user given frequency threshold, without considering the quantity or weight such as profit of the items. Quantity and weight are significant for addressing real world decision problems that require maximizing the utility in an organization. TwoPhase based on Apriori is suitable for sparse data sets with short patterns, CTU-Mine based on the pattern growth is suitable for dense data. 3

Definition u(3 4, t1) =$60 u(3 4, t3)=$60 u(3 4) = $120, 4

Definition Transaction Utility : Transaction weighted Utility: tu(1) = 80 twu(3 4)=$190 5

Compressed Transaction Utility-Prol 99<min_Utility(129.9) GlobalItem index Original item id Profit Quantity TWU

Compressed Utility Pattern-Tree Parallel projection of transaction database 7

CUP-tree Traverse index 1 (110) from 5, 2 (310) from (2,3,4), 3 (195) from 2, and 4 (190)from (3,5) 8

ProCUP-tree index 1 (110) from 5, cause 110<min_Utility(129.9) 2 (310) from (2,3,4),3 (195) from 2, and 4 (190)from (3,5) 9

ProCUP-tree oriUtility*itemQuantity + proUtility*proQuantity = Utility 35*2+25*2=120, 150*1+25*1=175, 10*5+25*3=125 High_Utility_Itemset = (3,2) (3,2,1) GlobalItem index Original item id ProItem index Profit Quantity TWU

Experiments 11

Conclusion CTU-Pro algorithm to mine the complete set of high utility itemsets from both sparse and relatively dense datasets with short or longer high utility patterns. The algorithm adapts to large data by constructing parallel subdivisions on disk that can be mined independently. 12