1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and.

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25.
Salvatore Ruggieri SIGKDD2010 Frequent Regular Itemset Mining 2010/9/2 1.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Data Mining Techniques Association Rule
Frequent Closed Pattern Search By Row and Feature Enumeration
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Zeev Dvir – GenMax From: “ Efficiently Mining Frequent Itemsets ” By : Karam Gouda & Mohammed J. Zaki.
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
A Fast High Utility Itemsets Mining Algorithm Ying Liu,Wei-keng Liao,and Alok Choudhary KDD’05 Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth (Complete Example). First scan – determine frequent 1- itemsets, then build header B8 A7 C7 D5 E3.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
FP-Tree/FP-Growth Practice. FP-tree construction null B:1 A:1 After reading TID=1: After reading TID=2: null B:2 A:1 C:1 D:1.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Rule Generation [Chapter ]
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
Sequential PAttern Mining using A Bitmap Representation
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004.
Mining High Utility Itemset in Big Data
Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley.
LOGO 改善 FP-growth 資料挖掘演算法 在巨大資料庫的效能 CHEN-HUNG Lin 國立高雄大學資訊工程學系 ( 研究所 ) 碩士論文 研究生:黃正男.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
1 The Strategies for Mining Fault-Tolerant Patterns Jia-Ling Koh Department of Information and Computer Education National Taiwan Normal University.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Mining General Temporal Association Rules for Items with Different Exhibition Cheng-Yue Chang, Ming-Syan Chen, Chang-Hung Lee, Proc. of the 2002 IEEE international.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
UP-Growth: An Efficient Algorithm for High Utility Itemset Mining
Frequent Pattern Mining
Association Rules.
Association Rules Zbigniew W. Ras*,#) presented by
Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc
Dynamic Itemset Counting
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
A Parameterised Algorithm for Mining Association Rules
Mining Association Rules from Stars
Data Mining Association Analysis: Basic Concepts and Algorithms
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Fractional Factorial Design
Frequent-Pattern Tree
Lecture 11 (Market Basket Analysis)
Design matrix Run A B C D E
Association Analysis: Basic Concepts
Presentation transcript:

1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and Networking, (ICCCN 2008) Dec Page(s):1 - 6 Speaker : 廖執善 鄭仁傑

2/24 Outline Introduction Mining high utility itemsets Existing Umining algorithm Proposed FUM algorithm Experimental Results Conclusions Future Work

3/24 Introduction (1/4) One of the important issues in data mining is the interestingness problem. The fundamental idea behind mining frequent itemsets is that only item sets with high frequency are of interest to users. A frequent itemset only reflects the statistical correlation between items, and it does not reflect the semantic significance of the items.

4/24 Introduction (2/4)

5/24 Introduction (3/4)

6/24 Introduction (4/4) Motivation : we are using a utility based itemset mining approach to overcome this limitation. Utility based data mining is a new research area interested in all types of utility factors in data mining processes and targeted at incorporating utility considerations in data mining tasks. High utility itemset mining is a research area of utility based data mining, aimed at finding itemsets that contribute high utility.

7/24 Mining high utility itemsets (1/3) A frequent itemset is a set of items that appears at least in a pre- specified number of transactions. Formally, let I = {I 1, I 2,, I m } be a set of items and DB = {T 1, T 2,, T n } a set of transactions where every transaction is also a set of items (i.e. itemset). Given a minimum support threshold minSup an itemset S is frequent iff:

8/24 Mining high utility itemsets (2/3) The following is the set of definitions given in [6] which we shall illustrate on a small example. Definition 1: The external utility of an item i p is a numerical value Y P defined by the user. It is transaction independent and reflects importance (usually profit) of the item. External utilities are stored in a utility table. For example, external utility of item B in Table2 is 10. Definition 2: The internal utility of an item ip is a numerical value x p which is transaction dependent. In most cases it is defined as the quantity of an item in transaction. For example, internal utility of item E in transaction T5 is 2 (see Table 1). Definition 3: Utility function f is a function of two variables: f{x, y) : (R+,R+) R+. The most common form also used in this paper is the product of internal and external utility: X p x Y p

9/24 Mining high utility itemsets (3/3) Definition 4: The utility of item ip in transaction T is the quantitative measure computed with utility function from Definition 3 (i.e.) u (i p, T) = f(X p, Y p ), i p  T Definition 5: The utility of itemset S in transaction T is defined as Definition 6: Itemset S is of high utility iff U(S)  minUtil where minUtil is user defined utility threshold in percents of the total utility of the database. Definition 7: High utility itemset mining is the problem of finding set H defined as where ‘I’ is the set of items (attributes).

10/24 Existing Umining algorithm (1/4)

11/24 Existing Umining algorithm (2/4)

12/24 Existing Umining algorithm (3/4)

13/24 Existing Umining algorithm (4/4) Minutil=196(threshold) k=4(numbers of level) Level1 I={A, B, C, D} by scan function and assigned to C1 Using calculate and store function u(A)=110, u(B)=200, u(C)=190, u(D)=85 Using Discover function H={B} bigger than Minutil Levet2 I={AB, AC, AD, BC, BD, CD} by generation function Using Prune function b(AB)=310, b(AC)=300, b(AD)=195, b(BC)=390, b(BD)=285, b(CD)=275, because b(AD)<Minutil, so omitted it. Therefore, C2={AB, AC, BC, BD, CD} Using calculate and store function u(AB)=105, u(AC)=197, u(BC)=138, u(BD)=211, u(CD)=193 Using Discover function H={AC, BD} bigger than Minutil Levet3 I={ABC, ABD, ACD, BCD} by generation function Using Prune function b(ABC)=220, b(ABD)=225.5, b(ACD)=262.5, b(BCD)=271, none omitted it. Using calculate and store function u(ABC)=143, u(ABD)=106, u(ACD)=150, u(BCD)=139 Using Discover function H={} Levet4 I={ABCD} by generation function Using Prune function b(ABCD)=179.3 because b(AD)<Minutil, so omitted it. None Candidate

14/24 Proposed FUM algorithm (1/2)

15/24 Proposed FUM algorithm (2/2) Candidateset = {A, B, C, D, E, AB, AC, AD, AE, BC, BD, BE, CD, CE, DE, ACD, ACE, ADE, BCE, BDE, CDE, ACDE } total 22 items

16/24 TWU tree Mining algorithm (1/2) A B C D E TWU T T T T T T T T T T itemABCDE Benefit35135 item TID =16*1+1*5=21 =12*5+2*3+1*5=71 … Reference :A Novel Algorithm for Mining High Utility Itemsets

17/24 TWU tree Mining algorithm (2/2) If min_util=130 WIT-tree for TWU-Mining: A B CD E TWU T T T T T T T T T T E EDC E B D E Root = =109< BX BDX27 E 254 BEX BDEX itemABCDE Benefit35135 HUIs={ BD, 240 BE, 182 BDE 48 B, } CX135810DX2478

18/24 Experimental Results (1/4)

19/24 Experimental Results (2/4)

20/24 Experimental Results (3/4) Minutil (%)Two-phaseTWU-Mining#HUIs Database#Trans#ItemsRemark BMS-POS Modified Retails Modified Experimental table in BMS-POS database

21/24 Experimental Results (4/4) Minutil (%)Two-phaseTWU-Mining#HUIs Experimental table in Retails database

22/24 Conclusions Utility based itemset mining is to discover the itemsets that are significant according to their utility values and utility constraints are capable of expressing more complex semantics than the support measure. In this paper we have shown that the proposed FUM algorithm executes faster than existing Umining algorithm, (see Table III) when more itemsets are identified as high utility itemsets.

23/24 Future Work A Fast Algorithm for Mining High Utility Itemsets 2009 IEEE International Advance Computing Conference (IACC2009) Patiala, India 6-7 March 2009 We have also suggested a novel method of generating different types of itemsets such as High Utility and High Frequency itemsets (HUHF), High Utility and Low Frequency itemsets (HULF), Low Utility and High Frequency itemsets (LUHF) and Low Utility and Low Frequency itemsets (LULF) using a combination of FUM and Fast Utility Frequent mining (FUFM) algorithms.

24/24 Question 為何 Umining #HUI 比 FUM #HUI 數量來得少 ? 基本上, FUM 的候選集應該比 Umining 還要少,為何 mining 出來 的 #HUI 比較多 ? 如果說, FUM 的候 選集比 Umining 還要多的話,那麼 Umining 會有 miss ,這樣才會合理。

25/24 謝謝大家!感恩!