Mining Top-K High Utility Itemsets Date: 2013/04/08 Author: Cheng Wei Wu, Bai-En Shie, Philip S. Yu, Vincent S. Tseng Source: KDD ’12 Advisor: Dr. Jia-Ling.

Slides:

Advertisements

Similar presentations

Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25.

Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.

gSpan: Graph-based substructure pattern mining

Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.

Frequent Closed Pattern Search By Row and Feature Enumeration

Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.

FP-Growth algorithm Vasiljevic Vladica,

FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.

Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.

A Fast High Utility Itemsets Mining Algorithm Ying Liu,Wei-keng Liao,and Alok Choudhary KDD’05 Advisor ： Jia-Ling Koh Speaker ： Tsui-Feng Yen.

Data Mining Association Analysis: Basic Concepts and Algorithms

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

Rakesh Agrawal Ramakrishnan Srikant

Efficiently Mining Long Patterns from Databases Roberto J. Bayardo Jr. IBM Almaden Research Center.

Mining Frequent patterns without candidate generation Jiawei Han, Jian Pei and Yiwen Yin.

Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.

Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane and Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB,

FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.

Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,

Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.

Performance and Scalability: Apriori Implementation.

SEG Tutorial 2 – Frequent Pattern Mining.

1 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010.

USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.

林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.

Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:

VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.

1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Querying Structured Text in an XML Database By Xuemei Luo.

MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.

Mining High Utility Itemset in Big Data

Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley.

Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.

CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.

Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.

Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.

MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:

CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者：林靜怡.

Association Rule Mining

1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and.

1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor ： Dr. Koh Jia-Ling Speaker ： Tu Yi-Lang Date ： Hong.

Date: 2012/07/02 Source: Marina Drosou, Evaggelia Pitoura (CIKM’11) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author: Chung-hung.

A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.

Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.

1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.

1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor ： Jia-Ling Koh Speaker ： Tsui-Feng Yen.

UP-Growth: An Efficient Algorithm for High Utility Itemset Mining

Frequent Pattern Mining

Byung Joon Park, Sung Hee Kim

Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc

CARPENTER Find Closed Patterns in Long Biological Datasets

Market Basket Analysis and Association Rules

Mining Frequent Itemsets over Uncertain Databases

Association Rule Mining

A Parameterised Algorithm for Mining Association Rules

Farzaneh Mirzazadeh Fall 2007

COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong

DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004

Junqiang Liu, Rong Zhao, Xiangcai Yang, Yong Zhang, Xiaoning Jiang

Presentation transcript:

Mining Top-K High Utility Itemsets Date: 2013/04/08 Author: Cheng Wei Wu, Bai-En Shie, Philip S. Yu, Vincent S. Tseng Source: KDD ’12 Advisor: Dr. Jia-Ling Koh Speaker: Yi-Hsuan Yeh

Outline Introduction Problem definition Mining top-k high utility −Baseline approach −Effective strategies Experiments Conclusion 2

Introduction 3 Mining high utility intems ： discovery of itemsets with utilities higher than a user-specified minimum utility threshold min_util.

Introduction Setting an appropriate minimum utility threshold is a difficult problem for users. User need to try different threshold by guessing and re-executing. 4 inconvenient time-consuming Setting k is more intuitive than setting the threshold because k represents the number of itemsets that the user wants to find.

Introduction Propose a new framework named TKU(Top-K Utility itemsets mining) for mining top-k high utility itemsets without setting min-util. 5 Transactions UP-Tree top-k high utility itemsets potential top-k high utility itemsets(PKHUIs) UP-Growth

Outline Introduction Problem definition Mining top-k high utility −Baseline approach −Effective strategies Experiments Conclusion 6

Problem definition I = {i 1, i 2, …, i m } ： a set of items. D = {T 1, T 2, …, T n } ： a transaction database where each transaction is a subset of I. p(i j, D) ： external utility (profit), a weight of an item q(i j,T c ) ： internal utility, the quantity of the item sold in the transaction. 7 Ex ： p(A, D) = 5p(C, D) = 1 q(A,T 1 ) = 1q(D,T 3 ) = 6

An itemset X = {i 1, i 2, …, i l } is a set of l distinct items, where i j ∈I, 1 ≤ j ≤ l, and l is the length of X. 8 Ex: X = {B,C} SC(X) = 3 support(X) = 3/5

Ex: 2.u(A,T 1 ) = 5*1 = 5 u(C,T 1 ) = 1*1 = X = {A,C} u(X,T 1 ) = u(A,T 1 ) + u(C,T 1 ) = 6 4.u(X) = u(X,T 1 ) + u(X,T 2 ) + u(X,T 3 ) = 6 + (5*2+1*6) + (5*1+1*1) = 28

10 high utility itemset ： u(X) ≥ min_util (D, min_util) ： the set of high utility itemset in D.

downward closure property ： any subset of a frequent itemset must also be frequent. We can not use the downward closure property to purn the search space. Ex: 11 u(X) = u(X,T 1 ) + u(X,T 2 ) + u(X,T 3 ) = (5*1) + (5*2) + (5*1) = 20 u(Y) = u(Y,T 1 ) + u(Y,T 2 ) + u(Y,T 3 ) = (5*1+1*1) + (5*2+1*6) + (5*1+1*1) = 28 Assume ： X = {A}, Y = {A,C}, min_util = 25 {A} is a low utility item, but its superset {A, C} is a high utility itemset.

12 7.TU(T 1 ) = u(A,T 1 ) + u(C,T 1 ) + u(D,T 1 ) = (5*1) + (1*1) + (2*1) = 8 Assume ： X = {A,C} 8.TWU(X) = TU(T 1 ) + TU(T 2 ) + TU(T 3 ) = = 65

13

Outline Introduction Problem definition Mining top-k high utility −Baseline approach −Effective strategies Experiments Conclusion 14

Mining top-k high utility 1.Baseline approach – Phase I Construction of UP-Tree Generation of potential top-k high utility itemsets(PKHUIs) ： UP-Growth −Phase II Identifying top-k high utility itemsets from PKHUIs 2.Effective strategies 15

Outline Introduction Problem definition Mining top-k high utility −Baseline approach −Effective strategies Experiments Conclusion 16

Phase I UP-Tree structure N.name N.count N.nu (node utility) N.parent N.link 17 Header table

Construction of UP-Tree 1.Build header table 2.Any item’s TWU > min_util will be discarded from transaction 3.Sort each items in every transaction 4.For each transaction build UP-Tree 18

19 Header table TIDTransaction T1(C,1) (A,1) (D,1) T2(C,6) (E,2) (A,2) (G,5) T3(C,1) (E,1) (A,1) (B,2) (D,6) (F,5) T4(C,3) (E,1) (B,4) (D,3) T5(C,2) (E,1) (B,2) (G,2) N.nu ： Sum of the utilities of the items which are before the item. C.nu = 1*1 = 1 A.nu = (1*1) + (5*1) = 6 D.nu = (1*1) + (5*1) + (2*1) = 8 = TU(T1) R C ： 1,1 A ： 1,6 D ： 1,8 count Assume ： min_util = 0

20 Header table TIDTransaction T1(C,1) (A,1) (D,1) T2(C,6) (E,2) (A,2) (G,5) T3(C,1) (E,1) (A,1) (B,2) (D,6) (F,5) T4(C,3) (E,1) (B,4) (D,3) T5(C,2) (E,1) (B,2) (G,2) C.nu = 1 + (1*6) = 7 E.nu = (1*6) + (3*2) = 12 A.nu = (1*6) + (3*2) + (5*2) = 22 G.nu = TU(T 2 ) = 27 R C ： 2,7 A ： 1,6 D ： 1,8 E ： 1,12 A ： 1,22 G ： 1,27

21 Header table TIDTransaction T1(C,1) (A,1) (D,1) T2(C,6) (E,2) (A,2) (G,5) T3(C,1) (E,1) (A,1) (B,2) (D,6) (F,5) T4(C,3) (E,1) (B,4) (D,3) T5(C,2) (E,1) (B,2) (G,2) C.nu = 7 + (1*1) = 8 E.nu = 12 + (1*1) + (3*1) = 16 A.nu = 22 + (1*1) + (3*1) + (5*1) = 31 B.nu = (1*1) + (3*1) + (5*1) + (2*2) = 13 D.nu = (1*1) + (3*1) + (5*1) + (2*2) + (2*6) = 25 F.nu = TU(T 3 ) = 30 R C ： 3,8 A ： 1,6 D ： 1,8 E ： 2,16 A ： 2,31 G ： 1,27 B ： 1,13 D ： 1,25 F ： 1,30

22 Header table TIDTransaction T1(C,1) (A,1) (D,1) T2(C,6) (E,2) (A,2) (G,5) T3(C,1) (E,1) (A,1) (B,2) (D,6) (F,5) T4(C,3) (E,1) (B,4) (D,3) T5(C,2) (E,1) (B,2) (G,2) C.nu = 8 + (1*3) = 11 E.nu = 16 + (1*3) + (3*1) = 22 B.nu = (1*3) + (3*1) + (2*4) = 14 D.nu = TU(T 4 ) = 20 R C ： 4,11 A ： 1,6 D ： 1,8 E ： 3,22 A ： 2,31 G ： 1,27 B ： 1,13 D ： 1,25 F ： 1,30 B ： 1,14 D ： 1,20

23 Header table TIDTransaction T1(C,1) (A,1) (D,1) T2(C,6) (E,2) (A,2) (G,5) T3(C,1) (E,1) (A,1) (B,2) (D,6) (F,5) T4(C,3) (E,1) (B,4) (D,3) T5(C,2) (E,1) (B,2) (G,2) C.nu = 11 + (1*2) = 13 E.nu = 22 + (1*2) + (3*1) = 27 B.nu = 14 + (1*2) + (3*1) + (2*2) = 23 G.nu = TU(T 5 ) = 11 R C ： 5,13 A ： 1,6 D ： 1,8 E ： 4,27 A ： 2,31 G ： 1,27 B ： 1,13 D ： 1,25 F ： 1,30 B ： 2,23 D ： 1,20 G ： 1,11

Generating PKHUIs from the UP- Tree ： UP-Gorwth miu(A) = u(A,T 1 ) = u(X,T 2 ) = 5*1 = 5 MIU(X) = (miu(A) + miu(C))*3 = ((5*1) + (1*1))*3 = Ex ： Assume ： X = {A,C}

25 mau(A) = u(A,T 2 ) = 5*2 = 10 MAU(X) = (mau(A) + mau(C))*3 = ((5*2) + (1*6))*3 = 48 Ex ： Assume ： X = {A,C}

For any itemset X, MIU(X) ≤ u(X) ≤ min{MAU(X),TWU(X)}. 26 Strategy 1. Raising the threshold by MIU of Candidate (MC)

border minimum utility threshold (denoted as border_min_util) which is initially set to 0 and raised dynamically after a sufficient number of itemsets with higher utilities has been captured during the generation of PKHUIs. border_min_util can be the k th highest utility of the 1-item. 27 u(A) = u(A,T 1 ) + u(A,T 2 ) + u(A,T 3 ) = (5*1) + (5*2) + (5*1) = 20 u(B) = u(B,T 3 ) + u(B,T 4 ) + u(B,T 5 ) = (2*2) + (2*4) + (2*2) = 16 u(C) = (1*1) + (1*6) + (1*1) + (1*3) +(1*2) = 13 u(D) = (2*1) + (2*6) + (2*3) = 20 u(E) = (3*2) + (3*1) + (3*1) + (3*1) = 15 u(F) = (1*5) = 5 u(G) = (1*5 ) + (1*2) = 7 ABCDEFG Assume ： k = 4

28 Header table Assume ： border_min_util = 15 conditional UP-Tree for F ： ItemABCDE Path utility ItemABCDEFG miu mau TWU(DF) = TU(T 3 ) = 30, MAX(DF) = (12+5)*1 = 17 MIU(DF) = (2+5)*1 = 7 < border_min_util = 15 TWU(AF) = TU(T 3 ) = 30, MAX(AF) = (10+5)*1 = 15 MIU(AF) = (5+5)*1 = 10 < border_min_util = 15 ΧTWU(EF) = TU(T 3 ) = 30, MAX(EF) = (6+5)*1 = 11 PKHUIs = ( {AF}, {DF} ) ΧTWU(F) = TU(T 3 ) = 30, MAX(F) = 5*1 = 5

29 ItemABCDEFG miu mau TWU(EAF) = TU(T 3 ) = 30 MAX(EAF) = (6+10+5) = 21 MIU(EAF) = (3+5+5)*1 = 13 < border_min_util = 15 PKHUIs = ( {AF}, {DF}, {EAF} ) conditional UP-Tree for AF ： border_min_util = 15 conditional UP-Tree for DF ： TWU(ADF) = TU(T 3 ) = 30 MAX(ADF) = ( ) = 27 MIU(ADF) = (5+2+5)*1 = 12 < border_min_util = 15 TWU(EDF) = TU(T 3 ) = 30 MAX(EDF) = (6+12+5) = 23 MIU(EDF) = (3+2+5)*1 = 10 < border_min_util = 15 PKHUIs = ( {AF}, {DF}, {EAF}, {ADF},{EDF} ) AF DF

30 ItemABCDEFG miu mau border_min_util = 15 TWU(EADF) = TU(T 3 ) = 30 MAX(EADF) = ( ) = 33 MIU(EADF) = ( )*1 = 15 ≤ border_min_util = 15 PKHUIs = ( {AF}, {DF}, {EAF}, {ADF},{EDF},{EADF} ) conditional UP-Tree for ADF ： ADF

Phase II Identifying top-k high utility itemsets from PKHUIs Exact utilities of PKHUIs are identified and top- k high utility itemsets are examined by scanning the original database. 31

Outline Introduction Problem definition Mining top-k high utility −Baseline approach −Effective strategies Experiments Conclusion 32

Effective strategies – Phase I Strategy 2. Pre-Evaluation (PE) Raise border_min_util at pre-evaluation step, before construct UP-Tree. 33 u(AC) T 1 ： u(AC,T 1 ) = (5*1) + (1*1) = 6 u(AD,T 1 ) = (5*1) + (2*1) = 7 T 2 ： u(AC,T 2 ) = (5*2) + (1*6) = 16 u(AE,T 2 ) = (5*2) + (3*2) = 16 u(AG,T 2 ) = (5*2) + (1*5) = 15 T 3 ： u(AB,T 3 ) = 9, u(AC,T 3 ) = 6, u(AD,T 3 ) = 17 u(AE,T 3 ) = 8, u(AF,T 3 ) = 10 T 4 ： u(BC,T 4 ) = 11, u(BD,T 4 ) = 14, u(BE,T 4 ) = 11 T 5 ： u(BC,T 5 ) = 6, u(BE,T 4 ) = 7, u(BG,T 5 ) = 6 u(AC) = = 28 If k = 4, border_min_util = 18

Strategy 3. Raising the threshold by Node Utilities (NU) during the construction of the UP-Tree 1.If there are more than k nodes in the current UPTree 2.k-th highest node utility ≥ border_min_util border_min_util = k-th highest node utility 34 Strategy 4. Raising the threshold by MIU of Descendents (MD) after the construction of UP-Tree and before the generation of PKHUIs. 1.For each node N α under the root in the UP-Tree, calculate the MIU with its descendent node N β 2.If there are more than k MIUs larger than border_min_util border_min_util = k-th highest MIU(N α N β )

Strategy 5. Sorting candidates & raising threshold by the exact utility of candidates. (SE) 1.The candidates are sorted by the descendent order of estimated utilities, i.e., min(TWU(X), MAU(X)) 2.If there are more than k HUIs whose exact utilities are larger than border_min_util border_min_util = k-th highest exact utility 3.If candidate Z’s min(TWU(Z), MAU(Z)) < border_min_util remaining candidates do not need to be checked 35 Effective strategies – Phase II

Outline Introduction Problem definition Mining top-k high utility −Baseline approach −Effective strategies Experiments Conclusion 36

Datasets 37 Foodmart and Chainstore contain unit profits and purchased quantities. Mushroom unit profits are generated between 1 and 1000 by log-normal distribution and quantities are generated randomly between 1 and 5.

Experiments 38 UP Optimal ： the threshold is optimal minimum utility threshold.

Foodmart 39

Mushroom 40 TWU values of itemsets are much larger than their exact utilities.

Chainstore 41 UP Low ： UP-Growth with a low minimum utility threshold.

42

Outline Introduction Problem definition Mining top-k high utility −Baseline approach −Effective strategies Experiments Conclusion 43

Conclusion 44 An efficient algrorithm named TKU for mining top-k high utility itemsets. Four strategies for phase I to raise the border minimum utility threshold and reduce the search space and number of generated canditions. A strategy is designed for phase II to decrease the number of checked candidations.