Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and.

Similar presentations


Presentation on theme: "1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and."— Presentation transcript:

1 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and Networking, 2008. (ICCCN 2008) 18-20 Dec. 2008 Page(s):1 - 6 Speaker :89621003 廖執善 69721042 鄭仁傑

2 2/24 Outline Introduction Mining high utility itemsets Existing Umining algorithm Proposed FUM algorithm Experimental Results Conclusions Future Work

3 3/24 Introduction (1/4) One of the important issues in data mining is the interestingness problem. The fundamental idea behind mining frequent itemsets is that only item sets with high frequency are of interest to users. A frequent itemset only reflects the statistical correlation between items, and it does not reflect the semantic significance of the items.

4 4/24 Introduction (2/4)

5 5/24 Introduction (3/4)

6 6/24 Introduction (4/4) Motivation : we are using a utility based itemset mining approach to overcome this limitation. Utility based data mining is a new research area interested in all types of utility factors in data mining processes and targeted at incorporating utility considerations in data mining tasks. High utility itemset mining is a research area of utility based data mining, aimed at finding itemsets that contribute high utility.

7 7/24 Mining high utility itemsets (1/3) A frequent itemset is a set of items that appears at least in a pre- specified number of transactions. Formally, let I = {I 1, I 2,, I m } be a set of items and DB = {T 1, T 2,, T n } a set of transactions where every transaction is also a set of items (i.e. itemset). Given a minimum support threshold minSup an itemset S is frequent iff:

8 8/24 Mining high utility itemsets (2/3) The following is the set of definitions given in [6] which we shall illustrate on a small example. Definition 1: The external utility of an item i p is a numerical value Y P defined by the user. It is transaction independent and reflects importance (usually profit) of the item. External utilities are stored in a utility table. For example, external utility of item B in Table2 is 10. Definition 2: The internal utility of an item ip is a numerical value x p which is transaction dependent. In most cases it is defined as the quantity of an item in transaction. For example, internal utility of item E in transaction T5 is 2 (see Table 1). Definition 3: Utility function f is a function of two variables: f{x, y) : (R+,R+) ---.. R+. The most common form also used in this paper is the product of internal and external utility: X p x Y p

9 9/24 Mining high utility itemsets (3/3) Definition 4: The utility of item ip in transaction T is the quantitative measure computed with utility function from Definition 3 (i.e.) u (i p, T) = f(X p, Y p ), i p  T Definition 5: The utility of itemset S in transaction T is defined as Definition 6: Itemset S is of high utility iff U(S)  minUtil where minUtil is user defined utility threshold in percents of the total utility of the database. Definition 7: High utility itemset mining is the problem of finding set H defined as where ‘I’ is the set of items (attributes).

10 10/24 Existing Umining algorithm (1/4)

11 11/24 Existing Umining algorithm (2/4)

12 12/24 Existing Umining algorithm (3/4)

13 13/24 Existing Umining algorithm (4/4) Minutil=196(threshold) k=4(numbers of level) Level1 I={A, B, C, D} by scan function and assigned to C1 Using calculate and store function u(A)=110, u(B)=200, u(C)=190, u(D)=85 Using Discover function H={B} bigger than Minutil Levet2 I={AB, AC, AD, BC, BD, CD} by generation function Using Prune function b(AB)=310, b(AC)=300, b(AD)=195, b(BC)=390, b(BD)=285, b(CD)=275, because b(AD)<Minutil, so omitted it. Therefore, C2={AB, AC, BC, BD, CD} Using calculate and store function u(AB)=105, u(AC)=197, u(BC)=138, u(BD)=211, u(CD)=193 Using Discover function H={AC, BD} bigger than Minutil Levet3 I={ABC, ABD, ACD, BCD} by generation function Using Prune function b(ABC)=220, b(ABD)=225.5, b(ACD)=262.5, b(BCD)=271, none omitted it. Using calculate and store function u(ABC)=143, u(ABD)=106, u(ACD)=150, u(BCD)=139 Using Discover function H={} Levet4 I={ABCD} by generation function Using Prune function b(ABCD)=179.3 because b(AD)<Minutil, so omitted it. None Candidate

14 14/24 Proposed FUM algorithm (1/2)

15 15/24 Proposed FUM algorithm (2/2) Candidateset = {A, B, C, D, E, AB, AC, AD, AE, BC, BD, BE, CD, CE, DE, ACD, ACE, ADE, BCE, BDE, CDE, ACDE } total 22 items

16 16/24 TWU tree Mining algorithm (1/2) A B C D E TWU T1001601 T2012021 T320101 T410021 T500402 T612000 T7020021 T8302561 T912000 T10012202 itemABCDE Benefit35135 item TID 21 71 12 14 13 111 57 13 72 =16*1+1*5=21 =12*5+2*3+1*5=71 … Reference :A Novel Algorithm for Mining High Utility Itemsets

17 17/24 TWU tree Mining algorithm (2/2) If min_util=130 WIT-tree for TWU-Mining: A B CD E TWU T100160121 T201202171 T32010112 T41002114 T50040214 T61200013 T7020021111 T830256157 T91200013 T1001220272 E EDC E B D E Root =12+14+13+57+13=109<130 280 BX267910 182 BDX27 E 254 BEX2710 182 BDEX27 176 253 372 itemABCDE Benefit35135 HUIs={ 240 83 172 BD, 240 BE, 182 BDE 48 B, 36 56 50 } CX135810DX2478

18 18/24 Experimental Results (1/4)

19 19/24 Experimental Results (2/4)

20 20/24 Experimental Results (3/4) Minutil (%)Two-phaseTWU-Mining#HUIs 551.8927.594 473.7339.056 3117.7255.677 2205.0995.5622 1569.22182.67161 Database#Trans#ItemsRemark BMS-POS5155971656Modified Retails8816216469Modified Experimental table in BMS-POS database

21 21/24 Experimental Results (4/4) Minutil (%)Two-phaseTWU-Mining#HUIs 17.677.4620 0.811.3811.3129 0.624.6323.2345 0.460.2557.6964 0.2210.78178.19239 0.1546.03426.27800 Experimental table in Retails database

22 22/24 Conclusions Utility based itemset mining is to discover the itemsets that are significant according to their utility values and utility constraints are capable of expressing more complex semantics than the support measure. In this paper we have shown that the proposed FUM algorithm executes faster than existing Umining algorithm, (see Table III) when more itemsets are identified as high utility itemsets.

23 23/24 Future Work A Fast Algorithm for Mining High Utility Itemsets 2009 IEEE International Advance Computing Conference (IACC2009) Patiala, India 6-7 March 2009 We have also suggested a novel method of generating different types of itemsets such as High Utility and High Frequency itemsets (HUHF), High Utility and Low Frequency itemsets (HULF), Low Utility and High Frequency itemsets (LUHF) and Low Utility and Low Frequency itemsets (LULF) using a combination of FUM and Fast Utility Frequent mining (FUFM) algorithms.

24 24/24 Question 為何 Umining #HUI 比 FUM #HUI 數量來得少 ? 基本上, FUM 的候選集應該比 Umining 還要少,為何 mining 出來 的 #HUI 比較多 ? 如果說, FUM 的候 選集比 Umining 還要多的話,那麼 Umining 會有 miss ,這樣才會合理。

25 25/24 謝謝大家!感恩!


Download ppt "1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and."

Similar presentations


Ads by Google