Download presentation
Presentation is loading. Please wait.
Published byMatilda Murphy Modified over 9 years ago
1
Association Mining Data Mining Spring 2012
2
Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese
3
Item = {Milk}, {Cheese}, {Bread}, etc. Itemset = {Milk}, {Milk, Cheese}, {Bacon, Bread, Milk} Doesn’t have to be in the dataset Can be of size 1 – n Items and Itemsets Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese
4
The Support Measure
5
Support Examples Support({Eggs}) = 3/5 = 60% Support({Eggs, Milk}) = 2/5 = 40% Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese
6
Minimum Support Minsup – The minimum support threshold for an itemset to be considered frequent (User defined) Frequent itemset – an itemset in a database whose support is greater than or equal to minsup. Support(X) > minsup = frequent Support(X) < minsup = infrequent
7
Minimum Support Examples Minimum support = 50% Support({Eggs}) = 3/5 = 60% Pass Support({Eggs, Milk}) = 2/5 = 40% Fail Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese
8
Association Rules
9
Confidence Example 1 {Eggs} => {Bread} Confidence = sup({Eggs, Bread})/Sup({Eggs}) Confidence = (1/5)/(3/5) = 33% Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese
10
Confidence Example 2 {Milk} => {Eggs, Cheese} Confidence = sup({Milk, Eggs, Cheese})/sup({Milk}) Confidence = (2/5)/(3/5) = 66% Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese
11
Strong Association Rules Minimum Confidence – A user defined minimum bound on confidence. (Minconf) Strong association rule – a rule X=>Y whose conf > minconf. - this is a potentially interesting rule for the user. Conf(X=>Y) > minconf = strong Conf(X=>Y) < minconf = uninteresting
12
Minimum Confidence Example Minconf = 50% {Eggs} => {Bread} Confidence = (1/5)/(3/5) = 33% Fail {Milk} => {Eggs, Cheese} Confidence = (2/5)/(3/5) = 66% Pass
13
Association Mining Association Mining: - Finds strong rules contained in a dataset from frequent itemsets. Can be divided into two major subtasks: 1. Finding frequent itemsets 2. Rule generation
14
Some algorithms change items into letters or numbers Numbers are more compact Easier to make comparisons Transactional Database Revisited Transactional dataset 123 35 2714 68 86132
15
Basic Set Logic Subset – a subset itemset X is contained in an itemset Y. Superset – a superset itemset Y contains an itemset X. example: X = {1,2} Y = {1,2,3,5} Y X
16
Apriori Arranges database into a temporary lattice structure to find associations Apriori principle – 1. itemsets in the lattice with support < minsup will only produce supersets with support < minsup. 2. the subsets of frequent itemsets are always frequent. Prunes lattice structure of non-frequent itemsets using minsup. Reduces the number of comparisons Reduces the number of candidate itemsets
17
Monotonicity Monotone (upward closed) - if X is a subset of Y, then support(X) cannot exceed support(Y). Anti-Monotone (downward closed) - if X is a subset of Y, then support(Y) cannot exceed support(X). Apriori is anti-monotone. - uses this property to prune the lattice structure.
18
Itemset Lattice
19
Lattice Pruning
20
Lattice Example 12345 24 124 14 Count occurrences of each 1-itemset in the database and compute their support: Support = #occurrences/#rows in db Prune anything less than minsup = 30%
21
Lattice Example 12345 24 124 14 12345 24 124 14 12345 24 124 14 Count occurrences of each 2-itemset in the database and compute their support Prune anything less than minsup = 30%
22
Lattice Example ABCDE BD ABD AD Count occurrences of the last 3-itemset in the database and compute its support. Prune anything less than minsup = 30%
23
Example - Results 12345 24 124 14 Frequent itemsets: {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}
24
Apriori Algorithm
25
Frequent Itemset Generation ItemsetSupportFrequent {1}75%Yes {2}50%No {3}75%Yes {4}25%No {5}100%Yes Transactional Database 12345 235 135 15 1.Minsup = 70% 2.Generate all 1-itemsets 3.Calculate the support for each itemset 4.Determine whether or not the itemsets are frequent
26
Frequent Itemset Generation ItemsetSupportFrequent {1,3}50%Yes {1,5}75%Yes {3,5}75%Yes Transactional Database 12345 235 135 15 Generate all 2-itemsets, minsup = 70% {1} U {3} = {1,3}, {1} U {5} = {1,5} {3} U {5} = {3,5}
27
Frequent Itemset Generation ItemsetSupportFrequent {1,3,5}50%Yes Transactional Database 12345 235 135 15 Generate all 3-itemsets, minsup = 70% {1,3} U {1,5} = {1,3,5}
28
Frequent Itemset Results All frequent itemsets generated are output: {1}, {3}, {5} {1,3}, {1,5}, {3,5} {1,3,5}
29
Apriori Rule Mining
30
Rule Combinations: 1. {1,2} 2-itemsets {1}=>{2} {2}=>{1} 2. {1,2,3} 3-itemsets {1}=>{2,3} {2,3}=>{1} {1,2}=>{3} {3}=>{1,2} {1,3}=>{2} {2}=>{1,3}
31
Strong Rule Generation RuleConfidenceStrong {1}=>{3}No {3}=>{1}No {1}=>{5}Yes {5}=>{1}No {3}=>{5}Yes {5}=>{3}No Transactional Database 12345 235 135 15 1.I = {{1}, {3}, {5}} 2.Rules = X => Y 3.Minconf = 80%
32
Strong Rule Generation RuleConfidenceStrong {2}=>{3,5}Yes {3,5}=>{2}No {2,3}=>{5}Yes {5}=>{2,3}No {2,5}=>{3}Yes {3}=>{2,5}No Transactional Database 12345 235 135 15 1.I = {{1}, {3}, {5}} 2.Rules = X => Y 3.Minconf = 80%
33
Strong Rules Results All strong rules generated are output: {1}=>{5} {3}=>{5} {2}=>{3,5} {2,3}=>{5} {2,5}=>{3}
34
Other Frequent Itemsets Closed Frequent Itemset – a frequent itemset X who has no immediate supersets with the same support count as X. Maximal Frequent Itemset – a frequent itemset whom none of its immediate supersets are frequent.
35
Itemset Relationships Frequent Itemsets Closed Frequent Itemsets Maximal Frequent Itemsets
36
Targeted Association Mining
37
* Users may only be interested in specific results * Potential to get smaller, faster, and more focused results * Examples: 1. User wants to know how often only bread and garlic cloves occur together. 2. User wants to know what items occur with toilet paper.
38
Itemset Trees * Itemset Tree: - A data structure which aids in users querying for a specific itemset and it’s support. * Items within a transaction are mapped to integer values and ordered such that each transaction is in lexical order. {Bread, Onion, Garlic} = {1, 2, 3} * Why use numbers? - make the tree more compact - numbers follow ordering easily
39
Itemset Trees An Itemset Tree T contains: * A root pair (I, f(I)), where I is an itemset and f(I) is its count. * A (possibly empty) set {T 1, T 2,..., T k } each element of which is an itemset tree. * If I j is in the root, then it will also be in The root’s children * If I j is not in the root, then it might be in the root’s children if: first_item(I) < first_item(I j ) and last_item(I) < last_item(I j )
40
Building an Itemset Tree Let c i be a node in the itemset tree. Let I be a transaction from the dataset Loop: Case 1: c i = I Case 2: c i is a child of I - make I the parent node of c i Case 3: c i and I contain a common lexical overlap i.e. {1,2,4} vs. {1,2,6} - make a node for the overlap - make I and c i it’s children. Case 4: c i is a parent of I - Loop to check c i ’s children - make I a child of c i Note: {2,6} and {1,2,6} do not have a Lexical overlap
41
Itemset Trees - Creation Dataset 24 1235 39 126 2 29
42
Itemset Trees - Creation Dataset 24 1235 39 126 2 29 Child node.
43
Itemset Trees - Creation Dataset 24 1235 39 126 2 29 Child node.
44
Itemset Trees - Creation Dataset 24 1235 39 126 2 29 Child node.
45
Itemset Trees - Creation Dataset 24 1235 39 126 2 29 Lexical overlap
46
Itemset Trees - Creation Dataset 24 1235 39 126 2 29 Parent node.
47
Itemset Trees - Creation Dataset 24 1235 39 126 2 29 Child node.
48
Itemset Trees – Querying Let I be an itemset, Let c i be a node in the tree Let totalSup be the total count for I in the tree For all s.t. first_item(c i ) < first_item(I): Case 1: If I is contained in c i. - Add support to totalSup. Case 2: If I is not contained and last_item(c i ) < last_item(I) - proceed down the tree
49
Example 1
50
Itemset Trees - Querying Querying Example 1: Query: {2} totalSup = 0
51
Itemset Trees - Querying Querying Example 1: Query: {2} 2 = 2 Add to support: totalSup = 3
52
Itemset Trees - Querying Querying Example 1: Query: {2} 1,2 contains 2 Add to support totalSup = 3 + 2 = 5
53
Itemset Trees - Querying Querying Example 1: Query: {2,9} 3 > 2, and end of Subtree. Return support totalSup = 5
54
Example 2
55
Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 0
56
Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 0 2 < 2 2 < 9 continue
57
Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 0 2 < 2 4 < 9 {2,4} doesn’t contain {2,9}, go to next sibling
58
Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 {2,9} = {2,9} Add to support!
59
Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 1 < 2 2 < 9 continue
60
Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 1 < 2 5 < 9 {1,2,3,5} doesn’t contain {2,9}, go to next sibling
61
Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 1 < 2 6 < 9 {1,2,6} doesn’t contain {2,9}, go to next node
62
Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 3 < 2 <= fail 9 < 9 End of tree, totalSupp = 1 Nodes = 8
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.