Download presentation
Presentation is loading. Please wait.
Published byPatrick Edwards Modified over 9 years ago
1
Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar Kunta March 23, 2008 (slides modified slightly from Biyu Liang’s version)
2
2 Outline Review of Association Analysis Introducing Quantitative AR Problem Partitioning Quantitative Attributes Identifying the Interesting Rules Extending the Apriori Algorithm Conclusions
3
3 Association Rule Item sets X and Y Rule X => Y Support=P(X And Y) Confidence = P(Y|X) = P(X And Y)/P(X) Find rules that have MinSup and MinConf
4
4 Boolean Association Rules TIDItem1Item2Item3Item4 1001101 2000111 3001110 4000010 Attribute has a value of “1” if the transaction contains the corresponding item; “0” otherwise.
5
5 Outline Review of Association Analysis Introducing Quantitative AR Problem Partitioning Quantitative Attributes Identifying the Interesting Rules Extending the Apriori Algorithm Conclusions
6
6 Quantitative Association Rules and => Support = 40%, Conf = 100% RecordIDAgeMarriedNumCars 10023No1 20025Yes1 30029No0 40034Yes2 50038Yes2
7
7 Mapping to Boolean Association Rules Problem Using as new attribute, which has only boolean values Record ID Age: 20..29 Age: 30..39 Married: Yes Married: No NumCars: 0 NumCars: 1 100100101 200101001 300100110 400011000 500011000
8
8 Problems with Direct Mapping MinSup: If number of intervals is large, the support of a single interval can be lower MinConf: Information lost during partition values into intervals. Confidence can be lower as number of intervals is smaller
9
9 The Tradeoff Increase the number of intervals (to reduce information lost) while combining adjacent ones (to increase support) ExecTime blows up as items per record increases ManyRules: Number of rules also blows up. Many of them will not be interesting
10
10 The Proposed Approach Partition quantitative attribute values and combining adjacent partitions as necessary Partial Completeness Measure for deciding the partitions Interest Measure (pruning) to address the “ManyRules” problem Extend the Apriori Algorithm
11
11 5 Steps of the Proposed Approach 1.Determine the number of partitions for each quantitative attribute 2.Map values/ranges to consecutive integer values such that the order is preserved 3.Find the support of each value of the attributes, and combine when support is less than MaxSup. Find frequent itemsets, whose support is larger than MinSup 4.Use frequent set to generate association rules 5.Pruning out uninteresting rules
12
12 5 Steps of the Proposed Approach 1.Determine the number of partitions for each quantitative attribute 2.Map values/ranges to consecutive integer values such that the order is preserved 3.Find the support of each value of the attributes, and combine when support is less than MaxSup. Find frequent itemsets, whose support is larger than MinSup 4.Use frequent set to generate association rules 5.Pruning out uninteresting rules
13
13 Outline Review of Association Analysis Introducing Quantitative AR Problem Partitioning Quantitative Attributes Identifying the Interesting Rules Extending the Apriori Algorithm Conclusions
14
14 Partial Completeness R : rules obtained before partition R’: rules obtained after partition Partial Completeness measures the maximum distance between a rule in R and its closest generalization in R’ is a generalization of itemset X : if The distance is defined by the ratio of support
15
15 K-Complete C : the set of frequent itemsets For any K ≥ 1, P is K-complete w.r.t C if: P C For any itemset X (or its subset) in C, there exists a generalization whose support is no more than K times that of X (or its subset) The smaller K is, the less the information lost
16
16 Theoretical Results Lemma 1: If P is K-complete set w.r.t C, then any rule R obtained from C has a generalization R’ from P, such that conf(R’) is bounded by [conf(R)/K, K*conf(R)] For given partial completeness level K, equi-depth partitioning satisfies the completeness level with minimum number of intervals: 2n/[m(K-1)], and MaxSup for each interval is m(K-1)/(2n)
17
17 Outline Review of Association Analysis Introducing Quantitative AR Problem Partitioning Quantitative Attributes Identifying the Interesting Rules Extending the Apriori Algorithm Conclusions
18
18 Example of Uninteresting Rule Suppose a quarter of people in age group 20..30 are in the age group 20..25 =>, with 8% sup, 70% conf =>, with 2% sup, 70% conf The second rule doesn’t give any additional information, and is less general than the first rule
19
19 Expected Values Based on Generalization Itemset Z = {, …, } The expected support of Z based on the support of its generalization is defined as
20
20 Expected Values Based on Generalization The expected confidence of the rule X => Y based on the confidence of its generalization is defined as
21
21 Interest Measure Itemset X is R-interesting w.r.t its generalization if The support of X is no less than R times the expected supports based on, and For any specialization X' of, X – X' is R- interesting w.r.t Rule X => Y is R-interesting w.r.t its generalization if the support or confidence is R times that of, and the itemset is R-interesting w.r.t
22
22 Outline Review of Association Analysis Introducing Quantitative AR Problem Partitioning Quantitative Attributes Identifying the Interesting Rules Extending the Apriori Algorithm Conclusions
23
23 Candidate Generation Given the set L k-1 of all frequent (k- 1)-itemset, generate the set of L k The process has three parts: Join Phase Subset Prune Phase Interest Prune Phase
24
24 Join Phase L k-1 joined with itself Join condition: k-2 items are the same, the remaining ones have different attribute Example, L 2 : { } Result of self-join, C 3 : { }
25
25 Subset Prune Phase Make sure any (k-1)-subset is in L k-1 Example, L 2 : { } Result of self-join, C 3 : { } Delete the first itemset in C 3 since is not in L 2
26
26 Interest Prune Phase Given user-specified interest level R Delete any itemset that contains a item with support greater than 1/R Lemma 5 guarantees that such itemsets cannot be R-interesting w.r.t to their generalizations
27
27 Outline Review of Association Analysis Introducing Quantitative AR Problem Partitioning Quantitative Attributes Identifying the Interesting Rules Extending the Apriori Algorithm Conclusions
28
28 Conclusions This paper introduced the problem of mining quantitative association rules in large relational tables It dealt with quantitative attributes by fine- partitioning the values and combining adjacent partitions as necessary Partial completeness quantifies the info lost, and help decide the partitions Interest measure to identify interesting rules
29
Thanks! Question?
30
30 Final Exam Questions What is Partial Completeness? (p.14-15) Determine a number of intervals, where there 3 quantitative attributes,.70 min support and a 1.5 partial completeness level? (p.16) If Intervals are too large, rules may not have MinConf, and if they are too small, rules may not have MinSupp, how Do you go about solving this catch 22 problem? (p.8-9)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.