Download presentation
Presentation is loading. Please wait.
1
Interestingness
2
Interestingness Measures - Lift
Measure of dependent/correlated events: lift Lift(B, C) = c(B->C)/s(C) = s(B u C)/(s(B) x s(C)) Lift(B, C) may tell how B and C are correlated Lift(B, C) = 1 => B and C are independent > 1: positively correlated < 1: negatively correlated Lift is more telling than support (s) & confidence (c)
3
Lift Example B ^B Tot Row C 400 350 750 ^C 200 50 250 Total Column 600
1000
4
Lift Solution Lift(B, C) = (400/1000)/((600/1000)*(750/1000)) = 0.89
Tot Row C 400 350 750 ^C 200 50 250 Total Column 600 1000 Lift(B, C) = (400/1000)/((600/1000)*(750/1000)) = 0.89 Lift(B, ^C) = (200/1000)/((600/1000)*(250/1000)) = 1.33 Thus B & C are negatively correlated since Lift(B,C) < 1 B and ^C are positively correlated since Lift(B, ^C) > 1
5
Lift Calculations s(B u C) =400/1000 = 2/5 = .4
s(C) = 750/1000 = ¾ = .75 Lift(B, C) = .4/(.6*.75) = .4/.45 = .89 s(B u ^C) = .2 s(B) = .6 s(^C) = .25 Lift(B, ^C) = .2/.15 = 1.33 Lift(^B, C) = ? Lift(^B, ^C) = ?
6
Interestingness Measures - c2
Another measure to test correlated events: c2 c2 = Σ (Observed – Expected)2 / Expected General rules c2 = 0 => independent c2 > 0 => correlated, either positively or negatively, so it needs additional tests. C2 also tells better than support (s) and confidence (c)
7
c2 Example B ^B Tot Row C 400 (450) 350 (300) 750 ^C 200 (150)
50 (100) 250 Total Column 600 400 1000
8
c2 Solution B ^B Tot Row C 400 (450) 350 (300) 750 ^C 200 (150) 50 (100) 250 Total Column 600 400 1000 Now c2 = ( ) 2/450 + ( ) 2/300 + ( ) 2/ (50-100) 2/100 = 55.55 c2 Shows B & C are correlated because the answer > 0 As expected value is 450 but 400 is observed we can say that B & C are negatively correlated.
9
Are Lift and c2 Always Good?
Null transactions -> transactions that contain neither B nor C Let’s examine another dataset D BC (100) is much rarer than B^C(1000) and ^BC (1000), but there are many ^B^C (100000) So unlikely that B&C will happen together! But, Lift(B,C) = 8.44 >> 1 (strong positive correlation) c2 i= 670 : Observed (BC) >> expected value (11.85) Too many null transactions may “spoil the soup”!
10
c2 & Lift With Null Example
B ^B Tot Row C 100 (11.85) 1000 1100 ^C 200 (988.15) 100000 101000 Total Column 102100
11
Other Interestingness Algorithms
Null invariance – value does not change with the number of null transactions. Interestingness null invariance measures: AllConf(A,B) Jaccard(A,B) Cosine(A,B) Kulczynski(A,B) MaxConf(A,B) Not all null-invariant measures are created equal
12
Imbalance Ratio with Kulczynski
Imbalance Ratio: measure the imbalance of two itemsets A&B in rule implications
13
Kulczynski (P(B/C) + P(C/B))/2 < epsilon where epsilon is 0.01
Where A = milk, b = coffee 1 billion transaction = 1,000,000,000 A = 1 million time = 1,000,000 B = 10 thousand times = 10000 A + B = one hundred = 100 S(A) = 10^6 / 10^9 = 10^-3 = 1/1000 S(B) = 10^4/ 10^9 = 10^-5 = 1/100000 S(A u B) = 10^2 / 10^9 = 10^-7 = 1/ S(A) * S(B) = 10^-3*10^-5 = 10^-8
14
Kulczynski P(B|A) = P(AUB) / P(A) = 10^2/10^6 = 10^-4
P(A|B) = P(AUB) / P(B) = 10^2/ 10^4 = 10^-2 (P(B|A) + P(A|B))/2 = (10^ ^-2)/2 = < 0.01 Therefore this is a negative pattern
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.