Interestingness.

Interestingness

Interestingness Measures - Lift
Measure of dependent/correlated events: lift Lift(B, C) = c(B->C)/s(C) = s(B u C)/(s(B) x s(C)) Lift(B, C) may tell how B and C are correlated Lift(B, C) = 1 => B and C are independent > 1: positively correlated < 1: negatively correlated Lift is more telling than support (s) & confidence (c)

Lift Example B ^B Tot Row C 400 350 750 ^C 200 50 250 Total Column 600
1000

Lift Solution Lift(B, C) = (400/1000)/((600/1000)*(750/1000)) = 0.89
Tot Row C 400 350 750 ^C 200 50 250 Total Column 600 1000 Lift(B, C) = (400/1000)/((600/1000)*(750/1000)) = 0.89 Lift(B, ^C) = (200/1000)/((600/1000)*(250/1000)) = 1.33 Thus B & C are negatively correlated since Lift(B,C) < 1 B and ^C are positively correlated since Lift(B, ^C) > 1

Lift Calculations s(B u C) =400/1000 = 2/5 = .4
s(C) = 750/1000 = ¾ = .75 Lift(B, C) = .4/(.6*.75) = .4/.45 = .89 s(B u ^C) = .2 s(B) = .6 s(^C) = .25 Lift(B, ^C) = .2/.15 = 1.33 Lift(^B, C) = ? Lift(^B, ^C) = ?

Interestingness Measures - c2
Another measure to test correlated events: c2 c2 = Σ (Observed – Expected)2 / Expected General rules c2 = 0 => independent c2 > 0 => correlated, either positively or negatively, so it needs additional tests. C2 also tells better than support (s) and confidence (c)

c2 Example B ^B Tot Row C 400 (450) 350 (300) 750 ^C 200 (150)
50 (100) 250 Total Column 600 400 1000

c2 Solution B ^B Tot Row C 400 (450) 350 (300) 750 ^C 200 (150) 50 (100) 250 Total Column 600 400 1000 Now c2 = ( ) 2/450 + ( ) 2/300 + ( ) 2/ (50-100) 2/100 = 55.55 c2 Shows B & C are correlated because the answer > 0 As expected value is 450 but 400 is observed we can say that B & C are negatively correlated.

Are Lift and c2 Always Good?
Null transactions -> transactions that contain neither B nor C Let’s examine another dataset D BC (100) is much rarer than B^C(1000) and ^BC (1000), but there are many ^B^C (100000) So unlikely that B&C will happen together! But, Lift(B,C) = 8.44 >> 1 (strong positive correlation) c2 i= 670 : Observed (BC) >> expected value (11.85) Too many null transactions may “spoil the soup”!

c2 & Lift With Null Example
B ^B Tot Row C 100 (11.85) 1000 1100 ^C 200 (988.15) 100000 101000 Total Column 102100

Other Interestingness Algorithms
Null invariance – value does not change with the number of null transactions. Interestingness null invariance measures: AllConf(A,B) Jaccard(A,B) Cosine(A,B) Kulczynski(A,B) MaxConf(A,B) Not all null-invariant measures are created equal

Imbalance Ratio with Kulczynski
Imbalance Ratio: measure the imbalance of two itemsets A&B in rule implications

Kulczynski (P(B/C) + P(C/B))/2 < epsilon where epsilon is 0.01
Where A = milk, b = coffee 1 billion transaction = 1,000,000,000 A = 1 million time = 1,000,000 B = 10 thousand times = 10000 A + B = one hundred = 100 S(A) = 10^6 / 10^9 = 10^-3 = 1/1000 S(B) = 10^4/ 10^9 = 10^-5 = 1/100000 S(A u B) = 10^2 / 10^9 = 10^-7 = 1/ S(A) * S(B) = 10^-3*10^-5 = 10^-8

Kulczynski P(B|A) = P(AUB) / P(A) = 10^2/10^6 = 10^-4
P(A|B) = P(AUB) / P(B) = 10^2/ 10^4 = 10^-2 (P(B|A) + P(A|B))/2 = (10^ ^-2)/2 = < 0.01 Therefore this is a negative pattern

Interestingness.

Similar presentations

Presentation on theme: "Interestingness."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Interestingness.

Similar presentations

Presentation on theme: "Interestingness."— Presentation transcript:

Similar presentations

About project

Feedback