Download presentation
Presentation is loading. Please wait.
Published byRudolf Riley Modified over 6 years ago
1
On the Discovery of Interesting Patterns in Association Rules
1998년 11월 9일 월요일 DE Lab. 하 미 라
2
Abstract complicated temporal pattern calendar algebra
ex. “first working day of every month” calendar algebra calendric association rules
3
Introduction 기존의 association rules cyclic association rules
with no attention paid to segmenting the data over different time intervals cyclic association rules simple periodicity fairy rigid did not deal with multiple units of time calendric association rules complicated pattern calendar algebra for “most” but not all the time mismatch threshold can handle multiple units of time
4
Problem Definition tj : the jth time unit, j 0
time interval [j • t, (j+1) • t], where t is the unit of time T[j] : the set of transactions executed in tj support of an itemset X in T[j] is the fraction of transactions in T[j] that contain the itemset confidence of a rule X Y in T[j] is the fraction of transactions in T[j] containing X that also contain Y. An association rule XY holds in time unit tj if the support of XY in T[j] exceeds supmin and the confidence of XY in T[j] exceeds conmin
5
mis-match threshold : m belong to an association rule XY
calendar C : set of time intervals {(s1, e1), (s2, e2), … , (sk, ek)} contain time unit t : if it contains an interval (sj, ej) such that sj t ej mis-match threshold : m belong to an association rule XY if the rule has enough confidence and support for the time units contained in the calendar with at most m mis-matches belong to an itemset X if the support of X exceeds supmin in at least w-m time units
6
Example C = {(31,31), (60,60), (89,89), (121,121), (152,152), (180,180), (213,213), (243,243), (274,274), (305,305), (334,334), (366, 366)} mismatch threshold : 0 the rule has enough support and confidence on days 31, 60, 89, 121, 152, 180, 213, 243, 274, 305, 334, 366 of year 1996 mismatch threshold : 4 the rule has to hold for at least 12-4 = 8 of the 12 days
7
An association rule can be represented as binary sequence
1 : the time unit in which the rule holds 0 : the time unit in which the rule does not have the minimum support and confidence ex : X Y holds in T[3], T[4], T[8], T[10], T[12] the calendar {(4,4), (8,8), (12,12)}, which corresponds to the a cycle of length 4, belongs to the rule
8
Calendar Algebra Calendar : a structured collection of intervals
calendar of order 1 : {(s1 , e1), (s2 , e2), … ,(sk , ek)} calendar of order 2 : a collection of calendars of order 1 and so on... interval operators (int1 = (s1, e1), int2 = (s2 , e2)) int1 overlaps int2 (s1 s2 e1) (s2 s1 e2) int1 during int2 (s1 s2) (e1 e2) int1 meets int2 (e1 = s2) int1 int2 (e1 s2) int1 int2 (s1 s2) (e1 e2)
9
Dicing Operations left : order 1, right : interval result : order 1 C:R:c’ {c c’ | c C c R c’}/{} C.R.c’ {c | c C c R c’}/{} left : order 1, right : order 1 result : order 2 C:R:C’ {{c c’ | c C c R c’}/{} | c’ C’} C.R.C’ {{c | c C c R c’}/{} | c’ C’} ex. WeeksInJan96 = {(-3,4), (5,11), (12,18), (19,25), (26,32)} JanIn1996 = {(1, 31)} WeeksInJan96 : overlaps : JanIn1996 = {(1,4), (5,11), (12,18), (19,25), (26,31)} WeeksInJan96.overlaps.JanIn1996 = {(-3,4), (5,11), (12,18), (19,25), (26,32)}
10
Slicing Operations C : calendar, p : integer
(p) / C : replaces each order 1 calendar in C with its pth element [p] / C : replaces every order 1 calendar in C with a calendar consisting of the pth element if p is negative, indexing is done from the end of the calendar (ex. (-1) / C) list of integers [p1, p2, …, pk] / C : replaces each order 1 calendar in C with a calendar consisting of the element (p1, p2, …, pk) / C : replaces each order 1 calendar in C with the element
11
ex. WeeksInJan96 = {(-3,4), (5,11), (12,18), (19,25), (26,32)}
flatten operator : takes an order k calendar produces an order k-1 calendar flatten {{(-3,4), (5,11), (12, 18), (19,25), (26,32)}} = {(-3,4), (5,11), (12, 18), (19,25), (26,32)} flatten {{(1,1)}, {(5,5)}} = {(1,1), (5,5)}
12
The Sequential Algorithm
generate the rules in each time unit with one of the existing methods and then apply a pattern matching algorithm to discover calendars
13
Pruning, Skipping, Elimination
to reduce the number of itemsets for which support must be caculated “A calendar that belongs to the rule X Y also belongs to the itemset XY. ” skipping “If time unit tj is not contained in any calendar that belongs to an itemset X, then there is no need to calculate the support for X in time segment T[j].”
14
pruning Lemma 1 : If a calendar C belongs to an itemset X then it must also belong to all of X’s subsets for computing the potential calendars of an itemset by merging the calendars of the itemset’s subsets ex. If we know that the calendar {(4,4), (8,8), (12,12)} is the only calendar that belongs to items A and B, prunning : the only calendar that can belong to AB is also {(4,4), (8,8), (12,12)} skipping : we have to calculate the support of AB only in T[4], T[8], and T[12] rather than all the segments
15
elimination to eliminate certain calendars from further consideration once we have determined then cannot exist. Lemma 2 : If the support for an itemset X is below the minimum support threshold supmin in m time units contained in a calendar C, where m is the mis-match threshold, then C cannot belong to X. ex. mis-match threshold : 0, itemset X does not have enough support in T[4] the calendar {(4,4), (8,8), (12,12)} cannot belong to X
16
The Interleaved Algorithm
First phase : the calendars belonging to large itemsets are dicovered using prunning, skipping, elimination Second phase : calendric association rules are generated using the calendars and support of the itemsets without scanning the database ex. C = {(4,4), (8,8), (12,12)} item A : item B :
17
2. Time segments are processed sequentially. For each time unit tj :
For each k, k 1 : 1. If k = 1, then all possible calendars are initially assumed to exist for each single itemset. Otherwise(if k > 1), pruning is applied to generate the potential calendars for k-itemsets using the calendars for (k-1)- itemsets. If the list of potential calendars for each k- itemsets is empty, Phase One terminates. 2. Time segments are processed sequentially. For each time unit tj : 2.1 skipping determines, from the set of candidate calendars for k-itemsets, the set of k-itemsets for which support will be calculated in time segment T[j]. 2.2 If a k-itemset X chosen in Step 2.1 does not have the minimum support in time segment T[j], then the mis-match count is incremented by 1 for each potential calendar associated with X that contains tj. If this mis-match threshold for a particular calendar, that calendar is eliminated from the list of potential calendars for X. Fig. 1 : Phase One of the interleaved algorithm
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.