Presentation is loading. Please wait.

Presentation is loading. Please wait.

Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu.

Similar presentations


Presentation on theme: "Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu."— Presentation transcript:

1 Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

2 Outline What is Data Mining? Formal Problem Definition TAG (Timed Automaton with Granularity) A Naive Solution Techniques for Improving Performance Experimental Results

3 What is Data Mining Data Mining A non-trivial extraction of implicit, previously unknown & potentially useful information from data Common Data Mining Techniques  association-rule mining  Sequential mining (Temporal mining)  Clustering  Classification  Outlier detection

4 Temporal Data Mining Finding time-related frequent patterns (frequent sub-sequences) which pairs of events occur frequently one week after another A simple example: user may be interested in finding all those events that frequently follow within 2 business days of a rise of the IBM stock price.

5 Definition Event Type (E): e.g. deposit to an account e.g. price increase of a specific stock Event e: An event e is a pair e=(E, t), where E is an event type and t is a positive integer, called the timestamp of e. Event Sequence An Event Sequence a finite set of events. Each event (E, t) appearing in an event sequence represents the occurrence of event type E at time t.

6 Granularity Granularity is a mappingμfrom the set of the positive integers to subset of the time domain such that for all positive integers i and j with i<j: (1) implies that each number in  i  is less than all the numbers in  j , and (2) implies. Example: year, month, week, day, business-day, business-week etc.

7 TCG A temporal constraint with granularity (TCG) [m,n]  is a binary relation on positive integers. For positive integers t 1 and t 2, (t 1, t 2 ) satisfies [m,n]  iff (1) t 1  t 2 (2) and are both defined, and (3) Example: TCG[0,0]day, [0,2]hour, [1,1]month

8 Event Structure An event structure (with granularities) is a rooted directed acyclic graph (W,A,Γ), where W is a finite set of event variables, A  W  W andΓ is a mapping from A to the finite set of TCGs. Complex event type derived from S each variable associated with a specific event type. Complex event matching S each variable associated with a distinct event such that the event timestamps satisfy the time constraints.

9 Example of Event Structure Assign the event types for x 0, x 1, x 2, x 3, to be IBM-rise, IBM- earnings-report, HP-rise, and IBM-fall, respectively, we have a complex event type. This complex event type describes that the IBM earnings were reported one business day after the IBM stock rose, and in the same or the next week the IBM stock fell; while the HP stock rose within 5 business days after the same rise of the IBM stock and within 8 hours before the same fall of the IBM stock. [1,1]b-day [0,5]b-day [0,8]hours Figure 1: An event structure [0,1]week

10 Formal Problem Definition An event-mining problem is a quadruple (S, , E 0,  ), where S is an event structure,  is the minimum confidence value, E 0 an event type, and  is a partial mapping which assigns a set of event types to some of the variables (expect root). An event-mining problem is the problem of finding all complex event types such that each occurs frequently in the input sequence and is derived from S by assigning E to the root and a specific event type to each of the other variables. Example (S, 0.8, IBM-rise,  )

11 TAG Timed Automaton with Granularities A basic component to test if a candidate complex event type appears frequent in a time sequence. A timed automaton with granularities is a 6-tuple , S, S 0, C, T, F), where (1)  is a finite set of input letters, (2) S is a finite set of states, (3) S 0  S is a set of start states, (4) C is a finite set of clocks, (5) T  S  S    2 C   (C) is a set of transitions, (6) F  S is a set of accepting states.

12 TAG  (C) is the set of all the formulas called clock constraints. A transition (s, s’, e,,  ) represents a transition from state s to state s’ on input symbol e. the set  C gives the clocks to be reset with this transition. And  is a clock constraint over C. Is essentially standard finite automata with some modifications. Each TAG maintains a set of clocks. Both input symbol and clock determine the next state. A run is an accepting run if the last state is in the set F. An event sequence is accepted by a TAG if there exists an accepting run.

13 A Naïve Solution Consider all the event types that occur in the given event sequence, and consider all the complex types derived from the given event structure, one from each assignment of these event types to the variables. Each of these complex types is called a candidate complex type for the event-mining problem. For each candidate complex type, start the corresponding TAG at every occurrence of E 0. That is, for each occurrence of E 0 in the event structure, use the rest of the event sequence as the input to one copy of the TAG. By counting the number of TAGs reaching a final state, versus the number of occurrences of E 0, all the solutions of the event-mining problem will be derived. The number of candidate types is exponential in the number of event types occurring in the event structure. Too costly.

14 Techniques to improve performance The performance of this algorithm can be improved by: identifying the possible inconsistencies in the given event structure before starting the process, reducing the length of the sequence, reducing the number of times an automaton has to be started, reducing the number of different automata to be started, applying the naïve algorithm.

15 Recognition of Inconsistent Event Structures A event structure is consistent if there exists a complex event that matches that event structure. If an event structure is inconsistent, it should be discarded even before the mining process starts. It is difficult to determine the consistency of event structures. Use approximated polynomial algorithms to check the consistency of event structures.

16 Recognition of Inconsistent Event Structures If one of the constraints implied by the given ones is the “empty” one, i.e. unsatisfiable, the whole event structure is inconsistent. A TCG [m ’, n ’ ] is logically implied by a TCG [m, n]  if each pair (x, y) satisfying the second constraint, satisfies also the first one. For example, a TCG [1,2]b-week can be converted into [3,18]day or [0,1]month, while it cannot be converted into [2,3]week-end or [1,3]week, since the resulting constraints are not implied by [1,2]b-week.

17 Reduction of the Event Sequence We can reduce the event sequence by exploiting the granularities. For example, if a discovery problem is defined on the sub-structure excluding variable x 3, the input event sequence can be reduced discarding any event that does not occur in a business day.

18 Reduction of the occurrences of the root The basic idea is to remove those occurrences of reference types which cannot be the root of a complex event matching the given structure. It is possible that for some occurrences of the reference types in the sequence, a constraint is unsatisfiable. Consider all the non-empty sets of explicit and implicit constraints on the pair of the root and each non-root node. Check if one of the constraints cannot be satisfied. For example, if no event occurs in the sequence in the next business day of an IBM-rise event, this particular reference event can be discarded. (No automaton is started for it.)

19 Reduction of the occurrences of the root Let N be the number of occurrences of the reference event type in the sequence. Let N ’ be the number of occurrences of reference events for which one of the constraints is unsatisfiable. These are reference events that are certainly not the root of a complex event satisfying the given event structure. If N ’ /N ≤1- , there cannot be any frequent complex event type and the empty set should be returned to the user. Otherwise, remove these occurrences of the reference type and modify  into  ’ = (  *N) / (N- N ’ ).

20 Reduction of the Candidate Type Based on the property: if a complex event type occurs frequently, then any of its sub-type should also occur frequently. In other words, if one assignment to two variables is not frequent, any candidate complex event type including this assignment won’t be frequent. So we can remove these complex event type from the candidate complex event type. For each subset W’ of W, the induced approximated sub- structure of W ’ is (W’, A ’, Γ ’ ), where A ’ consists of all pairs (X, Y)  W’  W’, such that there is a path from X to Y in S and there is at least one constraint on (X,Y).

21 Reduction of the Candidate Type To find the solutions to the induced discovery problems is rather straightforward and simple in time complexity. Indeed, the induced sub-structure gives the distance from the root to the variable (in effect, two distances, namely the minimum distance and the maximum distance). For each occurrence of E 0, this distance translates into a window, i.e., a period of time during which the event for X must appear. Extend the sub-structure to more than one non-root variable. These variable form a chain in S.

22 Experimental Results Closing prices of 439 stocks for 517 trading days Price changes are partitioned into 7 categories: (- , -5%), (- 5%, -3%), (-3%, 0), (0, 0), (0, 3%), (3%, 5%), (5%,  ) Total number of event types is 2978. The number of event is 181089. The reference event type X0: the drop of IBM stock of less than 3%. Minimum confidence value is 0.7. There is no other assignment to other variables. [0,2]b-day[1,2]b-day[0,0]b-week X0 X1 X2 X3 The event structure used in the experiment

23 Experimental Results cont.

24 This experiment focuses on Step 4, namely reduction of the candidate complex event types by using sub- structures. The result shows that after using heuristics the number of candidate complex event types reduces significantly.

25 Experimental Results cont. The two frequent event combinations discovered in the experiment

26 References C. Bettini, Wang, X.S., Jajodia, S. and Jia-Ling, L. "Discovering Temporal Relationships with Multiple Granularities in Time Sequences". IEEE Transations on Knowledge and Data Engineering, Vol. 10 (2), 1998. C. Bettini, X. Wang, and S. Jajodia. A General Framework for Time Granularity and its Application to Temporal Reasoning. Annals of Mathematics and Artificial Intelligence, Vol. 22 (1-2), pages 29-58, Baltzer Science Publishers, 1998. C. Bettini, X. S. Wang, and S. Jajodia. Testing complex temporal relationships involving multiple granularities and its application to data mining. In Proceedings of the Fifteenth ACM SIGACT- SIGMODSIGART Symposium on Principles of Database Systems (PODS'96), pages 68-78, Montreal, Canada, June 1996 C. Bettini, X. Sean Wang, and S. Jajodia. Mining temporal relationships with multiple granularities in time sequences. Data Engineering Bulletin, 21:32--38, 1998.

27 Thank you Question?


Download ppt "Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu."

Similar presentations


Ads by Google