Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu.

Slides:

Advertisements

Similar presentations

Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.

Advertisements

CS 267: Automated Verification Lecture 8: Automata Theoretic Model Checking Instructor: Tevfik Bultan.

Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

1 CMSC 471 Fall 2002 Class #6 – Wednesday, September 18.

Timed Automata.

Determinization of Büchi Automata

Project management Project manager must;

Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.

Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.

Equivalence, DFA, NDFA Sequential Machine Theory Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 2 Updated and modified by.

Aki Hecht Seminar in Databases (236826) January 2009

Anagh Lal Monday, April 14, Chapter 9 – Tree Decomposition Methods Anagh Lal CSCE Advanced Constraint Processing.

Firewall Policy Queries Author: Alex X. Liu, Mohamed G. Gouda Publisher: IEEE Transaction on Parallel and Distributed Systems 2009 Presenter: Chen-Yu Chang.

Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.

Algorithm Design Techniques: Induction Chapter 5 (Except Section 5.6)

Extracting Structured Data from Web Page Arvind Arasu, Hector Garcia-Molina ACM SIGMOD 2003.

Data Flow Analysis Compiler Design Nov. 8, 2005.

Constraint Reasoning Florida Institute of Technology Computer Science.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.

Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.

Unit 14 Derivation of State Graphs

Chapter 1 Introduction to Data Mining

INTRODUCTION TO THE THEORY OF COMPUTATION INTRODUCTION MICHAEL SIPSER, SECOND EDITION 1.

Similarity based Retrieval from Sequence Databases using Automata as Queries 作者 : A. Prasad Sistla, Tao Hu, Vikas howdhry 出處 :CIKM 2002 ACM 指導教授 : 郭煌政老師.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

February 18, 2015CS21 Lecture 181 CS21 Decidability and Tractability Lecture 18 February 18, 2015.

1 Automatic Refinement and Vacuity Detection for Symbolic Trajectory Evaluation Orna Grumberg Technion Haifa, Israel Joint work with Rachel Tzoref.

Lecture 4 Sequences CSCI – 1900 Mathematics for Computer Science Fall 2014 Bill Pine.

XML Data Management 10. Deterministic DTDs and Schemas Werner Nutt.

An Implementation of The Teiresias Algorithm Na Zhao Chengjun Zhan.

On Reducing the Global State Graph for Verification of Distributed Computations Vijay K. Garg, Arindam Chakraborty Parallel and Distributed Systems Laboratory.

Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

Submodule construction in logics 1 Gregor v. Bochmann, University of Ottawa Using First-Order Logic to Reason about Submodule Construction Gregor v. Bochmann.

Reading and Writing Mathematical Proofs Spring 2015 Lecture 4: Beyond Basic Induction.

Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.

Algorithmic Detection of Semantic Similarity WWW 2005.

NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.

Mathematical Preliminaries

Data Modelling and Cleaning CMPT 455/826 - Week 8, Day 2 Sept-Dec 2009 – w8d21.

Emerging Trend Detection Shenzhi Li. Introduction What is an Emerging Trend? –An Emerging Trend is a topic area for which one can trace the growth of.

1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery.

Intelligent Database Systems Lab Advisor ： Dr.Hsu Graduate ： Keng-Wei Chang Author ： Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.

Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.

Reasoning about the Behavior of Semantic Web Services with Concurrent Transaction Logic Presented By Dumitru Roman, Michael Kifer University of Innsbruk,

Complexity & Computability. Limitations of computer science  Major reasons useful calculations cannot be done:  execution time of program is too long.

Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.

Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.

Variants of LTL Query Checking Hana ChocklerArie Gurfinkel Ofer Strichman IBM Research SEI Technion Technion - Israel Institute of Technology.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.

1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and.

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Hidden Markov Models BMI/CS 576

What Is Cluster Analysis?

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Sequential Flexibility

CIS Automata and Formal Languages – Pei Wang

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Modeling Arithmetic, Computation, and Languages

Advanced Pattern Mining 02

Jaya Krishna, M.Tech, Assistant Professor

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Association Rule Mining

ece 627 intelligent web: ontology and beyond

Resolution Proofs for Combinational Equivalence

CS 8520: Artificial Intelligence

Presentation transcript:

Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Outline What is Data Mining? Formal Problem Definition TAG (Timed Automaton with Granularity) A Naive Solution Techniques for Improving Performance Experimental Results

What is Data Mining Data Mining A non-trivial extraction of implicit, previously unknown & potentially useful information from data Common Data Mining Techniques  association-rule mining  Sequential mining (Temporal mining)  Clustering  Classification  Outlier detection

Temporal Data Mining Finding time-related frequent patterns (frequent sub-sequences) which pairs of events occur frequently one week after another A simple example: user may be interested in finding all those events that frequently follow within 2 business days of a rise of the IBM stock price.

Definition Event Type (E): e.g. deposit to an account e.g. price increase of a specific stock Event e: An event e is a pair e=(E, t), where E is an event type and t is a positive integer, called the timestamp of e. Event Sequence An Event Sequence a finite set of events. Each event (E, t) appearing in an event sequence represents the occurrence of event type E at time t.

Granularity Granularity is a mappingμfrom the set of the positive integers to subset of the time domain such that for all positive integers i and j with i<j: (1) implies that each number in  i  is less than all the numbers in  j , and (2) implies. Example: year, month, week, day, business-day, business-week etc.

TCG A temporal constraint with granularity (TCG) [m,n]  is a binary relation on positive integers. For positive integers t 1 and t 2, (t 1, t 2 ) satisfies [m,n]  iff (1) t 1  t 2 (2) and are both defined, and (3) Example: TCG[0,0]day, [0,2]hour, [1,1]month

Event Structure An event structure (with granularities) is a rooted directed acyclic graph (W,A,Γ), where W is a finite set of event variables, A  W  W andΓ is a mapping from A to the finite set of TCGs. Complex event type derived from S each variable associated with a specific event type. Complex event matching S each variable associated with a distinct event such that the event timestamps satisfy the time constraints.

Example of Event Structure Assign the event types for x 0, x 1, x 2, x 3, to be IBM-rise, IBM- earnings-report, HP-rise, and IBM-fall, respectively, we have a complex event type. This complex event type describes that the IBM earnings were reported one business day after the IBM stock rose, and in the same or the next week the IBM stock fell; while the HP stock rose within 5 business days after the same rise of the IBM stock and within 8 hours before the same fall of the IBM stock. [1,1]b-day [0,5]b-day [0,8]hours Figure 1: An event structure [0,1]week

Formal Problem Definition An event-mining problem is a quadruple (S, , E 0,  ), where S is an event structure,  is the minimum confidence value, E 0 an event type, and  is a partial mapping which assigns a set of event types to some of the variables (expect root). An event-mining problem is the problem of finding all complex event types such that each occurs frequently in the input sequence and is derived from S by assigning E to the root and a specific event type to each of the other variables. Example (S, 0.8, IBM-rise,  )

TAG Timed Automaton with Granularities A basic component to test if a candidate complex event type appears frequent in a time sequence. A timed automaton with granularities is a 6-tuple , S, S 0, C, T, F), where (1)  is a finite set of input letters, (2) S is a finite set of states, (3) S 0  S is a set of start states, (4) C is a finite set of clocks, (5) T  S  S    2 C   (C) is a set of transitions, (6) F  S is a set of accepting states.

TAG  (C) is the set of all the formulas called clock constraints. A transition (s, s’, e,,  ) represents a transition from state s to state s’ on input symbol e. the set  C gives the clocks to be reset with this transition. And  is a clock constraint over C. Is essentially standard finite automata with some modifications. Each TAG maintains a set of clocks. Both input symbol and clock determine the next state. A run is an accepting run if the last state is in the set F. An event sequence is accepted by a TAG if there exists an accepting run.

A Naïve Solution Consider all the event types that occur in the given event sequence, and consider all the complex types derived from the given event structure, one from each assignment of these event types to the variables. Each of these complex types is called a candidate complex type for the event-mining problem. For each candidate complex type, start the corresponding TAG at every occurrence of E 0. That is, for each occurrence of E 0 in the event structure, use the rest of the event sequence as the input to one copy of the TAG. By counting the number of TAGs reaching a final state, versus the number of occurrences of E 0, all the solutions of the event-mining problem will be derived. The number of candidate types is exponential in the number of event types occurring in the event structure. Too costly.

Techniques to improve performance The performance of this algorithm can be improved by: identifying the possible inconsistencies in the given event structure before starting the process, reducing the length of the sequence, reducing the number of times an automaton has to be started, reducing the number of different automata to be started, applying the naïve algorithm.

Recognition of Inconsistent Event Structures A event structure is consistent if there exists a complex event that matches that event structure. If an event structure is inconsistent, it should be discarded even before the mining process starts. It is difficult to determine the consistency of event structures. Use approximated polynomial algorithms to check the consistency of event structures.

Recognition of Inconsistent Event Structures If one of the constraints implied by the given ones is the “empty” one, i.e. unsatisfiable, the whole event structure is inconsistent. A TCG [m ’, n ’ ] is logically implied by a TCG [m, n]  if each pair (x, y) satisfying the second constraint, satisfies also the first one. For example, a TCG [1,2]b-week can be converted into [3,18]day or [0,1]month, while it cannot be converted into [2,3]week-end or [1,3]week, since the resulting constraints are not implied by [1,2]b-week.

Reduction of the Event Sequence We can reduce the event sequence by exploiting the granularities. For example, if a discovery problem is defined on the sub-structure excluding variable x 3, the input event sequence can be reduced discarding any event that does not occur in a business day.

Reduction of the occurrences of the root The basic idea is to remove those occurrences of reference types which cannot be the root of a complex event matching the given structure. It is possible that for some occurrences of the reference types in the sequence, a constraint is unsatisfiable. Consider all the non-empty sets of explicit and implicit constraints on the pair of the root and each non-root node. Check if one of the constraints cannot be satisfied. For example, if no event occurs in the sequence in the next business day of an IBM-rise event, this particular reference event can be discarded. (No automaton is started for it.)

Reduction of the occurrences of the root Let N be the number of occurrences of the reference event type in the sequence. Let N ’ be the number of occurrences of reference events for which one of the constraints is unsatisfiable. These are reference events that are certainly not the root of a complex event satisfying the given event structure. If N ’ /N ≤1- , there cannot be any frequent complex event type and the empty set should be returned to the user. Otherwise, remove these occurrences of the reference type and modify  into  ’ = (  *N) / (N- N ’ ).

Reduction of the Candidate Type Based on the property: if a complex event type occurs frequently, then any of its sub-type should also occur frequently. In other words, if one assignment to two variables is not frequent, any candidate complex event type including this assignment won’t be frequent. So we can remove these complex event type from the candidate complex event type. For each subset W’ of W, the induced approximated sub- structure of W ’ is (W’, A ’, Γ ’ ), where A ’ consists of all pairs (X, Y)  W’  W’, such that there is a path from X to Y in S and there is at least one constraint on (X,Y).

Reduction of the Candidate Type To find the solutions to the induced discovery problems is rather straightforward and simple in time complexity. Indeed, the induced sub-structure gives the distance from the root to the variable (in effect, two distances, namely the minimum distance and the maximum distance). For each occurrence of E 0, this distance translates into a window, i.e., a period of time during which the event for X must appear. Extend the sub-structure to more than one non-root variable. These variable form a chain in S.

Experimental Results Closing prices of 439 stocks for 517 trading days Price changes are partitioned into 7 categories: (- , -5%), (- 5%, -3%), (-3%, 0), (0, 0), (0, 3%), (3%, 5%), (5%,  ) Total number of event types is The number of event is The reference event type X0: the drop of IBM stock of less than 3%. Minimum confidence value is 0.7. There is no other assignment to other variables. [0,2]b-day[1,2]b-day[0,0]b-week X0 X1 X2 X3 The event structure used in the experiment

Experimental Results cont.

This experiment focuses on Step 4, namely reduction of the candidate complex event types by using sub- structures. The result shows that after using heuristics the number of candidate complex event types reduces significantly.

Experimental Results cont. The two frequent event combinations discovered in the experiment

References C. Bettini, Wang, X.S., Jajodia, S. and Jia-Ling, L. "Discovering Temporal Relationships with Multiple Granularities in Time Sequences". IEEE Transations on Knowledge and Data Engineering, Vol. 10 (2), C. Bettini, X. Wang, and S. Jajodia. A General Framework for Time Granularity and its Application to Temporal Reasoning. Annals of Mathematics and Artificial Intelligence, Vol. 22 (1-2), pages 29-58, Baltzer Science Publishers, C. Bettini, X. S. Wang, and S. Jajodia. Testing complex temporal relationships involving multiple granularities and its application to data mining. In Proceedings of the Fifteenth ACM SIGACT- SIGMODSIGART Symposium on Principles of Database Systems (PODS'96), pages 68-78, Montreal, Canada, June 1996 C. Bettini, X. Sean Wang, and S. Jajodia. Mining temporal relationships with multiple granularities in time sequences. Data Engineering Bulletin, 21:32--38, 1998.

Thank you Question?