An Inductive Database for Mining Temporal Patterns in Event Sequences Alexandre Vautier, Marie-Odile Cordier and René Quiniou

Slides:



Advertisements
Similar presentations
Towards Data Mining Without Information on Knowledge Structure
Advertisements

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
3.6 Support Vector Machines
Introduction to Algorithms
Constraint Satisfaction Problems
Analysis of Computer Algorithms
Extension Principle Adriano Cruz ©2002 NCE e IM/UFRJ
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Dynamic Programming Introduction Prof. Muhammad Saeed.
and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
Introduction to Algorithms 6.046J/18.401J/SMA5503
Construct chronicles For each fuzzy clusters of step : instances are sorted in the decreasing order of their membership degree the T first instances that.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.
Addition Facts
Relational data objects 1 Lecture 6. Relational data objects 2 Answer to last lectures activity.
Query optimisation.
1 Student-Project Allocation with Preferences over Projects David Manlove Gregg OMalley University of Glasgow Department of Computing Science Supported.
Predictive Control in Matrix Converters Marie Curie ECON2 Summer School University of Nottingham, England July 9-11, 2008 Marco Esteban Rivera Abarca Universidad.
CS4026 Formal Models of Computation Part II The Logic Model Lecture 1 – Programming in Logic.
Reductions Complexity ©D.Moshkovitz.
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
1 Outline relationship among topics secrets LP with upper bounds by Simplex method basic feasible solution (BFS) by Simplex method for bounded variables.
Robust Window-based Multi-node Technology- Independent Logic Minimization Jeff L.Cobb Kanupriya Gulati Sunil P. Khatri Texas Instruments, Inc. Dept. of.
1 Clustering of location- based data Mohammad Rezaei May 2013.
§1 Greedy Algorithms ALGORITHM DESIGN TECHNIQUES
1 Verification of Parameterized Systems Reducing Model Checking of the Few to the One. E. Allen Emerson, Richard J. Trefler and Thomas Wahl Junaid Surve.
1 Column Generation. 2 Outline trim loss problem different formulations column generation the trim loss problem master problem and subproblem in column.
1-1 Constraint-based Scheduling Claude Le Pape. 1-2 Outline Introduction Scheduling constraints Non-preemptive scheduling –Temporal constraints –Resource.
Chapter 9 -- Simplification of Sequential Circuits.
Priority Queues Two kinds of priority queues: Min priority queue. Max priority queue.
Association Rule Mining
Recap: Mining association rules from large datasets
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
Text Categorization.
1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.
College of Information Technology & Design
Because I said so… Objective: To identify and use inductive and deductive reasoning.
the Entity-Relationship (ER) Model
Traditional IR models Jian-Yun Nie.
Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.
1 Modeling and Simulation: Exploring Dynamic System Behaviour Chapter9 Optimization.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Boolean and Vector Space Retrieval Models
Problems and Their Classes
Bi-intervals for backtracking on temporal constraint networks Jean-François Baget and Sébastien Laborie.
Addition 1’s to 20.
Copyright © Cengage Learning. All rights reserved.
Copyright © Cengage Learning. All rights reserved.
Complexity ©D.Moshkovits 1 Where Can We Draw The Line? On the Hardness of Satisfiability Problems.
Week 1.
Group Meeting Presented by Wyman 10/14/2006
Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25.
Relational Algebra and Relational Calculus
Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8
CPSC 322, Lecture 14Slide 1 Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) Oct, 5, 2012.
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Introduction Distance-based Adaptable Similarity Search
From Model-based to Model-driven Design of User Interfaces.
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Minimum Vertex Cover in Rectangle Graphs
1 Graphs with Maximal Induced Matchings of the Same Size Ph. Baptiste 1, M. Kovalyov 2, Yu. Orlovich 3, F. Werner 4, I. Zverovich 3 1 Ecole Polytechnique,
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
A Simple Method to Extract Fuzzy Rules by Measure of Fuzziness Jieh-Ren Chang Nai-Jian Wang.
Submodule construction in logics 1 Gregor v. Bochmann, University of Ottawa Using First-Order Logic to Reason about Submodule Construction Gregor v. Bochmann.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Presentation transcript:

An Inductive Database for Mining Temporal Patterns in Event Sequences Alexandre Vautier, Marie-Odile Cordier and René Quiniou RENNES - France

The application Cardiac arrhytmias P wave Normal QRS complex …ok, but how can a specific arrhythmia be automatically characterized ? Abnormal QRS complex Electrocardiograms Abnormal rhythm Normal rhythm M. Rabbit, you suffer from bigeminy, a severe cardiac arrhythmia M. Dog, you are ok !...

A problem definition close to supervised machine learning Discretized and labeled electrocardiograms p,Q,p,q,p,Q,p,Q p,q,p,q,p,q,p,q p,Q,p,p,Q,p,p,Q N..ok, which patterns are frequent in the sequence labeled but not frequent in sequences labeled ? P N p,Q,p,q,p,Q,p,Q p,q,p,q,p,q,p,q p,Q,p,p,Q,p,p,Q Temporal patterns representative of the sequence P P Frequent temporal patterns

Formalization of the problem The framework of inductive database (IDB) Sequences {L bigeminy } 2 P {L lbb, L mobitz, L normal } 2 N Temporal patterns … An IDB …ok, which temporal patterns C satisfy Qu expert (P,N,T,C) = (9 L 2 P, freq(C,L) ¸ T L ) Æ ( 8 L 2 N, freq(C,L) < T L ) ? Sequences {L bigeminy } 2 P {L lbb, L mobitz, L normal } 2 N Temporal patterns {C|freq(C, L bigeminy )¸ T 0 }, {C|freq(C, L lbbb )¸ T 1 }, {C|freq(C, L mobitz )¸ T 2 }, {C|freq(C, L normal )¸ T 3 }, {C| Qu expert (P,N,T,C) }

Plan Introduction Problem features Sequences Chronicles Inductive databases Order relation What is frequency ? Algorithms Frequent Minimal Chronicles Search (Fmc Search) Querying the IDB Experiments and problems to be solved Conclusion and future work

Long sequences of time-stamped events with few types Numerical temporal information of major importance An example of an event sequence: Features of sequences ABA B ABBAA B … Events time

Features of temporal patterns Chronicles Chronicle: a set of events temporally constrained May contain several events of the same type Specifies numerical temporal constraint between events: the uncertain delay represented by an interval [d min,d max ] d min,d max 2 Z Is easily readable by an expert of the application domain C, t 0 A, t 1 B, t 2 [5;10] [-2;20] C,1B,5A,8B,10C,15C,16A,26B,34A,27 Instances I C (L) Event sequence: an ordered list of time-stamped events C:

Inductive database Required definitions A query language that makes use of frequency constraints freq(C,L) ¸ T and freq(C,L) · T If a query on frequency satisfies monotonicity or anti-monotonicity properties then a search based on frequency is easier to compute An order relation on chronicles must be defined

An order relation on chronicles C is more general than C (C v C ), each event of C can be matched to an event of C each temporal constraint of C is more general than the corresponding constraint in C' A,t 2 B,t 3 B,t 4 [5;10] [9;20] A,t 0 B,t 1 [8;21] v C C

How to compute the frequency of a chronicle in a sequence ? B B AABB I C (L) L: ABB [2,3][-1,3] [1,5] C: BABBAB The cardinal of the set of All the instances ? Minimal occurences ? [Mannila,97] Earliest distinct instances ? [Dousson, 99] Distinct instances ?

Monotonicity and anti-monotonicity properties Constraints on frequency should satisfy monotonicity or anti-monotonicity properties A ABB A B [-2,2] A freq(C,L) ¸ freq(C,L) L: C: 2 instances 3 instances · Minimal occurences [Mannila, 97] don t have monotonicity and anti-monotonicity properties

Recognition criterion Let I C (L) be the set of instances of the chronicle C in the sequence L A recognition criterion selects a unique set E of instances from I C (L) The frequency of the chronicle C in the sequence L according to the recognition criterion Q is freq Q (C,L) = |E| A monotonic criterion is a recognition criterion Q such that C v C ) freq Q (C,L) ¸ freq Q (C,L)

Fmc Search Fmc Search: Frequent Minimal Chronicles Search freq Q (C,L) ¸ T, max win (C) · W Input: L: An event sequence T: A minimum frequency threshold Q:A recognition criterion (application dependent) W: A maximal time window Output: Fmc Q,W (L,T) Every chronicle from Fmc Q,W (L,T) satisfies 3 properties: is as specific as possible generalizes at least T instances… …that respect the recognition criterion Q Algorithm: Step 1Step 2Step 3 Step 4 x x x x x Fmc Q,W (L,T)

Fmc Search Step 1: Chronicle instance extraction The instances of every frequent chronicle are extracted from the sequence L. Their temporal constraints are set to [-W,W] Implemented in the software FACE (Frequency Analyser for Chronicle Extraction) x x x x x Fmc Q,W (L,T) The numerical temporal constraints of chronicles found by FACE are not specific enough

Fmc Search Step 2: Fuzzy clustering of instances A fuzzy clustering of each set I C (L) found at step 1 is performed B B AABB AB BB x x x x I C (L) An instance has a membership degree to each cluster x x x x x Fmc Q,W (L,T)

Fmc Search Step 3: Chronicle construction from clusters For each fuzzy cluster of step 2: Instances are sorted in the decreasing order of their membership degree The T first instances that respect the Q criterion are kept to construct a chronicle This chronicle is the lgg (least general generalization) of the selected instances x x x x x Fmc Q,W (L,T) The specificity of chronicles depends on the clustering

Fmc Search Step 4: Chronicle filtering - keep the most specific Compute the set of frequent minimal (maximally specific) chronicles Fmc Q,W (L,T) The most specific chronicles are retained Monotonicity property: A chronicle C that satisfies freq Q (C,L) ¸ T is more general than at least one chronicle of Fmc Q,W (L,T) x x x x x Fmc Q,W (L,T)

Querying the IDB A chronicle C satisfies this query iff: C is more general than at least one chronicle of Fmc Q,W (L P,T P ) monotonicity property C is not more general than every chronicle of Fmc Q,W (L N,T N ) anti-monotonicity property Version space ? T An adaptation of Mitchells algorithm computes this version space Remember my query: Qu expert (P,N,T,C). For the explanation P = {L P } and N = {L N } freq(C,L P )¸T P Æ freq(C,L N ) < T N

Experiments Characterization of cardiac arrhythmias Data: 4 sequences of cardiac events elaborated from electrocardiograms labeled by an expert containing ~4000 events of 3 types (P waves, normal QRS complexes, abnormal QRS complexes) A typical query : freq Q d (C,L bigeminy ) ¸ 5% Æ freq Q d (C,L normal ) · 10% Æ freq Q d (C,L mobitz ) · 10% Æ freq Q d (C,L lbbb ) · 10% Æ W = 3 s

Experiments An example of cardiac chronicle Characterizes bigeminy arrhythmia Also found by a supervised learning method (ILP) from ECGs [Carrault, 03] p = P waves q = normal QRS Q = abnormal QRS

Problems to be solved The step 3 of the Fmc-search has to cluster up to 180,000 instances per chronicle For a minimum threshold of 5%, up to 1000 chronicles can be extracted in one sequence This slows down Mitchell's algorithm dramatically Finding Fmc is an NP-complete problem The set Fmc Q,W (L N,T N ) is correct but not complete Results have to be filtered in order to give the correct solution ? T Optimal Fmc Q,W (L N,T N ) Practical Fmc Q,W (L N,T N )

Conclusion An original method to extract temporal patterns in the form of chronicles Chronicles express constraints on time by numerical intervals A formalization of the problem in the framework of inductive database which provides the definition of: An order relation on temporal patterns A monotonic recognition criterion and the related frequency A management of numerical temporal constraints (this task is very hard) An algorithm that finds Fmc in sequences A method to reuse and adapt Mitchells algorithm

Future work Control the clustering step of Fmc search in order to compute only Fmcs that are needed by Mitchells algorithm Adapt Mitchells algorithm in order to provide an approximate solution whose quality is user-defined Extend the method to other measures of interest Explore new applications intrusion detection

in Event Sequences An IDB for Mining Temporal Patterns Alexandre Vautier, Marie-Odile Cordier, and René Quiniou RENNES - France

Maximum size of I C (L) as a function of the number of events in a chronicle