Mining Time-Series Databases Mohamed G. Elfeky. Introduction A Time-Series Database is a database that contains data for each point in time. Examples:

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Mining Association Rules
Recap: Mining association rules from large datasets
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
CSE 634 Data Mining Techniques
Data Mining in Clinical Databases by using Association Rules Department of Computing Charles Lo.
LOGO Association Rule Lecturer: Dr. Bo Yuan
STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM /10/021Chen.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Spring 2003Data Mining by H. Liu, ASU1 5. Association Rules Market Basket Analysis and Itemsets APRIORI Efficient Association Rules Multilevel Association.
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Mining Frequent patterns without candidate generation Jiawei Han, Jian Pei and Yiwen Yin.
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
Data Mining Association Analysis: Basic Concepts and Algorithms
© 2006 Pearson Addison-Wesley. All rights reserved11 A-1 Chapter 11 Trees.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Performance and Scalability: Apriori Implementation.
SEG Tutorial 2 – Frequent Pattern Mining.
Sequential PAttern Mining using A Bitmap Representation
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
BACKGROUND Many phenomena happen in predictable cycles : CPU clock, presidential elections, moon cycle Periodicity : tendency for events to occur in a.
Binary Trees. Binary Tree Finite (possibly empty) collection of elements A nonempty binary tree has a root element The remaining elements (if any) are.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Association Analysis (3)
1 Finding Periodic Partial Patterns in Time Series Database Huiping Cao Apr. 30, 2003.
TREES General trees Binary trees Binary search trees AVL trees Balanced and Threaded trees.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Frequent Pattern Mining
TREES General trees Binary trees Binary search trees AVL trees
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
A Parameterised Algorithm for Mining Association Rules
Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.
Data Mining Association Analysis: Basic Concepts and Algorithms
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Association Analysis: Basic Concepts and Algorithms
732A02 Data Mining - Clustering and Association Analysis
Frequent-Pattern Tree
FP-Growth Wenlong Zhang.
Association Analysis: Basic Concepts
Presentation transcript:

Mining Time-Series Databases Mohamed G. Elfeky

Introduction A Time-Series Database is a database that contains data for each point in time. Examples: Weather Data Stock Prices

What to Mine? Full Periodic Patterns Every point in time contributes to the cyclic behavior of the time-series for each period. e.g., describing the weekly stock prices pattern considering all the days of the week. Partial Periodic Patterns Describing the behavior of the time-series at some but not all points in time. e.g., discovering that the stock prices are high every Saturday and small every Tuesday.

Mining Partial Periodic Patterns Problem Definition Methods Apriori Max-Subpattern Hit Set Jiawei Han, Guozhu Dong, and Yiwen Yin – ICDE98

Problem Definition The time-series is: S = D 1 D 2 … D n A pattern is: s = s 1 … s p over the set of features L and the letter *. |s| = p is the period of the pattern s. L-length of s is the number of s i that is not *. If s has L-length j, it is called a j-pattern. A subpattern of s is: s ’ = s ’ 1 … s ’ p such that for each position i: s ’ i is a * or subset of s i.

Problem Definition (Cont.) Each segment of the form D i|s|+1 … D i|s|+|s| is called a period segment. A period segment matches s if for each position j, either s j is * or subset of D i|s|+j. The frequency count of s in a time-series S is the number of period segments of S that matches s. The confidence of s is defined as the division of its frequency count by the maximum number of periods of length |s| in S. A pattern is called frequent if its confidence not less than a minimum threshold.

Problem Definition (Example) The pattern: a*{a,c}de is of length 5 and of L-length 4 and so it is called 4-pattern. The patterns: a*{a,c}** and **cde are subpatterns of the above pattern. In the series a{b,c}baebaced, the pattern: a*b, whose period is 3, has frequency count 2. Its confidence is 2/3 where 3 is the maximum number of periods of length 3.

Apriori Method Apriori Property: Each subpattern of a frequent pattern of period p is itself a frequent pattern of period p. Method: 1. Find F 1, the set of frequent 1-patterns of period p. 2. Find all frequent i-patterns of period p, for i from 2 to p, based on the idea of Apriori, and terminate when the candidate i-pattern set is empty.

Max-Subpattern Hit Set Method Definitions Algorithm Implementation Data Structure

Definitions A candidate max-pattern C max is the maximal pattern which can be generated from F 1 (the set of frequent 1-patterns). Example: If F 1 = {a***, *b**, *c**, **d*}, Then C max = a{b,c}d*

Definitions (Cont.) A subpattern of C max is hit in a period segment S i if it is the maximal subpattern of C max in S i. Example: For C max = a{b,c}d* and S i = a{b,c}ce, The hit subpattern is: a{b,c}** The hit set H is the set of all hit subpatterns of C max in S.

Algorithm 1. Scan S once to find F 1 and form the candidate max-pattern C max. 2. Scan S again, and for each period segment, add its max-subpattern to the hit set setting its count to 1 if it is not exist, or increase its count by Derive the frequent patterns from the hit set.

Implementation Data Structure Max-Subpattern Tree The root node is: C max. A child node is a subpattern of the parent node with one non-* letter missing. The link is labeled by this letter. A node containing only 2 non-* letters have no children since they are already in F 1. Each node has a count field which registers its number of hits.

Max-Subpattern Tree (Example) a{b,c}d* *{b,c}d* acd*abd*a{b,c}** a d cb *cd* *bd*a*d*ab**ac** bc b d d a a b bc

Max-Subpattern Tree (Construction) Finding w the max-subpattern in the current segment. Search for w in the tree, starting from the root and following the path corresponds to the missing non-* letters in order. If the node w is found, increase its count by 1. Otherwise, create a new node w (with count 1) and its missing ancestors in the followed path (with count 0).

Max-Subpattern Tree (Construction) a{b,c}d* *{b,c}d* a *cd* b 0 0 1

Max-Subpattern Tree (Traversal) After the second scan, the tree will contain all the max subpatterns of the time-series. Now the tree must be traversed to compute the confidence value of each subpattern.

Max-Subpattern Tree (Traversal) The frequency count of each node is the sum of its count and those of all its reachable ancestors. For Example: The frequency count of *cd* is 78. The frequency count of a*d* is 105.

Max-Subpattern Tree (Example) a{b,c}d* *{b,c}d* acd*abd*a{b,c}** a d cb *cd* *bd*a*d*ab**ac** bc b d d a a b bc