Generalized Sequential Pattern (GSP) Step 1: – Make the first pass over the sequence database D to yield all the 1-element frequent sequences Step 2: Repeat.

Slides:

Advertisements

Similar presentations

Mining Sequence Data. © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Sequence Data ObjectTimestampEvents A102, 3, 5 A206, 1 A231 B114,

Advertisements

Recap: Mining association rules from large datasets

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.

1 Frequent Itemset Mining: Computation Model uTypically, data is kept in a flat file rather than a database system. wStored on disk. wStored basket-by-basket.

Sequential Patterns & Process Mining Current State of Research Edgar de Graaf LIACS.

10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.

1 Mining Associations Apriori Algorithm. 2 Computation Model uTypically, data is kept in a flat file rather than a database system. wStored on disk. wStored.

Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Christoph F. Eick Questions and Topics Review Dec. 1, Give an example of a problem that might benefit from feature creation 2.Compute the Silhouette.

Association Analysis (7) (Mining Graphs)

Data Mining Association Analysis: Basic Concepts and Algorithms

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)

Data Mining Association Rules: Advanced Concepts and Algorithms

1 Section 6.1 Recurrence Relations. 2 Recursive definition of a sequence Specify one or more initial terms Specify rule for obtaining subsequent terms.

Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}

Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.

Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:

Data Mining Association Rules: Advanced Concepts and Algorithms

What Is Sequential Pattern Mining?

Advanced Association Rule Mining and Beyond. Continuous and Categorical Attributes Example of Association Rule: {Number of Pages  [5,10)  (Browser=Mozilla)}

Ch5 Mining Frequent Patterns, Associations, and Correlations

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.

Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent.

Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Rules: Advanced Concepts and Algorithms

Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Modul 8: Sequential Pattern Mining

Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 By Gun Ho Lee Intelligent Information Systems Lab.

Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.

Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang

Sequential Pattern Mining

Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01

Christoph F. Eick Questions and Topics Review Dec. 6, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2 Compute.

Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.

Data Mining Association Rules: Advanced Concepts and Algorithms

1 1 MSCIT 5210: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Tan, Steinbach, Kumar.

S EQUENTIAL P ATTERNS & THE GSP A LGORITHM BY : J OE C ASABONA.

1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.

Intelligent Database Systems Lab Advisor ： Dr.Hsu Graduate ： Keng-Wei Chang Author ： Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.

CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.

Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Least Common Multiple Greatest Common Factor

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.

1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

Mining Association Rules: Advanced Concepts and Algorithms

Data Mining Association Rules: Advanced Concepts and Algorithms

A new algorithm for gap constrained sequence mining

Frequent Pattern Mining

Advanced Pattern Mining 02

CS 548 Sequence Mining Showcase By Bian Du, Wa Gao, and Cam Jones

Association Rule Mining

Presentation transcript:

Generalized Sequential Pattern (GSP) Step 1: – Make the first pass over the sequence database D to yield all the 1-element frequent sequences Step 2: Repeat until no new frequent sequences are found – Candidate Generation: Merge pairs of frequent subsequences found in the (k-1)th pass to generate candidate sequences that contain k items – Candidate Pruning: Prune candidate k-sequences that contain infrequent (k-1)-subsequences – Support Counting: Make a new pass over the sequence database D to find the support for these candidate sequences – Candidate Elimination: Eliminate candidate k-sequences whose actual support is less than minsup

GSP Example Suppose now we have 3 events: 1, 2, 3, and let min-support be 50%. The sequence database is shown in following table: ObjectSequence A (1), (2), (3) B (1, 2), (3) C (1), (2, 3) D (1, 2, 3) E (1, 2), (2, 3), (1, 3)

Step 1: Make the first pass over the sequence database D to yield all the 1-element frequent sequences Candidate 1-sequences are:,, ObjectSequence A (1), (2), (3) B (1, 2), (3) C (1), (2, 3) D (1, 2, 3) E (1, 2), (2, 3), (1, 3)

Step 2: Candidate Generation: Merge pairs of frequent subsequences found in the (k-1)th pass to generate candidate sequences that contain k items Candidate 1-sequences are:,, Base case (k=2): Merging two frequent 1-sequences and will produce two candidate 2-sequences: and Candidate 2-sequences are:,,,,, ObjectSequence A (1), (2), (3) B (1, 2), (3) C (1), (2, 3) D (1, 2, 3) E (1, 2), (2, 3), (1, 3)

Step 2: Candidate Pruning: Prune candidate k-sequences that contain infrequent (k-1)- subsequences After candidate pruning, the 2-sequences should remain the same:,,,,, ObjectSequence A (1), (2), (3) B (1, 2), (3) C (1), (2, 3) D (1, 2, 3) E (1, 2), (2, 3), (1, 3)

Step 2: Support Counting and Candidate Elimination: After candidate elimination, the remaining frequent 2-sequences are: (support=0.6), (support=0.8), (support=0.6) ObjectSequence A (1), (2), (3) B (1, 2), (3) C (1), (2, 3) D (1, 2, 3) E (1, 2), (2, 3), (1, 3)

Repeat Step 2: Candidate Generation General case (k>2): A frequent (k-1)-sequence w 1 is merged with another frequent (k-1)-sequence w 2 to produce a candidate k-sequence if the subsequence obtained by removing the first event in w 1 is the same as the subsequence obtained by removing the last event in w 2 The resulting candidate after merging is given by the sequence w 1 extended with the last event of w 2. – If the last two events in w 2 belong to the same element, then the last event in w 2 becomes part of the last element in w 1 – Otherwise, the last event in w 2 becomes a separate element appended to the end of w 1

Repeat Step 2: Candidate Generation Generate 3-sequences from the remaining 2-sequences :,,,, 3-sequences are: (generated from and ), (generated from and ) ObjectSequence A (1), (2), (3) B (1, 2), (3) C (1), (2, 3) D (1, 2, 3) E (1, 2), (2, 3), (1, 3)

Repeat Step 2: Candidate Pruning 3-sequences : should be pruned because one 2-subsequences is not frequent. should not be pruned because all 2-subsequences and are frequent. should not be pruned because all 2-subsequences, and are frequent. So after pruning, the remaining 3-sequences are: and ObjectSequence A (1), (2), (3) B (1, 2), (3) C (1), (2, 3) D (1, 2, 3) E (1, 2), (2, 3), (1, 3)

Repeat Step 2: Support Counting Remaining 3-sequences :, support = 0.4 < 0.5, should be eliminated, support = 0.4 < 0.5, should be eliminated. Thus, there are no 3-sequences left. So the final frequent sequences are:,,,,,,, ObjectSequence A (1), (2), (3) B (1, 2), (3) C (1), (2, 3) D (1, 2, 3) E (1, 2), (2, 3), (1, 3)