Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.

Slides:

Advertisements

Similar presentations

Mining Sequence Data. © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Sequence Data ObjectTimestampEvents A102, 3, 5 A206, 1 A231 B114,

Advertisements

gSpan: Graph-based substructure pattern mining

Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.

Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Generalized Sequential Pattern (GSP) Step 1: – Make the first pass over the sequence database D to yield all the 1-element frequent sequences Step 2: Repeat.

Association Analysis (7) (Mining Graphs)

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining Association Analysis: Basic Concepts and Algorithms

© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Advanced Concepts and Algorithms Figures for Chapter 9 Introduction.

Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.

Data Mining Association Analysis: Basic Concepts and Algorithms

© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Basic Concepts and Algorithms Figures for Chapter 8 Introduction.

© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Anomaly Detection Figures for Chapter 10 Introduction to Data Mining by Tan,

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)

Data Mining Association Rules: Advanced Concepts and Algorithms

Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}

Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining: Exploring Data Figures for Chapter 3 Introduction to Data Mining by Tan, Steinbach,

1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.

© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Association Analysis: Advanced Concepts Figures for Chapter 7 Introduction to.

Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:

Data Mining Association Rules: Advanced Concepts and Algorithms

What Is Sequential Pattern Mining?

Advanced Association Rule Mining and Beyond. Continuous and Categorical Attributes Example of Association Rule: {Number of Pages  [5,10)  (Browser=Mozilla)}

October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent.

Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Rules: Advanced Concepts and Algorithms

Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Modul 8: Sequential Pattern Mining

Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 By Gun Ho Lee Intelligent Information Systems Lab.

Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.

Sequential Pattern Mining

1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.

Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.

Data Mining Association Rules: Advanced Concepts and Algorithms

1 1 MSCIT 5210: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Tan, Steinbach, Kumar.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.

S EQUENTIAL P ATTERNS & THE GSP A LGORITHM BY : J OE C ASABONA.

Intelligent Database Systems Lab Advisor ： Dr.Hsu Graduate ： Keng-Wei Chang Author ： Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.

Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong

Business Intelligence Technologies – Data Mining ` Lecture 2 Market Basket Analysis, Association Rules 1.

Gspan: Graph-based Substructure Pattern Mining

Sequential Pattern Mining

Mining Association Rules: Advanced Concepts and Algorithms

Data Mining Association Rules: Advanced Concepts and Algorithms

A new algorithm for gap constrained sequence mining

Frequent Pattern Mining

Advanced Pattern Mining 02

Association Analysis: Advance Concepts

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining Association Rules: Advanced Concepts and Algorithms

CS 548 Sequence Mining Showcase By Bian Du, Wa Gao, and Cam Jones

Data Mining Association Analysis: Basic Concepts and Algorithms

Presentation transcript:

Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Sequence Data ObjectTimestampEvents A102, 3, 5 A206, 1 A231 B114, 5, 6 B172 B217, 8, 1, 2 B281, 6 C141, 8, 7 Sequence Database:

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Examples of Sequence Data Sequence Database SequenceElement (Transaction) Event (Item) CustomerPurchase history of a given customer A set of items bought by a customer at time t Books, diary products, CDs, etc Web DataBrowsing activity of a particular Web visitor A collection of files viewed by a Web visitor after a single mouse click Home page, index page, contact info, etc Event dataHistory of events generated by a given sensor Events triggered by a sensor at time t Types of alarms generated by sensors Genome sequences DNA sequence of a particular species An element of the DNA sequence Bases A,T,G,C Sequence E1 E2 E1 E3 E2 E3 E4 E2 Element (Transaction) Event (Item)

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Formal Definition of a Sequence l A sequence is an ordered list of elements (transactions) s = –Each element contains a collection of events (items) e i = {i 1, i 2, …, i k } –Each element is attributed to a specific time or location l Length of a sequence, |s|, is given by the number of elements of the sequence l A k-sequence is a sequence that contains k events (items)

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Examples of Sequence l Web sequence: l Sequence of initiating events causing the nuclear accident at 3-mile Island: ( l Sequence of books checked out at a library:

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Formal Definition of a Subsequence l A sequence is contained in another sequence (m ≥ n) if there exist integers i 1 < i 2 < … < i n such that a 1  b i1, a 2  b i1, …, a n  b in l The support of a subsequence w is defined as the fraction of data sequences that contain w l A sequential pattern is a frequent subsequence (i.e., a subsequence whose support is ≥ minsup) Data sequenceSubsequenceContain? Yes No Yes

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Sequential Pattern Mining: Definition l Given: –a database of sequences –a user-specified minimum support threshold, minsup l Task: –Find all subsequences with support ≥ minsup

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Sequential Pattern Mining: Challenge l Given a sequence: –Examples of subsequences:,,, etc. l How many k-subsequences can be extracted from a given n-sequence? n = 9 k=4: Y _ _ Y Y _ _ _ Y

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Sequential Pattern Mining: Example Minsup = 50% Examples of Frequent Subsequences: s=60% s=80% s=60%

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Extracting Sequential Patterns l Given n events: i 1, i 2, i 3, …, i n l Candidate 1-subsequences:,,, …, l Candidate 2-subsequences:,, …,,, …, l Candidate 3-subsequences:,, …,,, …,,, …,,, …

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Generalized Sequential Pattern (GSP) l Step 1: –Make the first pass over the sequence database D to yield all the 1- element frequent sequences l Step 2: Repeat until no new frequent sequences are found –Candidate Generation:  Merge pairs of frequent subsequences found in the (k-1)th pass to generate candidate sequences that contain k items –Support Counting:  Make a new pass over the sequence database D to find the support for these candidate sequences –Candidate Elimination:  Eliminate candidate k-sequences whose actual support is less than minsup

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ GSP Example

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Timing Constraints (I) {A B} {C} {D E} <= m s <= x g >n g x g : max-gap n g : min-gap m s : maximum span Data sequenceSubsequenceContain? Yes No Yes No x g = 2, n g = 0, m s = 4

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Mining Sequential Patterns with Timing Constraints l Approach 1: –Mine sequential patterns without timing constraints –Postprocess the discovered patterns l Approach 2: –Modify GSP to directly prune candidates that violate timing constraints

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Frequent Subgraph Mining l Extend association rule mining to finding frequent subgraphs l Useful for Web Mining, computational chemistry, bioinformatics, spatial data sets, etc

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Graph Definitions

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Representing Transactions as Graphs l Each transaction is a clique of items

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Representing Graphs as Transactions

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Edge Growing

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Apriori-like Algorithm l Find frequent 1-subgraphs l Repeat –Candidate generation  Use frequent (k-1)-subgraphs to generate candidate k-subgraph –Candidate pruning  Prune candidate subgraphs that contain infrequent (k-1)-subgraphs –Support counting  Count the support of each remaining candidate –Eliminate candidate k-subgraphs that are infrequent In practice, it is not as easy. There are many other issues

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Example: Dataset

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Example

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Adjacency Matrix Representation The same graph can be represented in many ways

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Graph Isomorphism l A graph is isomorphic if it is topologically equivalent to another graph

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Graph Isomorphism l Test for graph isomorphism is needed: –During candidate generation step, to determine whether a candidate has been generated –During candidate pruning step, to check whether its (k-1)-subgraphs are frequent –During candidate counting, to check whether a candidate is contained within another graph