Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Slides:

Advertisements

Similar presentations

Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)

Advertisements

Mining Association Rules

CSE 634 Data Mining Techniques

Association rules and frequent itemsets mining

Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.

Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.

Data e Web Mining Paolo Gobbo

gSpan: Graph-based substructure pattern mining

PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth

Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.

IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department

Nadia Andreani Dwiyono DESIGN AND MAKE OF DATA MINING MARKET BASKET ANALYSIS APLICATION AT DE JOGLO RESTAURANT.

1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.

Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.

Association Rule Mining Zhenjiang Lin Group Presentation April 10, 2007.

Chapter 12: Web Usage Mining - An introduction

Mining Time-Series Databases Mohamed G. Elfeky. Introduction A Time-Series Database is a database that contains data for each point in time. Examples:

4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.

Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int’l Conference on Data Engineering (ICDE) March 1995 Presenter: Phil Schlosser.

Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.

1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)

Discovery of Aggregate Usage Profiles for Web Personalization

1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.

Research Project Mining Negative Rules in Large Databases using GRD.

Mining Association Rules

Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}

Mining Association Rules

1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.

MIS 451 Building Business Intelligence Systems Association Rule Mining (1)

Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.

Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.

Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:

FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.

Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to.

EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.

Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.

Discovery of Aggregate Usage Profiles for Web Personalization Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Yuqing Sun, Jim Wiltshire WebKDD 2000.

Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.

Data Mining Association Rules: Advanced Concepts and Algorithms

Association Rule Mining

ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.

Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.

Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization.

1 Finding Periodic Partial Patterns in Time Series Database Huiping Cao Apr. 30, 2003.

Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.

Improvement of Apriori Algorithm in Log mining Junghee Jaeho Information and Communications University,

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques

Advanced Pattern Mining 02

Lin Lu, Margaret Dunham, and Yu Meng

Market Basket Analysis and Association Rules

Mining Sequential Patterns

Gyozo Gidofalvi Uppsala Database Laboratory

Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak

Data Mining Association Rules: Advanced Concepts and Algorithms

Mining Complex Data COMP Seminar Spring 2011.

Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.

Association Rules and Sequential Patterns

COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong

Mining Sequential Patterns

Market Basket Analysis and Association Rules

Discovery of Significant Usage Patterns from Clickstream Data

Finding Frequent Itemsets by Transaction Mapping

Presentation transcript:

Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University

Sequential pattern mining  Association rule mining does not consider the order of transactions.  In many applications such orderings are significant. E.g.,  in market basket analysis, it is interesting to know whether people buy some items in sequence,  e.g., buying bed first and then bed sheets some time later.  In Web usage mining, it is useful to find navigational patterns of users in a Web site from sequences of page visits of users 2

3 Sequential Patterns Extending Frequent Itemsets  Sequential patterns add an extra dimension to frequent itemsets and association rules - time.  Items can appear before, after, or at the same time as each other.  General form: “x% of the time, when A appears in a transaction, B appears within z transactions.”  note that other items may appear between A and B, so sequential patterns do not necessarily imply consecutive appearances of items (in terms of time)  Examples  Renting “Star Wars”, then “Empire Strikes Back”, then “Return of the Jedi” in that order  Collection of ordered events within an interval  Most sequential pattern discovery algorithms are based on extensions of the Apriori algorithm for discovering itemsets  Navigational Patterns  they can be viewed as a special form of sequential patterns which capture navigational patterns among users of a site  in this case a session is a consecutive sequence of pageview references for a user over a specified period of time

4 Objective  Given a set S of input data sequences (or sequence database), the problem of mining sequential patterns is to find all the sequences that have a user-specified minimum support  Each such sequence is called a frequent sequence, or a sequential pattern  The support for a sequence is the fraction of total data sequences in S that contains this sequence

5 Sequence Databases  A sequence database consists of an ordered lis of elements or events  Each element can be a set of items or a single item (a singleton set)  Transaction databases vs. sequence databases A sequence database SIDsequences A transaction database TIDitemsets 10 a, b, d 20 a, c, d 30 a, d, e 40 b, e, f Elements in (…) are sets

6 Subsequence vs. super sequence  A sequence is an ordered list of events, denoted  Given two sequences α= and β=  α is called a subsequence of β, denoted as α ⊆ β, if there exist integers 1≤ j 1 < j 2 <…< j n ≤m such that a 1 ⊆ b j1, a 2 ⊆ b j2,…, a n ⊆ b jn  Examples:  is a subsequence of   3, (4, 5), 8  is contained in (or is a subsequence of)  6, (3, 7), 9, (4, 5, 8), (3, 8)   ⊆

7 What Is Sequential Pattern Mining?  Given a set of sequences and support threshold, find the complete set of frequent subsequences A sequence database A sequence : An element may contain a set of items. Items within an element are unordered and we list them alphabetically. is a subsequence of Given support threshold min_sup =2, is a sequential pattern SIDsequence

Another Example 8 Transactions Sorted by Customer ID

Example (continued) 9 Sequences produced from transactions Final sequential patterns

GSP mining algorithm  Very similar to the Apriori algorithm 10

11 Sequential Pattern Mining Algorithms  Apriori-based method: GSP (Generalized Sequential Patterns: Srikant & Agrawal, 1996)  Pattern-growth methods: FreeSpan & PrefixSpan (Han et al., 2000; Pei, et al., 2001)  Vertical format-based mining: SPADE (Zaki 2000)  Constraint-based sequential pattern mining (SPIRIT: Garofalakis, et al., 1999; Pei, et al., 2002)  Mining closed sequential patterns: CloSpan (Yan, Han & Afshar, 2003) From: J. Han and M. Kamber. Data Mining: Concepts and Techniques,

Mining Navigation Patterns  Each session induces a user trail through the site  A trail is a sequence of web pages followed by a user during a session, ordered by time of access  A sequential pattern in this context is a frequent trail  Sequential pattern mining can help identify common navigational sequences which in turn helps in understanding common user behavioral patterns  If the goal is to make predictions about future user actions based on past behavior, approaches such as Markov models (e.g., Markov Chains) can be used 12

13 Mining Navigational Patterns  Another Approach: Markov Chains  idea is to model the navigational sequences through the site as a state-transition diagram without cycles (a directed acyclic graph)  a Markov Chain consists of a set of states (pages or pageviews in the site) S = {s 1, s 2, …, s n } and a set of transition probabilities P = {p 1,1, …, p 1,n, p 2,1, …, p 2,n, …, p n,1, …, p n,n }  a path r from a state s i to a state s j, is a sequence states where the transition probabilities for all consecutive states are greater than 0  the probability of reaching a state s j from a state s i via a path r is the product of all the probabilities along the path:  the probability of reaching s j from s i is the sum over all paths:

Construct Markov Chain from Web Navigational Data  Add a unique start state  the start state has a transition to the first page in each session (representing the start of a session)  alternatively, could have a transition to every state, assuming that every page can potentially be start of a session  Add a unique final state  the last page in each trail has a transition to the final state (representing the end of the session)  The transition probabilities are obtained from counting click-throughs  The Markov chain built is called absorbing since we always end up in the final state 14

15 A Hypothetical Markov Chain  What is the probability that a user who visits the Home page purchases a product?  Home -> Search -> PD -> $ = 1/3 * 1/2 *1/2 = 1/12 =  Home -> Cat -> PD -> $ = 1/3 * 1/3 * 1/2 = 1/18 =  Home -> Cat -> $ = 1/3 * 1/3 = 1/9 =  Home -> RS -> PD -> $ = 1/3 * 2/3 * 1/2 = 1/9 =  What is the probability that a user who visits the Home page purchases a product?  Home -> Search -> PD -> $ = 1/3 * 1/2 *1/2 = 1/12 =  Home -> Cat -> PD -> $ = 1/3 * 1/3 * 1/2 = 1/18 =  Home -> Cat -> $ = 1/3 * 1/3 = 1/9 =  Home -> RS -> PD -> $ = 1/3 * 2/3 * 1/2 = 1/9 = Sum = An example Markov Chain

16 A B C D E Sessions: A, B A, B, C A, B, C, D A, B, C, E A, C, E A, B, D A, B, D, E B, C B, C, D B, C, E B, D, E Transition B  C: Total occurrences of B: 14 Total occurrence of BC: 8 Pr(C|B) = 8/14 = Web site hyperlink graph Calculating conditional probabilities for transitions Markov Chain Example

17 Sessions: A, B A, B, C A, B, C, D A, B, C, E A, C, E A, B, D A, B, D, E B, C B, C, D B, C, E B, D, E The Full Markov Chain A B C D E 0.57 Start Final Probability that someone will visit page C? S  B  C + S  A  C + S  A  B  C (0.31 * 0.57) + (0.69 * 0.18) + (0.69 * 0.82 * 0.57) = Prob. that someone who has visited B will visit E? B  D  E + B  C  E + B  C  D  E (0.21 * 0.33) + (0.57 * 0.40) + (0.57 * 0.20 * 0.33) = Probability that someone visiting page C will leave the site? 0.40 = 40% Markov Chain Example (cont.)

Mining Frequent Trails Using Markov Chains  Support s in [0,1) – accept only trails whose initial probability is above s  Confidence c in [0,1) – accept only trails whose probability is above c  Recall: the probability of a trail is obtained by multiplying the transition probabilities of the links in the trail  Mining for Patterns  Find all trails whose initial probability is higher than s, and whose trail probability is above c.  Use depth-first search on the Markov chain to compute the trails  The average time needed to find the frequent trails is proportional to the number of web pages in the site 18

Markov Chains: Another Example 19 IDSession Trail 1A1 > A2 > A3 2 3A1 > A2 > A3 > A4 4A5 > A2 > A4 5A5 > A2 > A4 > A6 6A5 > A2 > A3 > A6

Frequent Trails From Example Support = 0.1 and Confidence = 0.3 TrailProbability A1 > A2 > A30.67 A5 > A2 > A30.67 A2 > A30.67 A1 > A2 > A40.33 A5 > A2 > A40.33 A2 > A40.33 A4 > A

TrailProbability A1 > A2 > A30.67 A5 > A2 > A30.67 A2 > A Frequent Trails From Example Support = 0.1 and Confidence = 0.5

22 Efficient Management of Navigational Trails  Approach: Store sessions in an aggregated sequence tree  Initially introduced in Web Utilization Miner (WUM) - Spiliopoulou, 1998  for each occurrence of a sequence start a new branch or increase the frequency counts of matching nodes  in example below, note that s6 contains “b” twice, hence the sequence is

23 Mining Navigational Patterns The aggregated sequence tree can be used directly to determine support and confidence for navigational patterns Navigation pattern: a  b Support = 11/35 = 0.31 Confidence = 11/21 = 0.52 Navigation pattern: a  b Support = 11/35 = 0.31 Confidence = 11/21 = 0.52 Nav. pattern: a  b  e Support = 11/35 = 0.31 Confidence = 11/11 = 1.00 Nav. pattern: a  b  e Support = 11/35 = 0.31 Confidence = 11/11 = 1.00 Nav. patterns: a  b  e  f Support = 3/35 = Confidence = 3/11 = 0.27 Nav. patterns: a  b  e  f Support = 3/35 = Confidence = 3/11 = 0.27 Support = count at the node / count at root Confidence = count at the node / count at the parent Note that each node represents a navigational path ending in that node