Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University

Sequential pattern mining  Association rule mining does not consider the order of transactions.  In many applications such orderings are significant. E.g.,  in market basket analysis, it is interesting to know whether people buy some items in sequence,  e.g., buying bed first and then bed sheets some time later.  In Web usage mining, it is useful to find navigational patterns of users in a Web site from sequences of page visits of users 2

3 Sequential Patterns Extending Frequent Itemsets  Sequential patterns add an extra dimension to frequent itemsets and association rules - time.  Items can appear before, after, or at the same time as each other.  General form: “x% of the time, when A appears in a transaction, B appears within z transactions.”  note that other items may appear between A and B, so sequential patterns do not necessarily imply consecutive appearances of items (in terms of time)  Examples  Renting “Star Wars”, then “Empire Strikes Back”, then “Return of the Jedi” in that order  Collection of ordered events within an interval  Most sequential pattern discovery algorithms are based on extensions of the Apriori algorithm for discovering itemsets  Navigational Patterns  they can be viewed as a special form of sequential patterns which capture navigational patterns among users of a site  in this case a session is a consecutive sequence of pageview references for a user over a specified period of time

4 Objective  Given a set S of input data sequences (or sequence database), the problem of mining sequential patterns is to find all the sequences that have a user-specified minimum support  Each such sequence is called a frequent sequence, or a sequential pattern  The support for a sequence is the fraction of total data sequences in S that contains this sequence

5 Sequence Databases  A sequence database consists of an ordered lis of elements or events  Each element can be a set of items or a single item (a singleton set)  Transaction databases vs. sequence databases A sequence database SIDsequences 10 20 30 40 A transaction database TIDitemsets 10 a, b, d 20 a, c, d 30 a, d, e 40 b, e, f Elements in (…) are sets

6 Subsequence vs. super sequence  A sequence is an ordered list of events, denoted  Given two sequences α= and β=  α is called a subsequence of β, denoted as α ⊆ β, if there exist integers 1≤ j 1 < j 2 <…< j n ≤m such that a 1 ⊆ b j1, a 2 ⊆ b j2,…, a n ⊆ b jn  Examples:  is a subsequence of   3, (4, 5), 8  is contained in (or is a subsequence of)  6, (3, 7), 9, (4, 5, 8), (3, 8)   ⊆

7 What Is Sequential Pattern Mining?  Given a set of sequences and support threshold, find the complete set of frequent subsequences A sequence database A sequence : An element may contain a set of items. Items within an element are unordered and we list them alphabetically. is a subsequence of Given support threshold min_sup =2, is a sequential pattern SIDsequence 10 20 30 40

Another Example 8 Transactions Sorted by Customer ID

Example (continued) 9 Sequences produced from transactions Final sequential patterns

GSP mining algorithm  Very similar to the Apriori algorithm 10

11 Sequential Pattern Mining Algorithms  Apriori-based method: GSP (Generalized Sequential Patterns: Srikant & Agrawal, 1996)  Pattern-growth methods: FreeSpan & PrefixSpan (Han et al., 2000; Pei, et al., 2001)  Vertical format-based mining: SPADE (Zaki 2000)  Constraint-based sequential pattern mining (SPIRIT: Garofalakis, et al., 1999; Pei, et al., 2002)  Mining closed sequential patterns: CloSpan (Yan, Han & Afshar, 2003) From: J. Han and M. Kamber. Data Mining: Concepts and Techniques, www.cs.uiuc.edu/~hanji

Mining Navigation Patterns  Each session induces a user trail through the site  A trail is a sequence of web pages followed by a user during a session, ordered by time of access  A sequential pattern in this context is a frequent trail  Sequential pattern mining can help identify common navigational sequences which in turn helps in understanding common user behavioral patterns  If the goal is to make predictions about future user actions based on past behavior, approaches such as Markov models (e.g., Markov Chains) can be used 12

13 Mining Navigational Patterns  Another Approach: Markov Chains  idea is to model the navigational sequences through the site as a state-transition diagram without cycles (a directed acyclic graph)  a Markov Chain consists of a set of states (pages or pageviews in the site) S = {s 1, s 2, …, s n } and a set of transition probabilities P = {p 1,1, …, p 1,n, p 2,1, …, p 2,n, …, p n,1, …, p n,n }  a path r from a state s i to a state s j, is a sequence states where the transition probabilities for all consecutive states are greater than 0  the probability of reaching a state s j from a state s i via a path r is the product of all the probabilities along the path:  the probability of reaching s j from s i is the sum over all paths:

Construct Markov Chain from Web Navigational Data  Add a unique start state  the start state has a transition to the first page in each session (representing the start of a session)  alternatively, could have a transition to every state, assuming that every page can potentially be start of a session  Add a unique final state  the last page in each trail has a transition to the final state (representing the end of the session)  The transition probabilities are obtained from counting click-throughs  The Markov chain built is called absorbing since we always end up in the final state 14

15 A Hypothetical Markov Chain  What is the probability that a user who visits the Home page purchases a product?  Home -> Search -> PD -> $ = 1/3 * 1/2 *1/2 = 1/12 = 0.083  Home -> Cat -> PD -> $ = 1/3 * 1/3 * 1/2 = 1/18 = 0.056  Home -> Cat -> $ = 1/3 * 1/3 = 1/9 = 0.111  Home -> RS -> PD -> $ = 1/3 * 2/3 * 1/2 = 1/9 = 0.111  What is the probability that a user who visits the Home page purchases a product?  Home -> Search -> PD -> $ = 1/3 * 1/2 *1/2 = 1/12 = 0.083  Home -> Cat -> PD -> $ = 1/3 * 1/3 * 1/2 = 1/18 = 0.056  Home -> Cat -> $ = 1/3 * 1/3 = 1/9 = 0.111  Home -> RS -> PD -> $ = 1/3 * 2/3 * 1/2 = 1/9 = 0.111 Sum = 0.361 An example Markov Chain

16 A B C D E Sessions: A, B A, B, C A, B, C, D A, B, C, E A, C, E A, B, D A, B, D, E B, C B, C, D B, C, E B, D, E Transition B  C: Total occurrences of B: 14 Total occurrence of BC: 8 Pr(C|B) = 8/14 = 0.57 0.57 Web site hyperlink graph Calculating conditional probabilities for transitions Markov Chain Example

17 Sessions: A, B A, B, C A, B, C, D A, B, C, E A, C, E A, B, D A, B, D, E B, C B, C, D B, C, E B, D, E The Full Markov Chain A B C D E 0.57 Start Final 0.69 0.31 0.21 0.82 0.18 0.20 0.40 0.33 0.67 1.00 0.14 0.40 Probability that someone will visit page C? S  B  C + S  A  C + S  A  B  C (0.31 * 0.57) + (0.69 * 0.18) + (0.69 * 0.82 * 0.57) = 0.503 Prob. that someone who has visited B will visit E? B  D  E + B  C  E + B  C  D  E (0.21 * 0.33) + (0.57 * 0.40) + (0.57 * 0.20 * 0.33) = 0.335 Probability that someone visiting page C will leave the site? 0.40 = 40% Markov Chain Example (cont.)

Mining Frequent Trails Using Markov Chains  Support s in [0,1) – accept only trails whose initial probability is above s  Confidence c in [0,1) – accept only trails whose probability is above c  Recall: the probability of a trail is obtained by multiplying the transition probabilities of the links in the trail  Mining for Patterns  Find all trails whose initial probability is higher than s, and whose trail probability is above c.  Use depth-first search on the Markov chain to compute the trails  The average time needed to find the frequent trails is proportional to the number of web pages in the site 18

Markov Chains: Another Example 19 IDSession Trail 1A1 > A2 > A3 2 3A1 > A2 > A3 > A4 4A5 > A2 > A4 5A5 > A2 > A4 > A6 6A5 > A2 > A3 > A6

Frequent Trails From Example Support = 0.1 and Confidence = 0.3 TrailProbability A1 > A2 > A30.67 A5 > A2 > A30.67 A2 > A30.67 A1 > A2 > A40.33 A5 > A2 > A40.33 A2 > A40.33 A4 > A60.33 20

TrailProbability A1 > A2 > A30.67 A5 > A2 > A30.67 A2 > A30.67 21 Frequent Trails From Example Support = 0.1 and Confidence = 0.5

22 Efficient Management of Navigational Trails  Approach: Store sessions in an aggregated sequence tree  Initially introduced in Web Utilization Miner (WUM) - Spiliopoulou, 1998  for each occurrence of a sequence start a new branch or increase the frequency counts of matching nodes  in example below, note that s6 contains “b” twice, hence the sequence is

23 Mining Navigational Patterns The aggregated sequence tree can be used directly to determine support and confidence for navigational patterns Navigation pattern: a  b Support = 11/35 = 0.31 Confidence = 11/21 = 0.52 Navigation pattern: a  b Support = 11/35 = 0.31 Confidence = 11/21 = 0.52 Nav. pattern: a  b  e Support = 11/35 = 0.31 Confidence = 11/11 = 1.00 Nav. pattern: a  b  e Support = 11/35 = 0.31 Confidence = 11/11 = 1.00 Nav. patterns: a  b  e  f Support = 3/35 = 0.086 Confidence = 3/11 = 0.27 Nav. patterns: a  b  e  f Support = 3/35 = 0.086 Confidence = 3/11 = 0.27 Support = count at the node / count at root Confidence = count at the node / count at the parent Note that each node represents a navigational path ending in that node

Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Similar presentations

Presentation on theme: "Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Similar presentations

Presentation on theme: "Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University."— Presentation transcript:

Similar presentations

About project

Feedback