Discovering RFM Sequential Patterns From Customers’ Purchasing Data 中央大學資管系 陳彥良 教授 Date: 2015/10/14
2 Agenda Introduction Related Work Problem Definition Algorithm Performance Evaluation Conclusion
Sequential Pattern Mining 1 Sequential pattern mining –To find the relationships between occurrences of sequential events –To find if there exist any specific order of the occurrences. Example –Every time Microsoft stock drops 5%, IBM stock will also drops at least 4% within three days. Introduction 1
Sequential Pattern Mining 2 Applications of sequential pattern mining –Customer shopping sequences: First buy computer, then CD-ROM, and then digital camera, within 3 months. –Medical treatments, natural disasters (e.g., earthquakes), science & eng. processes, stocks and markets, etc. –Telephone calling patterns, Weblog click streams –DNA sequences and gene structures Introduction 2
Sequential Patterns v.s. Association Rules Correlation between transactions Relationships intra transaction CIDPurchased Items Which items are bought together? (, ) Which items are bought in a certain order? Introduction 3
What Is Sequential Pattern Mining? Given a set of sequences, find the complete set of frequent subsequences A sequence database A sequence : An element may contain a set of items. Items within an element are unordered and we list them alphabetically. is a subsequence of Given support threshold min_sup =2, is a sequential pattern SIDsequence Introduction 4
7 A SPM Example and the Problems Since traditional SPM methods discover only frequencies of the maximal sequential patterns –In a real-life situation the environment may change constantly and users’ behavior may also change over time –A lot of patterns are of little value Introduction 5
8 RFM Definition in Marketing by Bult and Wansbeek R (Recency): period from the last purchase to now –R↓: higher possibility the customer makes a repeated purchase F (Frequency): number of purchases made in a certain period –F↑: the customer has higher loyalty M (Monetary): the amount of money spent during a certain period –M↑: the customer is more important Introduction 6
9 The Proposed Algorithm: RFM-SPM Frequency constraint (traditional SPM) Frequency, Recency and Monetary constraints (RFM-SPM) Each constraint has two thresholds –Upper threshold and lower threshold –Ensure considered factor can be restricted within a specified range By setting these three factors to different intervals, we can discover those patterns which we feel interested Introduction 7
10 Recency Constraint Specified by giving a range from Rtime_min to Rtime_max, which are the number of days away from the starting date of the sequence database. Starting dateEnding date Rtime_min = 200Rtime_max = Introduction 8 Sequence DB 2002/12/ /12/ /12/ /12/ Ensuring that the last transaction of the pattern occurred in this interval
11 Monetary Constraint Given by a range from M_min to M_max. It ensures that the value of the discovered pattern must be between the M_min and M_max. Suppose the pattern is. Then we say a sequence satisfy this pattern with respect to the monetary constraint, if we can find an occurrence of pattern in this data sequence whose value is within this range. Introduction 9
12 Frequency Constraint The frequency of a pattern is the percentage of sequences in database that satisfy the recency constraint and monetary constraint. A pattern could be output as an RFM-pattern if its frequency falls within the interval of minsup_min and minsup_max. Introduction 10
13 A Example of RFM-Pattern 30% of customers who bought a computer would recently come back buying a scanner and a microphone and the total amount of these products is greater than NT 55,000 dollars. Introduction 11 30% of customers who bought a computer would recently come back buying a scanner and a microphone and the total amount of these products is greater than NT 55,000 dollars.
14 Related Work Cluster –Similar needs and/or characteristics that are likely to exhibit similar purchasing behaviors Classification –Classifying customers to different categories of customer value and they are also used to classify unseen cases Association rule –Extracting Share Frequent Itemsets with Infrequent Subsets SPM –Constraint-Based Sequential Pattern Mining: the Consideration of Recency and Compactness –Discovering RFM sequential patterns from customers’ purchasing data IntroductionRelated work 1 RFM RFM M RF
15 Data-Sequence in RFM-SPM IntroductionRelated workProblem def 1 SidSequence Traditional sequence DB SidSequence Transferred sequence DB
An Overview of Program Definition 16 Containment of itemset Subsequence Recent Subsequence Recent Monetary Subsequence IntroductionRelated workProblem def 2
17 Example 3.1. (subsequence) Data-sequence A = – Itemset (ab) - be contained in A [ ] Sequence B - a subsequence of A [ ] IntroductionRelated workProblem def 3 Yes
An Overview of Program Definition 18 Containment of itemset Subsequence Recent Subsequence Recent Monetary Subsequence IntroductionRelated workProblem def 4
19 Example 3.2. (recent subsequence) Data-sequence A = <(a, 1, 10), (c, 3, 40), (a, 4, 30), (b, 4, 70), (a, 6, 50), (e, 6, 90), (c, 10, 70)> Rtime_min = 5 and Rtime_max = 8. SSequence B <(ab)(ae)> - is a recent subsequence of A [ ] Sequence B <(ab)(ae)> is a subsequence of A The occurring time of itemset (ae)= 6 ≥ Rtime_min and 6 < Rtime_max IntroductionRelated workProblem def 5 Yes
An Overview of Program Definition 20 Containment of itemset Subsequence Recent Subsequence Recent Monetary Subsequence IntroductionRelated workProblem def 6
21 Example 3.3. (recent monetary subsequence ) Data-sequence A = – Rtime_min = 5, Rtime_max = 8, M_min = 200, M_max = 250. Sequence B - is a recent monetary subsequence of A [ ] Sequence B is a recent subsequence of A The total money of this subsequence = 240 ≥ M_min and 240 < M_max. IntroductionRelated workProblem def 7 Yes
Definition 3.1. (f-pattern, rf-pattern, rfm-pattern) Let B = be a sequence of itemsets. Call B anContain B as aDenoteThreshold f-patternSubsequencef-support or B.sup f no less than minsup_min rf-patternrecent subsequence rf-support or B.sup rf no less than minsup_min rfm-patternrecent monetary subsequence rfm-support or B.sup rfm between minsup_min and minsup_max IntroductionRelated workProblem def 8 22
Example 3.4. (RFM pattern) Given a data-sequence DB and six thresholds R: Rtime_min=10 ≤ < Rtime_max = 21 M: M_min = 150 ≤ < M_max = 250 F: Minsup_min = 2 ≤ < Minsup_max = 4 The RFM-patterns are listed as follows: –Containing 1 itemset = { } –Containing 2 itemsets ={ } –Containing 3 itemsets ={, } –Containing 4 itemsets ={ } SidSequence IntroductionRelated workProblem def 9
24 RFM-Apriori Algorithm The RFM-Apriori algorithm is developed by modifying the well-know Apriori (GSP) algorithm GSP –Put all items into C 1, the set of candidate f-patterns with length 1, and then scans the database to find the frequent 1-patterns (L 1 ) –Assume we already have the set of frequent (k-1)-patterns L k-1. Then it generates the set of candidate f-patterns C k by joining L k-1 with L k-1 –Afterwards, it scan the database to determine the supports of the patterns in C k, and then find out L k IntroductionRelated workProblem defRFM-Apriori Algorithm 1
25 RFM-Apriori Algorithm IntroductionRelated workProblem defRFM-Apriori Algorithm 2 25 C1C1 L1L1 C2C2 L2L2 L k-1 … CkCk CI 1 (LI 1 f ) LI 1 (LI 1 f, LI 1 rf, LI 1 rfm ) CI2CI2 LI 2 (LI 2 rf, LI 2 rfm ) LI k-1 (LI k-1 rf, LI k-1 rfm ) … CIkCIk LI 1 f x LI 1 rf L 1 x L 1 LI k-1 rf x LI k-1 rf Apriori All items L k-1 x L k Candidate Generation Support Counting Let CI k denote the set of candidate rf-patterns with length k in RFM-Apriori Count B.sup f Count B.Sup rf B.sup rfm 1 Inverse Candidate Tree LkLk LI k (LI k rf, LI k rfm ) 2
26 Example 4.1. (Candidate generation- CI 2 ) Suppose LI 1 f = {,,,, } and LI 1 rf = {, }, the CI 2 is as follows: –CI 2 ={,,,,,, (ab)(b)>,, (bc)(b)>, } illustration LI 1 f LI 1 rf b cb c ……. a b c ab bc IntroductionRelated workProblem defRFM-Apriori Algorithm 3
27 Example 4.2. (Candidate generation- CI k, k>2) Suppose LI 3 rf ={,,,,, }, the CI 4 is as follows: –CI 4 ={,,,,,,,,,,, } LI 3 rf : {, } CI 4 : illustration IntroductionRelated workProblem defRFM-Apriori Algorithm 4
28 RFM-Apriori Algorithm – Example Given a data-sequence DB and six thresholds Rtime_min=10, Rtime_max=21, M_min=150, M_max=250, Minsup_min=2 and Minsup_max=4, try to find the patterns that satisfy RFM constrains IntroductionRelated workProblem defRFM-Apriori Algorithm 5
29 CI 1 LI 1
30 Synthetic data parameters IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 1
31 Synthetic data parameters settings |S| = 4, |I| = 1.25, N S = 5000, N I = 25,000, N = 10000, T I = 10, H_price = 1000, M_price = 500, L_price = 100, H_quantity = 1, M_quantity = 3 and L_quantity = 1. IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 2
32 Real-life dataset – SC-POS The sales data of a chain supermarket in Taiwan. The SC-POS dataset recorded all transactions from twenty branches between 2001/12/27 and 2002/12/31. Each transaction in SC-POS dataset is the shopping list of a customer’s transactions, each transaction of which recorded the purchased date and time and the purchased items. A series of data preprocessing and cleaning tasks were performed, the final dataset contained items and customers’ data-sequences. IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 3
33 Test 4.1. Comparing the runtimes and number of patterns of the two algorithms Varying minsup_min from 1.25% to 0.5% in synthetic datasets Varying minsup_min from 3.5% to 2.5% in real-life dataset. IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 4
34 SYN-DS1 SC-POS IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 5 More complicated procedure to generate candidate pattern and compute supports Generates fewer candidate and frequent patterns > <
35 Test 4.2. Scalability test During this test, we vary the value of a selected parameter and keep all the other parameters constant. In each test, a parameter is increased to determine how the algorithms scale-up as the parameter increases. –The first test varies the number of customers, lDl; from 250,000 to 750,000; –The second varies the average number of transactions per customer, lCl; from 10 to 20 –The final one varies the average number of items bought per transaction, lTl; from2.5 to 4.5 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 6
36 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 7
37 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 8 Longer sequences would result in more patterns
38 Test 4.3. Testing the reaction of runtime and number of patterns by varying following parameters Varying the Rtime_min from 75 to 115 Varying the M_min from 1000 to 5000 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 9
39 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 10 CI K =LI K-1 rf x LI K-1 rf
40 Test 4.4. Comparing the number of three kinds of interesting patterns (*F*) (RF*) (RFM) IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 11
41 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 12 C10-T2.5-S4-I1.25RF*RFM*F* Name # of patterns % % % D=25, minsup_min= % D=50, minsup_min= % D=75, minsup_min = % C=10, minsup_min = % C=15, minsup_min = % C=20, minsup_min = % T=2.5, minsup_min = % T=3.5, minsup_min = % T=4.5, minsup_min = % SC-POS, minsup_min = %
42 Test 4.5. Segment the discovered patterns by RFM constraints as following DivisionsRFM RFM-segmentation# of patterns 1-1-1(R-F-M) IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 13
Managerial Applications Growing patterns: (RFM) –A(BC) in segments 122, 233, 334, 445, 555 Weakening patterns – A(BC) in segments 134, 233, 322, 421, 511 Dead patterns: –A(BC) in segments 123, 211 Emerging patterns –A(BC) in segments 412, IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 14
Managerial Applications Stable patterns –A(BC) in segments 132, 232, 332, 432, 532 Sort all patterns with R=3 according to M Sort all patterns with R=3 according to F 44 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 14
45 Conclusion We have developed an efficient algorithm for mining frequent patterns with consideration of Recency and Monetary. These two factors can help users identify those patterns which are active recently and have high monetary value Besides, the experiments showed our approach is more efficient than the traditional GSP algorithm.
46 Thanks for your attention!!!!!