Discovering RFM Sequential Patterns From Customers’ Purchasing Data 中央大學資管系 陳彥良 教授 Date: 2015/10/14.

Slides:



Advertisements
Similar presentations
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Advertisements

A distributed method for mining association rules
Data Mining Techniques Association Rule
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Sequence Databases & Sequential Patterns
Business Systems Intelligence: 4. Mining Association Rules Dr. Brian Mac Namee (
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Mining Association Rules
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Mining Association Rules
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
實驗室研究暨成果說明會 Content and Knowledge Management Laboratory (B) Data Mining Part Director: Anthony J. T. Lee Presenter: Wan-chuen Lin.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Ch5 Mining Frequent Patterns, Associations, and Correlations
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Sequential PAttern Mining using A Bitmap Representation
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Sequential Pattern Mining COMP Seminar BCB 713 Module Spring 2011.
Lecture 11 Sequential Pattern Mining MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University.
Sequential Pattern Mining
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Association Rules: Advanced Concepts and Algorithms
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Sequential Pattern Mining
DATA MINING © Prentice Hall.
Association rule mining
Frequent Pattern Mining
Association Rules.
Mining Sequential Patterns
Association Rule Mining
Association Rule Mining
Data Warehousing Mining & BI
Mining Sequential Patterns
Market Basket Analysis and Association Rules
Presentation transcript:

Discovering RFM Sequential Patterns From Customers’ Purchasing Data 中央大學資管系 陳彥良 教授 Date: 2015/10/14

2 Agenda Introduction Related Work Problem Definition Algorithm Performance Evaluation Conclusion

Sequential Pattern Mining 1 Sequential pattern mining –To find the relationships between occurrences of sequential events –To find if there exist any specific order of the occurrences. Example –Every time Microsoft stock drops 5%, IBM stock will also drops at least 4% within three days. Introduction 1

Sequential Pattern Mining 2 Applications of sequential pattern mining –Customer shopping sequences: First buy computer, then CD-ROM, and then digital camera, within 3 months. –Medical treatments, natural disasters (e.g., earthquakes), science & eng. processes, stocks and markets, etc. –Telephone calling patterns, Weblog click streams –DNA sequences and gene structures Introduction 2

Sequential Patterns v.s. Association Rules Correlation between transactions Relationships intra transaction CIDPurchased Items Which items are bought together? (, ) Which items are bought in a certain order? Introduction 3

What Is Sequential Pattern Mining? Given a set of sequences, find the complete set of frequent subsequences A sequence database A sequence : An element may contain a set of items. Items within an element are unordered and we list them alphabetically. is a subsequence of Given support threshold min_sup =2, is a sequential pattern SIDsequence Introduction 4

7 A SPM Example and the Problems Since traditional SPM methods discover only frequencies of the maximal sequential patterns –In a real-life situation the environment may change constantly and users’ behavior may also change over time –A lot of patterns are of little value Introduction 5

8 RFM Definition in Marketing by Bult and Wansbeek R (Recency): period from the last purchase to now –R↓: higher possibility the customer makes a repeated purchase F (Frequency): number of purchases made in a certain period –F↑: the customer has higher loyalty M (Monetary): the amount of money spent during a certain period –M↑: the customer is more important Introduction 6

9 The Proposed Algorithm: RFM-SPM Frequency constraint (traditional SPM)  Frequency, Recency and Monetary constraints (RFM-SPM) Each constraint has two thresholds –Upper threshold and lower threshold –Ensure considered factor can be restricted within a specified range By setting these three factors to different intervals, we can discover those patterns which we feel interested Introduction 7

10 Recency Constraint Specified by giving a range from Rtime_min to Rtime_max, which are the number of days away from the starting date of the sequence database. Starting dateEnding date Rtime_min = 200Rtime_max = Introduction 8 Sequence DB 2002/12/ /12/ /12/ /12/ Ensuring that the last transaction of the pattern occurred in this interval

11 Monetary Constraint Given by a range from M_min to M_max. It ensures that the value of the discovered pattern must be between the M_min and M_max. Suppose the pattern is. Then we say a sequence satisfy this pattern with respect to the monetary constraint, if we can find an occurrence of pattern in this data sequence whose value is within this range. Introduction 9

12 Frequency Constraint The frequency of a pattern is the percentage of sequences in database that satisfy the recency constraint and monetary constraint. A pattern could be output as an RFM-pattern if its frequency falls within the interval of minsup_min and minsup_max. Introduction 10

13 A Example of RFM-Pattern 30% of customers who bought a computer would recently come back buying a scanner and a microphone and the total amount of these products is greater than NT 55,000 dollars. Introduction 11 30% of customers who bought a computer would recently come back buying a scanner and a microphone and the total amount of these products is greater than NT 55,000 dollars.

14 Related Work Cluster –Similar needs and/or characteristics that are likely to exhibit similar purchasing behaviors Classification –Classifying customers to different categories of customer value and they are also used to classify unseen cases Association rule –Extracting Share Frequent Itemsets with Infrequent Subsets SPM –Constraint-Based Sequential Pattern Mining: the Consideration of Recency and Compactness –Discovering RFM sequential patterns from customers’ purchasing data IntroductionRelated work 1 RFM RFM M RF

15 Data-Sequence in RFM-SPM IntroductionRelated workProblem def 1 SidSequence Traditional sequence DB SidSequence Transferred sequence DB

An Overview of Program Definition 16 Containment of itemset Subsequence Recent Subsequence Recent Monetary Subsequence IntroductionRelated workProblem def 2

17 Example 3.1. (subsequence) Data-sequence A = –  Itemset (ab) - be contained in A [ ]  Sequence B - a subsequence of A [ ] IntroductionRelated workProblem def 3 Yes

An Overview of Program Definition 18 Containment of itemset Subsequence Recent Subsequence Recent Monetary Subsequence IntroductionRelated workProblem def 4

19 Example 3.2. (recent subsequence) Data-sequence A = <(a, 1, 10), (c, 3, 40), (a, 4, 30), (b, 4, 70), (a, 6, 50), (e, 6, 90), (c, 10, 70)> Rtime_min = 5 and Rtime_max = 8. SSequence B <(ab)(ae)> - is a recent subsequence of A [ ] Sequence B <(ab)(ae)> is a subsequence of A The occurring time of itemset (ae)= 6 ≥ Rtime_min and 6 < Rtime_max IntroductionRelated workProblem def 5 Yes

An Overview of Program Definition 20 Containment of itemset Subsequence Recent Subsequence Recent Monetary Subsequence IntroductionRelated workProblem def 6

21 Example 3.3. (recent monetary subsequence ) Data-sequence A = – Rtime_min = 5, Rtime_max = 8, M_min = 200, M_max = 250. Sequence B - is a recent monetary subsequence of A [ ]  Sequence B is a recent subsequence of A  The total money of this subsequence = 240 ≥ M_min and 240 < M_max. IntroductionRelated workProblem def 7 Yes

Definition 3.1. (f-pattern, rf-pattern, rfm-pattern) Let B = be a sequence of itemsets. Call B anContain B as aDenoteThreshold f-patternSubsequencef-support or B.sup f no less than minsup_min rf-patternrecent subsequence rf-support or B.sup rf no less than minsup_min rfm-patternrecent monetary subsequence rfm-support or B.sup rfm between minsup_min and minsup_max IntroductionRelated workProblem def 8 22

Example 3.4. (RFM pattern) Given a data-sequence DB and six thresholds R: Rtime_min=10 ≤ < Rtime_max = 21 M: M_min = 150 ≤ < M_max = 250 F: Minsup_min = 2 ≤ < Minsup_max = 4 The RFM-patterns are listed as follows: –Containing 1 itemset = { } –Containing 2 itemsets ={ } –Containing 3 itemsets ={, } –Containing 4 itemsets ={ } SidSequence IntroductionRelated workProblem def 9

24 RFM-Apriori Algorithm The RFM-Apriori algorithm is developed by modifying the well-know Apriori (GSP) algorithm GSP –Put all items into C 1, the set of candidate f-patterns with length 1, and then scans the database to find the frequent 1-patterns (L 1 ) –Assume we already have the set of frequent (k-1)-patterns L k-1. Then it generates the set of candidate f-patterns C k by joining L k-1 with L k-1 –Afterwards, it scan the database to determine the supports of the patterns in C k, and then find out L k IntroductionRelated workProblem defRFM-Apriori Algorithm 1

25 RFM-Apriori Algorithm IntroductionRelated workProblem defRFM-Apriori Algorithm 2 25 C1C1 L1L1 C2C2 L2L2 L k-1 … CkCk CI 1 (LI 1 f ) LI 1 (LI 1 f, LI 1 rf, LI 1 rfm ) CI2CI2 LI 2 (LI 2 rf, LI 2 rfm ) LI k-1 (LI k-1 rf, LI k-1 rfm ) … CIkCIk LI 1 f x LI 1 rf L 1 x L 1 LI k-1 rf x LI k-1 rf Apriori All items L k-1 x L k Candidate Generation Support Counting Let CI k denote the set of candidate rf-patterns with length k in RFM-Apriori Count B.sup f Count B.Sup rf B.sup rfm 1 Inverse Candidate Tree LkLk LI k (LI k rf, LI k rfm ) 2

26 Example 4.1. (Candidate generation- CI 2 ) Suppose LI 1 f = {,,,, } and LI 1 rf = {, }, the CI 2 is as follows: –CI 2 ={,,,,,, (ab)(b)>,, (bc)(b)>, } illustration LI 1 f LI 1 rf b cb c ……. a b c ab bc IntroductionRelated workProblem defRFM-Apriori Algorithm 3

27 Example 4.2. (Candidate generation- CI k, k>2) Suppose LI 3 rf ={,,,,, }, the CI 4 is as follows: –CI 4 ={,,,,,,,,,,, } LI 3 rf : {, } CI 4 : illustration IntroductionRelated workProblem defRFM-Apriori Algorithm 4

28 RFM-Apriori Algorithm – Example Given a data-sequence DB and six thresholds Rtime_min=10, Rtime_max=21, M_min=150, M_max=250, Minsup_min=2 and Minsup_max=4, try to find the patterns that satisfy RFM constrains IntroductionRelated workProblem defRFM-Apriori Algorithm 5

29 CI 1 LI 1

30 Synthetic data parameters IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 1

31 Synthetic data parameters settings |S| = 4, |I| = 1.25, N S = 5000, N I = 25,000, N = 10000, T I = 10, H_price = 1000, M_price = 500, L_price = 100, H_quantity = 1, M_quantity = 3 and L_quantity = 1. IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 2

32 Real-life dataset – SC-POS The sales data of a chain supermarket in Taiwan. The SC-POS dataset recorded all transactions from twenty branches between 2001/12/27 and 2002/12/31. Each transaction in SC-POS dataset is the shopping list of a customer’s transactions, each transaction of which recorded the purchased date and time and the purchased items. A series of data preprocessing and cleaning tasks were performed, the final dataset contained items and customers’ data-sequences. IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 3

33 Test 4.1. Comparing the runtimes and number of patterns of the two algorithms Varying minsup_min from 1.25% to 0.5% in synthetic datasets Varying minsup_min from 3.5% to 2.5% in real-life dataset. IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 4

34 SYN-DS1 SC-POS IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 5 More complicated procedure to generate candidate pattern and compute supports Generates fewer candidate and frequent patterns > <

35 Test 4.2. Scalability test During this test, we vary the value of a selected parameter and keep all the other parameters constant. In each test, a parameter is increased to determine how the algorithms scale-up as the parameter increases. –The first test varies the number of customers, lDl; from 250,000 to 750,000; –The second varies the average number of transactions per customer, lCl; from 10 to 20 –The final one varies the average number of items bought per transaction, lTl; from2.5 to 4.5 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 6

36 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 7

37 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 8 Longer sequences would result in more patterns

38 Test 4.3. Testing the reaction of runtime and number of patterns by varying following parameters Varying the Rtime_min from 75 to 115 Varying the M_min from 1000 to 5000 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 9

39 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 10 CI K =LI K-1 rf x LI K-1 rf

40 Test 4.4. Comparing the number of three kinds of interesting patterns (*F*) (RF*) (RFM) IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 11

41 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 12 C10-T2.5-S4-I1.25RF*RFM*F* Name # of patterns % % % D=25, minsup_min= % D=50, minsup_min= % D=75, minsup_min = % C=10, minsup_min = % C=15, minsup_min = % C=20, minsup_min = % T=2.5, minsup_min = % T=3.5, minsup_min = % T=4.5, minsup_min = % SC-POS, minsup_min = %

42 Test 4.5. Segment the discovered patterns by RFM constraints as following DivisionsRFM RFM-segmentation# of patterns 1-1-1(R-F-M) IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 13

Managerial Applications Growing patterns: (RFM) –A(BC) in segments 122, 233, 334, 445, 555 Weakening patterns – A(BC) in segments 134, 233, 322, 421, 511 Dead patterns: –A(BC) in segments 123, 211 Emerging patterns –A(BC) in segments 412, IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 14

Managerial Applications Stable patterns –A(BC) in segments 132, 232, 332, 432, 532 Sort all patterns with R=3 according to M Sort all patterns with R=3 according to F 44 IntroductionRelated workProblem defRFM-Apriori AlgorithmExperiment 14

45 Conclusion We have developed an efficient algorithm for mining frequent patterns with consideration of Recency and Monetary. These two factors can help users identify those patterns which are active recently and have high monetary value Besides, the experiments showed our approach is more efficient than the traditional GSP algorithm.

46 Thanks for your attention!!!!!