Download presentation
Presentation is loading. Please wait.
Published byRonald Mosley Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department of Information Management National Yunlin University of Science and Technology Applying Data Mining Technique to Direct Marketing
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Introduction Background The Generalized SOM Experiments Conclusions
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation Firms with the huge amount of complex marketing data on hand, need to further analysis and expect to make more profits. Clustering, a technique of data mining, is especially suitable for segmenting data. However, firm’s database usually consist of mixed data (numeric and categorical data).
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective We utilize a new visualized clustering algorithm, the generalized self-organizing map (GSOM), to segment customer data for direct marketing. ─ Unlike conventional SOM, the GSOM can reasonably express the relatively distance of categorical values. Then, we apply GSOM to direct marketing would generate more profits.
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Introduction (1/5) Marketing practices have shifted to customer- oriented from traditional mass marketing. Firms usually perform market segmentation and devise different marketing strategies for different segments.
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Introduction (2/5) Data mining means a process of nontrivial extraction of implicit, previously unknown and potentially useful information from a huge amount of data. Cluster analysis can assist marketers in identifying clusters of customers with similar characteristics.
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Introduction (3/5) The self-organizing map (SOM) network, proposed by Kohonen, is an useful visualized tool in data mining ─ Dimensionality reduction & Information visualization ─ Preserve the original topological relationship
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Introduction (4/5) The approach of the SOM in handling categorical data ─ It uses binary encoding that transforms categorical values to a set of binary values.
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Introduction (5/5) In this paper, we propose an extended SOM, named generalized SOM (GSOM), to overcome the drawback in handling categorical data ─ We construct the concept hierarchies for each categorical attributes.
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Background (1/2) Self-organizing map, SOM ─ Find the winner (BMU) by (1) ─ Update the winner and neighborhood by (2) (1) (2) (3)
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Background (2/2) Problems of the conventional SOM D(Coke, Pepsi) = D(Coke, Mocca) = D(Pepsi, Mocca)
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 The Generalized SOM We use concept hierarchies to help calculate the distances of categorical values ─ An input pattern and the GSOM vector are mapped to their associated concept hierarchies. ─ The distance between the input pattern and the GSOM vector is calculated by measuring the aggregated distance of mapping points in the hierarchies. SOM network Input pattern Mocca3 Pepsi2 Coke1 DrinkID Any JuiceCoffeeCarbonated OrangeAppleMoccaCokePepsiLatte x mqmq
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 Concept hierarchies (1/3) General concepts Specific concepts 0 1 2 1 1 11 1 1111 D(Coke, Pepsi) < D(Coke, Mocca) = D(Pepsi, Mocca)
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 A point X=(N X, d X ) N X : an anchor (leaf node) of point X d X : a positive offset (distance) from X to root Example: x=(Coke, 2.0); m q =(Pepsi, 1.7) Concept hierarchies (2/3) IDDrink 1Coke 2Pepsi 3Mocca SOM network Input pattern m q =(Pepsi, 1.7) Any JuiceCoffeeCarbonated OrangeAppleMoccaCokePepsiLatte x mqmq
15
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 Concept hierarchies (3/3) red d x blue d m q (4) (5) 0 1 2 Any JuiceCoffeeCarbonated OrangeAppleMoccaCokePepsiLatte x mqmq duplication |x – m q | = 2 + 1.7 – 2×1 = 1.7 Example: x=(Coke, 2.0); m q =(Pepsi, 1.7)
16
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Experiments Experiment dataset ─ Synthetic dataset consists of 6 groups of two categorical attributes, Department and Drink. ─ Real dataset Adult from the UCI repository With 48,842 patterns of 15 attributes. 8 categorical attributes, 6 numerical attributes, and 1 class attribute Salary. 76% of the patterns have the value of ≤50K.
17
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 17 Experiments Parameters were set according to the suggestion in the software package SOM_PAK. ─ Categorical values are transformed to binary values when we train the SOM. ─ While mixed data are used directly when we train the GSOM. Each link weight of concept hierarchies is set to 1.
18
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 18 Synthetic dataset (1/2) Group NoDepartmentDrinkData CountCommon Pattern 1EECoke20 Engineering College & Carbonated Drinks 2CEPepsi10 3MISLatte20 Management College & Coffee Drinks 4MBAMocca10 5VCOrange20 Design College & Juice Drinks 6SDApple10 Department Drink
19
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19 Synthetic dataset (2/2) ─ An 8×8 SOM network is used for the training. After 900 training iterations, the trained maps of SOM and GSOM under the same parameters are shown in below. Binary SOMGSOM
20
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 20 Real dataset (1/3) We randomly draw 10,000 patterns which have 75.76% of ≤50K, similar to the Salary distribution of the original Adult dataset ─ Three categorical attributes, Marital-status, Relationship, and Education. ─ Four numeric attributes, Capital-gain, Capital-loss, Age, and Hours-per-week.
21
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 21 Real dataset (2/3) ─ Concept hierarchies for the categorical attributes are constructed as shown in below. Relationship Marital-status Education
22
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 22 Real dataset (3/3) ─ A 15×15 SOM network is used for the training. After 50,000 iterations, the trained maps of SOM and GSOM under the same parameters are shown in below. Binary SOMGSOM
23
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 23 Distributions of Salary attribute in each cluster Group No Data Count No. of >50K No. of ≤50K Ratio of >50K Ratio of ≤50K 72,6261,2871,33949.01%50.99% 61,9507941,15640.72%59.28% 31,7532161,53712.32%87.68% 11,045599865.65%94.35% 21,133481,0854.24%95.76% 4740127281.62%98.38% 573487261.09%98.91% All9,9812,4247,55724.29%75.71%
24
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 24 Application to Direct Marketing (1/2) After we utilize the GSOM to perform data clustering, this segmented dataset can be further applied to catalog marketing. Suppose that ─ The cost of mailing a catalog is $2. ─ The customers whose salaries are over 50K, we make an average profit of $10 per person. ─ Otherwise, we make an average profit of $1 per person.
25
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 25 Application to Direct Marketing (2/2) $14,344 $7,505
26
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 26 Conclusions In this paper, we propose a data clustering method ─ The GSOM extends the conventional SOM and overcomes its drawback in handling categorical data by utilizing concept hierarchies. ─ The experimental results confirmed that the GSOM can better reveal the cluster structure of data than the conventional SOM does. ─ We can make more profits by the marketing based on the segmentation results of the GSOM than by the marketing to the customers randomly drawn from the customer database.
27
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 27 Q & A
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.