Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Gianfranco Chicco, Roberto Napoli Federico Piglione, Petru Postolache.
Advertisements

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: Hichem.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Unsupervised pattern recognition models for mixed feature-type.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology U*F clustering : a new performant “ clustering-mining ”
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Human eye sclera detection and tracking using a modified.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On-line Learning of Sequence Data Based on Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Graph self-organizing maps for cyclic and unbounded graphs.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Exploiting Data Topology in Visualization and Clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Looking inside self-organizing map ensembles with resampling.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Visualizing Ontology Components through Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Topology Preservation in Self-Organizing Feature Maps: Exact.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A self-organizing neural network using ideas from the immune.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Virus Pattern Recognition Using Self-Organization Map.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Keng-Wei Chang Author: Yehuda.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 New Unsupervised Clustering Algorithm for Large Datasets.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Exploiting Data Topology in Visualization and Clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Development of a reading material recommendation system based on a knowledge engineering approach Presenter.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. TurSOM: A Turing Inspired Self-organizing Map Presenter: Tsai Tzung Ruei Authors: Derek Beaton, Iren.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
國立雲林科技大學 National Yunlin University of Science and Technology Self-organizing map learning nonlinearly embedded manifoldsmanifolds Author :Timo Simila.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The Evolving Tree — Analysis and Applications Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Fast accurate fuzzy clustering through data reduction Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. The application of SOM as a decision support tool to identify AACSB peer schools Presenter : Chun-Ping.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors :
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Model-based evaluation of clustering validation measures.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A text mining approach on automatic generation of web.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Rival-Model Penalized Self-Organizing Map Yiu-ming Cheung.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Loss of the Mahalanobis Distance in High Dimensions-
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology O( ㏒ 2 M) Self-Organizing Map Algorithm Without Learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Enhanced neural gas network for prototype-based clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Unsupervised Learning with Mixed Numeric and Nominal Data.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive FIR Neural Model for Centroid Learning in Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A new data clustering approach- Generalized cellular automata.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A hierarchical clustering algorithm for categorical sequence.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Mechanisms and Cluster Identification with TurSOM.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Recognizing Partially Occluded, Expression Variant Faces.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author : Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.
ViSOM - A Novel Method for Multivariate Data Projection and Structure Visualization Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Hujun Yin.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Community self-Organizing Map and its Application to Data Extraction Presenter: Chun-Ping Wu Authors:
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2005.ACM GECCO.8.Discriminating and visualizing anomalies.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge Presenter : Jiang-Shan Wang Authors.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Enhancing Text Clustering by Leveraging Wikipedia Semantics.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology IEEE EC1 Generating War Game Strategies Using A Genetic.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Michael.
Presentation transcript:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department of Information Management National Yunlin University of Science and Technology Applying Data Mining Technique to Direct Marketing

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Introduction Background The Generalized SOM Experiments Conclusions

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation  Firms with the huge amount of complex marketing data on hand, need to further analysis and expect to make more profits.  Clustering, a technique of data mining, is especially suitable for segmenting data.  However, firm’s database usually consist of mixed data (numeric and categorical data).

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective  We utilize a new visualized clustering algorithm, the generalized self-organizing map (GSOM), to segment customer data for direct marketing. ─ Unlike conventional SOM, the GSOM can reasonably express the relatively distance of categorical values.  Then, we apply GSOM to direct marketing would generate more profits.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Introduction (1/5)  Marketing practices have shifted to customer- oriented from traditional mass marketing.  Firms usually perform market segmentation and devise different marketing strategies for different segments.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Introduction (2/5)  Data mining means a process of nontrivial extraction of implicit, previously unknown and potentially useful information from a huge amount of data.  Cluster analysis can assist marketers in identifying clusters of customers with similar characteristics.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Introduction (3/5)  The self-organizing map (SOM) network, proposed by Kohonen, is an useful visualized tool in data mining ─ Dimensionality reduction & Information visualization ─ Preserve the original topological relationship

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Introduction (4/5)  The approach of the SOM in handling categorical data ─ It uses binary encoding that transforms categorical values to a set of binary values.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Introduction (5/5)  In this paper, we propose an extended SOM, named generalized SOM (GSOM), to overcome the drawback in handling categorical data ─ We construct the concept hierarchies for each categorical attributes.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Background (1/2)  Self-organizing map, SOM ─ Find the winner (BMU) by (1) ─ Update the winner and neighborhood by (2) (1) (2) (3)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Background (2/2)  Problems of the conventional SOM D(Coke, Pepsi) = D(Coke, Mocca) = D(Pepsi, Mocca)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 The Generalized SOM  We use concept hierarchies to help calculate the distances of categorical values ─ An input pattern and the GSOM vector are mapped to their associated concept hierarchies. ─ The distance between the input pattern and the GSOM vector is calculated by measuring the aggregated distance of mapping points in the hierarchies. SOM network Input pattern Mocca3 Pepsi2 Coke1 DrinkID Any JuiceCoffeeCarbonated OrangeAppleMoccaCokePepsiLatte x mqmq

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 Concept hierarchies (1/3) General concepts Specific concepts D(Coke, Pepsi) < D(Coke, Mocca) = D(Pepsi, Mocca)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 A point X=(N X, d X ) N X : an anchor (leaf node) of point X d X : a positive offset (distance) from X to root Example: x=(Coke, 2.0); m q =(Pepsi, 1.7) Concept hierarchies (2/3) IDDrink 1Coke 2Pepsi 3Mocca SOM network Input pattern m q =(Pepsi, 1.7) Any JuiceCoffeeCarbonated OrangeAppleMoccaCokePepsiLatte x mqmq

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 Concept hierarchies (3/3) red d x blue d m q (4) (5) Any JuiceCoffeeCarbonated OrangeAppleMoccaCokePepsiLatte x mqmq duplication |x – m q | = – 2×1 = 1.7 Example: x=(Coke, 2.0); m q =(Pepsi, 1.7)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Experiments  Experiment dataset ─ Synthetic dataset consists of 6 groups of two categorical attributes, Department and Drink. ─ Real dataset Adult from the UCI repository With 48,842 patterns of 15 attributes. 8 categorical attributes, 6 numerical attributes, and 1 class attribute Salary. 76% of the patterns have the value of ≤50K.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 17 Experiments  Parameters were set according to the suggestion in the software package SOM_PAK. ─ Categorical values are transformed to binary values when we train the SOM. ─ While mixed data are used directly when we train the GSOM. Each link weight of concept hierarchies is set to 1.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 18 Synthetic dataset (1/2) Group NoDepartmentDrinkData CountCommon Pattern 1EECoke20 Engineering College & Carbonated Drinks 2CEPepsi10 3MISLatte20 Management College & Coffee Drinks 4MBAMocca10 5VCOrange20 Design College & Juice Drinks 6SDApple10 Department Drink

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19 Synthetic dataset (2/2) ─ An 8×8 SOM network is used for the training. After 900 training iterations, the trained maps of SOM and GSOM under the same parameters are shown in below. Binary SOMGSOM

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 20 Real dataset (1/3)  We randomly draw 10,000 patterns which have 75.76% of ≤50K, similar to the Salary distribution of the original Adult dataset ─ Three categorical attributes, Marital-status, Relationship, and Education. ─ Four numeric attributes, Capital-gain, Capital-loss, Age, and Hours-per-week.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 21 Real dataset (2/3) ─ Concept hierarchies for the categorical attributes are constructed as shown in below. Relationship Marital-status Education

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 22 Real dataset (3/3) ─ A 15×15 SOM network is used for the training. After 50,000 iterations, the trained maps of SOM and GSOM under the same parameters are shown in below. Binary SOMGSOM

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 23 Distributions of Salary attribute in each cluster Group No Data Count No. of >50K No. of ≤50K Ratio of >50K Ratio of ≤50K 72,6261,2871, %50.99% 61, , %59.28% 31, , %87.68% 11, %94.35% 21,133481, %95.76% %98.38% %98.91% All9,9812,4247, %75.71%

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 24 Application to Direct Marketing (1/2)  After we utilize the GSOM to perform data clustering, this segmented dataset can be further applied to catalog marketing.  Suppose that ─ The cost of mailing a catalog is $2. ─ The customers whose salaries are over 50K, we make an average profit of $10 per person. ─ Otherwise, we make an average profit of $1 per person.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 25 Application to Direct Marketing (2/2) $14,344 $7,505

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 26 Conclusions  In this paper, we propose a data clustering method ─ The GSOM extends the conventional SOM and overcomes its drawback in handling categorical data by utilizing concept hierarchies. ─ The experimental results confirmed that the GSOM can better reveal the cluster structure of data than the conventional SOM does. ─ We can make more profits by the marketing based on the segmentation results of the GSOM than by the marketing to the customers randomly drawn from the customer database.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 27 Q & A