2002/4/10IDSL seminar Estimating Business Targets Advisor: Dr. Hsu Graduate: Yung-Chu Lin Data Source: Datta et al., KDD01, pp. 420-425.

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Data Mining Classification: Alternative Techniques
Ch11 Curve Fitting Dr. Deshi Ye
Target Markets: Segmentation and Evaluation
Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Chapter 7 – K-Nearest-Neighbor
Sampling Distributions
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
CES 514 – Data Mining Lecture 8 classification (contd…)
1 Economics 240A Power One. 2 Outline w Course Organization w Course Overview w Resources for Studying.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Visual Recognition Tutorial
SA basics Lack of independence for nearby obs
Data Mining Techniques
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Chapter 10 Hypothesis Testing
Operations and Supply Chain Management
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Target Markets: Segmentation and Evaluation
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
EE325 Introductory Econometrics1 Welcome to EE325 Introductory Econometrics Introduction Why study Econometrics? What is Econometrics? Methodology of Econometrics.
1 Ch 3: Forecasting: Techniques and Routes. 2 Study objectives After studying this chapter the reader should be able to: Evaluate the suitability of several.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
A two-stage approach for multi- objective decision making with applications to system reliability optimization Zhaojun Li, Haitao Liao, David W. Coit Reliability.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Spatial Modeling of IPTV Potential A Case Study: Massillon Cable TV 2006 Location Intelligence Conference Professor Paul Rappoport, Temple University Robert.
A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
A New Temporal Pattern Identification Method for Characterization and Prediction of Complex Time Series Events Advisor : Dr. Hsu Graduate : You-Cheng Chen.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Date: 2011/1/11 Advisor: Dr. Koh. Jia-Ling Speaker: Lin, Yi-Jhen Mr. KNN: Soft Relevance for Multi-label Classification (CIKM’10) 1.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
1 Outline 1. Why do we need statistics? 2. Descriptive statistics 3. Inferential statistics 4. Measurement scales 5. Frequency distributions 6. Z scores.
2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin.
ViSOM - A Novel Method for Multivariate Data Projection and Structure Visualization Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Hujun Yin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
© 2013 John Wiley & Sons, Ltd, Accounting for Managers, 1Ce, Ch 8 1.
Chapter Capital Budgeting C H A P T E R. Chapter Objectives Define capital budgeting. Distinguish between the various techniques of capital budgeting.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Ass. Prof. Dr. Özgür KÖKALAN İstanbul Sabahattin Zaim University.
CPH Dr. Charnigo Chap. 14 Notes In supervised learning, we have a vector of features X and a scalar response Y. (A vector response is also permitted.
WOW 9 Marketing Research.  The process of getting the marketing information needed to make sound business decisions.
Data Transformation: Normalization
Clustering CSC 600: Data Mining Class 21.
k-Nearest neighbors and decision tree
Data Science Algorithms: The Basic Methods
Data Mining: Concepts and Techniques
Introduction to Decision Analysis & Modeling
Chapter 7 – K-Nearest-Neighbor
Roberto Battiti, Mauro Brunato
Outlier Discovery/Anomaly Detection
Introductory Econometrics
K Nearest Neighbor Classification
Lecture 22 Clustering (3).
CSE572, CBS598: Data Mining by H. Liu
CSE572, CBS572: Data Mining by H. Liu
Generalization in deep learning
Welcome!.
Nonparametric density estimation and classification
CSE572: Data Mining by H. Liu
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

2002/4/10IDSL seminar Estimating Business Targets Advisor: Dr. Hsu Graduate: Yung-Chu Lin Data Source: Datta et al., KDD01, pp

2002/4/10IDSL seminar Abstract  Propose a new solution to the classical econometric task of frontier analysis  Combine nearest neighbor methods and classical statistical methods  Identify under marketed customers  Benchmark regional directory divisions

2002/4/10IDSL seminar Outline  Motivation  Objective  Historical approaches  Target estimation methodology  Case study  Conclusion  Personal opinion

2002/4/10IDSL seminar Motivation  Setting targets is a critical task  Setting the target of each entity to the average amongst the entities traditionally  Two challenges –The characteristics of the entities will have a heavy influence on the outcome –The inherent unsupervised nature of the problem

2002/4/10IDSL seminar Objective  Provide a methodology for estimating unsupervised maximal or minimal targets  Setting revenue target expectations for individual customers  Revenue target setting for regional yellow page directories

2002/4/10IDSL seminar Historical Approaches  Mathematical programming  Economics

2002/4/10IDSL seminar Mathematical Programming  where is the target for xi, a vector for the ith observation  Sensitivity to errors or outliers since it assumes that all observed targets define the possible space

2002/4/10IDSL seminar Economics  where is a non-negative error term  The requirement of a model for the error term and for g

2002/4/10IDSL seminar Target Estimation Methodology  Nearest neighbor vs. clustering  The neighborhoods  The distance function  Target estimation from the neighborhoods  A heuristic for comparing neighborhoods

2002/4/10IDSL seminar Nearest Neighbor vs. Clustering  Time complexity –Clustering is better than nearest neighbor  Problem of clustering –Two similar entities fall into different cluster –Dimension higher, influence more serious –But nearest neighbor is not so

2002/4/10IDSL seminar The Neighborhoods  xi: ith observation  yi: the variable containg its target value  ni: neighborhood for xi, where ni is a set of observations {xi, xj, …}

2002/4/10IDSL seminar The Distance Function Continuous  standardize e.g. Continuous- (2,1)(3,4)  Nominal- (a,b)(a,c) 

2002/4/10IDSL seminar Target Estimation From the Neighborhoods  Let yi(1), yi(2), …, yi(k) be the order statistics, so that yi(1) is the largest

2002/4/10IDSL seminar A Heuristic for Comparing Neighborhoods  Maximal frontier  E(xi) will range from 0 to 1  Minimal frontier  E(xi) >=1

2002/4/10IDSL seminar Case Study  Target revenues for directory book advertisers  Target revenue for regional directories

2002/4/10IDSL seminar (1) Target Revenues for Directory Book Advertisers  Goal –Find businesses that have low spending relative to those with otherwise similar characteristics  Three categories of data available –Advertiser: e.g. number of employees –Directory: e.g. distribution size –Market : e.g. median household income

2002/4/10IDSL seminar Calculating Nearest Neighbors  Standardize continuous data: natural log  K=4  Weight the variables equally –But decrease the weights for many of the directory and market variables

2002/4/10IDSL seminar Distribution for E(x) for Advertisers

2002/4/10IDSL seminar A Decision Tree to Predict phi - xi

2002/4/10IDSL seminar (2) Target Revenue for Regional Directories  Goal –Benchmark regional directory divisions  Separate the data into two sets –Training set: 80% –Test set: 20%  K=4

2002/4/10IDSL seminar Book Type  System book –an entire serving area  System-neighborhood book –A smaller number of geographic areas in the franchise area  Neighborhood book –Areas outside of the telephone company’s franchise area

2002/4/10IDSL seminar Four Different Distributions labeled according to the legend

2002/4/10IDSL seminar Neigborhood booksSystem booksNon-system books The x-axis shos log(distribution) and the y-axis E(x)

2002/4/10IDSL seminar Conclusion  Present a general data mining methodology for estimating business targets by frontier analysis  First case –Increase sales focus on the under-marketed customers –Increase the potential revenue by several million  Second case –Estimate optimal revenue performance targets for directory divisions –Increase for directory books is a minimum of several million dollars

2002/4/10IDSL seminar Personal opinion  Combine several existed methodologies or disciplines can make new powerful one