Eco 5385 Predictive Analytics For Economists Spring 2014 Professor Tom Fomby Director, Richard B. Johnson Center for Economic Studies Department of Economics SMU
Presentation 6 The K-Nearest-Neighbors Model For Prediction or Classification Chapter 7 in SPB
OUTLINE I. K-NN: A Nonparametric Method A. No Parameters to Estimate as in Multiple Linear Regression B. Definition of Euclidean Distance between Vectors C. Recommendation: Standardize Input Variables before Proceeding
OUTLINE II. Un-Weighted Nearest Neighbor Scores: Simple Average of Training Neighbors Output Values III. Weighted Nearest Neighbor Scores: Weighted Average of Training Neighbors Output Values IV. Therefore K-NN is a sophisticated Step- Function Predictor that relies on an Average of Neighborhood Output Values taken from the Training Data Set
OUTLINE V. In the K-NN Prediction Problem the Neighborhood Size, K, is the tuning parameter VI. In the K-NN Classification Problem there are two tuning parameters: The Neighborhood Size and the cut-off probability for choice selection VII. The K-NN Tuning Parameters are Often Chosen so as to maximize the accuracy of scoring the Validation Data Set
Now for a Discussion of the Various Parts of this Outline go to the pdf file K-NN Method.pdf
Classroom Exercise: Exercise 4