Comparison of Instance-Based Techniques for Learning to Predict Changes in Stock Prices iCML Conference December 10, 2003 Presented by: David LeRoux
Goals of Paper Analyze k-nearest neighbor classification methods... to predict whether the S&P 500 stock index will increase by more than the median amount in a month... using WEKA
The Data 14 indexes from the Federal Reserve interest rates business conditions employment 11.5 years of monthly data through 6/2003 Actual values and month-to-month changes 28 features, 138 observations Timing issue - when are indexes published?
Instance-Based Classifiers Lazy learners - don’t develop representation Simple rule: classify same way as similar situations in training data Problems: What is similar? How many neighbors? How to weight contributions?
Similarity Metric Curse of dimensionality Identifying most important features Normalizing data Distance measurement
Number of Observations Trade-off between noise reduction and homogeneity Formula for estimating k Estimate noise error using Central Limit Theorem Estimate heterogeneity error using bounds on derivative of function being estimated Choose k where there errors are roughly equal
Results