The GDB Cup: Applying “Real World” Financial Data Mining in an Academic Setting Gary D. Boetticher University of Houston - Clear Lake Houston, Texas, USA
What is the GDB Cup? Modeled after the KDD Cup Start with $100,000 + Financial Data + Data Mining Techniques = Make As Much Money as Possible
Motivation Availability of Data Gain Experience with DM Process Synthesize ML + Domain Knowledge Pragmatic implications
Availability of Data Different Time Series Perspectives –1 minute to monthly Different Financial Instruments –Stocks, Futures, Options, Mutual Funds Large Sample Size – Stocks (Daily, 2.5 Years) –EMini Future (5 Minute, 2 Years) Inexpensive or Free Sources – – –Screen Scraping (finance.yahoo.com)
DM Process: Data Cleansing Low = 0 Volume = 0 Missing Data (e.g. no Open) Missing Time Periods
Build Models (Synthesize ML & Domain Knowledge) Tech. Analysis Moving Averages, RSI, MACD, Stochastics, PNF, etc. Machine Learners Supervised NN, GP, SVM, Neuro Fuzzy, SOM, ILP, etc.
Validating Models Statistical Valid. Financial Valid. Ignore Market Conditions (Buy & Hold) Start Date Value End Date Value Unrealistic Conditions (e.g. Drawdown) Standardize portfolio management Validate with EXCEL models
Results - 1 Fall /31/99 - 5/31/ stocks Spring /31/99 - 5/31/ stocks Fall /14/02 - 6/12/03 S&P EMini (5 Min.) Annual ROI = 270% Annual ROI = 310% Annual ROI = 852%
Results - 2 Spring 2004 (Train) 10/12/ /26/03 S&P EMini (5 Min.) Annual ROI = 23,300% Spring 2004 (Test) 12/29/ /16/04 S&P EMini (5 Min.) Annual ROI = 2,172%
Demo
Conclusions Effective way to understand DM Process –Data Cleansing –Data Validation Very Good Results –ROI > 250% in all four cases Pragmatic implications