Nonlinear Models

Nonlinear Models 8 February 1999 Data Mining in Finance Andreas S. Weigend Leonard N. Stern School of Business, New York University

2 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 2 The seven steps of model building © 1. Task Predict distribution of portfolio returns, understand structure in yield curves, find profitable time scales, discover trade styles, … © 2. Data Which data to use, and how to code/ preprocess/ represent them © 3. Architecture © 4. Objective/ Cost function (in-sample) © 5. Search/ Optimization/ Estimation © 6. Evaluation © 7. Analysis and Interpretation

3 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 3 How to make predictions? © “Pattern” = Input + Output Pair Keep all data © Nearest neighbor lookup © Local constant model © Local linear model Throw away data, only keep model © Global linear model © Global nonlinear model Neural network with hidden units - Sigmoids or hyperbolic tangents (tanh) Radial basis functions Keep only a few representative data point Support vector machines

4 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 4 Training data: Inputs and corresponding outputs input1 output input2

5 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 5 What is the prediction for a new input? input1 output input2 new input

6 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 6 input1 output input2 new input nearest neighbor prediction Nearest neighbor © Use output value of nearest neighbor in input space as prediction

7 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 7 input1 output input2 new input Local constant model © Use average of the outputs of nearby points in input space

8 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 8 input1 output input2 new input Local linear model © Find best-fitting plane (linear model) through nearby points in input space

9 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 9 input1 output input2 Nonlinear regression surface © Minimize “energy” stored in the “springs”

10 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 10 Throw away the data… just keep the surface! input1 output input2

11 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 11 Modeling – an iterative process  Step 1: Task/ Problem definition  Step 2: Data and Representation  Step 3: Architecture  Step 4: Objective/ Cost function (in-sample)  Step 5: Search/ Optimization/ Estimation  Step 6: Evaluation (out-of-sample)  Step 7: Analysis and Interpretation

12 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 12 Modeling issues  Step 1: Task and Problem definition  Step 2: Data and Representation  Step 3: Architecture What are the “primitives” that make up the surface?  Step 4: Objective/ Cost function (in-sample) How flexible should the surface be? - Too rigid model: stiff board (global linear model) - Too flexible model: cellophane going through all points - Penalize too flexible models (regularization)  Step 5: Search/ Optimization/ Estimation How do we find the surface?  Step 6: Evaluation (out-of-sample)  Step 7: Analysis and Interpretation

13 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 13 Step 3: Architecture – Example of neural networks © Project the input vector x onto a weight vector w w * x © This projection is then be nonlinearly “squashed” to give a hidden unit activation h = tanh (w * x) © Usually, a constant c in the argument allows the shifting of the location h = tanh (w * x + c) © There are several such hidden units, responding to different projections of the input vectors © Their activations are combined with weights v to form the output (and another constant b can be added) output = v * h + b

14 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 14 Neural networks compared to standard statistics © Comparison between neural nets and standard statistics Complexity - Statistics: Fix order of interactions - Neural nets: Fix number of features Estimation - Statistics: Find exact solution - Neural nets: Focus on path © Dimensionality Number of inputs: Curse of dimensionality - Points far away in input space Number of parameters: Blessing of dimensionality - Many hidden units make it easier to find good local minimum - But need to control for model complexity

15 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 15 Step 4: Cost function © Key problem: Want to be good on new data......but we only have data from the past © Always observation y = f(input) + noise © Assume Large sudden variations in output are due to noise Small variation (systematic) are signal, expressed as f(input) © Flexible models - Good news: can fit any signal - Bad news: can also fit any noise © Requires modeling decisions: Assumptions about model complexity - Weight decay, weight elimination, smoothness Assumptions about noise: error model or noise model

16 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 16 Step 5: Determining the parameters Search with gradient descent: iterative © Vice to virtue: path important © Guide network through solution space Hints Weight pruning Early stopping Weight-elimination Pseudo-data Add noise … Alternative approaches: © Model to match the local noise level of the data Local error bars Gated experts architecture with adaptive variances

