RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Neural Networks and Kernel Methods

EE-M /7: IS L7&8 1/24, v3.0 Lectures 7&8: Non-linear Classification and Regression using Layered Perceptrons Dr Martin Brown Room: E1k

Slides from: Doug Gray, David Poole

NEURAL NETWORKS Backpropagation Algorithm

Neural networks Introduction Fitting neural networks

Polynomial Curve Fitting BITS C464/BITS F464 Navneet Goyal Department of Computer Science, BITS-Pilani, Pilani Campus, India.

Pattern Recognition and Machine Learning

Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.

1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)

Lecture 13 – Perceptrons Machine Learning March 16, 2010.

Instance Based Learning

Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation.

Pattern Recognition and Machine Learning

Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.

Chapter 5 NEURAL NETWORKS

Instance Based Learning

MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Saturation, Flat-spotting Shift up Derivative Weight Decay No derivative on output nodes.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.

Radial Basis Function (RBF) Networks

Radial Basis Function Networks

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Radial Basis Function Networks

More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.

PATTERN RECOGNITION AND MACHINE LEARNING

Radial Basis Function Networks

Biointelligence Laboratory, Seoul National University

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Classification Part 3: Artificial Neural Networks

Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.

Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.

11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering

1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.

CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.

Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.

Robert J. Marks II CIA Lab Baylor University School of Engineering CiaLab.org Artificial Neural Networks: Supervised Models.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

Data Mining and Decision Support

CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.

Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.

Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.

Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.

CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning Supervised Learning Classification and Regression

Deep Feedforward Networks

The Gradient Descent Algorithm

Neural Networks Winter-Spring 2014

Boosting and Additive Trees (2)

CSC 578 Neural Networks and Deep Learning

Instance Based Learning

Artificial Neural Networks

Biointelligence Laboratory, Seoul National University

Chapter 8: Generalization and Function Approximation

Overfitting and Underfitting

Machine Learning: UNIT-4 CHAPTER-1

Model generalization Brief summary of methods

Neural networks (1) Traditional multi-layer perceptrons

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

Support Vector Machines 2

Goodfellow: Chapter 14 Autoencoders

Presentation transcript:

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models 8 February 1999 Data Mining in Finance Andreas S. Weigend Leonard N. Stern School of Business, New York University

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 2 The seven steps of model building © 1. Task Predict distribution of portfolio returns, understand structure in yield curves, find profitable time scales, discover trade styles, … © 2. Data Which data to use, and how to code/ preprocess/ represent them © 3. Architecture © 4. Objective/ Cost function (in-sample) © 5. Search/ Optimization/ Estimation © 6. Evaluation © 7. Analysis and Interpretation

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 3 How to make predictions? © “Pattern” = Input + Output Pair Keep all data © Nearest neighbor lookup © Local constant model © Local linear model Throw away data, only keep model © Global linear model © Global nonlinear model Neural network with hidden units - Sigmoids or hyperbolic tangents (tanh) Radial basis functions Keep only a few representative data point Support vector machines

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 4 Training data: Inputs and corresponding outputs input1 output input2

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 5 What is the prediction for a new input? input1 output input2 new input

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 6 input1 output input2 new input nearest neighbor prediction Nearest neighbor © Use output value of nearest neighbor in input space as prediction

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 7 input1 output input2 new input Local constant model © Use average of the outputs of nearby points in input space

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 8 input1 output input2 new input Local linear model © Find best-fitting plane (linear model) through nearby points in input space

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 9 input1 output input2 Nonlinear regression surface © Minimize “energy” stored in the “springs”

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 10 Throw away the data… just keep the surface! input1 output input2

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 11 Modeling – an iterative process  Step 1: Task/ Problem definition  Step 2: Data and Representation  Step 3: Architecture  Step 4: Objective/ Cost function (in-sample)  Step 5: Search/ Optimization/ Estimation  Step 6: Evaluation (out-of-sample)  Step 7: Analysis and Interpretation

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 12 Modeling issues  Step 1: Task and Problem definition  Step 2: Data and Representation  Step 3: Architecture What are the “primitives” that make up the surface?  Step 4: Objective/ Cost function (in-sample) How flexible should the surface be? - Too rigid model: stiff board (global linear model) - Too flexible model: cellophane going through all points - Penalize too flexible models (regularization)  Step 5: Search/ Optimization/ Estimation How do we find the surface?  Step 6: Evaluation (out-of-sample)  Step 7: Analysis and Interpretation

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 13 Step 3: Architecture – Example of neural networks © Project the input vector x onto a weight vector w w * x © This projection is then be nonlinearly “squashed” to give a hidden unit activation h = tanh (w * x) © Usually, a constant c in the argument allows the shifting of the location h = tanh (w * x + c) © There are several such hidden units, responding to different projections of the input vectors © Their activations are combined with weights v to form the output (and another constant b can be added) output = v * h + b

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 14 Neural networks compared to standard statistics © Comparison between neural nets and standard statistics Complexity - Statistics: Fix order of interactions - Neural nets: Fix number of features Estimation - Statistics: Find exact solution - Neural nets: Focus on path © Dimensionality Number of inputs: Curse of dimensionality - Points far away in input space Number of parameters: Blessing of dimensionality - Many hidden units make it easier to find good local minimum - But need to control for model complexity

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 15 Step 4: Cost function © Key problem: Want to be good on new data......but we only have data from the past © Always observation y = f(input) + noise © Assume Large sudden variations in output are due to noise Small variation (systematic) are signal, expressed as f(input) © Flexible models - Good news: can fit any signal - Bad news: can also fit any noise © Requires modeling decisions: Assumptions about model complexity - Weight decay, weight elimination, smoothness Assumptions about noise: error model or noise model

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 16 Step 5: Determining the parameters Search with gradient descent: iterative © Vice to virtue: path important © Guide network through solution space Hints Weight pruning Early stopping Weight-elimination Pseudo-data Add noise … Alternative approaches: © Model to match the local noise level of the data Local error bars Gated experts architecture with adaptive variances