Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform.

Slides:



Advertisements
Similar presentations
Beyond Linear Separability
Advertisements

Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Mean, Proportion, CLT Bootstrap
1 Functions and Applications
Experimental Design, Response Surface Analysis, and Optimization
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
The Game of Algebra or The Other Side of Arithmetic The Game of Algebra or The Other Side of Arithmetic © 2007 Herbert I. Gross By Herbert I. Gross & Richard.
Artificial Neural Networks
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 12 - Forecasting Forecasting is important in the business decision-making process in which a current choice or decision has future implications:
Simple Linear Regression
16 MULTIPLE INTEGRALS.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
October 14, 2010Neural Networks Lecture 12: Backpropagation Examples 1 Example I: Predicting the Weather We decide (or experimentally determine) to use.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
October 28, 2010Neural Networks Lecture 13: Adaptive Networks 1 Adaptive Networks As you know, there is no equation that would tell you the ideal number.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Decision Tree Models in Data Mining
Relationships Among Variables
Algebra Problems… Solutions
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Correlation and Linear Regression
Example of Simple and Multiple Regression
Introduction Data surrounds us in the real world. Every day, people are presented with numbers and are expected to make predictions about future events.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Slope Is a Rate of Change
Chapter 13: Inference in Regression
Linear Regression and Correlation
Hypothesis Testing in Linear Regression Analysis
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Artificial Neural Networks
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
NPV and the Time Value of Money
INTEGRALS Areas and Distances INTEGRALS In this section, we will learn that: We get the same special type of limit in trying to find the area under.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
Chapter 9 – Classification and Regression Trees
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
Copyright © 2010 Pearson Education, Inc. All rights reserved Sec
Integrals  In Chapter 2, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.  In much the.
Examining Relationships in Quantitative Research
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
In section 11.9, we were able to find power series representations for a certain restricted class of functions. Here, we investigate more general problems.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Over-Trained Network Node Removal and Neurotransmitter-Inspired Artificial Neural Networks By: Kyle Wray.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
1 Cost Drivers and Cost Behavior CHAPTER 5 © 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
In Chapters 6 and 8, we will see how to use the integral to solve problems concerning:  Volumes  Lengths of curves  Population predictions  Cardiac.
MBF1413 | Quantitative Methods Prepared by Dr Khairul Anuar 8: Time Series Analysis & Forecasting – Part 1
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
DRAWING INFERENCES FROM DATA THE CHI SQUARE TEST.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Chapter 7. Classification and Prediction
Deep Feedforward Networks
Data Mining Lecture 11.
Advanced Analytics Using Enterprise Miner
CHAPTER 29: Multiple Regression*
MBF1413 | Quantitative Methods Prepared by Dr Khairul Anuar
Zip Codes and Neural Networks: Machine Learning for
DSS-ESTIMATING COSTS Cost estimation is the process of estimating the relationship between costs and cost driver activities. We estimate costs for three.
Presentation transcript:

Chapter 7 (part 3) Neural Networks

What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform inputs into outputs to the best of its ability inputsoutputs NN

What are Neural Networks?

For the most part, neural networks are useful in situations where multiple regression is used. In a neural network the independent variables are called input cells the dependent variable is called an output cell (depending on the network model there might be more than one output). As in regression there is a certain number of observations (say N). Each observation contains a value for each input cell (or independent variable) and the output cell (or dependent variable).

In an illustration of neural network, each circle is a cell of the network. There are three different types of cells: Input cells are the cells in the first column of each network. The first column of cells in the network is called the Input Layer. Output cell is the cell in the last column of the network and the last layer is the Output Layer of the network. All other layers of the network are called Hidden Layers.

Cell 0 may be viewed as a special input cell, analogous to the constant term in multiple regression. Each other Input cell in the input layer corresponds to an independent variable. Each cell in the Output layer represents a dependent variable we want to predict. The arc in the network joining cells i and j has an associated weight w ij.

The figure above has 4 input variables and one hidden layer. With the exception of cell 0, each cell in the network is connected by an arc to each cell in the next network layer. There is an arc connecting cell 0 with each cell in a Hidden or Output Layer. Input Layer Hidden Layer Output Layer

INPUT and OUTPUT VALUES For any observation, each cell in the network has an associated input value and output value. For any observation, each input cell (other than cell 0): has an input value equal to the value of input i for that observation; has an output value equal to the value of input i for that observation. For example, if the first input equals 3, then the input and output from node 1 are both 3.

For cell 0, the input and output values always equal 1. Let INP(j) = input to cell j and OUT(j) = output from cell j. For now we suppress the dependence of INP(j) and OUT(j) on the particular observation. For any cell not in the Input Layer

The following equations illustrate the use of the previous formula by computing the inputs to cells 5-7 of the previous neural network figure. Suppose that for a given observation input cell j takes the value I j. Then: INP(5) = w 05 (1) + w 15 I 1 + w 25 I 2 + w 35 I 3 + w 45 I 4 INP(6) = w 06 (1) + w 16 I 1 + w 26 I 2 + w 36 I 3 + w 46 I 4 INP(7) = w 07 (1) + w 17 I 1 + w 27 I 2 + w 37 I 3 + w 47 I Input Layer Output Layer Hidden Layer

To determine the output from any cell not in the input layer we need to use a transfer function. We will later discuss the most commonly used transfer functions, but for now the transfer function f may stand for any function. Then for any cell j not in the Input Layer OUT(j) = f(INP(j)) For our figure, this formula yields OUT(5) = f(INP(5)) OUT(6) = f(INP(6)) OUT(7) = f(INP(7))

By using the formulas and the previous results, we can now determine the input and output for our output cell, cell 8. INP(8)=w 08 (1)+w 58 OUT(5)+w 68 OUT(6)+ 78 OUT(7) OUT(8) = f(INP(8)) For any observation, OUT(8) is our prediction for the output cell.

The complex math involved in neural networks is used to compute weights that produce "good" predictions. More formally, let OUT j (8) = output of cell 8 for observation j O j = actual value of cell 8 for observation j. The real work in the neural network analysis is to determine network weights w ij that minimize

SCALING DATA For the computations in a neural net to be tractable it is desirable to scale all data (inputs and outputs) so that all inputs lie in either of the following two intervals: Interval 1: [0, 1] Interval 2: [-1, +1]

If you want to scale your data to lie on [0, 1] then any value x for Input i should be transformed into Note: Range = Largest value for Input i – Smallest value for input i and any value x for the output should be transformed into

If you want to scale your data to lie on [-1 +1] then any value x for input i should be transformed into If you want to scale your data to lie on [-1 +1] then any value x for the output should be transformed into

THE SIGMOID TRANSFER FUNCTION Vast experience with neural networks indicates that the sigmoid transfer function usually yields the best predictions. If your data has been scaled to lie on [-1, +1] the relevant sigmoid function is given by Note that for this function an input near -  yields an output near -1 and an input near +  yields an output near +1.

If your data has been scaled to lie on [0 1] the relevant sigmoid function is given by Note that for this function an input near -  yields an output near 0 and an input near +  yields an output near 1.

REMARKS 1. The sigmoid function is often called the squashing function because it "squashes" values on the interval [- , +  ] to the unit interval [0 1]. 2. The slope of the sigmoid function for the [0 1] interval is given by f'(Input) = f(Input)(1-f(Input)) This implies that the sigmoid function is very steep for intermediate values of the input and very flat for extreme input values.

TESTING AND VALIDATION When we fit a regression to data, we often use 80%-90% of the data to fit a regression equation, and use the remaining data to "validate" the equation. The same technique is used in neural networks. We begin by designating 80%-90% of our data as the training or learning data set. Then we "fit" a neural network to this data.

Suppose cell 8 is the output cell. Let O j (8) = the actual output for observation j And OUT j (8) be the value of the output cell for observation j. Let AVGOT(8) = average value of the output for the training data.

Analogous to regression define; SST(Train) = Sum of Squares Total for Training data SST(Train) =  (O j (8) - AVGOT(8)) 2 and SSR(Train) =  (OUT j (8) - AVGOT(8)) 2 Define; R 2 (Train) = SSR(Train)/SST(Train)

If the network is to be useful for forecasting, the R 2 computed from the Test portion of the data should be close to R 2 (Train).

CONTINUOUS AND BINARY DATA If your dependent variable assumes only two values (say 0 and 1) we say we have binary data. In this case the usual procedure is to try and train the network until as many outputs as possible are below 0.1 and above 0.9. Then those observations with predictions less than.1 are classified as 0 and those observations with predictions larger than.9 are classified as 1. If we do not have binary data we say that the data is continuous.

Examples of the Use of Neural Networks Example 1: The efficient market hypothesis of financial markets states that the "past history" of a stock's returns yields no information about the future return of the stock. White (1988) examines returns on IBM to see if the market is efficient. He begins by running a multiple regression where the dependent variable is the next day's return on IBM stock and the five independent variables are the return on IBM during each of the last five days.

This regression yielded R 2 =.0079, which is consistent with the efficient market hypothesis. White then ran a neural network (containing one hidden layer) with the output cell corresponding to the next day's return on IBM and 5 input cells corresponding to the last five days' return on IBM. This neural network yielded R 2 =.179. This implies that the past five days of IBM returns do contain information that can be used to make predictions about tomorrow's return on IBM! According to the October 9, 1993 Economist, Fidelity manages 2.6 billion dollars in assets using neural nets. One of the neural net funds has beat the S&P 500 by 2-7% a quarter for more than three years.

Example 2: Researchers at Carnegie-Mellon University have developed ALVINN 1 (Automated Land Vehicle in a Neural Network), a neural network that can drive a car! It can tell if cars are nearby and then slow down the car. Within ten years a neural network may be driving your car!

Example 3: In finance and accounting it is important to accurately predict whether or not a company will go bankrupt during the next year. Altman(1968) developed a method (Altman's Z- statistic) to predict whether or not a firm will go bankrupt during the next year based on the firm's financial ratios. This method used a version of regression called discriminant analysis. Neural networks using financial ratios as input cells have outperformed Altman's Z.

Example 4: The September 22, 1993 New York Times reported that Otis Elevator uses neural networks to direct elevators. For example, if elevator 1 is on floor 10 and going up, elevator 2 is on floor 6 going down, and elevator 3 is on floor 2 and going up, which elevator should answer a call to go down from floor 7? Example 5: Many banks (Mellon and Chase are two examples) and credit card companies use neural networks to predict (on the basis of past usage patterns) whether or not a credit card transaction should be disallowed. AVCO Financial used a neural net to determine whether or not to lend people money. They increased their loan volume by 25% and decreased their default rate by 20%!

Example 6: "Pen" computers and personal digital assistants often use neural nets to read the user's handwriting. The "inputs" to the network are a binary representation of what the user has written. For example, let a 1 = place where we have written something and a 0 = a place where we have not written something. An input to the network might look like l

The neural net must decide to classify this input as an "s", a "5" or something else. Lecun tried to have a neural net "read" handwritten zip code digits digits were used for training and 2007 for testing. Running the neural net took three days on a Sun workstation. The net correctly classified 99.86% of the Training data and 95.0% of the Test data.

Why Neural Networks Can Beat Regression: The XOR Example The classical XOR data set can be used to obtain a better understanding of how neural networks work, and why they can pick up patterns that regression often misses. The XOR data set also illustrates the usefulness of a hidden layer. The XOR data set contains two inputs, one output, and four observations.

The data set is given below: ObservationInput 1Input 2Output We see that the output equals 1 if either input (but not both) is equal to 1. If we try to use regression to predict the output from the two inputs, we obtain the equation =.5. This equation yields an R 2 = 0, which means that linear multiple regression yields poor predictions indeed.

Now let's use the neural network of the following figure to predict the output We assume that for some θ the transfer function for the output cell is defined by f(x) = 1 if x  and f(x) = 0 if x< . Given this transfer function, it is natural to ask whether there are any values for  and the w ij 's that will enable the above network to make the correct predictions for the data set in the table in previous slide.

From the figure in previous slide we find INP(3) = w 03 + w 13 (Input 1) + w 23 (Input 2) and OUT(3) = 1 if INP(3)  and OUT(3) = 0 if INP(3)<  This implies that this figure will yield correct predictions for each observation if and only if the following four inequalities hold: Observation 1: w 03 <  Observation 2: w 03 + w 13  Observation 3: w 03 + w 23  Observation 4: w 03 + w 13 + w 23 < 

There are no values of w 03, w 13, w 23 and  that satisfy the following inequalities: Observation 1: w 03 <  Observation 4: w 03 + w 13 + w 23 <  To see this note that together Observation 3: w 03 + w 23  Observation 4: w 03 + w 13 + w 23 <  imply w 13 <0. Adding w 13 <0 to Observation 1: w 03 <  implies that w 03 + w 13 < θ, which contradicts the following inequality Observation 2: w 03 + w 13  Thus we have seen that there is no way the neural network of this figure can correctly predict the output for each observation.

Suppose we add a hidden layer with two nodes to the previous figure’s neural net.This yields the neural network in the following figure We define the transfer function f i at cell i by f i (x) = 1 if x  i and f i (x) = 0 for x<  i. If we choose  3 =.4,  4 = 1.2,  5 =.5, w 03 = w 04 = w 05 = 0, w 13 = w 14 = w 23 = w 24 = 1, w 35 =.6, w 45 = -.2, then the neural network for the figure above will yield correct predictions for each observation. We now verify that this is the case.

Observation 1: Input 1 = Input 2 = 0, Output = 0 Cell 3 Input = 1(0) + 1(0) = 0 Cell 3 Output = 0 (0<.4) Cell 4 Input = 1(0) + 1(0) = 0 Cell 4 Output = 0 (0<1.2) Cell 5 Input =.6(0) -.2(0) = 0 Cell 5 Output = 0 (0<.5)

Observation 2: Input 1 = 1, Input 2 = 0, Output = 1 Node 3 Input = 1(1) + 1(0) = 1 Node 3 Output = 1 (1 .4) Node 4 Input = 1(1) + 1(0) = 1 Node 4 Output = 0 (1<1.2) Node 5 Input =.6(1) -.2(0) =.6 Node 5 Output = 1 (.6 .5)

Observation 3: Input 1 = 0, Input 2 = 1, Output = 1 Node 3 Input = 1(0) + 1(1) = 1 Node 3 Output = 1 (1 .4) Node 4 Input = 1(0) + 1(1) = 1 Node 4 Output = 0 (1<1.2) Node 5 Input =.6(1) -.2(0) =.6 Node 5 Output = 1 (.6 .5)

Observation 4: Input 1 = Input 2 = 1, Output = 0 Node 3 Input = 1(1) + 1(1) = 2 Node 3 Output = 1 (2 .4) Node 4 Input = 1(1) + 1(1) = 2 Node 4 Output = 1 (2  1.2) Node 5 Input =.6(1) -.2(1) =.4 Node 5 Output = 0 (.4<.5) We have found that the hidden layer enables us to perfectly fit the XOR data set!

REMARK One might think that having more hidden layers will lead to much more accurate predictions. Vast experience shows that more than one hidden layer is rarely a significant improvement over a single hidden layer.