October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 1 Exemplar Analysis When building a neural network application, we must.

Slides:



Advertisements
Similar presentations
NEURAL NETWORKS Backpropagation Algorithm
Advertisements

NEURAL NETWORKS Perceptron
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Maximum Covariance Analysis Canonical Correlation Analysis.
Adaptive Resonance Theory (ART) networks perform completely unsupervised learning. Their competitive learning algorithm is similar to the first (unsupervised)
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Neural Network Based Approach for Short-Term Load Forecasting
Simple Neural Nets For Pattern Classification
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Radial Basis Functions
November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
November 30, 2010Neural Networks Lecture 20: Interpolative Associative Memory 1 Associative Networks Associative networks are able to store a set of patterns.
October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 1 K-Class Classification Problem Let us denote the k-th class by C k, with n k exemplars.
October 14, 2010Neural Networks Lecture 12: Backpropagation Examples 1 Example I: Predicting the Weather We decide (or experimentally determine) to use.
Experimental Evaluation
October 7, 2010Neural Networks Lecture 10: Setting Backpropagation Parameters 1 Creating Data Representations On the other hand, sets of orthogonal vectors.
BA 427 – Assurance and Attestation Services
1 1 Slide | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UCL CL LCL Chapter 13 Statistical Methods for Quality Control n Statistical.
Classification of Instruments :
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Radial Basis Function Networks
November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations The problem with some data representations is that the.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Issues with Data Mining
1/11 طراحی و آموزش شبکه های عصبی Slide from Dr. M. Pomplun.
December 5, 2012Introduction to Artificial Intelligence Lecture 20: Neural Network Application Design III 1 Example I: Predicting the Weather Since the.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
© Negnevitsky, Pearson Education, Will neural network work for my problem? Will neural network work for my problem? Character recognition neural.
1/11 طراحی و آموزش شبکه های عصبی Slide from Dr. M. Pomplun.
Cascade and Ratio Control
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
June 19, 2007 GRIDDED MOS STARTS WITH POINT (STATION) MOS STARTS WITH POINT (STATION) MOS –Essentially the same MOS that is in text bulletins –Number and.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
November 26, 2013Computer Vision Lecture 15: Object Recognition III 1 Backpropagation Network Structure Perceptrons (and many other classifiers) can only.
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
Method of Hooke and Jeeves
Non-Bayes classifiers. Linear discriminants, neural networks.
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Each neuron has a threshold value Each neuron has weighted inputs from other neurons The input signals form a weighted sum If the activation level exceeds.
Intro. ANN & Fuzzy Systems Lecture 13. MLP (V): Speed Up Learning.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample.
Computational Learning Theory Part 1: Preliminaries 1.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
BRAINPOP VIDEO.
LOAD FORECASTING. - ELECTRICAL LOAD FORECASTING IS THE ESTIMATION FOR FUTURE LOAD BY AN INDUSTRY OR UTILITY COMPANY - IT HAS MANY APPLICATIONS INCLUDING.
April 14, 2016Introduction to Artificial Intelligence Lecture 20: Image Processing 1 Example I: Predicting the Weather Let us study an interesting neural.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 1 Now let us talk about… Neural Network Application.
Weather Section 4 Section 4: Forecasting the Weather Preview Key Ideas Global Weather Monitoring Weather Maps Weather Forecasts Controlling the Weather.
March 1, 2016Introduction to Artificial Intelligence Lecture 11: Machine Evolution 1 Let’s look at… Machine Evolution.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Supervised Learning in ANNs
Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Hidden Markov Models Part 2: Algorithms
Introduction to Artificial Intelligence Lecture 11: Machine Evolution
Supervised Function Approximation
Capabilities of Threshold Neurons
Zip Codes and Neural Networks: Machine Learning for
Example I: Predicting the Weather
Computer Vision Lecture 19: Object Recognition III
Presentation transcript:

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 1 Exemplar Analysis When building a neural network application, we must make sure that we choose an appropriate set of exemplars (training data): The entire problem space must be covered. The entire problem space must be covered. There must be no inconsistencies (contradictions) in the data. There must be no inconsistencies (contradictions) in the data. We must be able to correct such problems without compromising the effectiveness of the network. We must be able to correct such problems without compromising the effectiveness of the network.

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 2 Ensuring Coverage For many applications, we do not just want our network to classify any kind of possible input. Instead, we want our network to recognize whether an input belongs to any of the given classes or it is “garbage” that cannot be classified. To achieve this, we train our network with both “classifiable” and “garbage” data (null patterns). For the the null patterns, the network is supposed to produce a zero output, or a designated “null neuron” is activated.

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 3 Ensuring Coverage In many cases, we use a 1:1 ratio for this training, that is, we use as many null patterns as there are actual data samples. We have to make sure that all of these exemplars taken together cover the entire input space. If it is certain that the network will never be presented with “garbage” data, then we do not need to use null patterns for training.

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 4 Ensuring Consistency Sometimes there may be conflicting exemplars in our training set. A conflict occurs when two or more identical input patterns are associated with different outputs. Why is this problematic?

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 5 Ensuring Consistency Assume a BPN with a training set including the exemplars (a, b) and (a, c). Whenever the exemplar (a, b) is chosen, the network adjust its weights to present an output for a that is closer to b. Whenever (a, c) is chosen, the network changes its weights for an output closer to c, thereby “unlearning” the adaptation for (a, b). In the end, the network will associate input a with an output that is “between” b and c, but is neither exactly b or c, so the network error caused by these exemplars will not decrease. For many applications, this is undesirable.

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 6 Ensuring Consistency To identify such conflicts, we can apply a search algorithm to our set of exemplars. How can we resolve an identified conflict? Of course, the easiest way is to eliminate the conflicting exemplars from the training set. However, this reduces the amount of training data that is given to the network. Eliminating exemplars is the best way to go if it is found that these exemplars represent invalid data, for example, inaccurate measurements. In general, however, other methods of conflict resolution are preferable.

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 7 Ensuring Consistency Another method combines the conflicting patterns. For example, if we have exemplars (0011, 0101), (0011, 0010), we can replace them with the following single exemplar: (0011, 0111). The way we compute the output vector of the new exemplar based on the two original output vectors depends on the current task. It should be the value that is most “similar” (in terms of the external interpretation) to the original two values.

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 8 Ensuring Consistency Alternatively, we can alter the representation scheme. Let us assume that the conflicting measurements were taken at different times or places. In that case, we can just expand all the input vectors, and the additional values specify the time or place of measurement. For example, the exemplars (0011, 0101), (0011, 0010) could be replaced by the following ones: (100011, 0101), (010011, 0010).

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 9 Ensuring Consistency One advantage of altering the representation scheme is that this method cannot create any new conflicts. Expanding the input vectors cannot make two or more of them identical if they were not identical before.

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 10 Training and Performance Evaluation How many samples should be used for training? Heuristic: At least 5-10 times as many samples as there are weights in the network. Formula (Baum & Haussler, 1989): P is the number of samples, |W| is the number of weights to be trained, and a is the desired accuracy (e.g., proportion of correctly classified samples).

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 11 Training and Performance Evaluation What learning rate  should we choose? The problems that arise when  is too small or to big are similar to the Adaline. Unfortunately, the optimal value of  entirely depends on the application. Values between 0.1 and 0.9 are typical for most applications. Often,  is initially set to a large value and is decreased during the learning process. Leads to better convergence of learning, also decreases likelihood of “getting stuck” in local error minimum at early learning stage.

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 12 Training and Performance Evaluation When training a BPN, what is the acceptable error, i.e., when do we stop the training? The minimum error that can be achieved does not only depend on the network parameters, but also on the specific training set. Thus, for some applications the minimum error will be higher than for others.

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 13 Training and Performance Evaluation An insightful way of performance evaluation is partial- set training. The idea is to split the available data into two sets – the training set and the test set. The network’s performance on the second set indicates how well the network has actually learned the desired mapping. We should expect the network to interpolate, but not extrapolate. Therefore, this test also evaluates our choice of training samples.

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 14 Training and Performance Evaluation If the test set only contains one exemplar, this type of training is called “hold-one-out” training. It is to be performed sequentially for every individual exemplar. This, of course, is a very time-consuming process. For example, if we have 1,000 exemplars and want to perform 100 epochs of training, this procedure involves 1,000  999  100 = 99,900,000 training steps. Partial-set training with a split would only require 70,000 training steps. On the positive side, the advantage of hold-one-out training is that all available exemplars (except one) are use for training, which might lead to better network performance.

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 15 Example I: Predicting the Weather Let us study an interesting neural network application. Its purpose is to predict the local weather based on a set of current weather data: temperature (degrees Celsius) temperature (degrees Celsius) atmospheric pressure (inches of mercury) atmospheric pressure (inches of mercury) relative humidity (percentage of saturation) relative humidity (percentage of saturation) wind speed (kilometers per hour) wind speed (kilometers per hour) wind direction (N, NE, E, SE, S, SW, W, or NW) wind direction (N, NE, E, SE, S, SW, W, or NW) cloud cover (0 = clear … 9 = total overcast) cloud cover (0 = clear … 9 = total overcast) weather condition (rain, hail, thunderstorm, …) weather condition (rain, hail, thunderstorm, …)

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 16 Example I: Predicting the Weather We assume that we have access to the same data from several surrounding weather stations. There are eight such stations that surround our position in the following way: 100 km

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 17 Example I: Predicting the Weather How should we format the input patterns? We need to represent the current weather conditions by an input vector whose elements range in magnitude between zero and one. When we inspect the raw data, we find that there are two types of data that we have to account for: Scaled, continuously variable values Scaled, continuously variable values n-ary representations of category values n-ary representations of category values

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 18 Example I: Predicting the Weather The following data can be scaled: temperature (-10… 40 degrees Celsius) temperature (-10… 40 degrees Celsius) atmospheric pressure (26… 34 inches of mercury) atmospheric pressure (26… 34 inches of mercury) relative humidity (0… 100 percent) relative humidity (0… 100 percent) wind speed (0… 250 km/h) wind speed (0… 250 km/h) cloud cover (0… 9) cloud cover (0… 9) We can just scale each of these values so that its lower limit is mapped to some  and its upper value is mapped to (1 -  ). These numbers will be the components of the input vector.

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 19 Example I: Predicting the Weather Usually, wind speeds vary between 0 and 40 km/h. By scaling wind speed between 0 and 250 km/h, we can account for all possible wind speeds, but usually only make use of a small fraction of the scale. Therefore, only the most extreme wind speeds will exert a substantial effect on the weather prediction. Consequently, we will use two scaled input values: wind speed ranging from 0 to 40 km/h wind speed ranging from 0 to 40 km/h wind speed ranging from 40 to 250 km/h wind speed ranging from 40 to 250 km/h

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 20 Example I: Predicting the Weather How about the non-scalable weather data? Wind direction is represented by an eight- component vector, where only one element (or possibly two adjacent ones) is active, indicating one out of eight wind directions. Wind direction is represented by an eight- component vector, where only one element (or possibly two adjacent ones) is active, indicating one out of eight wind directions. The subjective weather condition is represented by a nine-component vector with at least one, and possibly more, active elements. The subjective weather condition is represented by a nine-component vector with at least one, and possibly more, active elements. With this scheme, we can encode the current conditions at a given weather station with 23 vector components: one for each of the four scaled parameters one for each of the four scaled parameters two for wind speed two for wind speed eight for wind direction eight for wind direction nine for the subjective weather condition nine for the subjective weather condition

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 21 Example I: Predicting the Weather Since the input does not only include our station, but also the eight surrounding ones, the input layer of the network looks like this: … our station … north … … northwest The network has 207 input neurons, which accept 207-component input vectors.

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 22 Example I: Predicting the Weather What should the output patterns look like? We want the network to produce a set of indicators that we can interpret as a prediction of the weather in 24 hours from now. In analogy to the weather forecast on the evening news, we decide to demand the following four indicators: a temperature prediction a temperature prediction a prediction of the chance of precipitation occurring a prediction of the chance of precipitation occurring an indication of the expected cloud cover an indication of the expected cloud cover a storm indicator (extreme conditions warning) a storm indicator (extreme conditions warning)

October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 23 Example I: Predicting the Weather Each of these four indicators can be represented by one scaled output value: temperature (-10… 40 degrees Celsius) temperature (-10… 40 degrees Celsius) chance of precipitation (0%… 100%) chance of precipitation (0%… 100%) cloud cover (0… 9) cloud cover (0… 9) storm warning: two possibilities: storm warning: two possibilities: –0: no storm warning; 1: storm warning –probability of serious storm (0%… 100%) Of course, the actual network outputs range from  to (1 -  ), and after their computation, if necessary, they are scaled to match the ranges specified above.