More on NNs This is lecture 17 of Biologically Inspired Computing; about NN applications, and overfitting
NN Applications ANNs are a mature, tried and tested technology, used for all sorts of things. There are countless applications. Maybe ill-applied in many cases, maybe not trained ideally in many cases, and so on. However, the extent to which there are successful applications reveals: How useful it is to have some way of predicting/classifying without needing to know the “rules” underlying the task I.e. how much of an advance this BIC technology provides over “classical” methods. How basically flexible and reliable and scalable NNs are Next few slides contain application examples from just a single site about a single commercial NN package.
Stocks, Commodities and Futures Currency Price Predictions James O'Sullivan: Controls trading of more than 10 different financial markets with consistent profits. Corporate Bond Rating George Pugh: Predicts corporate bond ratings with 100% accuracy for consulting and trading. Standard and Poor's 500 Prediction LBS Capital Management, Inc: Predicts the S&P 500 one day ahead and one week ahead with better accuracy than traditional methods. Forecasting Stock Prices Walkrich Investments: Neural Networks rate underpriced stock; beating the S&P.
Business, Management, and Finance Direct Marketing Mail Prediction Microsoft: Improves response rates from 4.9% to 8.2%. Credit Scoring Herbert Jensen: Predicts loan application success with 75-80% accuracy. Identifing Policemen with Potential for Misconduct The Chicago Police Department predict misconduct potential based on employee records. Jury Summoning with Neural Networks The Montgomery Court House in Norristown, PA saves $70 million annually using The Intelligent Summoner from MEA. Forecasting Highway Maintenance with Neural Networks Professor Awad Hanna at the University of Wisconsin in Madison has trained a neural network to predict which type of concrete is better than another for a particular highway problem.
Medical Applications Breast Cancer Cell Analysis David Weinberg, MD: Image analysis ignores benign cells and classifies malignant cells. Hospital Expenses Reduced Anderson Memorial Hospital: Improves the quality of care, reduces death rate, and saved $500,000 in the first 15 months of use. Diagnosing Heart Attacks J. Furlong, MD: Recognizes Acute Myocardial Infarction from enzyme data Emergency Room Lab Test Ordering S. Berkov, MD: Saves time and money ordering tests using symptoms and demographics. Classifying Patients for Psychiatric Care G. Davis, MD: Predicts Length of Stay for Psychiatric Patients, saving money
Sports Applications Thoroughbred Horse Racing Don Emmons: 22 races, 17 winning horses. Thoroughbred Horse Racing Rich Janeva: 39% of winners picked at odds better than 4.5 to 1. Dog Racing Derek Anderson: 94% accuracy picking first place.
Science Solar Flare Prediction Dr. Henrik Lundstet: Predicts the next major solar flare; helps prevent problems for power plants. Mosquito Identification Aubrey Moore: 100% accuracy distinguishing between male and female, two species. Spectroscopy StellarNet Inc: Analyze spectral data to classify materials. Weather Forecasting Fort Worth National Weather Service: Predict rainfall to 85% accuracy. Air Quality Testing Researchers at the Defense Research Establishment Suffield, Chemical & Biological Defense Section, in Alberta, Canada have trained a neural network to recognize, classify and characterize aerosols of unknown origin with a high degree of accuracy.
Manufacturing Plastics Testing Monsanto: Predicts plastics quality, saving research time, processing time, and manufacturing expense. Computer Chip Manufacturing Quality Intel: Analyzes chip failures to help improve yields. Nondestructive Concrete Testing Donald G. Pratt: Detects the presence and position of flaws in reinforced concrete. Beer Testing Anheuser-Busch: Identifies the organic content of competitors' beer vapors with 96% accuracy. Steam Quality Testing AECL Research in Manitoba, Canada has developed the INSIGHT steam quality monitor, an instrument used to measure steam quality and mass flowrate.
Overfitting Suppose we train an NN to tell the difference between handwritten t and c, using only these examples: tsts cscs The ANN will learn easily. Either BP or some other method will quickly find weights for the NN which mean It gives 100% correct prediction on these cases.
Overfitting BUT; this NN will probably generalise very poorly. E.g. here is potential (very likely) performance on certain unseen cases Why? It will probably predict that this is a c It will probably predict that this is a t
Avoiding Overfitting It can be avoided by using as much training data as possible, ensuring as much diversity as possible in the data. This cuts down on the potential existence of features that might be discriminative in the training data, but are otherwise spurious. It can be avoided by jittering (adding noise). During training, every time an input pattern is presented, it is randomly perturbed. The idea of this is that spurious features will be `washed out’ by the noise, but valid discriminatory features will remain. The problem with this approach is how to correctly choose the level of noise.
Avoiding Overfitting II error Time (BP training, or EA/PSO generations) Training data Validation data Starting to overfit A typical curve showing performance during training. But here is performance on unseen data, not in the training set.
Avoiding Overfitting III 3. Another approach is early stopping. During training, keep track of the network’s performance on a separate validation set of data. At the point where error continues to improve on the training set, but starts to get worse on the validation set, that is when training should be stopped, since it is starting to overfit on the training data. The problem here is that this point is far from always clear cut.
Some other important NN points Input Layer Output layer Round nodes are `proper’ nodes, which work out a weighted sum of their inputs and send it on. Square `input nodes’ don’t really count – they just distribute the inputs. A NN like above, with just one layer of processing nodes, is called a perceptron. Perceptrons usually have many inputs and one output, but can have more than one output. They work out one (or more) weighted sums of their inputs.
Linear separability X=0 X=1 Y=0 Y= To the left, 1s and 0s are shown – these show the XOR of the x and y co- ordinates. Can you draw a straight line which has the 1s on one side of it and the 0s on the other side? It can’t be done; XOR is therefore not linearly separable It turns out that perceptrons cannot solve linearly inseparable classification problems. However, have just two layers of processing nodes, and all classification problems can be solved. Standard ANNs usually have 3 layers (input, hidden, output), and are sometimes called Multilayer Perceptrons
Perceptron can only draw one (hyper)line X=0 X=1 Y=0 Y=
Multilayer perceptron can only draw many (hyper)lines X=0 X=1 Y=0 Y= … but so can a perceptron. The difference is that the extra layer can make decisions based on what side of each line the data are on.
Talk to me …about how you would use an EA to evolve a neural network for a pattern recognition task. Encoding? Operators? Fitness ?
Next time Associative Networks (Hopfield) Self-Organising Maps