Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
Neural networks Introduction Fitting neural networks
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Sparse vs. Ensemble Approaches to Supervised Learning
Artificial Neural Networks ML Paul Scheible.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Three kinds of learning
Sparse vs. Ensemble Approaches to Supervised Learning
Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.
Part I: Classification and Bayesian Learning
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Classification Part 3: Artificial Neural Networks
Chapter 9 Neural Network.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
NEURAL NETWORKS FOR DATA MINING
Classification / Regression Neural Networks 2
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Non-Bayes classifiers. Linear discriminants, neural networks.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CSC321: Lecture 7:Ways to prevent overfitting
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Deep Feedforward Networks
Deep Learning Amin Sobhani.
ECE 5424: Introduction to Machine Learning
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural networks (3) Regularization Autoencoder
ECE 5424: Introduction to Machine Learning
Neural Networks and Backpropagation
An Introduction to Support Vector Machines
Machine Learning Today: Reading: Maria Florina Balcan
Hyperparameters, bias-variance tradeoff, validation
Support Vector Machines
Support Vector Machine _ 2 (SVM)
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
Neural networks (1) Traditional multi-layer perceptrons
Neural networks (3) Regularization Autoencoder
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Introduction to Neural Networks
Image recognition.
Support Vector Machines 2
Presentation transcript:

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods

Reminder K-class classification: K nodes in top layer Continuous outcome: Single node in top layer

Each hidden node can be seen as a linear space partition. Reminder

Fitting Neural Networks Overfitting The model is too flexible, involving too many parameters. May easily overfit the data. Early stopping – do not let the algorithm converge. Because the model starts with linear, this is a regularized solution (towards linear). Explicit regularization (“weight decay”) – minimize tends to shrink smaller weights more. Cross-validation is used to estimate λ.

Fitting Neural Networks

Number of Hidden Units and Layers Too few – might not have enough flexibility to capture the nonlinearities in the data Too many – overly flexible, BUT extra weights can be shrunk toward zero if appropriate regularization is used. ✔ Typical range: Cross-validation can be used.

Examples “A radial function is in a sense the most difficult for the neural net, as it is spherically symmetric and with no preferred directions.”

Examples

Going beyond single hidden layer A benchmark problem: classification of handwritten numerals.

3x3  1 5x5  1 Going beyond single hidden layer same operation on different parts each of the units in a single 8 × 8 feature map share the same set of nine weights (but have their own bias parameter)  Decision boundaries of parallel lines 3x3  1 5x5  1 No weight sharing weight shared

Going beyond single hidden layer

Going beyond single hidden layer

Deep learning

DataFeaturesModel Finding the correct features is critical in the success. -Kernels in SVM -Hidden layer nodes in neural network -Predictor combinations in RF A successful machine learning technology needs to be able to extract useful features (data representations) on its own. Deep learning methods: -Composition of multiple non-linear transformations of the data -Goal: more abstract – and ultimately more useful representations IEEE Trans Pattern Anal Mach Intell Aug;35(8):

Deep learning Learn representations of data with multiple levels of abstraction Example: image processing Layer 1: presence/absence of edge at particular location & orientation. Layer 2: motifs formed by particular arrangements of edges; allows small variations in edge locations Layer 3: assemble motifs into larger combinations of familiar objects Layer 4 and beyond: higher order combinations Key: the layers are not designed by an engineer, but learned from data using a general-purpose learner. Nature. 521:

Deep learning Key to success:  Detect minute differences  Ignore irrelevant variations Nature. 521:

Deep learning Nature 505, 146–148 IEEE Trans Pattern Anal Mach Intell Aug;35(8):

Deep learning Nature. 521:

Deep learning c, The equations used for computing the forward pass in a neural net with two hidden layers and one output layer, each constituting a module through which one can backpropagate gradients. At each layer, we first compute the total input z to each unit, which is a weighted sum of the outputs of the units in the layer below. Then a non-linear function f(.) is applied to z to get the output of the unit. Common f(): rectified linear unit (ReLU) f(z) = max(0,z) Nature. 521:

Deep learning d, The equations used for computing the backward pass. At each hidden layer we compute the error derivative with respect to the output of each unit, which is a weighted sum of the error derivatives with respect to the total inputs to the units in the layer above. We then convert the error derivative with respect to the output into the error derivative with respect to the input by multiplying it by the gradient of f(z). At the output layer, the error derivative with respect to the output of a unit is computed by differentiating2the cost function. Nature. 521:

Deep learning RNNs process an input sequence one element at a time, maintaining in their hidden units a ‘state vector’ that implicitly contains information about the history of all the past elements of the sequence. Nature. 521:

Deep learning IEEE Trans Pattern Anal Mach Intell Aug;35(8): Major areas of application -Speech Recognition and Signal Processing -Object Recognition -Natural Language Processing …… So far in bioinformatics -Training data size (subjects) is still too small compared to the number of variables (N<<p issue) -Neural network could be applied when human selection of variables is done first. -Biological knowledge, in the form of existing networks, are already explicitly used, instead of being learned from data. They are hard to beat with a limited amount of data.

Brief summary

Capability to learn highly abstract representations Need expert input to select representations Shallow classifiers (linear machines) Deep neural networks Kernel machines “Do not generalize well far from the training examples” Linear partition Nonlinear partition Parametric Nonlinear models Flexible Nonlinear partition Extract higher order information Brief summary

Ensemble learning Combining weak learners Combining strong learners Bagging Boosting Random forest Stacking

Ensemble learning  Uses cross validation to assess the individual performance of prediction algorithms  Combines algorithms to produce an asymptotically optimal combination 1.For each predictor, predict each observation in a V-fold cross-validation 2.Find a weight vector: 3.Combine the prediction from individual algoriths using the weights. Stat in Med. 34:106–117

Ensemble learning Lancet Respir Med. 3(1):42-52