ANN Design and Training

Slides:

Advertisements

Similar presentations

Artificial Neural Networks

Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke

Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.

Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.

Pattern Recognition and Machine Learning

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Theory and Application of Artificial Neural Networks BY: M. Eftekhari M. Eftekhari.

Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”

Visual Recognition Tutorial

x – independent variable (input)

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Giansalvo EXIN Cirrincione unit #7/8 ERROR FUNCTIONS part one Goal for REGRESSION: to model the conditional distribution of the output variables, conditioned.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.

CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Artificial Neural Networks

Artificial Neural Networks

1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.

NEURAL NETWORKS FOR DATA MINING

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.

Applying Neural Networks Michael J. Watts

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.

CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

Data Mining and Decision Support

CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.

Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.

Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.

Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

129 Feed-Forward Artificial Neural Networks AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning Supervised Learning Classification and Regression

Theory and Application of Artificial Neural Networks

Big data classification using neural network

Machine Learning with Spark MLlib

Learning Deep Generative Models by Ruslan Salakhutdinov

Deep Feedforward Networks

Artificial Neural Networks

Applying Neural Networks

Data Mining, Neural Network and Genetic Programming

Classification: Logistic Regression

Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

Machine Learning Today: Reading: Maria Florina Balcan

Limitations of Traditional Deep Network Architectures

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

N-Gram Model Formulas Word sequences Chain rule of probability

An Introduction To The Backpropagation Algorithm

Pattern Recognition and Machine Learning

CSCI N317 Computation for Scientific Applications Unit Weka

Deep Learning for Non-Linear Control

Ch4: Backpropagation (BP)

Statistics II: An Overview of Statistics

Data Transformations targeted at minimizing experimental variance

Lecture 1: Descriptive Statistics and Exploratory

Parametric Methods Berlin Chen, 2005 References:

Machine learning overview

COSC 4335: Part2: Other Classification Techniques

Introduction to Neural Networks

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

Lecture 16. Classification (II): Practical Considerations

Ch4: Backpropagation (BP)

1 CogNova Technologies Theory and Application of Artificial Neural Networks with Daniel L. Silver, PhD Daniel L. Silver, PhD Copyright (c), 2014 All Rights.

Presentation transcript:

ANN Design and Training with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017 23/11/2018 Deep Learning Workshop

Network Design & Training Issues Architecture of network Structure of artificial neurons Learning rules Training: Ensuring optimum training Learning parameters Data preparation and more ....

Network Design Architecture of the network: How many nodes? Determines number of network weights How many layers? How many nodes per layer? Input Layer Hidden Layer Output Layer Automated methods: augmentation (cascade correlation) weight pruning and elimination

Architecture of the network: Connectivity? Network Design Architecture of the network: Connectivity? Concept of model or hypothesis space Constraining the number of hypotheses: selective connectivity shared weights recursive connections Show 2 OHs of character recognition networks (from Hinton’s notes)

Structure of artificial neuron nodes Network Design Structure of artificial neuron nodes Choice of input integration: summed, squared and summed multiplied Choice of activation (transfer) function: sigmoid (logistic) hyperbolic tangent Guassian linear soft-max

Deep Learning Workshop Network Design Loss function: Sum of Squared Errors seeks the maximum likelihood hypothesis under the assumption that the training data can be modeled by Normally distributed noise added to the target function value. Fine for regression but less natural for classification Cross-entropy E = ∑j tj log oj - (1-tj) log (1-oj) seeks to to find the maximum likelihood hypotheses under the assumption that the observed (1 of n) Boolean outputs is a probabilistic function of the input instance. Maximizing likelihood is cast as the equivalent minimizing of the negative log likelihood. 23/11/2018 Deep Learning Workshop

How do you ensure that a network has been well trained? Network Training How do you ensure that a network has been well trained? Objective: To achieve good generalization accuracy on new examples/cases Establish a maximum acceptable error rate Train the network using a validation set to tune it Validate the trained network against a independent test set

Network Training Approach #1: Large Sample When the amount of available data is large ... Available Examples 70% Divide randomly 30% Generalization error = test error Training Set Test Set Production Set Compute Test error Used to develop one ANN model

Network Training Approach #2: Cross-validation Repeat 10 times When the amount of available data is small ... Available Examples Repeat 10 times 90% 10% Generalization error determined by mean test error and stddev Training Set Test Set Pro. Set Used to develop 10 different ANN models Accumulate test errors

TUTORIAL 2 Training of a good model using a validation set Backpropagation.py

How do you select between two ANN designs ? Network Training How do you select between two ANN designs ? A statistical test of hypothesis is required to ensure that a significant difference exists between the error rates of two ANN models If Large Sample method has been used then apply McNemar’s test* If Cross-validation then use a paired t test for difference of two proportions *We assume a classification problem, if this is function approximation then use paired t test for difference of means

Mastering ANN Parameters Network Training Mastering ANN Parameters Typical Range learning rate - 0.1 0.01 - 0.99 momentum - 0.8 0.1 - 0.9 weight-cost - 0.1 0.001 - 0.5 Fine tuning : - adjust individual parameters at each node and/or connection weight automatic adjustment during training

Network weight initialization Network Training Network weight initialization Random initial values +/- some range Smaller weight values for nodes with many incoming connections Rule of thumb: initial weight range should be approximately coming into a node

Typical Problems During Training Network Training Typical Problems During Training E Steady, rapid decline in total error Would like: # iter Seldom a local minimum - reduce learning or momentum parameter E But sometimes: # iter Reduce learning parms. - may indicate data is not learnable E # iter

Data Preparation Garbage in Garbage out The quality of results relates directly to quality of the data 50%-70% of ANN development time will be spent on data preparation The three steps of data preparation: Consolidation and Cleaning Selection and Preprocessing Transformation and Encoding

Data Preparation Data Types and ANNs Four basic data types: nominal discrete symbolic (blue,red,green) ordinal discrete ranking (1st, 2nd, 3rd) interval measurable numeric (-5, 3, 24) continuous numeric (0.23, -45.2, 500.43) bp ANNs accept only continuous numeric values (typically 0 - 1 range)

Consolidation and Cleaning Data Preparation Consolidation and Cleaning Determine appropriate input attributes Consolidate data into working database Eliminate or estimate missing values Remove outliers (obvious exceptions) Determine prior probabilities of categories and deal with volume bias

Selection and Preprocessing Data Preparation Selection and Preprocessing Select examples random sampling Consider number of training examples? Reduce attribute dimensionality remove redundant and/or correlating attributes combine attributes (sum, multiply, difference) Reduce attribute value ranges group symbolic discrete values quantize continuous numeric values

Transformation and Encoding Data Preparation Transformation and Encoding Nominal or Ordinal values Transform to discrete numeric values Encode the value 4 as follows: one-of-N code (0 1 0 0 0) - five inputs thermometer code ( 1 1 1 1 0) - five inputs real value (0.4)* - one input if ordinal Consider relationship between values (single, married, divorce) vs. (youth, adult, senior) * Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range

Transformation and Encoding Data Preparation Transformation and Encoding Interval or continuous numeric values De-correlate example attributes via normalization of values: Euclidean: n = x/sqrt(sum of all x^2) Percentage: n = x/(sum of all x) Variance based: n = (x - (mean of all x))/variance Scale values using a linear transform if data is uniformly distributed or use non-linear (log, power) if skewed distribution

Transformation and Encoding Data Preparation Transformation and Encoding Interval or continuous numeric values Encode the value 1.6 as: Single real-valued number (0.16)* - OK! Bits of a binary number (010000) - BAD! one-of-N quantized intervals (0 1 0 0 0) - NOT GREAT! - discontinuities distributed (fuzzy) overlapping intervals ( 0.3 0.8 0.1 0.0 0.0) - BEST! * Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range

Transfer Learning with csMTL Example: Learning to Learn how to transform images Requires methods of efficiently & effectively Retaining transform model knowledge Using this knowledge to learn new transforms (Silver and Tu, 2010)

Transfer Learning with csMTL Demo

Two more Morphed Images Passport Angry Filtered Passport Sad Filtered

TUTORIAL 3 Develop and train a BP network with and without a validation set to prevent overfitting. (Python code) Backpropagation.py