Download presentation
Presentation is loading. Please wait.
Published byClifton West Modified over 8 years ago
1
Lecture 8, CS5671 Neural Network Concepts Weight Matrix vs. NN MLP Network Architectures Overfitting Parameter Reduction Measures of Performance Sequence Encoding
2
Lecture 8, CS5672 Weight Matrices vs Neural Networks Weight Matrix –Derived from frequentist evaluation of input –For example, Information Theory based weight matrix –Not to be confused with weights representation of a neural network – Assumes independence of each position (no mutual information) –Does not handle non-linearity SLP may be considered equivalent to WM (How?)
3
Lecture 8, CS5673 Multiple Layer Perceptron Multiple layers –Hidden layers, possibly multiple, containing multiple neurons in each –Transform non-linear function into a linear representation Commonest form in use Uses backpropagation to update parameters during training
4
Lecture 8, CS5674 Classical non-linear problem The tale of Kobe, Shaquille and the Los Angeles Lakers –Predicting performance of team with 2 star players who can’t stand each other Case A (neither player plays): Team loses Case B (one of the two player plays): Team wins Case C (both players play at same time): Team loses (Boggles the brain of coach: “Why can’t they get along and have a win-win situation!!??)
5
Lecture 8, CS5675 The xor Problem MatchesPlayer XPlayer YMatch Result A000 B101 C011 D110
6
Lecture 8, CS5676 The xor Problem LOSE WIN Kobe Shaquille Absent Present Absent Present LR DT
7
Lecture 8, CS5677 Messy problem spaces - + Neural Networks
8
Lecture 8, CS5678 Network Architectures Feedforward network –No cycles Recurrent network –Loops present –Input of layer x includes contribution of output –Temporal component Input at time t incorporates contribution from output at time t-1 –Loop iterates till output convergence –Typically, final output equals a vector representative of a class
9
Lecture 8, CS5679 Overfitting ‘Cramming versus Understanding’ Goal: Derivation of general model from data (‘One shoe fits all; not one shoe size for each person’) Pitfall: Memorization of mapping between target and data –Likely with large number of parameters –Essentially, subsets of parameters exclusively map to each training example –As general model is not learnt, performs poorly when presented with unknown data
10
Lecture 8, CS56710 Overfitting Example: ML estimate for a small set of data Example: NN to screen for unfair coins, using the result of several tosses as input –Training examples: H=5,T=5; H=10,T=10; –Prediction: H=6,T=6; => Not fair! –Instead of learning (H ~ T => Fair coin), network memorizes (F(10,10) = F(5,5) => Fair coin) Possible because non-overlapping sets of parameters remember mapping between each input and target output –“Conversation between ‘intelligent’ computer and human beings based on memorized mappings”
11
Lecture 8, CS56711 Parameter Reduction Overfitting may be avoided/minimized by choosing a balance between reducing number of parameters and capturing maximum information from the input Need to minimize –Number of connections –Number of independent weights
12
Lecture 8, CS56712 Parameter Reduction Input transformations used to reduce number of inputs per training example –Use H/T ratio instead of combination of heads and tails, or worse actual permutation of heads or tails –Generally, use compression techniques to encode input Prior used to highlight translational invariance –Periodicity in input For example, if every third element in input is equally important –Compensating for artificial differences in input representation When combinations are important, not permutations Weight sharing –Sets of edges share the same weight Minimizes overfitting, also improves performance
13
Lecture 8, CS56713 Measures of Performance Let P = number of target positives; N = number of target negatives in test suite p = number of predicted positives; n = number of predicted negatives in test suite Then p = TP + FP; n = TN + FN where FN and TP are subsets of P and FP and TN are subsets of N P N ExpectedPredicted TP TN FP FN
14
Lecture 8, CS56714 Measures of Performance Sensitivity/Recall: TP/TP+FN Specificity: TN/TN+FP Precision/Positive Predictive Value: TP/TP+FP 1-20 for twenty residues? –Spurious algebraic correlation Orthogonal (20 bit encoding) – –Best –Frequently used with weight sharing log 2 A bit coding Equivalence classes –OK if relevant to the question asked, not otherwise Sequence Encoding
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.