Lecture 8, CS5671 Neural Network Concepts Weight Matrix vs. NN MLP Network Architectures Overfitting Parameter Reduction Measures of Performance Sequence.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Slides from: Doug Gray, David Poole
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
also known as the “Perceptron”
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Kostas Kontogiannis E&CE
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Lecture 14 – Neural Networks
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Neural Networks Marco Loog.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Artificial Neural Networks
Image Compression Using Neural Networks Vishal Agrawal (Y6541) Nandan Dubey (Y6279)
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Radial Basis Function Networks
Radial Basis Function Networks
Lecture 11, CS5671 Secondary Structure Prediction Progressive improvement –Chou-Fasman rules –Qian-Sejnowski –Burkhard-Rost PHD –Riis-Krogh Chou-Fasman.
Classification Part 3: Artificial Neural Networks
Cascade Correlation Architecture and Learning Algorithm for Neural Networks.
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
NEURAL NETWORKS FOR DATA MINING
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
CSC321: Lecture 7:Ways to prevent overfitting
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Chapter 8: Adaptive Networks
Perceptrons Michael J. Watts
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Deep Feedforward Networks
Learning with Perceptrons and Neural Networks
第 3 章 神经网络.
CSE P573 Applications of Artificial Intelligence Neural Networks
Classification / Regression Neural Networks 2
CS621: Artificial Intelligence
Machine Learning Today: Reading: Maria Florina Balcan
Artificial Intelligence Chapter 3 Neural Networks
CSE 573 Introduction to Artificial Intelligence Neural Networks
network of simple neuron-like computing elements
Artificial Intelligence Chapter 3 Neural Networks
Ch4: Backpropagation (BP)
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Ch4: Backpropagation (BP)
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Lecture 8, CS5671 Neural Network Concepts Weight Matrix vs. NN MLP Network Architectures Overfitting Parameter Reduction Measures of Performance Sequence Encoding

Lecture 8, CS5672 Weight Matrices vs Neural Networks Weight Matrix –Derived from frequentist evaluation of input –For example, Information Theory based weight matrix –Not to be confused with weights representation of a neural network – Assumes independence of each position (no mutual information) –Does not handle non-linearity SLP may be considered equivalent to WM (How?)

Lecture 8, CS5673 Multiple Layer Perceptron Multiple layers –Hidden layers, possibly multiple, containing multiple neurons in each –Transform non-linear function into a linear representation Commonest form in use Uses backpropagation to update parameters during training

Lecture 8, CS5674 Classical non-linear problem The tale of Kobe, Shaquille and the Los Angeles Lakers –Predicting performance of team with 2 star players who can’t stand each other Case A (neither player plays): Team loses Case B (one of the two player plays): Team wins Case C (both players play at same time): Team loses (Boggles the brain of coach: “Why can’t they get along and have a win-win situation!!??)

Lecture 8, CS5675 The xor Problem MatchesPlayer XPlayer YMatch Result A000 B101 C011 D110

Lecture 8, CS5676 The xor Problem LOSE WIN Kobe Shaquille Absent Present Absent Present LR DT

Lecture 8, CS5677 Messy problem spaces - + Neural Networks

Lecture 8, CS5678 Network Architectures Feedforward network –No cycles Recurrent network –Loops present –Input of layer x includes contribution of output –Temporal component Input at time t incorporates contribution from output at time t-1 –Loop iterates till output convergence –Typically, final output equals a vector representative of a class

Lecture 8, CS5679 Overfitting ‘Cramming versus Understanding’ Goal: Derivation of general model from data (‘One shoe fits all; not one shoe size for each person’) Pitfall: Memorization of mapping between target and data –Likely with large number of parameters –Essentially, subsets of parameters exclusively map to each training example –As general model is not learnt, performs poorly when presented with unknown data

Lecture 8, CS56710 Overfitting Example: ML estimate for a small set of data Example: NN to screen for unfair coins, using the result of several tosses as input –Training examples: H=5,T=5; H=10,T=10; –Prediction: H=6,T=6; => Not fair! –Instead of learning (H ~ T => Fair coin), network memorizes (F(10,10) = F(5,5) => Fair coin) Possible because non-overlapping sets of parameters remember mapping between each input and target output –“Conversation between ‘intelligent’ computer and human beings based on memorized mappings”

Lecture 8, CS56711 Parameter Reduction Overfitting may be avoided/minimized by choosing a balance between reducing number of parameters and capturing maximum information from the input Need to minimize –Number of connections –Number of independent weights

Lecture 8, CS56712 Parameter Reduction Input transformations used to reduce number of inputs per training example –Use H/T ratio instead of combination of heads and tails, or worse actual permutation of heads or tails –Generally, use compression techniques to encode input Prior used to highlight translational invariance –Periodicity in input For example, if every third element in input is equally important –Compensating for artificial differences in input representation When combinations are important, not permutations Weight sharing –Sets of edges share the same weight Minimizes overfitting, also improves performance

Lecture 8, CS56713 Measures of Performance Let P = number of target positives; N = number of target negatives in test suite p = number of predicted positives; n = number of predicted negatives in test suite Then p = TP + FP; n = TN + FN where FN and TP are subsets of P and FP and TN are subsets of N P N ExpectedPredicted TP TN FP FN

Lecture 8, CS56714 Measures of Performance Sensitivity/Recall: TP/TP+FN Specificity: TN/TN+FP Precision/Positive Predictive Value: TP/TP+FP 1-20 for twenty residues? –Spurious algebraic correlation Orthogonal (20 bit encoding) – –Best –Frequently used with weight sharing log 2 A bit coding Equivalence classes –OK if relevant to the question asked, not otherwise Sequence Encoding