Neural Networks Review

Slides:



Advertisements
Similar presentations
Artificial Intelligence 12. Two Layer ANNs
Advertisements

Neural Networks: Learning
Slides from: Doug Gray, David Poole
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Neural Networks: Backpropagation algorithm Data Mining and Semantic Web University of Belgrade School of Electrical Engineering Chair of Computer Engineering.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Kostas Kontogiannis E&CE
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Chapter 5 NEURAL NETWORKS
Statistical Learning Methods Russell and Norvig: Chapter 20 (20.1,20.2,20.4,20.5) CMSC 421 – Fall 2006.
Artificial Neural Networks Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree.
Artificial Neural Networks
1 Machine Learning: Naïve Bayes, Neural Networks, Clustering Skim 20.5 CMSC 471.
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Computer Science and Engineering
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
NEURAL NETWORKS FOR DATA MINING
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Neural Networks 2nd Edition Simon Haykin
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Fall 2004 Backpropagation CS478 - Machine Learning.
CS 388: Natural Language Processing: Neural Networks
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
ECE 5424: Introduction to Machine Learning
Real Neurons Cell structures Cell body Dendrites Axon
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification with Perceptrons Reading:
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Classification / Regression Neural Networks 2
CS621: Artificial Intelligence
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Non-linear hypotheses
Classification Neural Networks 1
Neuro-Computing Lecture 4 Radial Basis Function Network
of the Artificial Neural Networks.
Perceptron as one Type of Linear Discriminants
Neural Network - 2 Mayank Vatsa
Neural Networks Geoff Hulten.
Capabilities of Threshold Neurons
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Artificial Intelligence 12. Two Layer ANNs
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
David Kauchak CS158 – Spring 2019
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Neural Networks Review 1

Neural Function Brain function (thought) occurs as the result of the firing of neurons Neurons connect to each other through synapses, which propagate action potential (electrical impulses) by releasing neurotransmitters Synapses can be excitatory (potential-­‐increasing) or inhibitory (potential-­‐decreasing), and have varying activation thresholds Learning occurs as a result of the synapses’ plasticicity: They exhibit long-term changes in connection strength There are about 1011 neurons and about 1014 synapses in the human brain! 2 Based on slide by T. Finin, M. desJardins, L Getoor, R. Par

Biology of a Neuron 3

Brain Structure Different areas of the brain have different functions Some areas seem to have the same function in all humans (e.g., Broca’s region for motor speech); the overall layout is generally consistent Some areas are more plastic, and vary in their function; also, the lower-­‐level structure and function vary greatly We don’t know how different functions are “assigned” or acquired Partly the result of the physical layout / connection to inputs (sensors) and outputs (effectors) Partly the result of experience (learning) We really don’t understand how this neural structure leads to what we perceive as “consciousness” or “thought” 4 Based on slide by T. Finin, M. desJardins, L Getoor, R. Par

The “One Learning Algorithm” Hypothesis Somatosensory Cortex Auditory Cortex Auditory cortex learns to see [Roe et al., 1992] Somatosensory cortex learns to see [Metin & Frost, 1989] 5 Based on slide by Andrew Ng

Neural Networks Origins: Algorithms that try to mimic the brain. Very widely used in 80s and early 90s; popularity diminished in late 90s. Recent resurgence: State-of-the-art technique for many applications Artificial neural networks are not nearly as complex or intricate as the actual brain structure 8 Based on slide by Andrew Ng

Neural networks Output units Hidden units Input units Layered feed-forward network Neural networks are made up of nodes or units, connected by links Each link has an associated weight and activation level Each node has an input function (typically summing over weighted inputs), an activation function, and an output 9 Based on slide by T. Finin, M. desJardins, L Getoor, R. Par

Neuron Model: Logistic Unit 10 Based on slide by Andrew Ng

Neural Network 12 Slide by Andrew Ng

Feed-Forward Process Input layer units are set by some exterior function (think of these as sensors), which causes their output links to be activated at the specified level Working forward through the network, the input function of each unit is applied to compute the input value Usually this is just the weighted sum of the activation on the links feeding into this node The activation function transforms this input function into a final value Typically this is a nonlinear function, often a sigmoid function corresponding to the “threshold” of that node 13 Based on slide by T. Finin, M. desJardins, L Getoor, R. Par

Neural Network 14 Slide by Andrew Ng

Vectorization 15 Based on slide by Andrew Ng

Other Network Architectures 16

Multiple Output Units: One-vs-Rest Pedestrian Car Motorcycle Truck 17 Slide by Andrew Ng

Multiple Output Units: One-vs-Rest 18 Based on slide by Andrew Ng

Neural Network Classification 19 Slide by Andrew Ng

Understanding Representations 20

Representing Boolean Functions 21 Based on slide and example by Andrew Ng

Representing Boolean Functions 22

Combining Representations to Create Non-­Linear Functions 23 Based on example by Andrew Ng

Layering Representations x1 ... x20 x21 ... x40 x41 ... x60 ... x381 ... x400 20 × 20 pixel images d = 400 10 classes Each image is “unrolled” into a vector x of pixel intensities

Layering Representations x1 x2 x3 x4 x5 Output Layer “0” “1” “9” xd Hidden Layer Input Layer Visualization of Hidden Layer

LeNet 5 Demonstration: http://yann.lecun.com/exdb/lenet/

Neural Network Learning

Perceptron Learning Rule Equivalent to the intuitive rules: If output is correct, don’t change the weights If output is low (h(x) = 0, y = 1), increment weights for all the inputs which are 1 If output is high (h(x) = 1, y = 0), decrement weights for all inputs which are 1 Perceptron Convergence Theorem: If there is a set of weights that is consistent with the training data (i.e., the data is linearly separable), the perceptron learning algorithm will converge [Minicksy & Papert, 1969] 29

Batch Perceptron Simplest case: α = 1 and don’t normalize, yields the fixed increment perceptron Each increment of outer loop is called an epoch Based on slide by Alan Fern 30

Learning in NN: Backpropagation Similar to the perceptron learning algorithm, we cycle through our examples If the output of the network is correct, no changes are made If there is an error, weights are adjusted to reduce the error The trick is to assess the blame for the error and divide it among the contributing weights 31 Based on slide by T. Finin, M. desJardins, L Getoor, R. Par

Cost Function Based on slide by Andrew Ng

Optimizing the Neural Network Based on slide by Andrew Ng

Forward Propagation Based on slide by Andrew Ng

Backpropagation Intuition Each hidden node j is “responsible” for some fraction of the error δ (l) in each of the output nodes to which it connects j δ (l) is divided according to the strength of the connection between hidden node and the output node Then, the “blame” is propagated back to provide the error values for the hidden layer 35 Based on slide by T. Finin, M. desJardins, L Getoor, R. Par

Backpropagation Intuition 36 Based on slide by Andrew Ng

Backpropagation Intuition Based on slide by Andrew Ng

Backpropagation Intuition Based on slide by Andrew Ng

Backpropagation Intuition Based on slide by Andrew Ng

Backpropagation Intuition 40 Based on slide by Andrew Ng

Backpropagation: Gradient Computation Based on slide by Andrew Ng 41

Backpropagation Based on slide by Andrew Ng

Training a Neural Network via Gradient Descent with Backprop Based on slide by Andrew Ng

Backprop Issues “Backprop is the cockroach of machine learning. It’s ugly, and annoying, but you just can’t get rid of it.” - Geoff Hinton Problems: black box local minima

Implementation Details

Random Initialization Important to randomize initial weight matrices Can’t have uniform initial weights, as in logistic regression – Otherwise, all updates will be identical & the net won’t learn Based on slide by Andrew Ng

Implementation Details For convenience, compress all parameters into θ “unroll” Θ(1), Θ(2),... , Θ(L-1) into one long vector θ E.g., if Θ(1) is 10 x 10, then the first 100 entries of θ contain the value in Θ(1) Use the reshape command to recover the original matrices E.g., if Θ(1) is 10 x 10, then theta1 = reshape(theta[0:100], (10, 10)) Each step, check to make sure that J(θ) decreases Implement a gradient-­‐checking procedure to ensure that the gradient is correct... Based on slide by Andrew Ng

Gradient Checking Idea: estimate gradient numerically to verify implementation, then turn off gradient checking Based on slide by Andrew Ng

Gradient Checking Check that the approximate numerical gradient matches the entries in the D matrices Based on slide by Andrew Ng

Implementation Steps Implement backprop to compute DVec – DVec is the unrolled {D(1), D(2), ... } matrices Implement numerical gradient checking to compute gradApprox Make sure DVec has similar values to gradApprox Turn off gradient checking. Using backprop code for learning. Important: Be sure to disable your gradient checking code before training your classifier. If you run the numerical gradient computation on every iteration of gradient descent, your code will be very slow 51 Based on slide by Andrew Ng

Putting It All Together 52

Training a Neural Network Pick a network architecture (connectivity pattern between nodes) # input units = # of features in dataset # output units = # classes Reasonable default: 1 hidden layer or if >1 hidden layer, have same # hidden units in every layer (usually the more the better) Based on slide by Andrew Ng

Training a Neural Network Randomly initialize weights Implement forward propagation to get hΘ(xi) for any instance xi Implement code to compute cost function J(Θ) Implement backprop to compute partial derivatives Use gradient checking to compare computed using backpropagation vs. the numerical gradient estimate. Then, disable gradient checking code Use gradient descent with backprop to fit the network Based on slide by Andrew Ng

Any Question?