Neural Networks An Introduction

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Kostas Kontogiannis E&CE
Artificial Neural Networks - Introduction -
Machine Learning Neural Networks
Artificial Intelligence (CS 461D)
Decision Support Systems
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Foundations of Learning and Adaptive Systems ICS320
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Radial Basis Function (RBF) Networks
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Neural Networks Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University EE459 Neural Networks The Structure.
NEURAL NETWORKS FOR DATA MINING
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Artificial Intelligence & Neural Network
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
Each neuron has a threshold value Each neuron has weighted inputs from other neurons The input signals form a weighted sum If the activation level exceeds.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Where are we? What’s left? HW 7 due on Wednesday Finish learning this week. Exam #4 next Monday Final Exam is a take-home handed out next Friday in class.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Big data classification using neural network
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
Learning in Neural Networks
Data Mining, Neural Network and Genetic Programming
Advanced information retreival
Artificial Intelligence (CS 370D)
Artificial neural networks:
Real Neurons Cell structures Cell body Dendrites Axon
Radial Basis Function G.Anuradha.
Neural Networks Dr. Peter Phillips.
Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
CSC 578 Neural Networks and Deep Learning
Chapter 12 Advanced Intelligent Systems
Chapter 3. Artificial Neural Networks - Introduction -
Neuro-Computing Lecture 4 Radial Basis Function Network
of the Artificial Neural Networks.
Neural Network - 2 Mayank Vatsa
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Artificial Intelligence Lecture No. 28
Capabilities of Threshold Neurons
Backpropagation.
Computer Vision Lecture 19: Object Recognition III
Artificial Neural Networks
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Neural Networks An Introduction Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University

Brain and Machine The Brain The Machine Pattern Recognition Association Complexity Noise Tolerance The Machine Calculation Precision Logic

The contrast in architecture The Von Neumann architecture uses a single processing unit; Tens of millions of operations per second Absolute arithmetic precision The brain uses many slow unreliable processors acting in parallel

Features of the Brain Ten billion neurons Average several thousand connections Hundreds of operations per second Reliability low Die off frequently (never replaced) Compensates for problems by massive parallelism

The biological inspiration The brain has been extensively studied by scientists. Vast complexity prevents all but rudimentary understanding. Even the behavior of an individual neuron is extremely complex

The biological inspiration Single “percepts” distributed among many neurons Localized parts of the brain are responsible for certain well-defined functions (e.g.. vision, motion). Which features are integral to the brain's performance? Which are incidentals imposed by the fact of biology?

The Structure of Neurones

The Structure of Neurones A neurone has a cell body, a branching input structure (the dendrIte) and a branching output structure (th axOn) Axons connect to dendrites via synapses. Electro-chemical signals are propagated from the dendritic input, through the cell body, and down the axon to other neurons

The Structure of Neurones A neurone only fires if its input signal exceeds a certain amount (the threshold) in a short time period. Synapses vary in strength Good connections allowing a large signal Slight connections allow only a weak signal. Synapses can be either excitatory or inhibitory.

A Classic Artifical Neuron(1) Sj f (Sj) Xj ao a1 a2 an +1 wj0 wj1 wj2 wjn

A Classic Artifical Neuron(2) All neurons contain an activation function which determines whether the signal is strong enough to produce an output. Shows several functions that could be used as an activation function.

Learning When the output is calculated, the desire output is then given to the program to modify the weights. After modifications are done, the same inputs given will produce the outputs desired. Formula : Weight N = Weight N + learning rate * (Desire Output-Actual Output) * Input N * Weight N

Tractable Architectures Feedforward Neural Networks Connections in one direction only Partial biological justification Complex models with constraints (Hopfield and ART). Feedback loops included Complex behaviour, limited by constraining architecture

Fig. 1: Multilayer Perceptron Output Values Input Signals (External Stimuli) Output Layer Adjustable Weights Input Layer

Types of Layer The input layer. The hidden layer(s). Introduces input values into the network. No activation function or other processing. The hidden layer(s). Perform classification of features Two hidden layers are sufficient to solve any problem Features imply more layers may be better

Types of Layer (continued) The output layer. Functionally just like the hidden layers Outputs are passed on to the world outside the neural network.

A Simple Model of a Neuron w1j w2j w3j wij y1 y2 y3 yi O Each neuron has a threshold value Each neuron has weighted inputs from other neurons The input signals form a weighted sum If the activation level exceeds the threshold, the neuron “fires”

An Artificial Neuron O f(x) w1j w2j w3j wij y1 y2 y3 yi Each hidden or output neuron has weighted input connections from each of the units in the preceding layer. The unit performs a weighted sum of its inputs, and subtracts its threshold value, to give its activation level. Activation level is passed through a sigmoid activation function to determine output.

Mathematical Definition Number all the neurons from 1 up to N The output of the j'th neuron is oj The threshold of the j'th neuron is qj The weight of the connection from unit i to unit j is wij The activation of the j'th unit is aj The activation function is written as f(x)

Mathematical Definition Since the activation aj is given by the sum of the weighted inputs minus the threshold, we can write: aj = S ( wijoi ) - qj i oj = f(aj )

Activation functions Transforms neuron’s input into output. Features of activation functions: A squashing effect is required Prevents accelerating growth of activation levels through the network. Simple and easy to calculate Monotonically non-decreasing order-preserving

Standard activation functions The hard-limiting threshold function Corresponds to the biological paradigm either fires or not Sigmoid functions ('S'-shaped curves) The logistic function The hyperbolic tangent (symmetrical) Both functions have a simple differential Only the shape is important f(x) = 1 1 + e -ax

Training Algorithms Adjust neural network weights to map inputs to outputs. Use a set of sample patterns where the desired output (given the inputs presented) is known. The purpose is to learn to generalize Recognize features which are common to good and bad exemplars

Back-Propagation A training procedure which allows multi-layer feedforward Neural Networks to be trained; Can theoretically perform “any” input-output mapping; Can learn to solve linearly inseparable problems.

Activation functions and training For feedforward networks: A continuous function can be differentiated allowing gradient-descent. Back-propagation is an example of a gradient-descent technique. Reason for prevalence of sigmoid

Training versus Analysis Understanding how the network is doing what it does Predicting behaviour under novel conditions

Applications The properties of neural networks define where they are useful. Can learn complex mappings from inputs to outputs, based solely on samples Difficult to analyze: firm predictions about neural network behavior difficult; Unsuitable for safety-critical applications. Require limited understanding from trainer, who can be guided by heuristics.

Engine management The behaviour of a car engine is influenced by a large number of parameters temperature at various points fuel/air mixture lubricant viscosity. A major company have used neural networks to dynamically tune an engine depending on current settings.

Signature recognition Each person's signature is different. There are structural similarities which are difficult to quantify. One company have manufactured a machine which recognizes signatures to within a high level of accuracy. Considers speed in addition to gross shape. Makes forgery even more difficult.

Sonar target recognition Distinguish mines from rocks on sea-bed The neural network is provided with a large number of parameters which are extracted from the sonar signal. The training set consists of sets of signals from rocks and mines.

Stock market prediction “Technical trading” refers to trading based solely on known statistical parameters; e.g. previous price Neural networks have been used to attempt to predict changes in prices. Difficult to assess success since companies using these techniques are reluctant to disclose information.

Mortgage assessment Assess risk of lending to an individual. Difficult to decide on marginal cases. Neural networks have been trained to make decisions, based upon the opinions of expert underwriters. Neural network produced a 12% reduction in delinquencies compared with human experts.

Types of Problem Pattern Classification Regression Assign patterns to one of two or more classes Regression Predict value of a continuous variable

Pattern Classification Decide which class a particular pattern belongs to. In the most common case, there are only two classes. This implies that the neural network is modelling a step-function The most common use of Neural Networks.

Pattern Classification A feature is a measurement of some kind (a real number). Corresponds to inputs of neural network. A pattern is called a feature vector Points in N-dimensional space. Bifurcate feature space. Division is based on sample patterns.

Decision boundaries In simple cases, divide feature space by drawing a hyper-plane across it. Known as a decision boundary. Discriminant function: returns different values on opposite sides. Problems which can be thus classified are linearly separable.

Linear Separability Decision Boundary X1 A A A A A A A X2 B B B B B B

Nearest neighbour New pattern is assigned the same class as its nearest neighbour. Can be improved by taking k nearest neighbours and assigning to the majority

Hyper-plane partitions A single Perceptron (i.e. output unit) with connections from each input can perform, and learn, a linear separation. Perceptrons have a step function activation. Units with a sigmoid activation also act as a linear discriminant, if interpreted correctly. Use activation mid-point

Hyper-plane partitions An extra layer models a convex hull “An area with no dents in it” Perceptron models, but can’t learn Sigmoid function learning of convex hulls Two layers add convex hulls together Sufficient to classify anything “sane”. In theory, further layers add nothing In practice, extra layers may be better

Different Non-Linearly Separable Problems Types of Decision Regions Exclusive-OR Problem Classes with Meshed regions Most General Region Shapes Structure Single-Layer Half Plane Bounded By Hyperplane A B B A Two-Layer Convex Open Or Closed Regions A B B A Abitrary (Complexity Limited by No. of Nodes) Three-Layer A B B A

Over-training With sufficient nodes can classify any training set exactly May have poor generalisation ability. Cross-validation with some patterns Typically 50% of training patterns Validation set error is checked each epoch Stop training if validation error goes up

Training time How many epochs of training? Stop if the error fails to improve (has reached a minimum) Stop if the rate of improvement drops below a certain level Stop if the error reaches an acceptable level Stop when a certain number of epochs have passed

Rugby players & Ballet dancers 2 Height (m) Ballet? 1 50 100 Weight (Kg)

Clustering K means Use Euclidean distance ||x - mean || randomly assign each point to 1 of K sets calculate mean vector of each set reassign points to set with closed mean vector repeat until no further changes Use Euclidean distance ||x - mean || Caution - scaling inputs is important

MLPs versus RBFs MLPs separate classes using hyper-planes MLP RBF’s separate classes using hyper-spheres. MLPs may have one or more hidden layers, RBFs have just one. X2 MLP X1 X2 RBF X1

MLPs versus RBIs 2 MLPs use distributed learning, RBFs use localized learning. RBFs usually require more hidden units to model the problem. RBFs said to be more robust for novel data and they train faster. However, RBFs can suffer from ‘curse of dimensionality.

3-Class Problem classifying New Data using MLP LEGEND: Class A Class B Class C New Data (Unknown Class) Decision Boundary

3-Class Problem Classifying New Data using RBFN LEGEND: Class A Class B Class C New Data (Unknown Class) Decision Boundary

Radial Basis Function Architecture Centre pattern, stored in 1st layer weights Distance measure, determines how far an input pattern is from the centre. Gaussian transfer function. Outputs 2nd layer weights Radial Units 1st layer weights Inputs

A Radial Basis Function Neuron +1 X1 ck1 ck2 X2 ck3 X3 Euclidean summation Ik=||X-ck|| Transfer function vk= j (Ik) vk ckn Xn

RBF 3 Stage Training 1. Find cluster centres by e.g. K-means clustering. 2. Find the width of the function (deviation) e.g. K-nearest neighbours. 3. Supervised training phase. Adjust 2nd layer weights to map input patterns onto the known output values.

Selecting Radial Centres Radial Sampling (or sub-sampling) randomly select centres from training points K-means centre assignment k means clustering Kohonen Self Organising Maps competitive learning (ref: Kohonen lecture)

Calculating Centre Widths Explicit Assignment K Nearest Neighbours - need to specify size of K Isotropic - determined by the number of centres and how spread out they are d is distance between most distant centres k is number of centres

Supervised phase Optimize second layer weights using the known outputs linear optimization using Pseudo inverse Can use backpropagation, quick propagation or delta-bar-delta instead

Number of Patterns A simple formula gives a reasonable guideline: Use w/e training patterns, w = number of weights, e = desired accuracy Train until the error is less than e/2 on the training set The mathematical justification is quite complex (omitted!) In practice, can RARELY meet this criterion

Advantage & Disadvantage Pattern recognition Solve problems with many inputs Damage to the network does not screw up the output completely Disadvantage: Slow, compare to other computers Black box model

Conclusion True AI can be achieved, but our understand of human brain and lack of technologies do not enable us to study this field further. It is highly possible that androids like Data could be created. What would happen if the androids are superior than us?