Presentation is loading. Please wait.

Presentation is loading. Please wait.

Human Brain The brain is a highly complex, non-linear, and parallel computer, composed of some 1011 neurons that are densely connected (~104 connection.

Similar presentations


Presentation on theme: "Human Brain The brain is a highly complex, non-linear, and parallel computer, composed of some 1011 neurons that are densely connected (~104 connection."— Presentation transcript:

1 Human Brain The brain is a highly complex, non-linear, and parallel computer, composed of some 1011 neurons that are densely connected (~104 connection per neuron). We have just begun to understand how the brain works... A neuron is much slower (10-3sec) compared to a silicon logic gate (10-9sec), however the massive interconnection between neurons make up for the comparably slow rate.

2 Human Brain Plasticity: Some of the neural structure of the brain is present at birth, while other parts are developed through learning, especially in early stages of life, to adapt to the environment (new inputs).

3 Biological Neuron

4 Biological Neuron dendrites: nerve fibres carrying electrical signals to the cell cell body: computes a non-linear function of its inputs axon: single long fiber that carries the electrical signal from the cell body to other neurons synapse: the point of contact between the axon of one cell and the dendrite of another, regulating a chemical connection whose strength affects the input to the cell.

5 Inspiration from Neurobiology
A neuron: many-inputs / one-output unit output can be excited or not excited incoming signals from other neurons determine if the neuron shall excite ("fire") Output subject to attenuation in the synapses, which are junction parts of the neuron Inputs: dentrites Processing: soma Outputs: axons Synapses: electrochemical contact between neurons Basically, a biological neuron receives inputs from other sources, combines them in some way, performs a generally nonlinear operation on the result, and then output the final result.

6 A Neuron on a microchip

7 A Biological Neuron

8 A Photomicrograph of Neurons

9 Biological Neuro-Signal

10 Neural networks Neural network: information processing paradigm inspired by biological nervous systems, such as our brain Structure: large number of highly interconnected processing elements (neurons) working together Like people, they learn from experience (by example)

11 Neural networks Neural networks are configured for a specific application, such as pattern recognition or data classification, through a learning process In a biological system, learning involves adjustments to the synaptic connections between neurons  same for artificial neural networks (ANNs)

12 Where can neural network systems help
when we can't formulate an algorithmic solution. when we can get lots of examples of the behavior we require. ‘learning from experience’ when we need to pick out the structure from existing data.

13 Complicated Example: Categorising Vehicles
Input to function: pixel data from vehicle images Output: numbers: 1 for a car; 2 for a bus; 3 for a tank INPUT INPUT INPUT INPUT OUTPUT = OUTPUT = OUTPUT = OUTPUT=1

14 Artificial Neuron – Feed Forward
(1) Summation Neuron i (2) Transfer

15 Transfer Functions

16 Artificial Neuron – Error Backward
Neuron i

17 Perceptron Output layer Y Input layer X1 X2 X3

18 Perceptron (cont.) Perceptron was introduced by Rosenblatt, 1957
He introduced the idea of training A Perceptron is a linear threshold gate Given a classification problem try to find a perceptron to fit:

19 Perceptron – Feed Forward
(1) Summation Neuron i (2) Transfer

20 Perceptron – Error Backward
Neuron i

21 Perceptron : Weight Adjustment
The Perceptron learning rule: If the perceptron gives the correct answer, do nothing If the perceptron gives the wrong answer, adjust the weights and threshold “in the right direction”, so that it eventually gives the right answer.

22 Perceptron : Training Algorithm

23 Perceptron : Error value

24 Perceptron : Learning Rate h
h is the rate at which the training rule converges toward the correct solution. Typically h <=1 Too small an h produces slow convergence. Too large of an h can cause oscillations in the process.

25 Multi-layer Perceptron
Output layer Layer S+1 Hidden layer Layer S Input layer Layer S-1

26 Function Learning Map categorisation learning to numerical problem
Each category given a number Or a range of real valued numbers (e.g., ) Function learning examples Input = 1,2,3,4 Output = 1,4,9,16 Here the concept to learn is squaring integers Input = [1,2,3], [2,3,4], [3,4,5], [4,5,6] Output = 1, 5, 11, 19 Here the concept is: [a,b,c] -> a*c - b The calculation is more complicated than in the first example Neural networks: Calculation is much more complicated in general But it is still just a numerical calculation

27 Example Perceptron Categorisation of 2x2 pixel black & white images
Into “bright” and “dark” Representation of this rule: If it contains 2, 3 or 4 white pixels, it is “bright” If it contains 0 or 1 white pixels, it is “dark” Perceptron architecture: Four input units, one for each pixel One input unit: +1 for white, -1 for dark

28 Example Perceptron Example calculation: x1=-1, x2=1, x3=1, x4=-1
S = 0.25*(-1) *(1) *(1) *(-1) = 0 0 > -0.1, so the output from the ANN is +1 So the image is categorised as “bright”

29 Worked Example Return to the “bright” and “dark” example
Use a learning rate of η = 0.1 Suppose we have set random weights:

30 Worked Example Use this training example, E, to update weights:
Here, x1 = -1, x2 = 1, x3 = 1, x4 = -1 as before Propagate this information through the network: S = (-0.5 * 1) + (0.7 * -1) + (-0.2 * +1) + (0.1 * +1) + (0.9 * -1) = -2.2 Hence the network outputs o(E) = -1 But this should have been “bright”=+1 So t(E) = +1

31 Calculating the Error Values
Δ0 = η(t(E)-o(E))x0 = 0.1 * (1 - (-1)) * (1) = 0.1 * (2) = 0.2 Δ1 = η(t(E)-o(E))x1 = 0.1 * (1 - (-1)) * (-1) = 0.1 * (-2) = -0.2 Δ2 = η(t(E)-o(E))x2 Δ3 = η(t(E)-o(E))x3 Δ4 = η(t(E)-o(E))x4

32 Calculating the New Weights

33 New Look Perceptron Calculate for the example, E, again:
S = (-0.3 * 1) + (0.5 * -1) + (0 * +1) + (0.3 * +1) + (0.7 * -1) = -1.2 Still gets the wrong categorisation But the value is closer to zero (from -2.2 to -1.2) In a few epochs time, this example will be correctly categorised

34 Boolean Functions Take in two inputs (-1 or +1)
Produce one output (-1 or +1) In other contexts, use 0 and 1 Example: AND function Produces +1 only if both inputs are +1 Example: OR function Produces +1 if either inputs are +1 Related to the logical connectives from F.O.L.

35 Boolean Functions as Perceptrons
Problem: XOR boolean function Produces +1 only if inputs are different Cannot be represented as a perceptron Because it is not linearly separable

36 Linearly Separable Boolean Functions
Can use a line (dotted) to separate +1 and –1 Think of the line as representing the threshold Angle of line determined by two weights in perceptron Y-axis crossing determined by threshold

37 Linearly Separable Functions
Result extends to functions taking many inputs And outputting +1 and –1 Also extends to higher dimensions for outputs

38 Typical Activation Functions
F(x) = 1 / (1 + e -k ∑ (wixi) ) Shown for k = 0.5, 1 and 10 Using a nonlinear function which approximates a linear threshold allows a network to approximate nonlinear functions

39 Learning performance Network architecture Learning method:
Unsupervised Reinforcement learning Backpropagation

40 Unsupervised learning
No help from the outside No training data, no information available on the desired output Learning by doing Used to pick out structure in the input: Clustering Reduction of dimensionality  compression Example: Kohonen’s Learning Law Unsupervised learning The hidden neurons must find a way to organize themselves without help from the outside. In this approach, no sample outputs are provided to the network against which it can measure its predictive performance for a given vector of inputs. This is learning by doing.

41 Competitive learning: example
Example: Kohonen network Winner takes all only update weights of winning neuron Network topology Training patterns Activation rule Neighborhood Learning

42 Reinforcement learning
Teacher: training data The teacher scores the performance of the training examples Use performance score to shuffle weights ‘randomly’ Relatively slow learning due to ‘randomness’

43 Back propagation Desired output of the training examples
Error = difference between actual & desired output Change weight relative to error size Calculate output layer error , then propagate back to previous layer Improved performance, very common!

44 Applications Prediction: learning from past experience
pick the best stocks in the market predict weather identify people with cancer risk Classification Image processing Predict bankruptcy for credit card companies Risk assessment

45 Applications Recognition
Pattern recognition: SNOOPE (bomb detector in U.S. airports) Character recognition Handwriting: processing checks Data association Not only identify the characters that were scanned but identify when the scanner is not working properly

46 Applications Data Conceptualization
infer grouping relationships e.g. extract from a database the names of those most likely to buy a particular product. Data Filtering e.g. take the noise out of a telephone signal, signal smoothing Planning Unknown environments Sensor data is noisy Fairly new approach to planning

47 Strengths of a Neural Network
Power: Model complex functions, nonlinearity built into the network Ease of use: Learn by example Very little user domain-specific expertise needed Intuitively appealing: based on model of biology, will it lead to genuinely intelligent computers/robots? Neural networks cannot do anything that cannot be done using traditional computing techniques, BUT they can do some things which would otherwise be very difficult.

48 General Advantages Advantages Adapt to unknown situations
Robustness: fault tolerance due to network redundancy Autonomous learning and generalization Disadvantages Not exact Large complexity of the network structure For motion planning?

49 Applications Aerospace
High performance aircraft autopilots, flight path simulations, aircraft control systems, autopilot enhancements, aircraft component simulations, aircraft component fault detectors Automotive Automobile automatic guidance systems, warranty activity analyzers Banking Check and other document readers, credit application evaluators Defense Weapon steering, target tracking, object discrimination, facial recognition, new kinds of sensors, sonar, radar and image signal processing including data compression, feature extraction and noise suppression, signal/image identification Electronics Code sequence prediction, integrated circuit chip layout, process control, chip failure analysis, machine vision, voice synthesis, nonlinear modeling

50 Applications Financial
Real estate appraisal, loan advisor, mortgage screening, corporate bond rating, credit line use analysis, portfolio trading program, corporate financial analysis, currency price prediction Manufacturing Manufacturing process control, product design and analysis, process and machine diagnosis, real-time particle identification, visual quality inspection systems, beer testing, welding quality analysis, paper quality prediction, computer chip quality analysis, analysis of grinding operations, chemical product design analysis, machine maintenance analysis, project bidding, planning and management, dynamic modeling of chemical process systems Medical Breast cancer cell analysis, EEG and ECG analysis, prosthesis design, optimization of transplant times, hospital expense reduction, hospital quality improvement, emergency room test advisement

51 Applications Robotics
Trajectory control, forklift robot, manipulator controllers, vision systems Speech Speech recognition, speech compression, vowel classification, text to speech synthesis Securities Market analysis, automatic bond rating, stock trading advisory systems Telecommunications Image and data compression, automated information services, real-time translation of spoken language, customer payment processing systems Transportation Truck brake diagnosis systems, vehicle scheduling, routing systems

52 Properties of ANNs Learning from examples labeled or unlabeled
Adaptivity changing the connection strengths to learn things Non-linearity the non-linear activation functions are essential Fault tolerance if one of the neurons or connections is damaged, the whole network still works quite well

53 Artificial Neuron Model
x0= +1 x1 x2 x3 xm bi :Bias wi1 S f ai Neuroni Activation wim function Input Synaptic Output Weights

54 ai = f (ni) = f (Swijxj + bi)
Bias n ai = f (ni) = f (Swijxj + bi) i = 1 An artificial neuron: - computes the weighted sum of its input and - if that value exceeds its “bias” (threshold), - it “fires” (i.e. becomes active)

55 Bias Bias can be incorporated as another weight clamped to a fixed input of +1.0 This extra free variable (bias) makes the neuron more powerful. n ai = f (ni) = f (Swijxj) i = 0

56 Other Activation Functions

57 Different Network Topologies
Single layer feed-forward networks Input layer projecting into the output layer Single layer network Input Output layer layer

58 Different Network Topologies
Multi-layer feed-forward networks One or more hidden layers. Input projects only from previous layers onto a layer. 2-layer or 1-hidden layer fully connected network Input Hidden Output layer layer layer

59 Different Network Topologies
Recurrent networks A network with feedback, where some of its inputs are connected to some of its outputs (discrete time). Recurrent network Input Output layer layer

60 How to Decide on a Network Topology?
# of input nodes? Number of features # of output nodes? Suitable to encode the output representation transfer function? Suitable to the problem # of hidden nodes? Not exactly known

61 Examples: Input Units Hidden Units Output units weights
Autoassociation Heteroassociation

62 How is a function computed by a Multilayer Neural Network?
Typically, y1=1 for positive example and y1=0 for negative example hj=g(wji.xi) y1=g(wkj.hj) where g(x)= 1/(1+e ) x x x x x5 x6 h h2 h3 y1 k j i wji’s wkj’s g (sigmoid): 1/2 1

63 Learning in Multilayer Neural Networks
Learning consists of searching through the space of all possible matrices of weight values for a combination of weights that satisfies a database of positive and negative examples (multi-class as well as regression problems are possible). Note that a Neural Network model with a set of adjustable weights defines a restricted hypothesis space corresponding to a family of functions. The size of this hypothesis space can be increased or decreased by increasing or decreasing the number of hidden units present in the network.

64 The Perceptron Training Rule
One way to learn an acceptable weight vector is to begin with random weights, then iteratively apply the perceptron to each training example, modifying the perceptron weights whenever it misclassifies an example. This process is repeated, iterating through the training examples as many times as needed until the perceptron classifies all training examples correctly. Weights are modified at each step according to the perceptron training rule, which revises the weight associated with input according to the rule

65 Gradient Descent and Delta Rule
The delta training rule is best understood by considering the task of training an unthresholded perceptron; that is, a linear unit for which the output o is given by In order to derive a weight learning rule for linear units, let us begin by specifying a measure for the training error of a hypothesis (weight vector), relative to the training examples.

66 BACKPROPAGATION Algorithm
EECP0720 Expert Systems – Artificial Neural Networks BACKPROPAGATION Algorithm

67 Error Function The Backpropagation algorithm learns the weights for a multilayer network, given a network with a fixed set of units and interconnections. It employs gradient descent to attempt to minimize the squared error between the network output values and the target values for those outputs. We begin by redefining E to sum the errors over all of the network output units where outputs is the set of output units in the network, and tkd and okd are the target and output values associated with the kth output unit and training example d.

68 Architecture of Backpropagation

69 Backpropagation Learning Algorithm

70 Backpropagation Learning Algorithm (cont.)

71 Backpropagation Learning Algorithm (cont.)

72 Backpropagation Learning Algorithm (cont.)

73 Backpropagation Learning Algorithm (cont.)

74 Output The response function is normally nonlinear Samples include
Sigmoid Piecewise linear

75 Backpropagation Preparation
Training Set A collection of input-output patterns that are used to train the network Testing Set A collection of input-output patterns that are used to assess network performance Learning Rate-η A scalar parameter, analogous to step size in numerical integration, used to set the rate of adjustments

76 Network Error Total-Sum-Squared-Error (TSSE)
Root-Mean-Squared-Error (RMSE)

77 Face Detection using Neural Networks
Training Process Face Database Output=1, for face database Non-Face Database Neural Network Face or Non-Face? Output=0, for non-face database Testing Process

78 Backpropagation Using Gradient Descent
Advantages Relatively simple implementation Standard method and generally works well Disadvantages Slow and inefficient Can get stuck in local minima resulting in sub-optimal solutions

79 Local Minima Local Minimum Global Minimum

80 Other Ways To Minimize Error
Varying training data Cycle through input classes Randomly select from input classes Add noise to training data Randomly change value of input node (with low probability) Retrain with expected inputs after initial training E.g. Speech recognition

81 Other Ways To Minimize Error
Adding and removing neurons from layers Adding neurons speeds up learning but may cause loss in generalization Removing neurons has the opposite effect

82 References [1]  L. Smith, ed. (1996, 2001), "An Introduction to Neural Networks", URL: [2]  Sarle, W.S., ed. (1997), Neural Network FAQ, URL: ftp://ftp.sas.com/pub/neural/FAQ.html [3]  StatSoft, "Neural Networks", URL: [4]  S. Cho, T. Chow, and C. Leung, "A Neural-Based Crowd Estimation by Hybrid Global Learning Algorithm", IEEE Transactions on Systems, Man and Cybernetics, Part B, No

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102


Download ppt "Human Brain The brain is a highly complex, non-linear, and parallel computer, composed of some 1011 neurons that are densely connected (~104 connection."

Similar presentations


Ads by Google