Neural Networks and Its Deep Structures

Neural Networks and Its Deep Structures
2018/6/2 2017 Neural Networks Neural Networks and Its Deep Structures 張智星 (J.-S. Roger Jang) 台灣大學資訊系 (CSIE, NTU) ... In this talk, we are going to apply two neural network controller design techniques to fuzzy controllers, and construct the so-called on-line adaptive neuro-fuzzy controllers for nonlinear control systems. We are going to use MATLAB, SIMULINK and Handle Graphics to demonstrate the concept. So you can also get a preview of some of the features of the Fuzzy Logic Toolbox, or FLT, version 2.

Outline Neural networks: ANN & DNN Derivative-based optimization
2018/6/2 Outline Neural networks: ANN & DNN Derivative-based optimization Derivative-free optimization Genetic algorithms Simulated annealing Random search Examples and demos Specifically, this is the outline of the talk. Wel start from the basics, introduce the concepts of fuzzy sets and membership functions. By using fuzzy sets, we can formulate fuzzy if-then rules, which are commonly used in our daily expressions. We can use a collection of fuzzy rules to describe a system behavior; this forms the fuzzy inference system, or fuzzy controller if used in control systems. In particular, we can can apply neural networks?learning method in a fuzzy inference system. A fuzzy inference system with learning capability is called ANFIS, stands for adaptive neuro-fuzzy inference system. Actually, ANFIS is already available in the current version of FLT, but it has certain restrictions. We are going to remove some of these restrictions in the next version of FLT. Most of all, we are going to have an on-line ANFIS block for SIMULINK; this block has on-line learning capability and it ideal for on-line adaptive neuro-fuzzy control applications. We will use this block in our demos; one is inverse learning and the other is feedback linearization. Artificial intelligence Machine learning

Concept of Modeling x1 xn . . . y y*
2018/6/2 Concept of Modeling x1 xn . . . Unknown target system y Model y* Given desired i/o pairs (training set) of the form (x1, ..., xn; y), construct a model to match the i/o pairs Given an input condition, it very easy to derive the overall output. However, our task is more than that. What we want to do is to construct an appropriate fuzzy inference system that can match a desired input/output data set of a unknown target system. This is called fuzzy modeling and it a branch of nonlinear system identification. In general, fuzzy modeling involves two step. The first step is structure identification, where we have to find suitable numbers of fuzzy rules and membership functions. In FLT, this is accomplished by subtractive clustering proposed by Steve Chiu in Rockwell Science Research Center. The second step is parameter identification, where we have to find the optimal parameters for membership functions as well as output linear equations. In the FLT, this is done by ANFIS, which was proposed by me back in Since ANFIS uses some neural network training techniques, it is also classified as one of the neuro-fuzzy modeling approaches. Two steps in modeling structure identification: input selection, model complexity parameter identification: optimal parameters

Neural Networks Supervised Learning Unsupervised Learning Others
2018/6/2 Neural Networks Supervised Learning Multilayer perceptrons Radial basis function networks Modular neural networks LVQ (learning vector quantization) Unsupervised Learning Competitive learning networks Kohonen self-organizing networks ART (adaptive resonant theory) Others Hopfield networks

Single-Layer Perceptrons
2018/6/2 Single-Layer Perceptrons Network architecture x1 w1 w0 w2 x2 y = signum(S wi xi + w0) w3 x3 -1 if S wi xi + w0 < 0 = 1 if S wi xi + w0 > 0 Dwi = k t xi Learning rule

Single-Layer Perceptrons
2018/6/2 Single-Layer Perceptrons Example: Gender classification h v w1 w2 w0 Network Arch. y = signum(hw1+vw2+w0) -1 if female 1 if male = y Training data h (hair length) v (voice freq.)

Continuous Activation Functions
2018/6/2 Continuous Activation Functions Commonly used continuous activation functions Logistic Hyper-tangent Identity y = 1/(1+exp(-x)) y = tanh(x/2) y = x

Multilayer Perceptrons (MLPs)
Network architecture x1 x2 y1 y2 hyperbolic tangent or logistic function Learning rule: Steepest descent (Backprop) Conjugate gradient method All optim. methods using first derivative Derivative-free optim.

Multilayer Perceptrons (MLPs)
Example: XOR problem Training data x1 x2 y Network Arch. x1 x2 y x1 x2 x1 x2 y

MLP Decision Boundaries
2018/6/2 MLP Decision Boundaries Single-layer: Half planes Exclusive-OR problem Meshed regions Most general regions A B A B B A

2018/6/2 MLP Decision Boundaries Two-layer: Convex regions Exclusive-OR problem Meshed regions Most general regions A B A B B A

2018/6/2 MLP Decision Boundaries Three-layer: Arbitrary regions Exclusive-OR problem Meshed regions Most general regions A B A B B A

Summary: MLP Decision Boundaries
XOR Interwined General 1-layer: Half planes A B 2-layer: Convex A B 3-layer: Arbitrary

Radial Basis Function Networks
2018/6/2 Radial Basis Function Networks Network architecture w1 x1 f1 w2 f2 y x2 f3 w3 identity mapping Gaussian function g(x, si, mi) wi = g(x, si, mi), i = 1, 2, 3 y = Si fi * wi = f1 * w1 + f2*w2+f3*w3

Radial Basis Function Networks
2018/6/2 Radial Basis Function Networks Example of data fitting Gaussian function g(x, si, mi) Scaled Gaussian function RBFN output Training data points

Adaptive Networks Architecture: Goal: Basic training method: x z y
2018/6/2 Adaptive Networks x z y Architecture: Feedforward networks with diff. node functions Squares: nodes with parameters Circles: nodes without parameters Goal: To achieve an I/O mapping specified by training data Basic training method: Backpropagation or steepest descent

Derivative-based Optimization
Based on first derivatives: Steepest descent (gradient descent) Conjugate gradient method Gauss-Newton method Levenberg-Marquardt method And many others Based on second derivatives: Newton method

Parameter ID: Steepest Descent
2018/6/2 Parameter ID: Steepest Descent Synonyms: gradient descent backpropagation Concept: general nonlinear model: y = f(x, q) define an error measure: E(q) = S I [yi - f(xi, q)] update formula: qnext = qnow - h E(q) D

Demo: Steepest Descent
2018/6/2 Demo: Steepest Descent Find the min. of the “peaks” function z = f(x, y) = 3*(1-x)^2*exp(-(x^2) - (y+1)^2) - 10*(x/5 - x^3 - y^5)*exp(-x^2-y^2) -1/3*exp(-(x+1)^2 - y^2).

Demo: Steepest Descent
2018/6/2 Demo: Steepest Descent Derivatives of the “peaks” function dz/dx = -6*(1-x)*exp(-x^2-(y+1)^2) - 6*(1-x)^2*x*exp(-x^2-(y+1)^2) - 10*(1/5-3*x^2)*exp(-x^2-y^2) + 20*(1/5*x-x^3-y^5)*x*exp(-x^2-y^2) - 1/3*(-2*x-2)*exp(-(x+1)^2-y^2) dz/dy = 3*(1-x)^2*(-2*y-2)*exp(-x^2-(y+1)^2) + 50*y^4*exp(-x^2-y^2) + 20*(1/5*x-x^3-y^5)*y*exp(-x^2-y^2) + 2/3*y*exp(-(x+1)^2-y^2) d(dz/dx)/dx = 36*x*exp(-x^2-(y+1)^2) - 18*x^2*exp(-x^2-(y+1)^2) - 24*x^3*exp(-x^2-(y+1)^2) + 12*x^4*exp(-x^2-(y+1)^2) + 72*x*exp(-x^2-y^2) - 148*x^3*exp(-x^2-y^2) - 20*y^5*exp(-x^2-y^2) + 40*x^5*exp(-x^2-y^2) + 40*x^2*exp(-x^2-y^2)*y^5 -2/3*exp(-(x+1)^2-y^2) - 4/3*exp(-(x+1)^2-y^2)*x^2 -8/3*exp(-(x+1)^2-y^2)*x d(dz/dy)/dy = -6*(1-x)^2*exp(-x^2-(y+1)^2) + 3*(1-x)^2*(-2*y-2)^2*exp(-x^2-(y+1)^2) + 200*y^3*exp(-x^2-y^2)-200*y^5*exp(-x^2-y^2) + 20*(1/5*x-x^3-y^5)*exp(-x^2-y^2) - 40*(1/5*x-x^3-y^5)*y^2*exp(-x^2-y^2) + 2/3*exp(-(x+1)^2-y^2)-4/3*y^2*exp(-(x+1)^2-y^2)

Param. ID: Gauss-Newton Method
Synonyms: linearization method extended Kalman filter method Concept: general nonlinear model: y = f(x, q) linearization at q = qnow: y = f(x, qnow)+a1(q1 - q1,now)+a2(q2 - q2,now) + ... LSE solution: qnext = qnow + h(A A) A B T -1 T

Param. ID: Levenberg-Marquardt
Formula: qnext = qnow + h(A A + lI) A B Effects of l: l small Gauss-Newton method l big steepest descent How to update l: greedy policy make l small cautious policy make l large T -1 T

Param. ID: Comparisons Steepest descent (SD) Hybrid learning (SD+LSE)
treats all parameters as nonlinear Hybrid learning (SD+LSE) distinguishes between linear and nonlinear Gauss-Newton (GN) linearizes and treat all parameters as linear Levenberg-Marquardt (LM) switches smoothly between SD and GN

Derivative-free Optimization
Many derivative-free optimization Genetic algorithms (GAs) Simulated annealing (SA) Random search Downhill simplex search Tabu search Characteristics Fast in development, slow in optimization Good for parallelism

Neural Networks and Its Deep Structures

Similar presentations

Presentation on theme: "Neural Networks and Its Deep Structures"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neural Networks and Its Deep Structures

Similar presentations

Presentation on theme: "Neural Networks and Its Deep Structures"— Presentation transcript:

Similar presentations

About project

Feedback