Download presentation
Presentation is loading. Please wait.
Published byVincent Stevens Modified over 8 years ago
1
Neural Network Dong Wang CSLT ML Summer Seminar (3)
2
Content Connectionism Mapping model: MLP\RBF\CNN Memory model: Hopfield network and ISOmap Process models: RNN\LSTM\S2S Neural Turing machine
3
What is neural network? Simon Haykin, 1998
4
In machine learning and cognitive science, artificial neural networks (ANNs) are a family of models inspired by biological neural networks (the central nervous systems of animals, in particular the brain) which are used to estimate or approximate functions that can depend on a large number of inputs and are generally unknown.machine learningcognitive sciencebiological neural networkscentral nervous systemsbrainapproximatefunctionsinputs Wiki 2016.7.8
5
Biological motivation https://en.wikipedia.org/wiki/Artificial_neural_network
6
So ANN is… A bunch of models that can approximate various functions of human brain capabilities – Memorization – Induction and deduction Abstraction Prediction Description ANN in ML has gradually lost it’s original motivation from biologic. In stead we regard it as a computation model and dare not modifying any component, in spite its biological reality.
7
Competition between classical AI and NN ANN, or connectionism changed AI – Representation, from symbols to distributed signals – Processing, from sequential to parallel – Capability, from simple algorithm pipeline to full processing – Modeling, from design to learning – More flexible, more robust
8
Shall we combine? Simon Haykin, 1998
9
There are some reasons to do combination, but I don’t think symbolic processing need much careful consideration. Simon Haykin, 1998
10
Content Connectionism Mapping model: MLP\RBF\CNN Memory model: Hopfield network and ISOmap Process models: RNN\LSTM\S2S Neural Turing machine
11
Start from a linear model
12
Or a log linear model The liner model is a simple neural network. But in history the learning methods are different.
13
How to train? Closed-form solution Gradient descend Hebbian Perceptron convergence theorem Bishop 95
14
The limitation It can not deal with many complex problems Even incapable for the simple Xor… due to the linearity
15
How to deal with it? Oooh, generalized linear model If φ j (x) =∑w i,j φ’(x), MLP If φ j (x) = φ(||x-v j || 2 ), RBF If K(x,y)=∑ j φ j (x) T φ j (y), Kernel machine …
16
Multi-layer perceptron Use an adapted feature mapping!
17
General form Layer skip is allowed But no loop A NN with two-layer hidden weights with most usually used activation function can approximate any continuous functions.
18
Some other variants High-order designed feature Projection pursuit Generalized additive model Multivariate adaptive regression splines
19
Training for MLP Back-propagating errors from output Using the chain rule of partial derivatives Some tricks for NN training – Feature normalization and transfer – Appropriate activation function – Weight initialization – Using momentum – Second order information – Crriculum learning – Learning to learn – Regularization, e.g., L2, L1, batch norm, gradient clip, early stopping (cross validation)
20
Radio basis function (RBF) If we choose φ j (x) as another parametric form, for example, we get adaptable feature. E.g.,
21
General form of RBF
22
Universal approximater
23
RBF Training Using unsupervised learning to obtain the feature function (k-mean, GMM, etc) Supervised learning is possible, but may lose the local property of the hidden units. Using closed form derivation to train the second layer
24
Some views for RBF Interpolation: given a set of (x,t), how to get a t’ for an input x’. Lease square for training Extended to adapted classes centers instead of training samples. f(x)=∑ m w m h(||x-c m ||)
25
Some views of RBF Kernel regression K(w,x n ) is the first layer, t n is the second layer.
26
Some views for RBF Applying to classification task, more interesting The first layer is to form the class-specific kernel(RBF), and the second layer is the class prior.
27
Comparison between MLP and RBF The mapping feature is different: the constant contour of MLP input is on a plan, while that for RBF is sphere. Therefore, MLP is more ‘distributed’ as variance on input will impact most hidden units. RBF is more local. MLP is much difficult to train. Unsupervised learning of RBF is an advantage in both computation and data efficiency. However, it often requires more RBFs.
28
Constrained NN: Convolutional networks (CNN) Most data are structured (constrained) – In subspace – With cluster forms – Dimension dependency and correlation – Value sparsity – … Typical structures we may encounter – Spatial structure – Temporal structure – Spectral structure Structured neural model – Full connection is costly in computation and inefficiency in data statistics – Full connection concerns little priors (no structure assumption) – Structure helps both
29
CNN for structured data Locality and Repetition by convolution Invariance by pooling Suitable for feature extraction Easy to train
30
More structure: Mix-density network Training NN to predict parameters, intead of data itself Involving more structure assumptions
31
Constrained on parameters: Bayesian network Place a prior on w, calculate the posterior of w Using Laplace approximation Laplace approximation
32
Content Connectionism Mapping model: MLP\RBF\CNN Memory model: Hopfield network and ISOmap Process models: RNN\LSTM\S2S Neural Turing machine
33
Hopfiled net A set of nodes with symmetric connections An enery is associated with a configuration Hebbian learning Auto-Associative memory
34
Bidirectional associative memory (BAM) A two layer NN Training with (X,Y), M=∑X i T Y i Given X or Y, retreive Y or X. Y=g(X T M), g(.) a threshold Simple matrix multiplication X=A1 A1 T M=(4,2,-2,-4) Y=(1,1,0,0)
35
Boltzmann machine A stochastic recurrent neural network and a Markov Random Field Stochastic and generative counterpart of Hopfield net Capable learning internal representations After running a long time, goes to a equilibrium state Training is by gradient descend, to make the KL between true and model-generatd distributions similar Running Training
36
Auto-encoder A way of coding data A way of describe data structure, e.g., manifold
37
Self-organizing map (SOM) (Kohonen net) Update neighbouring nodes in the low- dimensional space Similar to k-mean with a small number of nodes
38
Content Connectionism Mapping model: MLP\RBF\CNN Memory model: Hopfield network and SOM Process models: RNN\LSTM\S2S Neural Turing machine
39
Recurrent neural networks (RNN) A way of learning temporal rules Not only mapping and memory, but processing BP through time
40
Gated network (e.g., LSTM) Training: gradient vanish and exploration Inference: lost quickly LSTM: context aware remember and retrieval
41
RNN as generative model Marginal probability Conditional probability
42
More advanced RNNs
43
Neural Turing machines Use additional memory to save knowledge Use recurrent model to learn programing More complex operations
44
Wrap up Neural network can be as simple as a linear model with adaptive features, but can also be as complex as human brain. Neural network is a large family of models that perform various functions. We classify them to mapping, memory, processing. Neural network is closely related to kernel methods, Bayesian statistics, etc. Recent advance in computional graph moves NN far from its biological origin. Is it good?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.