Neural Network Dong Wang CSLT ML Summer Seminar (3)

Content Connectionism Mapping model: MLP\RBF\CNN Memory model: Hopfield network and ISOmap Process models: RNN\LSTM\S2S Neural Turing machine

What is neural network? Simon Haykin, 1998

In machine learning and cognitive science, artificial neural networks (ANNs) are a family of models inspired by biological neural networks (the central nervous systems of animals, in particular the brain) which are used to estimate or approximate functions that can depend on a large number of inputs and are generally unknown.machine learningcognitive sciencebiological neural networkscentral nervous systemsbrainapproximatefunctionsinputs Wiki 2016.7.8

Biological motivation https://en.wikipedia.org/wiki/Artificial_neural_network

So ANN is… A bunch of models that can approximate various functions of human brain capabilities – Memorization – Induction and deduction Abstraction Prediction Description ANN in ML has gradually lost it’s original motivation from biologic. In stead we regard it as a computation model and dare not modifying any component, in spite its biological reality.

Competition between classical AI and NN ANN, or connectionism changed AI – Representation, from symbols to distributed signals – Processing, from sequential to parallel – Capability, from simple algorithm pipeline to full processing – Modeling, from design to learning – More flexible, more robust

Shall we combine? Simon Haykin, 1998

There are some reasons to do combination, but I don’t think symbolic processing need much careful consideration. Simon Haykin, 1998

Start from a linear model

Or a log linear model The liner model is a simple neural network. But in history the learning methods are different.

How to train? Closed-form solution Gradient descend Hebbian Perceptron convergence theorem Bishop 95

The limitation It can not deal with many complex problems Even incapable for the simple Xor… due to the linearity

How to deal with it? Oooh, generalized linear model If φ j (x) =∑w i,j φ’(x), MLP If φ j (x) = φ(||x-v j || 2 ), RBF If K(x,y)=∑ j φ j (x) T φ j (y), Kernel machine …

Multi-layer perceptron Use an adapted feature mapping!

General form Layer skip is allowed But no loop A NN with two-layer hidden weights with most usually used activation function can approximate any continuous functions.

Some other variants High-order designed feature Projection pursuit Generalized additive model Multivariate adaptive regression splines

Training for MLP Back-propagating errors from output Using the chain rule of partial derivatives Some tricks for NN training – Feature normalization and transfer – Appropriate activation function – Weight initialization – Using momentum – Second order information – Crriculum learning – Learning to learn – Regularization, e.g., L2, L1, batch norm, gradient clip, early stopping (cross validation)

Radio basis function (RBF) If we choose φ j (x) as another parametric form, for example, we get adaptable feature. E.g.,

General form of RBF

Universal approximater

RBF Training Using unsupervised learning to obtain the feature function (k-mean, GMM, etc) Supervised learning is possible, but may lose the local property of the hidden units. Using closed form derivation to train the second layer

Some views for RBF Interpolation: given a set of (x,t), how to get a t’ for an input x’. Lease square for training Extended to adapted classes centers instead of training samples. f(x)=∑ m w m h(||x-c m ||)

Some views of RBF Kernel regression K(w,x n ) is the first layer, t n is the second layer.

Some views for RBF Applying to classification task, more interesting The first layer is to form the class-specific kernel(RBF), and the second layer is the class prior.

Comparison between MLP and RBF The mapping feature is different: the constant contour of MLP input is on a plan, while that for RBF is sphere. Therefore, MLP is more ‘distributed’ as variance on input will impact most hidden units. RBF is more local. MLP is much difficult to train. Unsupervised learning of RBF is an advantage in both computation and data efficiency. However, it often requires more RBFs.

Constrained NN: Convolutional networks (CNN) Most data are structured (constrained) – In subspace – With cluster forms – Dimension dependency and correlation – Value sparsity – … Typical structures we may encounter – Spatial structure – Temporal structure – Spectral structure Structured neural model – Full connection is costly in computation and inefficiency in data statistics – Full connection concerns little priors (no structure assumption) – Structure helps both

CNN for structured data Locality and Repetition by convolution Invariance by pooling Suitable for feature extraction Easy to train

More structure: Mix-density network Training NN to predict parameters, intead of data itself Involving more structure assumptions

Constrained on parameters: Bayesian network Place a prior on w, calculate the posterior of w Using Laplace approximation Laplace approximation

Hopfiled net A set of nodes with symmetric connections An enery is associated with a configuration Hebbian learning Auto-Associative memory

Bidirectional associative memory (BAM) A two layer NN Training with (X,Y), M=∑X i T Y i Given X or Y, retreive Y or X. Y=g(X T M), g(.) a threshold Simple matrix multiplication X=A1 A1 T M=(4,2,-2,-4) Y=(1,1,0,0)

Boltzmann machine A stochastic recurrent neural network and a Markov Random Field Stochastic and generative counterpart of Hopfield net Capable learning internal representations After running a long time, goes to a equilibrium state Training is by gradient descend, to make the KL between true and model-generatd distributions similar Running Training

Auto-encoder A way of coding data A way of describe data structure, e.g., manifold

Self-organizing map (SOM) (Kohonen net) Update neighbouring nodes in the low- dimensional space Similar to k-mean with a small number of nodes

Content Connectionism Mapping model: MLP\RBF\CNN Memory model: Hopfield network and SOM Process models: RNN\LSTM\S2S Neural Turing machine

Recurrent neural networks (RNN) A way of learning temporal rules Not only mapping and memory, but processing BP through time

Gated network (e.g., LSTM) Training: gradient vanish and exploration Inference: lost quickly LSTM: context aware remember and retrieval

RNN as generative model Marginal probability Conditional probability

More advanced RNNs

Neural Turing machines Use additional memory to save knowledge Use recurrent model to learn programing More complex operations

Wrap up Neural network can be as simple as a linear model with adaptive features, but can also be as complex as human brain. Neural network is a large family of models that perform various functions. We classify them to mapping, memory, processing. Neural network is closely related to kernel methods, Bayesian statistics, etc. Recent advance in computional graph moves NN far from its biological origin. Is it good?

Neural Network Dong Wang CSLT ML Summer Seminar (3)

Similar presentations

Presentation on theme: "Neural Network Dong Wang CSLT ML Summer Seminar (3)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neural Network Dong Wang CSLT ML Summer Seminar (3)

Similar presentations

Presentation on theme: "Neural Network Dong Wang CSLT ML Summer Seminar (3)"— Presentation transcript:

Similar presentations

About project

Feedback