Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural Network Dong Wang CSLT ML Summer Seminar (3)

Similar presentations


Presentation on theme: "Neural Network Dong Wang CSLT ML Summer Seminar (3)"— Presentation transcript:

1 Neural Network Dong Wang CSLT ML Summer Seminar (3)

2 Content Connectionism Mapping model: MLP\RBF\CNN Memory model: Hopfield network and ISOmap Process models: RNN\LSTM\S2S Neural Turing machine

3 What is neural network? Simon Haykin, 1998

4 In machine learning and cognitive science, artificial neural networks (ANNs) are a family of models inspired by biological neural networks (the central nervous systems of animals, in particular the brain) which are used to estimate or approximate functions that can depend on a large number of inputs and are generally unknown.machine learningcognitive sciencebiological neural networkscentral nervous systemsbrainapproximatefunctionsinputs Wiki 2016.7.8

5 Biological motivation https://en.wikipedia.org/wiki/Artificial_neural_network

6 So ANN is… A bunch of models that can approximate various functions of human brain capabilities – Memorization – Induction and deduction Abstraction Prediction Description ANN in ML has gradually lost it’s original motivation from biologic. In stead we regard it as a computation model and dare not modifying any component, in spite its biological reality.

7 Competition between classical AI and NN ANN, or connectionism changed AI – Representation, from symbols to distributed signals – Processing, from sequential to parallel – Capability, from simple algorithm pipeline to full processing – Modeling, from design to learning – More flexible, more robust

8 Shall we combine? Simon Haykin, 1998

9 There are some reasons to do combination, but I don’t think symbolic processing need much careful consideration. Simon Haykin, 1998

10 Content Connectionism Mapping model: MLP\RBF\CNN Memory model: Hopfield network and ISOmap Process models: RNN\LSTM\S2S Neural Turing machine

11 Start from a linear model

12 Or a log linear model The liner model is a simple neural network. But in history the learning methods are different.

13 How to train? Closed-form solution Gradient descend Hebbian Perceptron convergence theorem Bishop 95

14 The limitation It can not deal with many complex problems Even incapable for the simple Xor… due to the linearity

15 How to deal with it? Oooh, generalized linear model If φ j (x) =∑w i,j φ’(x), MLP If φ j (x) = φ(||x-v j || 2 ), RBF If K(x,y)=∑ j φ j (x) T φ j (y), Kernel machine …

16 Multi-layer perceptron Use an adapted feature mapping!

17 General form Layer skip is allowed But no loop A NN with two-layer hidden weights with most usually used activation function can approximate any continuous functions.

18 Some other variants High-order designed feature Projection pursuit Generalized additive model Multivariate adaptive regression splines

19 Training for MLP Back-propagating errors from output Using the chain rule of partial derivatives Some tricks for NN training – Feature normalization and transfer – Appropriate activation function – Weight initialization – Using momentum – Second order information – Crriculum learning – Learning to learn – Regularization, e.g., L2, L1, batch norm, gradient clip, early stopping (cross validation)

20 Radio basis function (RBF) If we choose φ j (x) as another parametric form, for example, we get adaptable feature. E.g.,

21 General form of RBF

22 Universal approximater

23 RBF Training Using unsupervised learning to obtain the feature function (k-mean, GMM, etc) Supervised learning is possible, but may lose the local property of the hidden units. Using closed form derivation to train the second layer

24 Some views for RBF Interpolation: given a set of (x,t), how to get a t’ for an input x’. Lease square for training Extended to adapted classes centers instead of training samples. f(x)=∑ m w m h(||x-c m ||)

25 Some views of RBF Kernel regression K(w,x n ) is the first layer, t n is the second layer.

26 Some views for RBF Applying to classification task, more interesting The first layer is to form the class-specific kernel(RBF), and the second layer is the class prior.

27 Comparison between MLP and RBF The mapping feature is different: the constant contour of MLP input is on a plan, while that for RBF is sphere. Therefore, MLP is more ‘distributed’ as variance on input will impact most hidden units. RBF is more local. MLP is much difficult to train. Unsupervised learning of RBF is an advantage in both computation and data efficiency. However, it often requires more RBFs.

28 Constrained NN: Convolutional networks (CNN) Most data are structured (constrained) – In subspace – With cluster forms – Dimension dependency and correlation – Value sparsity – … Typical structures we may encounter – Spatial structure – Temporal structure – Spectral structure Structured neural model – Full connection is costly in computation and inefficiency in data statistics – Full connection concerns little priors (no structure assumption) – Structure helps both

29 CNN for structured data Locality and Repetition by convolution Invariance by pooling Suitable for feature extraction Easy to train

30 More structure: Mix-density network Training NN to predict parameters, intead of data itself Involving more structure assumptions

31 Constrained on parameters: Bayesian network Place a prior on w, calculate the posterior of w Using Laplace approximation Laplace approximation

32 Content Connectionism Mapping model: MLP\RBF\CNN Memory model: Hopfield network and ISOmap Process models: RNN\LSTM\S2S Neural Turing machine

33 Hopfiled net A set of nodes with symmetric connections An enery is associated with a configuration Hebbian learning Auto-Associative memory

34 Bidirectional associative memory (BAM) A two layer NN Training with (X,Y), M=∑X i T Y i Given X or Y, retreive Y or X. Y=g(X T M), g(.) a threshold Simple matrix multiplication X=A1 A1 T M=(4,2,-2,-4) Y=(1,1,0,0)

35 Boltzmann machine A stochastic recurrent neural network and a Markov Random Field Stochastic and generative counterpart of Hopfield net Capable learning internal representations After running a long time, goes to a equilibrium state Training is by gradient descend, to make the KL between true and model-generatd distributions similar Running Training

36 Auto-encoder A way of coding data A way of describe data structure, e.g., manifold

37 Self-organizing map (SOM) (Kohonen net) Update neighbouring nodes in the low- dimensional space Similar to k-mean with a small number of nodes

38 Content Connectionism Mapping model: MLP\RBF\CNN Memory model: Hopfield network and SOM Process models: RNN\LSTM\S2S Neural Turing machine

39 Recurrent neural networks (RNN) A way of learning temporal rules Not only mapping and memory, but processing BP through time

40 Gated network (e.g., LSTM) Training: gradient vanish and exploration Inference: lost quickly LSTM: context aware remember and retrieval

41 RNN as generative model Marginal probability Conditional probability

42 More advanced RNNs

43 Neural Turing machines Use additional memory to save knowledge Use recurrent model to learn programing More complex operations

44 Wrap up Neural network can be as simple as a linear model with adaptive features, but can also be as complex as human brain. Neural network is a large family of models that perform various functions. We classify them to mapping, memory, processing. Neural network is closely related to kernel methods, Bayesian statistics, etc. Recent advance in computional graph moves NN far from its biological origin. Is it good?


Download ppt "Neural Network Dong Wang CSLT ML Summer Seminar (3)"

Similar presentations


Ads by Google