Presentation is loading. Please wait.

Presentation is loading. Please wait.

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Similar presentations


Presentation on theme: "Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing."— Presentation transcript:

1 Object Recognizing

2 Deep Learning Success in 2012 DeepNet and speech processing

3 David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcor ne/Teaching/dmml.html

4 ImageNet

5

6

7

8 DL is providing breakthrough results in speech recognition and image classification … From this Hinton et al 2012 paper: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/38131.pdf go here: http://yann.lecun.com/exdb/mnist/ From here: http://people.idsia.ch/~juergen/cvpr2012.pdf

9 Continuous improvement Micrososft Dec 2015 150 layers Error rate of 3.5% and a localization error of 9%.

10 10

11

12

13

14 What are Deep Nets

15 Neural networks in the brain Repeating layers Linear, non-linear, pooling Learning by modifying synapses

16 Biology: Linear and non-linear operations

17 Biology: Feed-forward, recurrent, feed-back DNN adopts the feed-forward path

18 DNN Architecture

19 General structure local connections, convolution, reduced sampling

20 Multiple filters

21 Repeating operations: Linear, Non-linear, Pooling

22

23 Depth – multiple filters

24

25 Repeating 3-layer arrangement

26 History Deep Learning

27 LeNet 1998 Essentially the same as the current generation

28 MNIST data set

29 Hinton Trends in Cognitive Science 2007 The goal: unsupervised Restricted Boltzmann Machines Combining generative model and inference CNN are feed-forward and massively supervised

30 Back-propagation 1986

31 The entire network is a large parametric function The parameters are the network weights (60M in AlexNet) The parameters are learned from example The learning algorithm: back-propagation Gradient descent in the space of parameters

32 Back Propagation

33 1 4 2 3 5 6 input hidden output

34 1 4 2 3 5 6 input hidden output N5N5 N6N6 w 13 w 35 L = linear signal L k = Σ w ik N i N = nonlinear output N = σ(L) σ: y = 1 / (1 + e -αx ) dy/dx = y(1-y) dN/dL = N(1-N) dL/dW = N N L

35 L = linear signal L k = Σ w ik N i N = nonlinear output N = σ(L) σ: y = 1 / (1 + e -αx ) dy/dx = y(1-y)

36 1 4 2 3 5 6 N5N5 N6N6 w 13 w 35 Error E = 1/2 [(T 5 – N 5 ) 2 + (T 6 – N 6 ) 2 ] dE/d w 35 = (chain rule along the path) dE/d N 5 * dN 5 /d L 5 * dL 5 /d w 35 (T 5 – N 5 )N 5 (1-N 5 ) N3N3 * * Call dE/d L k = δ k back-propagating error δw ik = δ k N i Adjust the weight: δ5δ5 L5L5

37 1 4 2 3 5 6 N5N5 N6N6 w 13 w 35 General rule: dE/d L k = δ k back-propagating error Adjusting weights: δw ik = δ k N i δ5δ5 N3N3

38 1 4 2 3 5 6 N5N5 N6N6 w 13 w 35 General rule: dE/d L k = δ k back-propagating error Adjusting weights: δw ik = δ k N i δ3δ3 N1N1 True for w 13 Compute δ 3

39 1 4 2 3 5 6 input hidden output N5N5 N6N6 N6N6 w 13 w 35

40 1 4 2 3 5 6 Compute δ 3 dE/dw 13 = dE/d L 3 dL 3 /d w 13 = δ 3 N 1 dE/dL 3 = dE 1 /dL 3 + dE 2 /dL 3 = δ 31 + δ 32 δ 31 = dE 1 /dL 3 = dE 1 /dN 3 * dN 3 /dL 3 dE 1 /dN 3 = dE 1 /dL 5 * dL 5 / dN 3 = δ 5 * w 35 δ 31 = δ 5 w 35 N 3 (1-N 3 ) δ 32 = δ 6 w 36 N 3 (1-N 3 ) Adjusting δw 13 δw 13 = (δ 5 w 35 + δ 6 w 36 ) N 3 (1-N 3 ) N 1 w 13 L3L3 N3N3 L5L5

41 1 4 2 3 5 6 δ 5 * w 35 Adjusting δw 13 δw 13 = (δ 5 w 35 + δ 6 w 36 ) N 3 (1-N 3 ) N 1 w 13 δ 6 * w 36 N 3 (1-N 3 ) δ3δ3 N1N1 Propagate δ 5 and δ 6 Multiply by N 3 (1-N 3 ) Get δ 3 Adjust w 13 by δ 3 N 1 Iterated for all weights over many examples Supervision is required

42 Dropout

43 Dropout: An efficient way to average many large neural nets (http://arxiv.org/abs/1207.0580) Consider a neural net with one hidden layer. Each time we present a training example, we randomly omit each hidden unit with probability 0.5. So we are randomly sampling from 2^H different architectures. – All architectures share weights.

44 Dropout – Multi Layer For each example, set units at all levels to 0 with some probability, usually p = 0.5 Each example has a different ‘mask’ During feed-forward flow, these units are multiplied by 0, the do not participate in the computation. Similarly for the BP The intuition is to avoid over-fitting At test time all the units are used Most implementations no longer use dropout. The issue of overfitting is actively studied. For some reasons adding weights does not cause drop in test performance.

45

46 Visualizing the features at different layers Bob Fergus NIPS 2013 Best 9 patches: showing at each layer responses of 48 units. Each unit is in fact a layer of units – copies of the same unit it different locations, covering the image (a ‘convolution’ filter) They identify by a ‘deconvolution’ algorithm the patches that caused the largest activation of the unit, in a large set of test images. Showing in a 3*3 small array the 9 top-patches for each unit.

47

48 First layer in AlexNet

49

50 Layer 3 top-9 patches for each unit

51

52 1 2 3 5

53 Different visual tasks

54 Segmentation

55

56 Edge Detection

57 Captioning

58

59

60 Getting annotations http://www.cs.tau.ac.il/~guylev1/index6.ht mlhttp://www.cs.tau.ac.il/~guylev1/index6.ht ml April 2015 tauimageannotator@gmail.com

61 A woman with brown hair is sitting on her head. tauimageannotator@gmail.com

62 a man is standing in front woman in white shirt.

63 Some Future Recognition Challenges

64 Full object interpretation Headlight Window Door knob Back wheel Mirror Front wheel Headlight Window Bumper

65 Actions: What are these people doing?

66 Agents interactions Disagree: Hug:

67


Download ppt "Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing."

Similar presentations


Ads by Google