Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overall Introduction for the Lecture

Similar presentations


Presentation on theme: "Overall Introduction for the Lecture"— Presentation transcript:

1

2 Overall Introduction for the Lecture
– Betz Johannes 6 Pathfinding: From British Museum to A* – Lennart Adenaw 11 Reinforcement Learning – Christian Dengler 1 Introduction: Artificial Intelligence – Betz Johannes P6: – Lennart Adenaw P – Christian Dengler P1: – Betz Johannes 7 Introduction: Artificial Neural Networks – Lennart Adenaw 12 AI-Development – Johannes Betz 2 Perception – Betz Johannes P – Lennart Adenaw P – Johannes Betz P2: – Betz Johannes 8 Deep Neural Networks – Jean-Michael Georg 13 Free Discussion – Betz/Adenaw 3 Supervised Learning: Regression – Alexander Wischnewski P8 – Jean-Michael Georg P3: – Alexander Wischnewski 9 Convolutional Neural Networks – Jean-Michael Georg 4 Supervised Learning: Classification – Jan-Cedric Mertens P9 – Jean-Michael Georg P4: – Jan-Cedric Mertens 10 Recurrent Neural Networks – Christian Dengler 5 Unsupervised Learning: Clustering – Jan-Cedric Mertens P – Christian Dengler

3 Objectives for Lecture 8: Neural Networks
Depth of understanding After the lecture you are able to… Remember Understand Apply Analyze Evaluate Develop … compute local gradients with a computational graph … understand how backpropagation works in a neural network … apply backpropagation algorithm to small neural networks to calculate weight changes and local gradients … understand differences between activation functions and use cases … compute correct dimensions for weight and biases matrices … how to program fully connected neural networks … initialize weights correctly …. train a neural network

4

5 Repetition

6 Repetition

7 Last Week: Today:

8

9 Computational Graph Options: 1. Analytical 2. Difference quotient

10 Computational Graph Simple example
Lets calculate some derivatives!

11 Computational Graph Simple example
Lets calculate some derivatives!

12 Computational Graph Simple example
Lets calculate some derivatives!

13 Computational Graph Simple example
Lets calculate some derivatives!

14 Computational Graph Simple example
Lets calculate some derivatives!

15 Computational Graph Simple example
Lets calculate some derivatives!

16 Computational Graph Simple example
Lets calculate some derivatives!

17 Computational Graph Simple example
Lets calculate some derivatives!

18 Computational Graph Simple example
Gradient „flows“ Upstream Gradient Downstream Gradient Lets calculate some derivatives!

19 Computational Graph Simple example
Lets calculate some derivatives!

20 Computational Graph Simple example
Lets calculate some derivatives!

21 Computational Graph Standard operations
Addition: Multiplication Downstream gradient remains the same Downstream gradient switches to other factor

22 Computational Graph Simple example, with numbers

23 Computational Graph Simple example, with numbers

24 Computational Graph Simple example, with numbers

25 Computational Graph Simple example, with numbers

26 Computational Graph Simple example, with numbers

27 Computational Graph Simple example, with numbers

28 Computational Graph Abstract function

29 Computational Graph Abstract function
Forward path

30 Computational Graph Abstract function
Saving local gradients during forward path Forward path

31 Computational Graph Abstract function
Already calculated during forward path Backward path

32 Computational Graph Abstract function
Already calculated during forward path Backward path

33 Computational Graph Abstract function

34 Computational Graph Complex Example

35 Computational Graph Complex Example

36 Computational Graph Complex Example

37 Computational Graph Complex Example

38 Computational Graph Neuron

39 Computational Graph Neuron

40 Computational Graph Neuron

41 Computational Graph Neuron
Local derivative

42 Computational Graph Neuron

43

44 Backpropagation Neural Chain
Forward path Input Neuron Layer 1 Layer 2 Loss Function

45 Backpropagation Neural Chain
Input Neuron Layer 1 Layer 2 Loss Function Forward path

46 Backpropagation Neural Chain
Input Neuron Layer 1 Layer 2 Loss Function Forward path Backward path

47 Backpropagation Neural Chain

48 Backpropagation Neural Chain
Input Neuron Layer 1 Layer 2 Loss Function Forward path Backward path

49 Backpropagation Neural Chain

50 Backpropagation Neural Chain
Input Neuron Layer 1 Layer 2 Loss Function Forward path Backward path

51 Backpropagation Neural Chain
Input Neuron Layer 1 Layer 2 Loss Function Forward path Backward path

52 Backpropagation Neural Chain
Input Neuron Layer 1 Layer 2 Loss Function Forward path Backward path

53

54 Backpropagation Neural Network

55 Backpropagation Neural Network

56 Backpropagation Neural Network

57 Backpropagation Neural Network
Hadamard product

58 Backpropagation Neural Network

59 Backpropagation Neural Network, one hidden layer

60 Backpropagation Neural Network, one hidden layer

61 Backpropagation Neural Network, one hidden layer

62 Backpropagation Neural Network, one hidden layer
Forward path Backward path

63 Backpropagation Neural Network, one hidden layer

64 CPU vs GPU Low computedensity Complex logic control Large Cache
Low latencytollerance High computedensity Many calculation per memory acces Optimizer for parallel computing Low latency tolerance

65

66 Activation Functions Sigmoid
Two main problems: Causes vanishing gradient: Gradient nearly zero for very large or small x, kills gradient and network stops learning Output isn‘t zero centered: Always all gradients positive or all negativ, inefficient weight updates

67 Activation Functions Tangens hyperbolicus (tanh)
Better than sigmoid: Output is zero centered But still causes vanishing gradient

68 Activation Functions Rectified Linear Unit (ReLU)
Most common activation functio: Computationally efficient Converges very fast Does not activate all neurons at the same time Problem: Gradient is zero for x<0 and can cause vanishing gradient -> dead relus may happen Not zero centered Usage: Mostly used in hidden layers Positive bias at init to get active ReLU

69 Activation Functions Leaky ReLU
Improves on ReLU: Removes zero part of ReLU by adding a small slope. More stable then relu, but adds another paramter Computationally efficient Converges very fast Doesn‘t die Parameter a can also be learned by the network

70 Activation Functions Exponential Linea Unit (ELU)
Benefits of ReLU and Leaky ReLU Computation requires

71 Activation Functions Softmax
Type of Sigmoid, handy for classification problems. Divides by the sum of all outputs, allows for percentage representation Classifier: Dog Dog With Softmax Cat Cat Mouse Mouse

72 Activation Functions Rule of thumb
Sigmoid / Softmax for classifiers Sigmoid, tanh sometimes avoided due to vanishing gradient ReLU mostly used today, but should only be used in hidden layer Start with ReLU if you don‘t get optimal results go for LeakyReLU od ELU

73

74 Fully Connected Layer Dimensions
Alternativ:

75 Fully Connected Layer Dimensions

76 Fully Connected Layer Image Data

77 Fully Connected Layer Image Data

78

79 Gradient Descent Training: for n < max_epochs: Forward path Backward path Update weights

80 Gradient Descent 1. forward path 2. forward path 3. forward path 4. forward path All the data 3x compute time

81 Stochastic Gradient Decent Alternative to Gradient Decent
Only use one random datapoint at a time Reduces compute time per optimisation step But finding local minimums takes longer

82 Batch Gradient Descent A Compromise
Split dataset in batches, => Trying to minimize global loss function with local cost functions

83 Dimensions with batch gradient descent
For batchsize n: 3D Matrix multiplcation -> handled by tensorflow

84 Stochastic vs. Batch Stochastic training can miss local minimums because of the randomness of each input Takes longer in general Mini batches finds minimum quickly but may take more computational power

85 Codeexample Tensorflow:
Layer 1 Layer 2 Layer 3 imwidth imwidth imwidth imwidth 4000 Neuronen 3000 Neuronen

86 Weight Initialisation
𝑊 1 = 𝑤 11 𝑤 12 𝑤 13 𝑤 21 𝑤 22 𝑤 23 𝑤 31 𝑤 32 𝑤 33 Weight Initialisation Normal Distribution (0, 𝝈~ 𝟐 𝒏 𝒊 ) Init as 0 Normal Distribution (0, 𝝈=𝟎.𝟒) 𝒏 𝒊 : Number of Neurons in previous layer 𝑊 1 = − − −0.06 − 𝑊 1 = − − − − 𝑊 1 = z.B.: 𝒏 𝒊 = 2 => 𝝈 = 1

87

88 Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training

89 Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training

90 Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training

91 Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training

92 Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training

93 Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training

94 Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training

95 Summary What we learned today:
Using computational graphs to calculate local gradients in any mathematical function Backpropagating local gradients to calculate weight updates Most common activation functions and their use cases Use of batches for training neural networks Initialize weights correctly Steps to take in training a neural network

96 End


Download ppt "Overall Introduction for the Lecture"

Similar presentations


Ads by Google