Download presentation
Presentation is loading. Please wait.
2
Overall Introduction for the Lecture
– Betz Johannes 6 Pathfinding: From British Museum to A* – Lennart Adenaw 11 Reinforcement Learning – Christian Dengler 1 Introduction: Artificial Intelligence – Betz Johannes P6: – Lennart Adenaw P – Christian Dengler P1: – Betz Johannes 7 Introduction: Artificial Neural Networks – Lennart Adenaw 12 AI-Development – Johannes Betz 2 Perception – Betz Johannes P – Lennart Adenaw P – Johannes Betz P2: – Betz Johannes 8 Deep Neural Networks – Jean-Michael Georg 13 Free Discussion – Betz/Adenaw 3 Supervised Learning: Regression – Alexander Wischnewski P8 – Jean-Michael Georg P3: – Alexander Wischnewski 9 Convolutional Neural Networks – Jean-Michael Georg 4 Supervised Learning: Classification – Jan-Cedric Mertens P9 – Jean-Michael Georg P4: – Jan-Cedric Mertens 10 Recurrent Neural Networks – Christian Dengler 5 Unsupervised Learning: Clustering – Jan-Cedric Mertens P – Christian Dengler
3
Objectives for Lecture 8: Neural Networks
Depth of understanding After the lecture you are able to… Remember Understand Apply Analyze Evaluate Develop … compute local gradients with a computational graph … understand how backpropagation works in a neural network … apply backpropagation algorithm to small neural networks to calculate weight changes and local gradients … understand differences between activation functions and use cases … compute correct dimensions for weight and biases matrices … how to program fully connected neural networks … initialize weights correctly …. train a neural network
5
Repetition
6
Repetition
7
Last Week: Today:
9
Computational Graph Options: 1. Analytical 2. Difference quotient
10
Computational Graph Simple example
Lets calculate some derivatives!
11
Computational Graph Simple example
Lets calculate some derivatives!
12
Computational Graph Simple example
Lets calculate some derivatives!
13
Computational Graph Simple example
Lets calculate some derivatives!
14
Computational Graph Simple example
Lets calculate some derivatives!
15
Computational Graph Simple example
Lets calculate some derivatives!
16
Computational Graph Simple example
Lets calculate some derivatives!
17
Computational Graph Simple example
Lets calculate some derivatives!
18
Computational Graph Simple example
Gradient „flows“ Upstream Gradient Downstream Gradient Lets calculate some derivatives!
19
Computational Graph Simple example
Lets calculate some derivatives!
20
Computational Graph Simple example
Lets calculate some derivatives!
21
Computational Graph Standard operations
Addition: Multiplication Downstream gradient remains the same Downstream gradient switches to other factor
22
Computational Graph Simple example, with numbers
23
Computational Graph Simple example, with numbers
24
Computational Graph Simple example, with numbers
25
Computational Graph Simple example, with numbers
26
Computational Graph Simple example, with numbers
27
Computational Graph Simple example, with numbers
28
Computational Graph Abstract function
29
Computational Graph Abstract function
Forward path
30
Computational Graph Abstract function
Saving local gradients during forward path Forward path
31
Computational Graph Abstract function
Already calculated during forward path Backward path
32
Computational Graph Abstract function
Already calculated during forward path Backward path
33
Computational Graph Abstract function
34
Computational Graph Complex Example
35
Computational Graph Complex Example
36
Computational Graph Complex Example
37
Computational Graph Complex Example
38
Computational Graph Neuron
39
Computational Graph Neuron
40
Computational Graph Neuron
41
Computational Graph Neuron
Local derivative
42
Computational Graph Neuron
44
Backpropagation Neural Chain
Forward path Input Neuron Layer 1 Layer 2 Loss Function
45
Backpropagation Neural Chain
Input Neuron Layer 1 Layer 2 Loss Function Forward path
46
Backpropagation Neural Chain
Input Neuron Layer 1 Layer 2 Loss Function Forward path Backward path
47
Backpropagation Neural Chain
48
Backpropagation Neural Chain
Input Neuron Layer 1 Layer 2 Loss Function Forward path Backward path
49
Backpropagation Neural Chain
50
Backpropagation Neural Chain
Input Neuron Layer 1 Layer 2 Loss Function Forward path Backward path
51
Backpropagation Neural Chain
Input Neuron Layer 1 Layer 2 Loss Function Forward path Backward path
52
Backpropagation Neural Chain
Input Neuron Layer 1 Layer 2 Loss Function Forward path Backward path
54
Backpropagation Neural Network
55
Backpropagation Neural Network
56
Backpropagation Neural Network
57
Backpropagation Neural Network
Hadamard product
58
Backpropagation Neural Network
59
Backpropagation Neural Network, one hidden layer
60
Backpropagation Neural Network, one hidden layer
61
Backpropagation Neural Network, one hidden layer
62
Backpropagation Neural Network, one hidden layer
Forward path Backward path
63
Backpropagation Neural Network, one hidden layer
64
CPU vs GPU Low computedensity Complex logic control Large Cache
Low latencytollerance High computedensity Many calculation per memory acces Optimizer for parallel computing Low latency tolerance
66
Activation Functions Sigmoid
Two main problems: Causes vanishing gradient: Gradient nearly zero for very large or small x, kills gradient and network stops learning Output isn‘t zero centered: Always all gradients positive or all negativ, inefficient weight updates
67
Activation Functions Tangens hyperbolicus (tanh)
Better than sigmoid: Output is zero centered But still causes vanishing gradient
68
Activation Functions Rectified Linear Unit (ReLU)
Most common activation functio: Computationally efficient Converges very fast Does not activate all neurons at the same time Problem: Gradient is zero for x<0 and can cause vanishing gradient -> dead relus may happen Not zero centered Usage: Mostly used in hidden layers Positive bias at init to get active ReLU
69
Activation Functions Leaky ReLU
Improves on ReLU: Removes zero part of ReLU by adding a small slope. More stable then relu, but adds another paramter Computationally efficient Converges very fast Doesn‘t die Parameter a can also be learned by the network
70
Activation Functions Exponential Linea Unit (ELU)
Benefits of ReLU and Leaky ReLU Computation requires
71
Activation Functions Softmax
Type of Sigmoid, handy for classification problems. Divides by the sum of all outputs, allows for percentage representation Classifier: Dog Dog With Softmax Cat Cat Mouse Mouse
72
Activation Functions Rule of thumb
Sigmoid / Softmax for classifiers Sigmoid, tanh sometimes avoided due to vanishing gradient ReLU mostly used today, but should only be used in hidden layer Start with ReLU if you don‘t get optimal results go for LeakyReLU od ELU
74
Fully Connected Layer Dimensions
Alternativ:
75
Fully Connected Layer Dimensions
76
Fully Connected Layer Image Data
77
Fully Connected Layer Image Data
79
Gradient Descent Training: for n < max_epochs: Forward path Backward path Update weights
80
Gradient Descent 1. forward path 2. forward path 3. forward path 4. forward path All the data 3x compute time
81
Stochastic Gradient Decent Alternative to Gradient Decent
Only use one random datapoint at a time Reduces compute time per optimisation step But finding local minimums takes longer
82
Batch Gradient Descent A Compromise
Split dataset in batches, => Trying to minimize global loss function with local cost functions
83
Dimensions with batch gradient descent
For batchsize n: 3D Matrix multiplcation -> handled by tensorflow
84
Stochastic vs. Batch Stochastic training can miss local minimums because of the randomness of each input Takes longer in general Mini batches finds minimum quickly but may take more computational power
85
Codeexample Tensorflow:
Layer 1 Layer 2 Layer 3 imwidth imwidth imwidth imwidth 4000 Neuronen 3000 Neuronen
86
Weight Initialisation
𝑊 1 = 𝑤 11 𝑤 12 𝑤 13 𝑤 21 𝑤 22 𝑤 23 𝑤 31 𝑤 32 𝑤 33 Weight Initialisation Normal Distribution (0, 𝝈~ 𝟐 𝒏 𝒊 ) Init as 0 Normal Distribution (0, 𝝈=𝟎.𝟒) 𝒏 𝒊 : Number of Neurons in previous layer 𝑊 1 = − − −0.06 − 𝑊 1 = − − − − 𝑊 1 = z.B.: 𝒏 𝒊 = 2 => 𝝈 = 1
88
Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training
89
Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training
90
Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training
91
Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training
92
Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training
93
Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training
94
Training a Neural Network
1. Preparing input data 2. Prepare datapipeline to get the data into the network 3. Define the network 4. Initialize all variables 5. Train the network 6. Supervise the training
95
Summary What we learned today:
Using computational graphs to calculate local gradients in any mathematical function Backpropagating local gradients to calculate weight updates Most common activation functions and their use cases Use of batches for training neural networks Initialize weights correctly Steps to take in training a neural network
96
End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.