Download presentation
Presentation is loading. Please wait.
1
Shiyan Hu Michigan Technological University
Deep Learning Shiyan Hu Michigan Technological University
2
Network parameter ๐: all the weights and biases in the โneuronsโ
Neural Network โNeuronโ Neural Network Different connection leads to different network structures Network parameter ๐: all the weights and biases in the โneuronsโ
3
Fully Connect Feedforward Network
1 4 0.98 1 -2 1 -1 -2 0.12 -1 1 Sigmoid Function
4
Fully Connect Feedforward Network
1 4 0.98 2 0.86 3 0.62 1 -2 1 -1 -1 -2 -1 -2 0.12 -2 -1 0.11 0.83 -1 1 -1 4 2
5
Fully Connect Feedforward Network
0.73 0.72 0.51 1 2 3 -2 -1 -1 1 -2 -1 0.5 -2 -1 0.12 0.85 1 -1 4 2
6
Fully Connect Feedforward Network
neuron Input Layer 1 โฆโฆ Layer 2 โฆโฆ Layer L โฆโฆ Output โฆโฆ y1 โฆโฆ y2 โฆโฆ โฆโฆ โฆโฆ yM Input Layer Output Layer Hidden Layers
7
Deep = Many hidden layers
19 layers 8 layers 6.7% 7.3% 16.4% AlexNet (2012) VGG (2014) GoogleNet (2014)
8
Deep = Many hidden layers
3.57% 7.3% 6.7% 16.4% AlexNet (2012) VGG (2014) GoogleNet (2014) Residual Net (2015)
9
Matrix Operation 1 โ2 โ1 1 1 โ1 1 0 0.98 0.12 ๐ = + 4 โ2 0.98 4 1 1 -2
-1 -2 0.12 -1 1 1 โ2 โ1 1 1 โ1 1 0 ๐ = + 4 โ2
10
Neural Network โฆโฆ y1 โฆโฆ y2 โฆโฆ โฆโฆ โฆโฆ โฆโฆ โฆโฆ โฆโฆ yM W1 W2 WL b1 b2 bL x a1
+ ๐ b2 W2 a1 + ๐ bL WL + ๐ aL-1
11
Neural Network โฆโฆ y1 โฆโฆ y2 โฆโฆ โฆโฆ โฆโฆ โฆโฆ โฆโฆ โฆโฆ yM
WL โฆโฆ y2 b1 b2 bL โฆโฆ โฆโฆ โฆโฆ โฆโฆ โฆโฆ x a1 a2 โฆโฆ y yM y =๐ x Using parallel computing techniques to speed up matrix operation WL W2 W1 x b1 b2 bL โฆ โฆ + + =๐ ๐ ๐ + x
12
Output Layer โฆโฆ โฆโฆ y1 โฆโฆ โฆโฆ โฆโฆ y2 โฆโฆ โฆโฆ โฆโฆ yM
Automatic feature engineering โฆโฆ โฆโฆ y1 โฆโฆ โฆโฆ โฆโฆ y2 Softmax โฆโฆ โฆโฆ โฆโฆ yM Input Layer Output Layer Multi-class Classifier Hidden Layers
13
Example Application Input Output y1 y2 The image is โ2โ โฆโฆ โฆโฆ โฆโฆ y10
0.1 0.7 is 2 The image is โ2โ โฆโฆ โฆโฆ 0.2 is 0 16 x 16 = 256 Ink โ 1 No ink โ 0 Each dimension represents the confidence of a digit.
14
What is needed is a function โฆโฆ
Example Application Handwriting Digit Recognition โฆโฆ โฆโฆ y1 y2 y10 is 1 Machine is 2 Neural Network โ2โ โฆโฆ is 0 What is needed is a function โฆโฆ Input: 256-dim vector output: 10-dim vector
15
Example Application โ2โ โฆโฆ โฆโฆ โฆโฆ โฆโฆ โฆโฆ y1 y2 y10
Input Layer 1 โฆโฆ Layer 2 โฆโฆ Layer L โฆโฆ Output โฆโฆ โฆโฆ y1 y2 y10 is 1 A function set containing the candidates for Handwriting Digit Recognition โฆโฆ is 2 โ2โ โฆโฆ โฆโฆ โฆโฆ is 0 Input Layer Output Layer Hidden Layers You need to learn a good function in your function set to minimize the classification error.
16
Given a set of parameters
Classification Error target โ1โ โฆโฆ y1 1 ๐ฆ 1 Given a set of parameters โฆโฆ y2 ๐ฆ 2 Softmax โฆโฆ โฆโฆ โฆโฆ Cross Entropy โฆโฆ โฆโฆ โฆโฆ โฆโฆ y10 ๐ฆ 10 ๐ฆ ๐ฆ ๐ถ ๐ฆ , ๐ฆ =โ ๐= ๐ฆ ๐ ๐๐ ๐ฆ ๐
17
Total Error ๐ฟ= ๐=1 ๐ ๐ถ ๐ For all training data โฆ
๐ฟ= ๐=1 ๐ ๐ถ ๐ For all training data โฆ x1 NN y1 ๐ฆ 1 ๐ถ 1 Find a function in function set that minimizes total loss L x2 NN y2 ๐ฆ 2 ๐ถ 2 x3 NN y3 ๐ฆ 3 ๐ถ 3 โฆโฆ โฆโฆ โฆโฆ โฆโฆ Find the network parameters ๐ฝ โ that minimize total loss L xN NN yR ๐ฆ ๐ ๐ถ ๐
18
Gradient Descent ๐ ๐๐ฟ ๐ ๐ค 1 ๐๐ฟ ๐ ๐ค 2 โฎ ๐๐ฟ ๐ ๐ 1 โฎ ๐ป๐ฟ= โฆโฆ gradient โฆโฆ
Compute ๐๐ฟ ๐ ๐ค 1 ๐๐ฟ ๐ ๐ค 1 ๐๐ฟ ๐ ๐ค 2 โฎ ๐๐ฟ ๐ ๐ 1 โฎ ๐ค 1 0.2 0.15 โ๐ ๐๐ฟ ๐ ๐ค 1 Compute ๐๐ฟ ๐ ๐ค 2 ๐ค 2 -0.1 0.05 ๐ป๐ฟ= โ๐ ๐๐ฟ ๐ ๐ค 2 โฆโฆ Compute ๐๐ฟ ๐ ๐ 1 ๐ 1 0.3 0.2 โ๐ ๐๐ฟ ๐ ๐ 1 gradient โฆโฆ
19
Gradient Descent ๐ โฆโฆ โฆโฆ โฆโฆ โฆโฆ โฆโฆ Compute ๐๐ฟ ๐ ๐ค 1 Compute ๐๐ฟ ๐ ๐ค 1
0.2 0.15 0.09 โ๐ ๐๐ฟ ๐ ๐ค 1 โ๐ ๐๐ฟ ๐ ๐ค 1 โฆโฆ Compute ๐๐ฟ ๐ ๐ค 2 Compute ๐๐ฟ ๐ ๐ค 2 ๐ค 2 -0.1 0.05 0.15 โ๐ ๐๐ฟ ๐ ๐ค 2 โ๐ ๐๐ฟ ๐ ๐ค 2 โฆโฆ โฆโฆ Compute ๐๐ฟ ๐ ๐ 1 Compute ๐๐ฟ ๐ ๐ 1 ๐ 1 0.3 0.2 0.10 โ๐ ๐๐ฟ ๐ ๐ 1 โ๐ ๐๐ฟ ๐ ๐ 1 โฆโฆ โฆโฆ
20
Gradient Descent This is the โlearningโ of machines in deep learning โฆโฆ Even alpha go using this approach. People image โฆโฆ Actually โฆ..
21
Backpropagation Backpropagation: an efficient way to compute ๐๐ฟ ๐๐ค in neural network
22
Gradient Descent Millions of parameters โฆโฆ
Network parameters Starting Parameters โฆโฆ Millions of parameters โฆโฆ To compute the gradients efficiently, we use backpropagation.
23
Backpropagation ๐ฟ ๐ = ๐=1 ๐ ๐ถ ๐ ๐ ๐๐ฟ ๐ ๐๐ค = ๐=1 ๐ ๐ ๐ถ ๐ ๐ ๐๐ค ๐ฅ 1 ๐ฅ 2
xn NN ๐ yn ๐ฆ ๐ ๐ถ ๐ ๐ฟ ๐ = ๐=1 ๐ ๐ถ ๐ ๐ ๐๐ฟ ๐ ๐๐ค = ๐=1 ๐ ๐ ๐ถ ๐ ๐ ๐๐ค ๐ฆ 1 ๐ฅ 1 ๐ฅ 2 ๐ฆ 2
24
Backpropagation ๐ง ๐ค 1 โฆโฆ ๐ฅ 1 โฆโฆ ๐ค 2 ๐ง= ๐ฅ 1 ๐ค 1 + ๐ฅ 2 ๐ค 2 +๐ ๐ฅ 2
๐ฆ 1 ๐ฅ 1 b โฆโฆ ๐ค 2 ๐ง= ๐ฅ 1 ๐ค 1 + ๐ฅ 2 ๐ค 2 +๐ ๐ฆ 2 ๐ฅ 2 Forward pass: Compute ๐๐ง ๐๐ค for all parameters ๐๐ถ ๐๐ค =? ๐๐ง ๐๐ค ๐๐ถ ๐๐ง Backward pass: (Chain rule) Compute ๐๐ถ ๐๐ง for all activation function inputs z
25
Backpropagation โ Forward pass
Compute ๐๐ง ๐๐ค for all parameters ๐ค 1 ๐ง โฆโฆ ๐ฆ 1 ๐ฅ 1 b โฆโฆ ๐ค 2 ๐ง= ๐ฅ 1 ๐ค 1 + ๐ฅ 2 ๐ค 2 +๐ ๐ฆ 2 ๐ฅ 2 ๐ฅ 1 ๐๐ง ๐ ๐ค 1 =? The value of the input connected to the weight ๐ฅ 2 ๐๐ง ๐ ๐ค 2 =?
26
Backpropagation โ Forward pass
Compute ๐๐ง ๐๐ค for all parameters 0.98 1 2 0.86 3 1 -2 1 -1 -1 -2 -1 0.12 -2 -1 0.11 -1 1 -1 4 2 ๐๐ง ๐๐ค =โ1 ๐๐ง ๐๐ค =0.12 ๐๐ง ๐๐ค =0.11
27
Backpropagation โ Backward pass
Compute ๐๐ถ ๐๐ง for all activation function inputs z ๐ง ๐ ๐ค 1 ๐ค 3 ๐งโฒ ๐ฅ 1 b ๐=๐ ๐ง ๐ค 2 ๐ค 4 ๐งโโ ๐ฅ 2 ๐๐ถ ๐๐ง = ๐๐ ๐๐ง ๐๐ถ ๐๐ ๐โฒ ๐ง
28
Backpropagation โ Backward pass
Compute ๐๐ถ ๐๐ง for all activation function inputs z ๐ง ๐ ๐ค 1 ๐ค 3 ๐งโฒ ๐ฅ 1 b ๐=๐ ๐ง ๐ค 2 ๐ค 4 ๐โฒ ๐ง ๐ ๐ง ๐งโโ ๐ฅ 2 ๐๐ถ ๐๐ง = ๐๐ ๐๐ง ๐๐ถ ๐๐ ๐โฒ ๐ง
29
Backpropagation โ Backward pass
Compute ๐๐ถ ๐๐ง for all activation function inputs z ๐ ๐งโฒ ๐ค 1 ๐ง ๐ค 3 ๐ฅ 1 b ๐งโฒ=๐ ๐ค 3 +โฏ ๐=๐ ๐ง ๐ค 2 ๐ค 4 ๐งโโ ๐ฅ 2 ๐๐ถ ๐๐ง = ๐๐ ๐๐ง ๐๐ถ ๐๐ ๐๐ถ ๐๐ = ๐๐งโฒ ๐๐ ๐๐ถ ๐๐งโฒ + ๐๐งโฒโฒ ๐๐ ๐๐ถ ๐๐งโฒโฒ (Chain rule) ? ? ๐ค 3 ๐ค 4
30
Backpropagation โ Backward pass
Compute ๐๐ถ ๐๐ง for all activation function inputs z ๐ง ๐ ๐ค 3 ๐งโฒ ๐ค 1 ๐ฅ 1 ๐๐ถ ๐๐งโฒ ๐๐ถ ๐๐ง b ๐ค 2 ๐ค 4 ๐งโโ ๐ฅ 2 ๐๐ถ ๐๐งโฒโฒ ๐๐ถ ๐๐ง =๐โฒ ๐ง ๐ค 3 ๐๐ถ ๐๐งโฒ + ๐ค 4 ๐๐ถ ๐๐งโฒโฒ
31
Backpropagation โ Backward pass
๐โฒ ๐ง ๐ค 3 ๐๐ถ ๐๐งโฒ ๐๐ถ ๐๐ง ๐ค 4 ๐โฒ ๐ง is a constant since z is already determined in the forward pass. ๐๐ถ ๐๐งโฒโฒ ๐๐ถ ๐๐ง =๐โฒ ๐ง ๐ค 3 ๐๐ถ ๐๐งโฒ + ๐ค 4 ๐๐ถ ๐๐งโฒโฒ
32
Backpropagation โ Backward pass
Compute ๐๐ถ ๐๐ง for all activation function inputs z ๐ค 1 ๐ง ๐ ๐ค 3 ๐งโฒ ๐ฆ 1 ๐ฅ 1 ๐๐ถ ๐๐งโฒ ๐๐ถ ๐๐ง b ๐ค 2 ๐ค 4 ๐งโโ ๐ฆ 2 ๐ฅ 2 ๐๐ถ ๐๐งโฒโฒ Case 1. Output Layer ๐๐ถ ๐๐งโฒ = ๐ ๐ฆ 1 ๐๐งโฒ ๐๐ถ ๐ ๐ฆ 1 ๐๐ถ ๐๐งโฒโฒ = ๐ ๐ฆ 2 ๐๐งโฒโฒ ๐๐ถ ๐ ๐ฆ 2
33
Backpropagation โ Backward pass
Compute ๐๐ถ ๐๐ง for all activation function inputs z Case 2. Not Output Layer ๐งโฒ โฆโฆ ๐๐ถ ๐๐งโฒ ๐งโโ โฆโฆ ๐๐ถ ๐๐งโฒโฒ
34
Backpropagation โ Backward pass
Compute ๐๐ถ ๐๐ง for all activation function inputs z Case 2. Not Output Layer Compute ๐๐ถ ๐๐ง recursively ๐งโฒ ๐โฒ ๐ค 5 ๐ง ๐ ๐๐ถ ๐๐งโฒ ๐๐ถ ๐ ๐ง ๐ Until we reach the output layer โฆโฆ ๐ค 6 ๐งโโ ๐ง ๐ ๐๐ถ ๐๐งโฒโฒ ๐๐ถ ๐ ๐ง ๐
35
Backpropagation โ Backward Pass
Compute ๐๐ถ ๐๐ง for all activation function inputs z Compute ๐๐ถ ๐๐ง from the output layer ๐๐ถ ๐ ๐ง 1 ๐๐ถ ๐ ๐ง 3 ๐๐ถ ๐ ๐ง 5 ๐ง 1 ๐ง 3 ๐ง 5 ๐ฅ 1 ๐ฆ 1 ๐ฅ 2 ๐ฆ 2 ๐ง 2 ๐ง 4 ๐ง 6 ๐๐ถ ๐ ๐ง 2 ๐๐ถ ๐ ๐ง 4 ๐๐ถ ๐ ๐ง 6
36
Backpropagation โ Backward Pass
Compute ๐๐ถ ๐๐ง for all activation function inputs z Compute ๐๐ถ ๐๐ง from the output layer ๐๐ถ ๐ ๐ง 1 ๐๐ถ ๐ ๐ง 3 ๐๐ถ ๐ ๐ง 5 ๐ง 1 ๐ง 3 ๐ง 5 ๐ฅ 1 ๐ฆ 1 ๐โฒ ๐ง 1 ๐โฒ ๐ง 3 ๐โฒ ๐ง 2 ๐โฒ ๐ง 4 ๐ฅ 2 ๐ฆ 2 ๐ง 2 ๐ง 4 ๐ง 6 ๐๐ถ ๐ ๐ง 2 ๐๐ถ ๐ ๐ง 4 ๐๐ถ ๐ ๐ง 6
37
Backpropagation โ Summary
Forward Pass Backward Pass โฆ โฆ ๐ ๐๐ง ๐๐ค ๐๐ถ ๐๐ง = ๐๐ถ ๐๐ค =๐ โ
for all w
38
Example Application โ1โ Handwriting Digit Recognition Machine
MNIST Data: โHello worldโ for deep learning Keras provides data sets loading function:
39
Keras 28x28 โฆโฆ 500 softplus, softsign, relu, tanh, hard_sigmoid, linear 500 Softmax y1 y2 y10 โฆโฆ
40
Keras
41
SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam
Keras Step 3.1: Configuration SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam Step 3.2: Find the optimal network parameters Training data (Images) Labels (digits) To be discussed
42
Keras Save and load models
How to use the neural network (testing): case 1: case 2:
43
Repeat the above process
We do not really minimize total loss! Mini-batch Randomly initialize network parameters x1 NN y1 Pick the 1st batch ๐ฆ 1 ๐ถ 1 ๐ฟโฒ= ๐ถ 1 + ๐ถ 31 +โฏ Mini-batch x31 NN y31 Update parameters once ๐ฆ 31 ๐ถ 31 Pick the 2nd batch โฆโฆ ๐ฟโฒโฒ= ๐ถ 2 + ๐ถ 16 +โฏ Update parameters once x2 NN y2 ๐ฆ 2 โฆ ๐ถ 2 Until all mini-batches have been picked Mini-batch x16 NN y16 ๐ฆ 16 one epoch ๐ถ 16 โฆโฆ Repeat the above process
44
Mini-batch Batch size influences both speed and performance. You need to tune it. x1 NN โฆโฆ y1 ๐ฆ 1 ๐ 1 x31 y31 ๐ฆ 31 ๐ 31 Mini-batch Pick the 1st batch Pick the 2nd batch ๐ฟโฒ= ๐ถ 1 + ๐ถ 31 +โฏ ๐ฟโฒโฒ= ๐ถ 2 + ๐ถ 16 +โฏ Update parameters Until all mini-batches have been picked โฆ one epoch Repeat 20 times
45
Donโt worry. This is the default of Keras.
Shuffle the training examples for each epoch Epoch 1 Epoch 2 x1 NN y1 x1 NN y1 ๐ฆ 1 ๐ฆ 1 ๐ 1 ๐ 1 Mini-batch x31 NN y31 Mini-batch x31 NN y17 ๐ฆ 31 ๐ฆ 17 ๐ 31 ๐ 17 โฆโฆ โฆโฆ Donโt worry. This is the default of Keras. x2 NN y2 x2 NN y2 ๐ฆ 2 ๐ฆ 2 ๐ 2 ๐ 2 Mini-batch Mini-batch x16 NN y16 x16 NN y26 ๐ฆ 16 ๐ฆ 26 ๐ 16 ๐ 26 โฆโฆ โฆโฆ
46
The Power of Deep? Results on Training Data
Deeper usually does not imply better.
47
Vanishing Gradient Problem
Smaller gradients Large input Small output โฆโฆ โฆโฆ ๐ฆ 1 ๐ฆ 2 ๐ฆ ๐ โฆโฆ ๐ฆ 1 ๐ฆ 2 ๐ฆ ๐ โฆโฆ โฆโฆ โฆโฆ โฆโฆ โฆโฆ ๐ถ +โ๐ถ โฆโฆ +โ๐ค Intuitive way to compute the derivatives โฆ ๐๐ถ ๐๐ค =? โ๐ถ โ๐ค
48
Vanishing Gradient Problem
โฆโฆ โฆโฆ y1 y2 yM โฆโฆ โฆโฆ โฆโฆ โฆโฆ โฆโฆ โฆโฆ Smaller gradients Larger gradients Learn very slow Learn very fast Almost random Already converge based on random!?
49
ReLU Rectified Linear Unit (ReLU) Reason: ๐ ๐ง 1. Fast to compute
๐ ๐=๐ง ๐=0 ๐ ๐ง 1. Fast to compute 2. Vanishing gradient problem [Xavier Glorot, AISTATSโ11] [Andrew L. Maas, ICMLโ13] [Kaiming He, arXivโ15]
50
๐ง ๐ ๐=๐ง ๐=0 ReLU
51
ReLU A Thinner linear network Do not have smaller gradients ๐ ๐=๐ง ๐=0
With different input data, it becomes a piece-wise linear approximation of nonlinear network.
52
Maxout ReLU is a special cases of Maxout
Learnable activation function [Ian J. Goodfellow, ICMLโ13] neuron + 5 + 1 Input Max Max 7 2 + 7 + 2 + โ1 + 4 Max Max 1 4 + 1 + 3
53
Maxout ReLU is a special cases of Maxout + Input ๐ง 1 ๐ง 2 ๐ Input ๐ง ๐ค ๐
๐๐๐ฅ ๐ง 1 , ๐ง 2 Input ReLU ๐ง ๐ค ๐ ๐ ๐ค ๐ ๐ ๐ ๐ง=๐ค๐ฅ+๐ ๐ง 1 =๐ค๐ฅ+๐ ๐ฅ ๐ฅ ๐ง 2 =0
54
Learnable Activation Function
Maxout More than ReLU Max Input + ๐ง 1 ๐ง 2 ๐ ๐๐๐ฅ ๐ง 1 , ๐ง 2 Input ReLU ๐ง ๐ค ๐ ๐ ๐ค ๐ ๐ค โฒ ๐ โฒ Learnable Activation Function ๐ ๐ ๐ง=๐ค๐ฅ+๐ ๐ง 1 =๐ค๐ฅ+๐ ๐ฅ ๐ฅ ๐ง 2 = ๐ค โฒ ๐ฅ+ ๐ โฒ
55
Dropout Training: Each time before updating the parameters
Each neuron has p% to dropout
56
Dropout Training: Thinner! Each time before updating the parameters
Each neuron has p% to dropout The structure of the network is changed. Using the new network for training
57
Dropout Testing: No dropout
If the dropout rate at training is p%, all the weights times 1-p% Assume that the dropout rate is 50%. If a weight w=1 by training, set ๐ค=0.5 for testing.
58
Dropout - Intuitive Reason
Training of Dropout Testing of Dropout Assume dropout rate is 50% No dropout Weights from training ๐ง โฒ โ2๐ง ๐ค 1 0.5ร ๐ค 1 ๐ค 2 ๐ง 0.5ร ๐ค 2 ๐ง โฒ ๐ค 3 0.5ร ๐ค 3 ๐ค 4 0.5ร ๐ค 4 ๐ง โฒ โ๐ง Weights multiply 1-p%
59
Dropout - Intuitive Reason
When teams up, if everyone expect the partner will do the work, nothing will be done finally. However, if you know your partner will dropout, you will do better. When testing, no one dropout actually, so obtaining good results eventually.
60
Why Deep? Layer X Size Word Error Rate (%) 1 X 2k 24.2 2 X 2k 20.4
18.4 4 X 2k 17.8 5 X 2k 17.2 7 X 2k 17.1 1 X 16k 22.1 Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks."ย Interspeech
61
Fat + Short v.s. Thin + Tall
The same number of parameters โฆโฆ Shallow Which one is better? โฆโฆ Deep
62
Modularization Deep โ Modularization
Donโt put everything in your main function.
63
Modularization Deep โ Modularization weak few examples Classifier 1
Girls with long hair Classifier 2 Boys with long hair Image weak few examples Classifier 3 Girls with short hair Classifier 4 Boys with short hair
64
Modularization Deep โ Modularization Classifier 1 Girls with long hair
Boy or Girl? Classifier 2 Boys with long hair Image good few data Basic Classifier Classifier 3 Girls with short hair Long or short? Classifier 4 Boys with short hair
65
Modularization โ Less training data? Deep โ Modularization
โฆโฆ The modularization is automatically learned from data. The most basic classifiers Use 1st layer as module to build classifiers Use 2nd layer as module โฆโฆ
66
Universality Theorem Any continuous function f
Can be realized by a network with one hidden layer Reference for the reason: (given enough hidden neurons) Yes, shallow network can represent any function. However, using deep structure is more effective.
67
Analogy Logic circuits Neural network Logic circuits consists of gates
A two layers of logic gates can represent any Boolean function. Using multiple layers of logic gates to build some functions are much simpler Neural network consists of neurons A hidden layer network can represent any continuous function. Using multiple layers of neurons to represent some functions are much simpler less parameters less data? less gates needed
68
With multiple layers, we need only O(d) gates.
Analogy E.g. parity check For input sequence with d bits, Circuit 1 (even) Two-layer circuit need O(2d) gates. Circuit 0 (odd) XNOR 1 1 1 With multiple layers, we need only O(d) gates.
69
1 29 0.1 75 -1 1 35 -3 1 -1 -15 1 78 1 -1 1 3 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.