Machine Learning & Deep Learning
AI vs. Human
AI 기술의 새로운 등장 Artificial Intelligence Machine Learning Deep Learning
강의 계획 강의 발표 평가 Machine Learning & Deep Learning 기초 Deep Learning Book Machine learning basic concepts Linear regression Logistic regression (classification) Multivariable (Vector) linear/logistic regression Neural networks 발표 Deep Learning Book 평가 중간고사: 45% 기말고사: 45% 출석: 10%
강의 교재 (발표) http://www.deeplearningbook.org/ 저자: Ian Goodfellow, Yoshua Bengio and Aaron Courville
Machine Learning Basics
Basic concepts What is ML? What is learning? What is regression? supervised learning unsupervised learning What is regression? What is classification?
Supervised/Unsupervised learning learning with labeled examples - training set
Supervised/Unsupervised learning
Supervised Learning
Supervised Learning Training data set
Supervised Learning AlphaGo
Types of Supervised Learning
Predicting final exam score : Regression
Predicting Pass/non-pass : binary classification
Predicting grades (A, B, …) : multi-labels classification
Linear Regression
Predicting exam score: regression
Regression data
Linear Hypothesis
Linear Hypothesis
Which hypothesis is better?
Cost function How fit the line to our (training) data
Cost function How fit the line to our (training) data
Cost function
ML Goal To minimize the cost function 𝑎𝑟𝑔𝑚𝑖𝑛 {𝑊,𝑏} 𝑐𝑜𝑠𝑡(𝑊,𝑏)
Hypothesis and Cost
Simplified hypothesis
What cost(W) looks like?
What cost(W) looks like?
What cost(W) looks like?
How to minimize the cost
Gradient descent algorithm
Gradient descent algorithm How it works?
Cost function: formal definition
Cost function: formal definition
Cost function: convex function
Cost function: convex function
Multi-variable linear regression
Predicting exam score: regression using two inputs (x1, x2)
Hypothesis
Cost function
Matrix notation
Matrix notation Hypothesis without b
Logistic regression
Classification
Classification
Classification
Linear Regression을 이용한 classification 0.5
Linear Regression을 이용한 classification 0.5
Linear Regression을 이용한 classification Linear regression hypothesis H(x) = Wx + b 1보다 크고 0보다 작을 수도 있음 0 ~ 1 사이의 값으로 scaling 필요
Logistic regression Sigmoid function (or Logistic function) 이용
Logistic hypothesis
Cost function
New cost function for logistic regression
New cost function for logistic regression y=1 일 때, H(x)=1로 예측 => cost = 0 H(x)=0로 예측 => cost = y=0 일 때, H(x)=0로 예측 => cost = 0 H(x)=1로 예측 => cost =
New cost function for logistic regression 𝐶 𝐻 𝑥 , 𝑦 =−𝑦 log 𝐻 𝑥 − 1−𝑦 log(1−𝐻 𝑥 )
Gradient descent algorithm
Multinomial classification
Logistic regression HL(X) = WX z = H(X) g(z) : 0 ~ 1 사이의 값을 가지도록 HR(X) = g(HL(X)) z X 𝑌 : 0 ~ 1 W
Multinomial classification
Multinomial classification X W z 𝒀 : 0 ~ 1 X W z 𝒀 : 0 ~ 1 X W z 𝒀 : 0 ~ 1
Multinomial classification X W z 𝒀 𝑨 B X W z 𝒀𝑩 C X W z 𝒀 𝑪
Multinomial classification X W z 𝒀 𝑨 𝑤𝐴1 𝑤𝐴2 𝑤𝐴3 𝑤𝐵1 𝑤𝐵2 𝑤𝐵3 𝑤𝐶1 𝑤𝐶1 𝑤𝐶1 𝑥1 𝑥2 𝑥3 B X W z 𝒀𝑩 C X W z 𝒀 𝑪
Softmax function 𝒀 𝑨 𝒀 𝑩 X Y = 𝒀 𝑪 W 0.7 0.2 0.1 = 2.0 1.0 0.1 S(Y) = 2.0 1.0 0.1 0.7 0.2 0.1 Y = probability
Cost function 0.7 0.2 0.1 1 0 0 𝐷 𝑆, 𝐿 =− 𝑖 𝐿 𝑖 ∗log( 𝑆 𝑖 ) S(Y) Cross-entropy S(Y) L = Y (실제값) 0.7 0.2 0.1 1 0 0 𝐷 𝑆, 𝐿 =− 𝑖 𝐿 𝑖 ∗log( 𝑆 𝑖 )
Cross-entropy cost function 𝐷 𝑆, 𝐿 = 𝑖 {𝐿 𝑖 ∗−log( 𝑆 𝑖 )} 예를 들어, 실제값 Y = L = [0 1]t
Logistic cost vs. Cross entropy cost 𝐶 𝐻 𝑥 , 𝑦 =−𝑦 log 𝐻 𝑥 − 1−𝑦 log(1−𝐻 𝑥 ) 𝐷 𝑆, 𝐿 =− 𝑖 𝐿 𝑖 ∗log( 𝑆 𝑖 )
Final cost (loss) function 𝐿𝑜𝑠𝑠 𝑊 = 1 𝑁 𝑖 𝐷(𝑆 𝑖 , 𝐿 𝑖 ) Training data
Gradient descent 𝑤𝑖←𝑤𝑖−𝛼 𝜕𝐿𝑜𝑠𝑠 𝑊 𝜕𝑤𝑖
Learning rate
Large learning rate: overshooting
Small learning rate: takes too long
Optimal learning rates ? Observe the cost function Check it goes down in a reasonable rate
Data preprocessing
Data preprocessing for gradient descent w2 w1
Data preprocessing for gradient descent w2 w1
Data preprocessing for gradient descent
Standardization
Overfitting
Overfitting The ML model is very good only with the training data set Memorization Not good at test set or in real use
Overfitting
Solutions for overfitting More training data Reduce the number of features Regularization
Regularization Try not to have too big number in the weights 특정 wi 가 커지는 경우
Regularization 𝐿𝑜𝑠𝑠 𝑊 = 1 𝑁 𝑖 𝐷(𝑆(𝑊𝑋 𝑖 +𝑏), 𝐿 𝑖 )+ 𝑤 𝑖 2 𝐿𝑜𝑠𝑠 𝑊 = 1 𝑁 𝑖 𝐷(𝑆(𝑊𝑋 𝑖 +𝑏), 𝐿 𝑖 )+ 𝑤 𝑖 2 : regularization strength 범위: 0 ~ 1