Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning & Deep Learning

Similar presentations


Presentation on theme: "Machine Learning & Deep Learning"— Presentation transcript:

1 Machine Learning & Deep Learning

2 AI vs. Human

3 AI 기술의 새로운 등장 Artificial Intelligence Machine Learning Deep Learning

4 강의 계획 강의 발표 평가 Machine Learning & Deep Learning 기초 Deep Learning Book
Machine learning basic concepts Linear regression Logistic regression (classification) Multivariable (Vector) linear/logistic regression Neural networks 발표 Deep Learning Book 평가 중간고사: 45% 기말고사: 45% 출석: 10%

5 강의 교재 (발표) http://www.deeplearningbook.org/
저자: Ian Goodfellow, Yoshua Bengio and Aaron Courville

6 Machine Learning Basics

7 Basic concepts What is ML? What is learning? What is regression?
supervised learning unsupervised learning What is regression? What is classification?

8 Supervised/Unsupervised learning
learning with labeled examples - training set

9 Supervised/Unsupervised learning

10 Supervised Learning

11 Supervised Learning Training data set

12 Supervised Learning AlphaGo

13 Types of Supervised Learning

14 Predicting final exam score : Regression

15 Predicting Pass/non-pass : binary classification

16 Predicting grades (A, B, …) : multi-labels classification

17 Linear Regression

18 Predicting exam score: regression

19 Regression data

20 Linear Hypothesis

21 Linear Hypothesis

22 Which hypothesis is better?

23 Cost function How fit the line to our (training) data

24 Cost function How fit the line to our (training) data

25 Cost function

26 ML Goal To minimize the cost function 𝑎𝑟𝑔𝑚𝑖𝑛 {𝑊,𝑏} 𝑐𝑜𝑠𝑡(𝑊,𝑏)

27 Hypothesis and Cost

28 Simplified hypothesis

29 What cost(W) looks like?

30 What cost(W) looks like?

31 What cost(W) looks like?

32 How to minimize the cost

33 Gradient descent algorithm

34 Gradient descent algorithm
How it works?

35 Cost function: formal definition

36 Cost function: formal definition

37 Cost function: convex function

38 Cost function: convex function

39 Multi-variable linear regression

40 Predicting exam score: regression using two inputs (x1, x2)

41 Hypothesis

42 Cost function

43 Matrix notation

44 Matrix notation Hypothesis without b

45 Logistic regression

46 Classification

47 Classification

48 Classification

49 Linear Regression을 이용한 classification
0.5

50 Linear Regression을 이용한 classification
0.5

51 Linear Regression을 이용한 classification
Linear regression hypothesis H(x) = Wx + b 1보다 크고 0보다 작을 수도 있음 0 ~ 1 사이의 값으로 scaling 필요

52 Logistic regression Sigmoid function (or Logistic function) 이용

53 Logistic hypothesis

54 Cost function

55 New cost function for logistic regression

56 New cost function for logistic regression
y=1 일 때, H(x)=1로 예측 => cost = 0 H(x)=0로 예측 => cost =  y=0 일 때, H(x)=0로 예측 => cost = 0 H(x)=1로 예측 => cost = 

57 New cost function for logistic regression
𝐶 𝐻 𝑥 , 𝑦 =−𝑦 log 𝐻 𝑥 − 1−𝑦 log⁡(1−𝐻 𝑥 )

58 Gradient descent algorithm

59 Multinomial classification

60 Logistic regression HL(X) = WX z = H(X) g(z) : 0 ~ 1 사이의 값을 가지도록
HR(X) = g(HL(X)) z X 𝑌 : 0 ~ 1 W

61 Multinomial classification

62 Multinomial classification
X W z 𝒀 : 0 ~ 1 X W z 𝒀 : 0 ~ 1 X W z 𝒀 : 0 ~ 1

63 Multinomial classification
X W z 𝒀 𝑨 B X W z 𝒀𝑩 C X W z 𝒀 𝑪

64 Multinomial classification
X W z 𝒀 𝑨 𝑤𝐴1 𝑤𝐴2 𝑤𝐴3 𝑤𝐵1 𝑤𝐵2 𝑤𝐵3 𝑤𝐶1 𝑤𝐶1 𝑤𝐶1 𝑥1 𝑥2 𝑥3 B X W z 𝒀𝑩 C X W z 𝒀 𝑪

65 Softmax function 𝒀 𝑨 𝒀 𝑩 X Y = 𝒀 𝑪 W 0.7 0.2 0.1 = 2.0 1.0 0.1 S(Y)
= Y = probability

66 Cost function 0.7 0.2 0.1 1 0 0 𝐷 𝑆, 𝐿 =− 𝑖 𝐿 𝑖 ∗log⁡( 𝑆 𝑖 ) S(Y)
Cross-entropy S(Y) L = Y (실제값) 1 0 0 𝐷 𝑆, 𝐿 =− 𝑖 𝐿 𝑖 ∗log⁡( 𝑆 𝑖 )

67 Cross-entropy cost function
𝐷 𝑆, 𝐿 = 𝑖 {𝐿 𝑖 ∗−log⁡( 𝑆 𝑖 )} 예를 들어, 실제값 Y = L = [0 1]t

68 Logistic cost vs. Cross entropy cost
𝐶 𝐻 𝑥 , 𝑦 =−𝑦 log 𝐻 𝑥 − 1−𝑦 log⁡(1−𝐻 𝑥 ) 𝐷 𝑆, 𝐿 =− 𝑖 𝐿 𝑖 ∗log⁡( 𝑆 𝑖 )

69 Final cost (loss) function
𝐿𝑜𝑠𝑠 𝑊 = 1 𝑁 𝑖 𝐷(𝑆 𝑖 , 𝐿 𝑖 ) Training data

70 Gradient descent 𝑤𝑖←𝑤𝑖−𝛼 𝜕𝐿𝑜𝑠𝑠 𝑊 𝜕𝑤𝑖

71 Learning rate

72 Large learning rate: overshooting

73 Small learning rate: takes too long

74 Optimal learning rates ?
Observe the cost function Check it goes down in a reasonable rate

75 Data preprocessing

76 Data preprocessing for gradient descent
w2 w1

77 Data preprocessing for gradient descent
w2 w1

78 Data preprocessing for gradient descent

79 Standardization

80 Overfitting

81 Overfitting The ML model is very good only with the training data set
Memorization Not good at test set or in real use

82 Overfitting

83 Solutions for overfitting
More training data Reduce the number of features Regularization

84 Regularization Try not to have too big number in the weights
특정 wi 가 커지는 경우

85 Regularization 𝐿𝑜𝑠𝑠 𝑊 = 1 𝑁 𝑖 𝐷(𝑆(𝑊𝑋 𝑖 +𝑏), 𝐿 𝑖 )+ 𝑤 𝑖 2
𝐿𝑜𝑠𝑠 𝑊 = 1 𝑁 𝑖 𝐷(𝑆(𝑊𝑋 𝑖 +𝑏), 𝐿 𝑖 )+ 𝑤 𝑖 2 : regularization strength 범위: 0 ~ 1


Download ppt "Machine Learning & Deep Learning"

Similar presentations


Ads by Google