Machine Learning & Deep Learning

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

Linear Regression.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Supervised Learning Recap
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
The loss function, the normal equation,
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
x – independent variable (input)
Collaborative Filtering Matrix Factorization Approach
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Neural Network Introduction Hung-yi Lee. Review: Supervised Learning Training: Pick the “best” Function f * Training Data Model Testing: Hypothesis Function.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
The problem of overfitting
Regularization (Additional)
M Machine Learning F# and Accord.net.
Logistic Regression William Cohen.
Machine Learning in CSC 196K
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Who am I? Work in Probabilistic Machine Learning Like to teach 
Deep Feedforward Networks
Dhruv Batra Georgia Tech
Introduction to Linear Regression
CSE 4705 Artificial Intelligence
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Data Mining Lecture 11.
Neural Networks and Backpropagation
A brief introduction to neural network
Machine Learning & Data Science
Logistic Regression Classification Machine Learning.
Logistic Regression Classification Machine Learning.
Collaborative Filtering Matrix Factorization Approach
Logistic Regression Classification Machine Learning.
CSCI B609: “Foundations of Data Science”
Overview of Machine Learning
[Figure taken from googleblog
Deep Learning for Non-Linear Control
ML – Lecture 3B Deep NN.
Neural networks (1) Traditional multi-layer perceptrons
Artificial Intelligence 10. Neural Networks
Machine Learning Algorithms – An Overview
Machine learning overview
Image Classification & Training of Neural Networks
Neural networks (3) Regularization Autoencoder
Martin Schrimpf & Jon Gauthier MIT BCS Peer Lectures
Logistic Regression Chapter 7.
Linear Discrimination
Machine Learning – a Probabilistic Perspective
Logistic Regression Classification Machine Learning.
Introduction to Deep Learning
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Introduction to Neural Networks
Machine learning CS 229 / stats 229
Logistic Regression Classification Machine Learning.
Linear regression with one variable
Logistic Regression Classification Machine Learning.
Artificial Neural Network learning
What is Artificial Intelligence?
An introduction to neural network and machine learning
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

Machine Learning & Deep Learning

AI vs. Human

AI 기술의 새로운 등장 Artificial Intelligence Machine Learning Deep Learning

강의 계획 강의 발표 평가 Machine Learning & Deep Learning 기초 Deep Learning Book Machine learning basic concepts Linear regression Logistic regression (classification) Multivariable (Vector) linear/logistic regression Neural networks 발표 Deep Learning Book 평가 중간고사: 45% 기말고사: 45% 출석: 10%

강의 교재 (발표) http://www.deeplearningbook.org/ 저자: Ian Goodfellow, Yoshua Bengio and Aaron Courville

Machine Learning Basics

Basic concepts What is ML? What is learning? What is regression? supervised learning unsupervised learning What is regression? What is classification?

Supervised/Unsupervised learning learning with labeled examples - training set

Supervised/Unsupervised learning

Supervised Learning

Supervised Learning Training data set

Supervised Learning AlphaGo

Types of Supervised Learning

Predicting final exam score : Regression

Predicting Pass/non-pass : binary classification

Predicting grades (A, B, …) : multi-labels classification

Linear Regression

Predicting exam score: regression

Regression data

Linear Hypothesis

Linear Hypothesis

Which hypothesis is better?

Cost function How fit the line to our (training) data

Cost function How fit the line to our (training) data

Cost function

ML Goal To minimize the cost function 𝑎𝑟𝑔𝑚𝑖𝑛 {𝑊,𝑏} 𝑐𝑜𝑠𝑡(𝑊,𝑏)

Hypothesis and Cost

Simplified hypothesis

What cost(W) looks like?

What cost(W) looks like?

What cost(W) looks like?

How to minimize the cost

Gradient descent algorithm

Gradient descent algorithm How it works?

Cost function: formal definition

Cost function: formal definition

Cost function: convex function

Cost function: convex function

Multi-variable linear regression

Predicting exam score: regression using two inputs (x1, x2)

Hypothesis

Cost function

Matrix notation

Matrix notation Hypothesis without b

Logistic regression

Classification

Classification

Classification

Linear Regression을 이용한 classification 0.5

Linear Regression을 이용한 classification 0.5

Linear Regression을 이용한 classification Linear regression hypothesis H(x) = Wx + b 1보다 크고 0보다 작을 수도 있음 0 ~ 1 사이의 값으로 scaling 필요

Logistic regression Sigmoid function (or Logistic function) 이용

Logistic hypothesis

Cost function

New cost function for logistic regression

New cost function for logistic regression y=1 일 때, H(x)=1로 예측 => cost = 0 H(x)=0로 예측 => cost =  y=0 일 때, H(x)=0로 예측 => cost = 0 H(x)=1로 예측 => cost = 

New cost function for logistic regression 𝐶 𝐻 𝑥 , 𝑦 =−𝑦 log 𝐻 𝑥 − 1−𝑦 log⁡(1−𝐻 𝑥 )

Gradient descent algorithm

Multinomial classification

Logistic regression HL(X) = WX z = H(X) g(z) : 0 ~ 1 사이의 값을 가지도록 HR(X) = g(HL(X)) z X 𝑌 : 0 ~ 1 W

Multinomial classification

Multinomial classification X W z 𝒀 : 0 ~ 1 X W z 𝒀 : 0 ~ 1 X W z 𝒀 : 0 ~ 1

Multinomial classification X W z 𝒀 𝑨 B X W z 𝒀𝑩 C X W z 𝒀 𝑪

Multinomial classification X W z 𝒀 𝑨 𝑤𝐴1 𝑤𝐴2 𝑤𝐴3 𝑤𝐵1 𝑤𝐵2 𝑤𝐵3 𝑤𝐶1 𝑤𝐶1 𝑤𝐶1 𝑥1 𝑥2 𝑥3 B X W z 𝒀𝑩 C X W z 𝒀 𝑪

Softmax function 𝒀 𝑨 𝒀 𝑩 X Y = 𝒀 𝑪 W 0.7 0.2 0.1 = 2.0 1.0 0.1 S(Y) = 2.0 1.0 0.1 0.7 0.2 0.1 Y = probability

Cost function 0.7 0.2 0.1 1 0 0 𝐷 𝑆, 𝐿 =− 𝑖 𝐿 𝑖 ∗log⁡( 𝑆 𝑖 ) S(Y) Cross-entropy S(Y) L = Y (실제값) 0.7 0.2 0.1 1 0 0 𝐷 𝑆, 𝐿 =− 𝑖 𝐿 𝑖 ∗log⁡( 𝑆 𝑖 )

Cross-entropy cost function 𝐷 𝑆, 𝐿 = 𝑖 {𝐿 𝑖 ∗−log⁡( 𝑆 𝑖 )} 예를 들어, 실제값 Y = L = [0 1]t

Logistic cost vs. Cross entropy cost 𝐶 𝐻 𝑥 , 𝑦 =−𝑦 log 𝐻 𝑥 − 1−𝑦 log⁡(1−𝐻 𝑥 ) 𝐷 𝑆, 𝐿 =− 𝑖 𝐿 𝑖 ∗log⁡( 𝑆 𝑖 )

Final cost (loss) function 𝐿𝑜𝑠𝑠 𝑊 = 1 𝑁 𝑖 𝐷(𝑆 𝑖 , 𝐿 𝑖 ) Training data

Gradient descent 𝑤𝑖←𝑤𝑖−𝛼 𝜕𝐿𝑜𝑠𝑠 𝑊 𝜕𝑤𝑖

Learning rate

Large learning rate: overshooting

Small learning rate: takes too long

Optimal learning rates ? Observe the cost function Check it goes down in a reasonable rate

Data preprocessing

Data preprocessing for gradient descent w2 w1

Data preprocessing for gradient descent w2 w1

Data preprocessing for gradient descent

Standardization

Overfitting

Overfitting The ML model is very good only with the training data set Memorization Not good at test set or in real use

Overfitting

Solutions for overfitting More training data Reduce the number of features Regularization

Regularization Try not to have too big number in the weights 특정 wi 가 커지는 경우

Regularization 𝐿𝑜𝑠𝑠 𝑊 = 1 𝑁 𝑖 𝐷(𝑆(𝑊𝑋 𝑖 +𝑏), 𝐿 𝑖 )+ 𝑤 𝑖 2 𝐿𝑜𝑠𝑠 𝑊 = 1 𝑁 𝑖 𝐷(𝑆(𝑊𝑋 𝑖 +𝑏), 𝐿 𝑖 )+ 𝑤 𝑖 2 : regularization strength 범위: 0 ~ 1