Machine Learning – Classification David Fenyő

Machine Learning – Classification David Fenyő
Contact:

Supervised Learning: Classification

Generative or Discriminant Algorithms
Generative algorithm: Learns the probabilities of data given the hypothesis p(D|H) and the prior probability of the hypothesis p(H) and calculates the probability of the hypothesis given the data p(H|D) using Bayes Rule, and derives decision boundary using p(H|D). - In general a lot of data is needed to estimate the conditional probabilities. Discriminant algorithm: Learns the probability of the hypothesis given the data p(H|D) or the decision boundary directly.

Generative or Discriminant Algorithms
“One should solve the classification problem directly and never solve a more general problem as an intermediate step“, Vapnik, Statistical Learning Theory, John Wiley & Sons 1998 Nguyen et al., “Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space”,

Bayes Optimal Classifier
Assigns each observation to the most likely class, given its predictor values. Need to know the conditional probabilities. These can be estimated from data but a lot of training data is needed.

Estimating Conditional Probabilities
Label 0 Label 1 Label 0 Label 1 Probability of Label 1 Probability Of Label 1

Naïve Bayes Classifier
Assumption: features are independent. Reduced the amount of data needed to estimated the conditional probabilities.

𝑦= 0 𝑖𝑓 𝒙∙𝒘<0 1 𝑖𝑓 𝒙∙𝒘>0
The Perceptron – A Simple Linear Classifier 10 Linear Regression: 𝑦=𝒙∙𝒘+𝜖 𝒙=(1, 𝑥 1 , 𝑥 2 , 𝑥 3 ,…, 𝑥 𝑘 ) 𝒘=( 𝑤 0 , 𝑤 1 , 𝑤 2 , 𝑤 3 ,… , 𝑤 𝑘 ) Perceptron: 𝑦= 0 𝑖𝑓 𝒙∙𝒘<0 1 𝑖𝑓 𝒙∙𝒘>0

𝑦= 0 𝑖𝑓 𝑤 1 𝑥 1 + 𝑤 0 <0 1 𝑖𝑓 𝑤 1 𝑥 1 + 𝑤 0 >0
The Perceptron – A Simple Linear Classifier 11 Linear Regression: 𝑦= 𝑤 1 𝑥 1 + 𝑤 0 +𝜖 Perceptron: 𝑦= 0 𝑖𝑓 𝑤 1 𝑥 1 + 𝑤 0 <0 1 𝑖𝑓 𝑤 1 𝑥 1 + 𝑤 0 >0

The Perceptron Learning Algorithm
The weight vector 𝒘 is initialized randomly Repeat until no misclassifications: Select a data point randomly If misclassified then update 𝒘 = 𝒘−𝒙𝑠𝑖𝑔𝑛(𝒙∙𝒘)

The Perceptron Learning Algorithm

Nearest Neighbors K = 1

Nearest Neighbors K = 8 K = 4 K = 2 K = 1

𝑦= 𝑤 1 𝑥 1 + 𝑤 0 +𝜖 𝑦=𝜎( 𝑤 1 𝑥 1 + 𝑤 0 +𝜖) Logistic Regression
Linear Regression: 𝑦= 𝑤 1 𝑥 1 + 𝑤 0 +𝜖 Logistic Regression: 𝑦=𝜎( 𝑤 1 𝑥 1 + 𝑤 0 +𝜖) where 𝜎(𝑡)= 1 1+ 𝑒 −𝑡 17 𝑤 1 =1 𝑤 1 =10

𝑦=𝒙∙𝒘+𝜖 𝑦=𝜎(𝒙∙𝒘+𝜖) Logistic Regression Linear Regression:
18 Linear Regression: 𝑦=𝒙∙𝒘+𝜖 𝒙=(1, 𝑥 1 , 𝑥 2 , 𝑥 3 ,…, 𝑥 𝑘 ) 𝒘=( 𝑤 0 , 𝑤 1 , 𝑤 2 , 𝑤 3 ,… , 𝑤 𝑘 ) Logistic Regression: 𝑦=𝜎(𝒙∙𝒘+𝜖) where 𝜎(𝑡)= 1 1+ 𝑒 −𝑡

Logistic Regression 19

Sum of Square Errors as Loss Function
𝑤 1 𝑤 0

Sum of Square Errors as Loss Function
𝑤 1 𝑤 0 𝑤 0 𝑤 1

𝐿 𝒘 =log⁡( 𝑖=1 𝑛 𝜎 𝒙 𝑖 𝑦 𝑖 (1−𝜎( 𝒙 𝑖 )) 1−𝑦 𝑖 )=
Logistic Regression – Loss Function 𝐿 𝒘 =log⁡( 𝑖=1 𝑛 𝜎 𝒙 𝑖 𝑦 𝑖 (1−𝜎( 𝒙 𝑖 )) 1−𝑦 𝑖 )= 𝑖=1 𝑛 𝑦 𝑖 log 𝜎 𝒙 𝑖 + (1−𝑦 𝑖 ) log 1−𝜎 𝒙 𝑖 where 𝜎(𝑡)= 1 1+ 𝑒 −𝑡

Logistic Regression – Error Landscape
𝑤 1 𝑤 0

𝑤 1 𝑤 0 𝑤 0 𝑤 1

𝑤 1 𝑤 1 𝑤 0 𝑤 0

𝑤 1 𝑤 1 𝑤 1 𝑤 0 𝑤 0 𝑤 0

𝑤 𝑛+1 = 𝑤 𝑛 −𝜂 𝐿 𝑤 𝑛 +∆𝑤 −𝐿( 𝑤 𝑛 ) ∆𝑤
Gradient Descent min 𝒘 𝑳 𝒘 𝒘 𝑛+1 = 𝒘 𝑛 −𝜂𝛁𝐿( 𝒘 𝑛 ) 𝑤 𝑛+1 = 𝑤 𝑛 −𝜂 𝐿 𝑤 𝑛 +∆𝑤 −𝐿( 𝑤 𝑛 ) ∆𝑤 𝑤 𝑛+1 = 𝑤 𝑛 −𝜂 𝐿 𝑤 𝑛 +∆𝑤 −𝐿( 𝑤 𝑛 −∆𝑤) 2∆𝑤

Logistic Regression – Gradient Descent
𝑤 1 𝑤 1 𝑤 0 𝑤 0 Hyperparameters: Learning rate Learning rate schedule Gradient memory

Estimating Conditional Probabilities
Label 0 Label 1 Label 0 Label 1 Probability of Label 1 Probability Of Label 1

Logistic Regression and Fraction
on sample Probability of Label 1 from distribution Difference

Evaluation of Binary Classification Models
Predicted True Negative False Positive 1 33 Actual False Negative True Positive True Positive Rate / Sensitivity / Recall = TP/(TP+FN) – fraction of label 1 predicted to be label 1 False Positive Rate = FP/(FP+TN) – fraction of label 0 predicted to be label 1 Accuracy = (TP+TN)/total - fraction of correct predictions Precision = TP/(TP+FP) – fraction of correct among positive predictions False discovery rate = 1 – precision Specificity = TN/(TN+FP) – fraction of correct predictions among label 0

Evaluation of Binary Classification Models
Label 0 Label 1 Label 0 Label 1 True Positives True Positives False Positives False Positives

Example: Species Identification
Teubl et al., Manuscript in preparation

Example: Detection of Transposon Insertions
Tang et al. “Human transposon insertion profiling: Analysis, visualization and identification of somatic LINE-1 insertions in ovarian cancer”, PNAS 2017;114:E733-E740

Choosing Hyperparameters
Data Set Test Training

Data Set Test Training Cross-Validation: Choosing Hyperparameters
40 Data Set Test Training Training 1 Validation 1 Training 2 Validation 2 Training 3 Validation 3 Training 4 Validation4

Home Work Learn the nomenclature for evaluating binary classifiers (precision, recall, false positive rate etc.) Compare logistic regression and k nearest neighbors on data from different distributions, variances and sample sizes. 41

Machine Learning – Classification David Fenyő

Similar presentations

Presentation on theme: "Machine Learning – Classification David Fenyő"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning – Classification David Fenyő

Similar presentations

Presentation on theme: "Machine Learning – Classification David Fenyő"— Presentation transcript:

Similar presentations

About project

Feedback