Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning – Classification David Fenyő

Similar presentations


Presentation on theme: "Machine Learning – Classification David Fenyő"— Presentation transcript:

1 Machine Learning – Classification David Fenyő
Contact:

2 Supervised Learning: Classification

3 Generative or Discriminant Algorithms
Generative algorithm: Learns the probabilities of data given the hypothesis p(D|H) and the prior probability of the hypothesis p(H) and calculates the probability of the hypothesis given the data p(H|D) using Bayes Rule, and derives decision boundary using p(H|D). - In general a lot of data is needed to estimate the conditional probabilities. Discriminant algorithm: Learns the probability of the hypothesis given the data p(H|D) or the decision boundary directly.

4 Generative or Discriminant Algorithms
“One should solve the classification problem directly and never solve a more general problem as an intermediate step“, Vapnik, Statistical Learning Theory, John Wiley & Sons 1998 Nguyen et al., “Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space”,

5 Probability: Bayes Rule
Multiplication Rule 5 P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A) P(A|B) = P(B|A)P(A)/P(B) Bayes Rule Likelyhood Hypothesis (H) Data (D) P(H|D) = P(D|H) P(H) / P(D) Posterior Probability Prior Probability

6 … Bayes Rule: More Data P(H|D) = P(D|H) P(H) / P(D) Posterior Prior
Hypothesis (H) Data (D) P(H|D) = P(D|H) P(H) / P(D) Posterior Probability Prior Probability P(H|D1) = P(D1|H) P(H) / P(D1) P(H|D1,D2) = P(D2|H) P(H|D1) / P(D2) P(H|D1,D2,D3) = P(D3|H) P(H|D1,D2) / P(D3) 𝑃 𝐻| 𝐷 1 … 𝐷 𝑛 =𝑃(𝐻) 𝑘=1 𝑛 𝑃(𝐷 𝑘 |𝐻) 𝑃( 𝐷 𝑘 )

7 Bayes Optimal Classifier
Assigns each observation to the most likely class, given its predictor values. Need to know the conditional probabilities. These can be estimated from data but a lot of training data is needed.

8 Estimating Conditional Probabilities
Label 0 Label 1 Label 0 Label 1 Probability of Label 1 Probability Of Label 1

9 Naïve Bayes Classifier
Assumption: features are independent. Reduced the amount of data needed to estimated the conditional probabilities.

10 𝑦= 0 𝑖𝑓 𝒙∙𝒘<0 1 𝑖𝑓 𝒙∙𝒘>0
The Perceptron – A Simple Linear Classifier 10 Linear Regression: 𝑦=𝒙∙𝒘+𝜖 𝒙=(1, 𝑥 1 , 𝑥 2 , 𝑥 3 ,…, 𝑥 𝑘 ) 𝒘=( 𝑤 0 , 𝑤 1 , 𝑤 2 , 𝑤 3 ,… , 𝑤 𝑘 ) Perceptron: 𝑦= 0 𝑖𝑓 𝒙∙𝒘<0 1 𝑖𝑓 𝒙∙𝒘>0

11 𝑦= 0 𝑖𝑓 𝑤 1 𝑥 1 + 𝑤 0 <0 1 𝑖𝑓 𝑤 1 𝑥 1 + 𝑤 0 >0
The Perceptron – A Simple Linear Classifier 11 Linear Regression: 𝑦= 𝑤 1 𝑥 1 + 𝑤 0 +𝜖 Perceptron: 𝑦= 0 𝑖𝑓 𝑤 1 𝑥 1 + 𝑤 0 <0 1 𝑖𝑓 𝑤 1 𝑥 1 + 𝑤 0 >0

12 The Perceptron Learning Algorithm
The weight vector 𝒘 is initialized randomly Repeat until no misclassifications: Select a data point randomly If misclassified then update 𝒘 = 𝒘−𝒙𝑠𝑖𝑔𝑛(𝒙∙𝒘)

13 The Perceptron Learning Algorithm

14 The Perceptron Learning Algorithm

15 Nearest Neighbors K = 1

16 Nearest Neighbors K = 8 K = 4 K = 2 K = 1

17 𝑦= 𝑤 1 𝑥 1 + 𝑤 0 +𝜖 𝑦=𝜎( 𝑤 1 𝑥 1 + 𝑤 0 +𝜖) Logistic Regression
Linear Regression: 𝑦= 𝑤 1 𝑥 1 + 𝑤 0 +𝜖 Logistic Regression: 𝑦=𝜎( 𝑤 1 𝑥 1 + 𝑤 0 +𝜖) where 𝜎(𝑡)= 1 1+ 𝑒 −𝑡 17 𝑤 1 =1 𝑤 1 =10

18 𝑦=𝒙∙𝒘+𝜖 𝑦=𝜎(𝒙∙𝒘+𝜖) Logistic Regression Linear Regression:
18 Linear Regression: 𝑦=𝒙∙𝒘+𝜖 𝒙=(1, 𝑥 1 , 𝑥 2 , 𝑥 3 ,…, 𝑥 𝑘 ) 𝒘=( 𝑤 0 , 𝑤 1 , 𝑤 2 , 𝑤 3 ,… , 𝑤 𝑘 ) Logistic Regression: 𝑦=𝜎(𝒙∙𝒘+𝜖) where 𝜎(𝑡)= 1 1+ 𝑒 −𝑡

19 Logistic Regression 19

20 Sum of Square Errors as Loss Function
𝑤 1 𝑤 0

21 Sum of Square Errors as Loss Function
𝑤 1 𝑤 0

22 Sum of Square Errors as Loss Function
𝑤 1 𝑤 0 𝑤 0 𝑤 1

23 𝐿 𝒘 =log⁡( 𝑖=1 𝑛 𝜎 𝒙 𝑖 𝑦 𝑖 (1−𝜎( 𝒙 𝑖 )) 1−𝑦 𝑖 )=
Logistic Regression – Loss Function 𝐿 𝒘 =log⁡( 𝑖=1 𝑛 𝜎 𝒙 𝑖 𝑦 𝑖 (1−𝜎( 𝒙 𝑖 )) 1−𝑦 𝑖 )= 𝑖=1 𝑛 𝑦 𝑖 log 𝜎 𝒙 𝑖 + (1−𝑦 𝑖 ) log 1−𝜎 𝒙 𝑖 where 𝜎(𝑡)= 1 1+ 𝑒 −𝑡

24 Logistic Regression – Error Landscape
𝑤 1 𝑤 0

25 Logistic Regression – Error Landscape
𝑤 1 𝑤 0

26 Logistic Regression – Error Landscape
𝑤 1 𝑤 0 𝑤 0 𝑤 1

27 Logistic Regression – Error Landscape
𝑤 1 𝑤 1 𝑤 0 𝑤 0

28 Logistic Regression – Error Landscape
𝑤 1 𝑤 1 𝑤 1 𝑤 0 𝑤 0 𝑤 0

29 𝑤 𝑛+1 = 𝑤 𝑛 −𝜂 𝐿 𝑤 𝑛 +∆𝑤 −𝐿( 𝑤 𝑛 ) ∆𝑤
Gradient Descent min 𝒘 𝑳 𝒘 𝒘 𝑛+1 = 𝒘 𝑛 −𝜂𝛁𝐿( 𝒘 𝑛 ) 𝑤 𝑛+1 = 𝑤 𝑛 −𝜂 𝐿 𝑤 𝑛 +∆𝑤 −𝐿( 𝑤 𝑛 ) ∆𝑤 𝑤 𝑛+1 = 𝑤 𝑛 −𝜂 𝐿 𝑤 𝑛 +∆𝑤 −𝐿( 𝑤 𝑛 −∆𝑤) 2∆𝑤

30 Logistic Regression – Gradient Descent
𝑤 1 𝑤 1 𝑤 0 𝑤 0 Hyperparameters: Learning rate Learning rate schedule Gradient memory

31 Estimating Conditional Probabilities
Label 0 Label 1 Label 0 Label 1 Probability of Label 1 Probability Of Label 1

32 Logistic Regression and Fraction
on sample Probability of Label 1 from distribution Difference

33 Evaluation of Binary Classification Models
Predicted True Negative False Positive 1 33 Actual False Negative True Positive True Positive Rate / Sensitivity / Recall = TP/(TP+FN) – fraction of label 1 predicted to be label 1 False Positive Rate = FP/(FP+TN) – fraction of label 0 predicted to be label 1 Accuracy = (TP+TN)/total - fraction of correct predictions Precision = TP/(TP+FP) – fraction of correct among positive predictions False discovery rate = 1 – precision Specificity = TN/(TN+FP) – fraction of correct predictions among label 0

34 Evaluation of Binary Classification Models
Label 0 Label 1 Label 0 Label 1 True Positives True Positives False Positives False Positives

35 Example: Species Identification
Teubl et al., Manuscript in preparation

36 Example: Detection of Transposon Insertions
Tang et al. “Human transposon insertion profiling: Analysis, visualization and identification of somatic LINE-1 insertions in ovarian cancer”, PNAS 2017;114:E733-E740

37 Example: Detection of Transposon Insertions
Tang et al. “Human transposon insertion profiling: Analysis, visualization and identification of somatic LINE-1 insertions in ovarian cancer”, PNAS 2017;114:E733-E740

38 Example: Detection of Transposon Insertions
Tang et al. “Human transposon insertion profiling: Analysis, visualization and identification of somatic LINE-1 insertions in ovarian cancer”, PNAS 2017;114:E733-E740

39 Choosing Hyperparameters
Data Set Test Training

40 Data Set Test Training Cross-Validation: Choosing Hyperparameters
40 Data Set Test Training Training 1 Validation 1 Training 2 Validation 2 Training 3 Validation 3 Training 4 Validation4

41 Home Work Learn the nomenclature for evaluating binary classifiers (precision, recall, false positive rate etc.) Compare logistic regression and k nearest neighbors on data from different distributions, variances and sample sizes. 41

42


Download ppt "Machine Learning – Classification David Fenyő"

Similar presentations


Ads by Google