COMP 328: Midterm Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology Can be used as cheat sheet
Page 2 Overview l Algorithms for supervised learning n Decision trees n Naïve Bayes classifiers n Neural networks n Instance-based learning n Support vector machines l General issues regarding supervised learning n Classification error and confidence interval n Bias-Variance tradeoff n PAC learning theory
Supervised Learning Page 3
Decision Trees Page 4
Decision trees Page 5
Reduced-Error Pruning Page 6
Decision Trees l Issues with attributes n Continuous n Attributes with many values Use GainRatio instead of Gain n Missing values l Tree construction is a search process n Local minimum Page 7
Naïve Bayes Classifier Page 8 Can classify using this rule: But, joint too expensive to get
Naïve Bayes Classifier Page 9
Learning Naïve Bayes Classifier Page 10 l Laplace smoothing l Continuous attribute l When independence not true, double counting of evidence l Generalization: Bayesian networks
Neural Networks Page 11 For classification and regression
Neural Networks l Activation function n Step, sign n Sigmoid, tanh (hyperbolic tangent) Page 12
Neural Network/Properties l Perceptrons are linear classifier l Two-layer network with enough perceptron units can represent all Boolean functions l One layer with enough sigmoid units can approximate any functions well Page 13
Neural Network Page 14 l Converge only when linearly separable
Neural Network Page 15 l Adaline learning: Delta rule
Neural Network Page 16
Instance-Based Learning l Lazy learning n K-NN n Distance-weighted k-NN (kernel regression) n Locally weighted regression Page 17
Support Vector Machines Page 18
SVM Page 19
SVM Page 20
SVM Page 21
SVM l Data not linearly separable Page 22
SVM Page 23
Nonlinear SVM Page 24
Impact of σ and C Page 25
Classifier Evaluation l Relationship between Page 26
Algorithm Evaluation/Model Selection Page 27 l W hich learning algorithm to use? l Given algorithm, which model to use? (How many hidden units?)
Algorithm Evaluation/Model Selection Page 28
Bias-Variance Decomposition Page 29
Bias-Variance Tradeoff Page 30 For classification problem also
PAC Learning Theory l Probably approximate correct (PAC) l Relationship between Page 31
PAC Learning Theory Page 32
VC Dimension Page 33
Sample Complexity Page 34