29 August 2013 Venkat Naïve Bayesian on CDF Pair Scores
Outline Naïve Bayesian Overview Adapting Naïve Bayesian to CDF Pairscores Comparisons with Logistic Regression Comparisons with the Voting Scheme
Bayesian Framework
Unbiased Learning requires O(NK d ) samples for reasonable parameter estimation Impractical for most values of d
Naïve Bayes Assumption Let The Naïve Bayes Assumption implies class conditional independence Requires O(NK) samples
Gaussian Naïve Bayes What are the parameters to be estimated ? N Priors Nd Likelihood functions
Naïve Bayes on CDF Pairscores Direct application of GNB on CDF Pairscores guaranteed to give poor results. Must make use of which features are irrelevant conditioned on a class. For instance, conditioned on class 7, the score for the say class-36-vs-class-9 model is irrelevant.
Naïve Bayes on CDF Pairscores We have N=50 classes and d = 1225 pairscores
Naïve Bayes on CDF Pairscores
Likelihood Distributions
P(s(c,c’)|y=c ) P(s(c’,c)|y=c )
Results (2 nd level) Naïve Bayesian : 57.18% Voting : 59.01% Logistic Regression : 57.51% So, which is the overall best scheme ??
Naïve Bayes vs Logistic Regression GNB (generative) and LR (discriminative) essentially model the same classifier when Naïve Bayesian Assumptions hold. However, LR converges to asymptotic accuracies slower than GNB This is due to LR requiring exponentially higher number of samples compared to GNB for good parameter estimates
Naïve Bayes vs Logistic Regression
LOGISTIC REGRESSION
Naïve Bayes vs Logistic Regression NAÏVE BAYESIAN
Naïve Bayes vs Logistic Regression NAÏVE BAYESIAN
Naïve Bayes vs Logistic Regression When training data is scarce, GNB theoretically outperforms LR Moreover, if LR only marginally outperforms GNB, then GNB should still be chosen due to its low variance property.
Naïve Bayes vs Voting Scheme Naïve Bayes is equivalent to a weighted voting scheme. Unweighted voting scheme takes unbiased votes from pairwise models, ignoring scores and scales. The binary structure of the unweighted scheme has ill- defined bias-variance properties. One can argue that it just happens to work well in this case.