29 August 2013 Venkat Naïve Bayesian on CDF Pair Scores.

Slides:



Advertisements
Similar presentations
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Partitioned Logistic Regression for Spam Filtering Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Christopher Meek Microsoft.
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Assuming normally distributed data! Naïve Bayes Classifier.
Classification and risk prediction
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Part 2b Parameter Estimation CSE717, FALL 2008 CUBS, Univ at Buffalo.
Sample Selection Bias Lei Tang Feb. 20th, Classical ML vs. Reality  Training data and Test data share the same distribution (In classical Machine.
Empirical Bayes Estimate Spring Empirical Bayes Model For the EB method, a different weight is assigned to the prior distribution and standard estimate.
Linear Methods for Classification
Bayesian Learning Rong Jin.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.
Thanks to Nir Friedman, HU
Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Crash Course on Machine Learning
Crash Course on Machine Learning Part II
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 2012
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
1 E. Fatemizadeh Statistical Pattern Recognition.
INTRODUCTION TO Machine Learning 3rd Edition
Slides for “Data Mining” by I. H. Witten and E. Frank.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
John Lafferty Andrew McCallum Fernando Pereira
NTU & MSRA Ming-Feng Tsai
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.
Introduction to Machine Learning Nir Ailon Lecture 11: Probabilistic Models.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Learning Coordination Classifiers
Ch3: Model Building through Regression
CH 5: Multivariate Methods
Discriminative and Generative Classifiers
ECE 5424: Introduction to Machine Learning
Distributions and Concepts in Probability Theory
OVERVIEW OF BAYESIAN INFERENCE: PART 1
Pattern Recognition and Machine Learning
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

29 August 2013 Venkat Naïve Bayesian on CDF Pair Scores

Outline Naïve Bayesian Overview Adapting Naïve Bayesian to CDF Pairscores Comparisons with Logistic Regression Comparisons with the Voting Scheme

Bayesian Framework

Unbiased Learning requires O(NK d ) samples for reasonable parameter estimation Impractical for most values of d

Naïve Bayes Assumption Let The Naïve Bayes Assumption implies class conditional independence Requires O(NK) samples

Gaussian Naïve Bayes What are the parameters to be estimated ? N Priors Nd Likelihood functions

Naïve Bayes on CDF Pairscores Direct application of GNB on CDF Pairscores guaranteed to give poor results. Must make use of which features are irrelevant conditioned on a class. For instance, conditioned on class 7, the score for the say class-36-vs-class-9 model is irrelevant.

Naïve Bayes on CDF Pairscores We have N=50 classes and d = 1225 pairscores

Naïve Bayes on CDF Pairscores

Likelihood Distributions

P(s(c,c’)|y=c ) P(s(c’,c)|y=c )

Results (2 nd level) Naïve Bayesian : 57.18% Voting : 59.01% Logistic Regression : 57.51% So, which is the overall best scheme ??

Naïve Bayes vs Logistic Regression GNB (generative) and LR (discriminative) essentially model the same classifier when Naïve Bayesian Assumptions hold. However, LR converges to asymptotic accuracies slower than GNB This is due to LR requiring exponentially higher number of samples compared to GNB for good parameter estimates

Naïve Bayes vs Logistic Regression

LOGISTIC REGRESSION

Naïve Bayes vs Logistic Regression NAÏVE BAYESIAN

Naïve Bayes vs Logistic Regression NAÏVE BAYESIAN

Naïve Bayes vs Logistic Regression When training data is scarce, GNB theoretically outperforms LR Moreover, if LR only marginally outperforms GNB, then GNB should still be chosen due to its low variance property.

Naïve Bayes vs Voting Scheme Naïve Bayes is equivalent to a weighted voting scheme. Unweighted voting scheme takes unbiased votes from pairwise models, ignoring scores and scales. The binary structure of the unweighted scheme has ill- defined bias-variance properties. One can argue that it just happens to work well in this case.