Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Project 03/15 (Phase 1): 10% of training data is available for algorithm development 04/05 (Phase 2): full training data and test examples are available 04/18 (submission): submit your prediction before 11:59pm Apr. 18 (Wednesday) 04/24 and 04/26: Project presentation Announce the competition results 04/30: project report is due

Logistic Regression Rong Jin

Logistic Regression Generative models often lead to linear decision boundary Linear discriminatory model Directly model the linear decision boundary w is the parameter to be decided

Logistic Regression

Learn parameter w by Maximum Likelihood Estimation (MLE) Given training data

Logistic Regression Convex objective function, global optimal Gradient descent Classification error

Illustration of Gradient Descent

How to Decide the Step Size ? Back track line search

Example: Heart Disease Input feature x: age group id Output y: if having heart disease y=1: having heart disease y=-1: no heart disease 1: 25-29 2: 30-34 3: 35-39 4: 40-44 5: 45-49 6: 50-54 7: 55-59 8: 60-64

Example: Heart Disease

Example: Text Categorization Learn to classify text into two categories Input d: a document, represented by a word histogram Output y=  1: +1 for political document, -1 for non- political document

Example: Text Categorization Training data

Example 2: Text Classification Dataset: Reuter-21578 Classification accuracy Naïve Bayes: 77% Logistic regression: 88%

Logistic Regression vs. Naïve Bayes Both are linear decision boundaries Naïve Bayes: Logistic regression: learn weights by MLE Both can be viewed as modeling p(d|y) Naïve Bayes: independence assumption Logistic regression: assume an exponential family distribution for p(d|y) (a broad assumption)

Logistic Regression vs. Naïve Bayes

Discriminative vs. Generative Discriminative Models Model P(y|x) Pros Usually good performance Cons Slow convergence Expensive computation Sensitive to noise data Generative Models Model P(x|y) Pros Usually fast converge Cheap computation Robust to noise data Cons Usually performs worse

Overfitting Problem Consider text categorization What is the weight for a word j appears in only one training document d k ?

Overfitting Problem

Using regularization Without regularization Iteration Overfitting Problem Decrease in the classification accuracy of test data

Solution: Regularization Regularized log-likelihood The effects of regularizer Favor small weights Guarantee bounded norm of w Guarantee the unique solution

Regularized Logistic Regression Using regularization Without regularization Iteration Classification performance by regularization

Regularization as Robust Optimization Assume each data point is unknown but bounded in a sphere of radius r and center x i

Sparse Solution by Lasso Regularization RCV1 collection: 800K documents 47K unique words

Sparse Solution by Lasso Regularization How to solve the optimization problem? Subgradient descent Minimax

Bayesian Treatment Compute the posterior distribution of w Laplacian approximation

Bayesian Treatment Laplacian approximation

Multi-class Logistic Regression How to extend logistic regression model to multi-class classification ?

Conditional Exponential Model Let classes be Need to learn Normalization factor (partition function)

Conditional Exponential Model Learn weights ws by maximum likelihood estimation Any problem ?

Modified Conditional Exponential Model

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Similar presentations

Presentation on theme: "Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Similar presentations

Presentation on theme: "Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)"— Presentation transcript:

Similar presentations

About project

Feedback