Download presentation
Presentation is loading. Please wait.
Published byGertrude Cobb Modified over 9 years ago
1
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith
2
Using Data Action Model Data estimation; regression; learning; training classification; decision pattern classification machine learning statistical inference...
3
Probabilistic Models Let X and Y be random variables. (continuous, discrete, structured,...) Goal: predict Y from X. A model defines P(Y = y | X = x). 1. Where do models come from? 2. If we have a model, how do we use it?
4
Using a Model We want to classify a message, x, as spam or mail: y ε {spam, mail}. Model x P(spam | x) P(mail | x)
5
Bayes’ Rule what we said the model must define likelihood: one distribution over complex observations per y prior normalizes into a distribution:
6
Naive Bayes Models Suppose X = (X 1, X 2, X 3,..., X m ). Let
7
Naive Bayes: Graphical Model Y X1X1 X2X2 X3X3 XmXm...
8
Part II Where do the model parameters come from?
9
Using Data Action Model Data estimation; regression; learning; training
10
Warning This is a HUGE topic. We will barely scratch the surface.
11
Forms of Models Recall that a model defines P(x | y) and P(y). These can have a simple multinomial form, like P(mail) = 0.545, P(spam) = 0.455 Or they can take on some other form, like a binomial, Gaussian, etc.
12
Example: Gaussian Suppose y is {male, female}, and one observed variable is H, height. P(H | male) ~ N (μ m, σ m 2 ) P(H | female) ~ N (μ f, σ f 2 ) How to estimate μ m, σ m 2, μ f, σ f 2 ?
13
Maximum Likelihood Pick the model that makes the data as likely as possible max P(data | model)
14
Maximum Likelihood (Gaussian) Estimating the parameters μ m, σ m 2, μ f, σ f 2 can be seen as fitting the data estimating an underlying statistic (point estimate)
15
Using the model
17
Example: Regression Suppose y is actual runtime, and x is input length. Regression tries to predict some continuous variables from others.
18
Regression Linear: assume linear relationship, fit a line. We can turn this into a model!
19
Linear Model Given x, predict y. y = β 1 x + β 0 + N (0, σ 2 ) true regression line random deviation
20
Principle of Least Squares Minimize the sum of squared vertical deviations. Unique, closed form solution! vertical deviation
21
Other kinds of regression transform one or both variables (e.g., take a log) polynomial regression (least squares → linear system) multivariate regression logistic regression
22
Example: text categorization Bag-of-words model: x is a histogram of counts for all words y is a topic
23
MLE for Multinomials “Count and Normalize”
24
The Truth about MLE You will never see all the words. For many models, MLE isn’t safe. To understand why, consider a typical evaluation scenario.
25
Evaluation Train your model on some data. How good is the model? Test on different data that the system never saw before. Why?
26
Tradeoff overfits the training data low variance doesn’t generalizelow accuracy
27
Text categorization again Suppose ‘v1@gra’ never appeared in any document in training, ever. What is the above probability for a new document containing ‘v1@gra’ at test time?
28
Solutions Regularization Prefer less extreme parameters Smoothing “Flatten out” the distribution Bayesian Estimation Construct a prior over model parameters, then train to maximize P(data | model) × P(model)
29
One More Point Building models is not the only way to be empirical. Neural networks, SVMs, instance- based learning MLE and smoothed/Bayesian estimation are not the only ways to estimate. Minimize error, for example (“discriminative” estimation)
30
Assignment 3 Spam detection We provide a few thousand examples Perform EDA and pick features Estimate probabilities Build a Naive-Bayes classifier
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.