Download presentation
Presentation is loading. Please wait.
Published byTomas Rawlings Modified over 10 years ago
1
: INTRODUCTION TO Machine Learning Parametric Methods
2
Parametric Estimation X = { x t } t where x t ~ p (x) Parametric estimation: Assume a form for p (x |q ) and estimate q, its sufficient statistics, using X N ( μ, σ 2 ) where q = { μ, σ 2 }
3
Maximum Likelihood Estimation Likelihood of q given the sample X l ( θ |X) = p (X | θ ) = t p (x t | θ ) Log likelihood L( θ |X) = log l ( θ |X) = t log p (x t | θ ) Maximum likelihood estimator θ * = argmax θ L( θ |X)
4
Examples: Bernoulli/Multinomial Bernoulli: Two states, failure/success, x in {0,1} P (x) = p o x (1 – p o ) (1 – x) L (p o |X) = log t p o x t (1 – p o ) (1 – x t ) MLE: p o = t x t / N Multinomial: K>2 states, x i in {0,1} P (x 1,x 2,...,x K ) = i p i x i L(p 1,p 2,...,p K |X) = log t i p i x i t MLE: p i = t x i t / N
5
Gaussian (Normal) Distribution p(x) = N ( μ, σ 2 ) MLE for μ and σ 2 :
6
Bias and Variance Unknown parameter q Estimator d i = d (X i ) on sample X i Bias: b q (d) = E [d] – q Variance: E [(d–E [d]) 2 ] Mean square error: r (d,q) = E [(d–q) 2 ] = (E [d] – q) 2 + E [(d–E [d]) 2 ] = Bias 2 + Variance
7
Bayes Estimator Treat θ as a random var with prior p ( θ ) Bayes rule: p ( θ |X) = p(X| θ ) p( θ ) / p(X) Full: p(x|X) = p(x| θ ) p( θ |X) d θ Maximum a Posteriori (MAP): θ MAP = argmax θ p( θ |X) Maximum Likelihood (ML): θ ML = argmax θ p(X| θ ) Bayes: θ Bayes = E[ θ |X] = θ p( θ |X) d θ
8
Parametric Classification
9
Given the sample ML estimates are Discriminant becomes Parametric Classification
10
(a)and(b) for two classes when the input is one-dimensional. Variances are equal and the posteriors intersect at one point, which is the threshold if decision.
11
Parametric Classification (a)and(b) for two classes when the input is one-dimensional. Variances are unequal and the posteriors intersect at two points. In (c), the expected risks are shown for the two classes and for reject with
12
Regression
13
Regression: From LogL to Error
14
Linear Regression
15
Polynomial Regression
16
Square Error: Relative Square Error: Absolute Error: E ( θ |X) = t |r t – g(x t | θ )| ε -sensitive Error: E ( θ |X) = t 1(|r t – g(x t | θ )|>ε) (|r t – g(x t |θ)| – ε) Other Error Measures
17
Bias and Variance biasvariance noisesquared error
18
Estimating Bias and Variance M samples X i ={x t i, r t i }, i=1,...,M are used to fit g i (x), i =1,...,M
19
Bias/Variance Dilemma Example: g i (x)=2 has no variance and high bias g i (x)= t r t i /N has lower bias with variance As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with data)
20
Bias/Variance Dilemma (a) Function, f(x) = 2sin(1.5x), and one noisy (N(0,1)) dataset sampled from the function. Five samples are taken, each containing twenty in-stances. (b), (c), (d) are five polynomial fits, namely, gi(.), of order 1, 3 and 5. for each case, dotted line is the average of the five fits namely,.
21
Polynomial Regression Best fit min error In the same setting as that of previous, using one hundred models instead of five, bias, variance, and error for polynomials of order 1 to 5.
22
Model Selection Cross-validation: Measure generalization accuracy by testing on data unused during training Regularization: Penalize complex models E=error on data + λ model complexity Akaikes information criterion (AIC), Bayesian information criterion (BIC) Minimum description length (MDL): Kolmogorov complexity, shortest description of data Structural risk minimization (SRM)
23
Best fit, elbow Model Selection
24
Bayesian Model Selection Prior on models, p(model) Regularization, when prior favors simpler models Bayes, MAP of the posterior, p(model|data) Average over a number of models with high posterior
25
Regression example Coefficients increase in magnitude as order increases: 1: [-0.0769, 0.0016] 2: [0.1682, -0.6657, 0.0080] 3: [0.4238, -2.5778, 3.4675, -0.0002 4: [-0.1093, 1.4356, -5.5007, 6.0454, -0.0019]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.