Ch3: Model Building through Regression 3.1 Introduction Regression: is to model or to determine through statistical data analysis the explicit relationship between a set of random variables, mathematically, where X : dependent variable (response) independent variables (regressors) Regression model: : sample values of expectational error for accounting for uncertainty
3.2 Linear Regression Model Linear function f linear regression model Nonlinear function f nonlinear regression model 3.2 Linear Regression Model Problem: The unknown stochastic environment is to be probed using a set of examples (x, d). Consider the linear regression model:
3.3 ML and MAP Estimations of w Problem (under stochastic environment): Given the joint statistics of X, D, W , estimate w. Methods of estimation: i) maximum likelihood (ML), ii) maximum a posterior (MAP) 3.3 ML and MAP Estimations of w Refer to (i) x bears no relation to w. (ii) The information of w is contained in d. Focus on the joint probability density function
From the conditional probability (Bayesian eq.) where : observation density of response d due to regressor x, given parameter w and is often reformulated as the likelihood function, i.e.,
: prior density of w before any observation. Let : posterior density of w after observations. : evidence of the information contained in d. Bayesian eq. becomes
Maximum likelihood (ML) estimate of w Maximum a posteriori (MAP) estimate of w or ML ignores the prior How to come up with an approximate Considering a Gaussian environment Let be the training sample.
Assumptions: (1) are iid. (2) is described by a Gaussian of zero mean and common variance , i.e., of w are iid. Each element is governed by a Gaussian density function of zero mean and common variance
The likelihood function measures the similarity between and in turn their difference From Assumption (2),
From Assumption (1), the overall likelihood function
From Assumption (3), where
Substitute Eqs. (B) and (C) into Eq. (A), Substitute into
Let Obtain where
If is large, the prior distribution of each element of w is close to be uniform and is close to zero, the MAP estimate reduces to the ML estimate The ML estimator is unbiased, i.e., while the MAP estimator is biased.
3.4 Relationship between Regularized LS and MAP Estimations of w Least Squares (LS) Estimation Define the cost function as Solve for w by minimizing Obtain which is the same solution as the ML one. Modify the cost function as structural regularizer
Solve for w by minimizing Obtain which is the same solution as the MAP one.