1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang

2 Basic Concepts Statistical inference –Using data to infer the distribution that generated the data We observe. We want to infer (or estimate or learn) F or some feature of F such as its mean. Statistical model –A set of distributions ( or a set of densities) Parametric model Non parametric model

3 Parametric model –A set that can be parameterized by a finite number of parameters –E.g. Assume the data come from a normal distribution, the model is –A parametric model takes the form Statistical Model(1)

4 Non-parametric model –A set that cannot be parameterized by a finite number of parameters –E.g. Assume the data comes from Statistical Model(2) Probability density function, PDF, f(x): Cumulative density function,CDF, F(x):

5 Outline Model Inference –Maximum likelihood inference (8.2.2) EM Algorithm (8.5) –Bayesian inference (8.3) Gibbs Sampling (8.6) –Bootstrap (8.2.1,8.2.3,8.4) Model Averaging and improvement –Bagging (8.7) –Bumping (8.9)

6 Parametric Inference Parametric models: The problem of inference  problem of estimating the parameter Method –Maximum Likelihood Inference –Bayesian Inference

7 An Example of MLE Suppose you have But you don’t know or MLE: For which is most likely?

8 A General MLE strategy Suppose is a vector of parameters. Task: Find MLE for 2.Work out using high-school calculus 1.Write 3.Solve the set of simultaneous equations 4.Check you are at a maximum Likelihood function Log-likelihood function Maximum likelihood estimator:Maximizes Likelihood function

9 Properties of MLE(?) Sampling distributions of the maximum likelihood estimator has a limiting normal distribution.(P230) Fisher information is true value of Information matrix

10 where with An Example for EM Algorithm(1) Model Y as a mixture of two normal distribution sum of terms is inside the logarithm=>difficult to maximize it The parameters are The log-likelihood based on the N training cases is

11 An Example for EM Algorithm(2) Consider unobserved latent variables : comes from model 2; otherwise from model 1. If we knew the values of 1.Take initial guesses for the parameters 2.Expectation Step: compute 3.Maximization Step: compute the values for the parameters which can maximize the log-likelihood given 4.Iterate steps 2 and 3 until convergence.

12 An Example for EM Algorithm(3)

13 Bayesian Inference Prior (knowledge before we see the data): Sampling model: After observing data Z, we update our beliefs and form the posterior distribution Posterior is proportional to likelihood times prior! Doesn’t it cause a problem to throw away the constant? We can always recover it, since

14 Task: predict the values of a future observation Bayesian approach Maximum likelihood approach Prediction using inference

15 MCMC(1) General Problem: evaluating can be difficult. where However, if we can draw samples then we can estimate This is Monte Carlo (MC) integration.

16 MCMC(2) ? A stochastic process is an indexed random variable where t maybe time and X is a random variable. A Markov chain is generated by sampling So, depends only on,not on p is the transition kernel. As, the Markov chain converges to its stationary distribution.

17 MCMC(3) Problem: How do we construct a Markov chain whose stationary distribution is our target distribution, ? This is called Markov chain Monte Carlo (MCMC) Two key objectives: 1.Generate a sample from a joint probability distribution 2.Estimate expectations using generated sample averages ( I.e. doing MC integration)

18 Gibbs Sampling(1) Purpose: Draw from a Joint Distribution Method: Iterative Conditional Sampling target Draw

19 Gibbs Sampling(2) Suppose that Sample or update in turn: …… Always use the most recent values

20 An Example for Conditional Sampling Target distribution: How to draw samples?

21 Recall: Same Example for EM (1) Model Y as a mixture of two normal distribution where with For simplicity, assume the parameters are

22 1.Take initial guesses for the parameters 2.Repeat for t=1.2.,…. (a)For i=1,2,…,N generate with (b)Generate 3.Continue step 2 until the joint distribution of doesn’t change Comparison between EM and Gibbs Sampling 1.Take initial guesses for the parameters 2.Expectation Step: compute 3.Maximization Step: compute the values for the parameters which can maximize the log-likelihood given 4.Iterate steps 2 and 3 until convergence. EM Gibbs

23 Bootstrap(0) Basic idea: –Randomly draw datasets with replacement from the training data –Each sample has the same size as the original training set …… Training sample Bootstrap samples

24 Example for Bootstrap(1) Y Z bioequivalence

25 Example for Bootstrap(2) We want to estimate The estimator is : What is the accuracy of the estimator?

26 Bootstrap(1) The bootstrap was introduced as a general method for assessing the statistical accuracy of an estimator. Data: Statistic(any function of the data): We want to know Real world Bootstrap world can be estimated with ?

27 Suppose we draw a sample from a distribution. Bootstrap(2)---Detour

28 Bootstrap(3) Real world Bootstrap world Bootstrap Variance Estimation 1.Draw 2.Compute 3.Repeat steps 1 and 2, B times, to get 4.Let

29 Bootstrap(4) Non-parametric Bootstrap –Uses the raw data, not a specific parametric model, to generate new datasets Parametric Bootstrap –Simulate new responses by adding Gaussian noise to the predicted values –Example from the book… ---estimate We simulate new (x,y) by

30 Bootstrap(5)---Summary Nonparametric bootstrap –No underlying distribution assumption Parametric bootstrap agrees with maximum likelihood Bootstrap distribution approximates posterior distribution of parameters with non-informative priors (?)

31 Bagging(1) Bootstrap: –A way of assessing the accuracy of a parameter estimate or a prediction Bagging (Bootstrap Aggregating) –Use bootstrap samples to predict data classifiers Classification becomes majority voting …… Original sample Bootstrap sample Bootstrap estimators

32 Bagging(2) Pros –The estimator can be significantly improved if the learning algorithm is unstable. Some change to training set causes large change in output hypothesis –Reduce the variance, bias unchanged Cons –Degrade the performance of stable procedures ??? –Lose the structure after bagging

33 Bumping A stochastic flavor of model selection –Bootstrap Umbrella of Model Parameters –Sample data set, train it, until we are satisfied or tired …… Original sample Bootstrap sample Bootstrap estimators Compare different models on the training data

34 Conclusions Maximum Likelihood vs. Bayesian Inference EM vs. Gibbs Sampling Bootstrap –Bagging –Bumping

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Similar presentations

Presentation on theme: "1 Chapter 8: Model Inference and Averaging Presented by Hui Fang."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Similar presentations

Presentation on theme: "1 Chapter 8: Model Inference and Averaging Presented by Hui Fang."— Presentation transcript:

Similar presentations

About project

Feedback