Download presentation
Presentation is loading. Please wait.
Published byFrancine Elinor Waters Modified over 9 years ago
1
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang
2
2 Basic Concepts Statistical inference –Using data to infer the distribution that generated the data We observe. We want to infer (or estimate or learn) F or some feature of F such as its mean. Statistical model –A set of distributions ( or a set of densities) Parametric model Non parametric model
3
3 Parametric model –A set that can be parameterized by a finite number of parameters –E.g. Assume the data come from a normal distribution, the model is –A parametric model takes the form Statistical Model(1)
4
4 Non-parametric model –A set that cannot be parameterized by a finite number of parameters –E.g. Assume the data comes from Statistical Model(2) Probability density function, PDF, f(x): Cumulative density function,CDF, F(x):
5
5 Outline Model Inference –Maximum likelihood inference (8.2.2) EM Algorithm (8.5) –Bayesian inference (8.3) Gibbs Sampling (8.6) –Bootstrap (8.2.1,8.2.3,8.4) Model Averaging and improvement –Bagging (8.7) –Bumping (8.9)
6
6 Parametric Inference Parametric models: The problem of inference problem of estimating the parameter Method –Maximum Likelihood Inference –Bayesian Inference
7
7 An Example of MLE Suppose you have But you don’t know or MLE: For which is most likely?
8
8 A General MLE strategy Suppose is a vector of parameters. Task: Find MLE for 2.Work out using high-school calculus 1.Write 3.Solve the set of simultaneous equations 4.Check you are at a maximum Likelihood function Log-likelihood function Maximum likelihood estimator:Maximizes Likelihood function
9
9 Properties of MLE(?) Sampling distributions of the maximum likelihood estimator has a limiting normal distribution.(P230) Fisher information is true value of Information matrix
10
10 where with An Example for EM Algorithm(1) Model Y as a mixture of two normal distribution sum of terms is inside the logarithm=>difficult to maximize it The parameters are The log-likelihood based on the N training cases is
11
11 An Example for EM Algorithm(2) Consider unobserved latent variables : comes from model 2; otherwise from model 1. If we knew the values of 1.Take initial guesses for the parameters 2.Expectation Step: compute 3.Maximization Step: compute the values for the parameters which can maximize the log-likelihood given 4.Iterate steps 2 and 3 until convergence.
12
12 An Example for EM Algorithm(3)
13
13 Bayesian Inference Prior (knowledge before we see the data): Sampling model: After observing data Z, we update our beliefs and form the posterior distribution Posterior is proportional to likelihood times prior! Doesn’t it cause a problem to throw away the constant? We can always recover it, since
14
14 Task: predict the values of a future observation Bayesian approach Maximum likelihood approach Prediction using inference
15
15 MCMC(1) General Problem: evaluating can be difficult. where However, if we can draw samples then we can estimate This is Monte Carlo (MC) integration.
16
16 MCMC(2) ? A stochastic process is an indexed random variable where t maybe time and X is a random variable. A Markov chain is generated by sampling So, depends only on,not on p is the transition kernel. As, the Markov chain converges to its stationary distribution.
17
17 MCMC(3) Problem: How do we construct a Markov chain whose stationary distribution is our target distribution, ? This is called Markov chain Monte Carlo (MCMC) Two key objectives: 1.Generate a sample from a joint probability distribution 2.Estimate expectations using generated sample averages ( I.e. doing MC integration)
18
18 Gibbs Sampling(1) Purpose: Draw from a Joint Distribution Method: Iterative Conditional Sampling target Draw
19
19 Gibbs Sampling(2) Suppose that Sample or update in turn: …… Always use the most recent values
20
20 An Example for Conditional Sampling Target distribution: How to draw samples?
21
21 Recall: Same Example for EM (1) Model Y as a mixture of two normal distribution where with For simplicity, assume the parameters are
22
22 1.Take initial guesses for the parameters 2.Repeat for t=1.2.,…. (a)For i=1,2,…,N generate with (b)Generate 3.Continue step 2 until the joint distribution of doesn’t change Comparison between EM and Gibbs Sampling 1.Take initial guesses for the parameters 2.Expectation Step: compute 3.Maximization Step: compute the values for the parameters which can maximize the log-likelihood given 4.Iterate steps 2 and 3 until convergence. EM Gibbs
23
23 Bootstrap(0) Basic idea: –Randomly draw datasets with replacement from the training data –Each sample has the same size as the original training set …… Training sample Bootstrap samples
24
24 Example for Bootstrap(1) Y Z bioequivalence
25
25 Example for Bootstrap(2) We want to estimate The estimator is : What is the accuracy of the estimator?
26
26 Bootstrap(1) The bootstrap was introduced as a general method for assessing the statistical accuracy of an estimator. Data: Statistic(any function of the data): We want to know Real world Bootstrap world can be estimated with ?
27
27 Suppose we draw a sample from a distribution. Bootstrap(2)---Detour
28
28 Bootstrap(3) Real world Bootstrap world Bootstrap Variance Estimation 1.Draw 2.Compute 3.Repeat steps 1 and 2, B times, to get 4.Let
29
29 Bootstrap(4) Non-parametric Bootstrap –Uses the raw data, not a specific parametric model, to generate new datasets Parametric Bootstrap –Simulate new responses by adding Gaussian noise to the predicted values –Example from the book… ---estimate We simulate new (x,y) by
30
30 Bootstrap(5)---Summary Nonparametric bootstrap –No underlying distribution assumption Parametric bootstrap agrees with maximum likelihood Bootstrap distribution approximates posterior distribution of parameters with non-informative priors (?)
31
31 Bagging(1) Bootstrap: –A way of assessing the accuracy of a parameter estimate or a prediction Bagging (Bootstrap Aggregating) –Use bootstrap samples to predict data classifiers Classification becomes majority voting …… Original sample Bootstrap sample Bootstrap estimators
32
32 Bagging(2) Pros –The estimator can be significantly improved if the learning algorithm is unstable. Some change to training set causes large change in output hypothesis –Reduce the variance, bias unchanged Cons –Degrade the performance of stable procedures ??? –Lose the structure after bagging
33
33 Bumping A stochastic flavor of model selection –Bootstrap Umbrella of Model Parameters –Sample data set, train it, until we are satisfied or tired …… Original sample Bootstrap sample Bootstrap estimators Compare different models on the training data
34
34 Conclusions Maximum Likelihood vs. Bayesian Inference EM vs. Gibbs Sampling Bootstrap –Bagging –Bumping
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.