Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of.

Slides:



Advertisements
Similar presentations
Lecture 13 L1 , L∞ Norm Problems and Linear Programming
Advertisements

October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
Sampling: Final and Initial Sample Size Determination
Chapter 10: Estimating with Confidence
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
Sampling Distributions and Sample Proportions
The Central Limit Theorem
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Lecture 3 Review of Linear Algebra Simple least-squares.
Maximum likelihood (ML)
Stat 301 – Day 36 Bootstrapping (4.5). Last Time – CI for Odds Ratio Often the parameter of interest is the population odds ratio,   Especially with.
Lecture 5 Probability and Statistics. Please Read Doug Martinson’s Chapter 3: ‘Statistics’ Available on Courseworks.
Maximum likelihood (ML) and likelihood ratio (LR) test
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Evaluating Hypotheses
Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps.
Stat 321 – Lecture 26 Estimators (cont.) The judge asked the statistician if she promised to tell the truth, the whole truth, and nothing but the truth?
Bootstrapping LING 572 Fei Xia 1/31/06.
Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance.
Maximum likelihood (ML)
Chapter 10: Estimating with Confidence
Chapter 6 Random Error The Nature of Random Errors
Empirical Research Methods in Computer Science Lecture 2, Part 1 October 19, 2005 Noah Smith.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Confidence Intervals Confidence Interval for a Mean
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Chapter 7: Sample Variability Empirical Distribution of Sample Means.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Section 10.1 Confidence Intervals
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
+ “Statisticians use a confidence interval to describe the amount of uncertainty associated with a sample estimate of a population parameter.”confidence.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
ME Mechanical and Thermal Systems Lab Fall 2011 Chapter 3: Assessing and Presenting Experimental Data Professor: Sam Kassegne, PhD, PE.
+ DO NOW. + Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
ES 07 These slides can be found at optimized for Windows)
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Estimation by Intervals Confidence Interval. Suppose we wanted to estimate the proportion of blue candies in a VERY large bowl. We could take a sample.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Lecture 13 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Oliver Schulte Machine Learning 726
Chapter 8: Estimating with Confidence
HYDROLOGY Lecture 12 Probability
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Probabilistic Surrogate Models
Presentation transcript:

Lecture 6 Bootstraps Maximum Likelihood Methods

Boostrapping A way to generate empirical probability distributions Very handy for making estimates of uncertainty

100 realizations of a normal distribution p(y) with y=50  y =100

What is the distribution of y est =  i y i ? N 1

We know this should be a Normal distribution with expectation=y=50 and  variance=  y /  N=10 p(y) y p(y est ) y est

Here’s an empirical way of determining the distribution called bootstrapping

y1y2y3y4y5y6y7…yNy1y2y3y4y5y6y7…yN y’ 1 y ’ 2 y ’ 3 y ’ 4 y ’ 5 y ’ 6 y ’ 7 … y ’ N … 6 N original data Random integers in the range 1-N N resampled data N 1  i y’ i Compute estimate Now repeat a gazillion times and examine the resulting distribution of estimates

Note that we are doing random sampling with replacement of the original dataset y to create a new dataset y’ Note: the same datum, y i, may appear several times in the new dataset, y’

pot of an infinite number of y’s with distribution p(y) cup of N y’s drawn from the pot Does a cup drawn from the pot capture the statistical behavior of what’s in the pot?

More or less the same thing in the 2 pots ? Take 1 cup p(y) Duplicate cup an infinite number of times Pour into new pot  p(y)

Random sampling easy to code in MatLab yprime = y(unidrnd(N,N,1)); vector of N random integers between 1 and N original data resampled data

The theoretical and bootstrap results match pretty well ! theoretical Bootstrap with 10 5 realizations

Obviously bootstrapping is of limited utility when we know the theoretical distribution (as in the previous example)

but it can be very useful when we don’t for example what’s the distribution of  y est where (  y est ) 2 = 1/(N-1)  i (y i -y est ) 2 and y est = (1/N)  i y i ( Yes, I know a statistician would know it follows Student’s T-distribution …)

To do the bootstrap we calculate y’ est = (1/N)  i y’ i (  y’ est ) 2 = 1/(N-1)  i (y’ i -y’ est ) 2 and  y’ est =  (  y’ est ) 2 many times – say 10 5 times

Here’s the bootstrap result … Bootstrap with 10 5 realizations  y true I numerically calculate an expected value of 92.8 and a  variance of 6.2 Note that the distribution is not quite centered about the true value of 100 This is random variation. The original N=100 data are not quite representative of the an infinite ensemble of normally- distributed values p  y est )  y est

So we would be justified saying  y  92.6 ± 12.4 that is, 2  6.2, the 95% confidence interval

The Maximum Likelihood Distribution A way to fit parameterized probability distributions to data very handy when you have good reason to believe the data follow a particular distribution

Likelihood Function, L The logarithm of the probable-ness of a given dataset

N data y are all drawn from the same distribution p(y) the probable-ness of a single measurement y i is p(y i ) So the probable-ness of the whole dataset is p(y 1 )  p(y 2 )  …  p(y N ) =  i p(y i ) L = ln  i p(y i ) =  i ln p(y i )

Now imagine that the distribution p(y) is known up to a vector m of unknown parameters write p(y; m) with semicolon as a reminder that its not a joint probabilty The L is a function of m L(m) =  i ln p(y i ; m)

The Principle of Maximum Likelihood Chose m so that it maximizes L(m)  L/  m i = 0 the dataset that was in fact observed is the most probable one that could have been observed

Example – normal distribution of unknown mean y and variance  2 p(y i ) = (2  ) -1/2  -1 exp{ -½  -2 (y i -y) 2 } L =  i ln p(y i ) = -½Nln(2  ) –Nln(  ) -½  -2  i (y i -y) 2  L/  y = 0 =  -2  i (y i -y)  L/  = 0 = - N  -1 +  -3  i (y i -y) 2 N’s arise because sum is from 1 to N

Solving for y and  0 =  -2  i (y i -y) y = N -1  i y i 0 = -N  -1 +  -3  i (y i -y) 2  2 = N -1  i (y i -y) 2

y = N -1  i y i  2 = N -1  i (y i -y) 2 Sample mean is the maximum likelihood estimate of the expected value of the normal distribution Sample variance (more-or-less*) is the maximum likelihood estimate of the variance of the normal distribution * issue of N vs. N-1 in the formula Interpreting the results

Example – 100 data drawn from a normal distribution true y=50  =100

L(y,  ) y  max at y=62  =107

Another Example – exponential distribution p(y i ) = ½  -1 exp{ -  -1 |y i -y| } Check normalization … use z= y i -y  p(y i )dy = ½  -1  -  +  exp{ -  -1 |y i -y| } dy i = ½  -1 2  0 +  exp{ -  -1 z } dz =  -1 (-  ) exp{-  -1 z}| 0 +  = 1 Is this parameter really the expectation ? Is this parameter really  variance ?

Is y the expectation ? E(y i ) =  -  +  y i ½  -1 exp{ -  -1 |y i -y| } dy i use z= y i -y E(y i ) = ½  -1  -  +  (z+y) exp{ -  -1 |z| } dz = ½  -1 2 y  o +  exp{ -  -1 z } dz = - y exp{ -  -1 z }| o +  = y z exp(-  -1 |z|) is odd function times even function so integral is zero YES !

Is  the  variance ? var(y i ) =  -  +  (y i -y) 2 ½  -1 exp{ -  -1 |y i -y| } dy i use z=  -1 (y i -y) E(y i ) = ½  -1  -  +   2 z 2 exp{ -|z| }  dz =  2  0 +  z 2 exp{ -z } dz = 2  2   2 CRC Math Handbook gives this integral as equal to 2 Not Quite …

Maximum likelihood estimate L = Nln(½) – Nln(  ) -  -1  i |y i -y|  L/  y = 0 = -  -1  i sgn (y i -y)  L/  = 0 = - N  -1 +  -2  i |y i -y| y such that  i sgn (y i -y) = 0 x |x| x d|x|/dx +1 Zero when half the y i ’s bigger than y, half of them smaller y is the median of the y i ’s

Once y is known then …  L/  = 0 = - N  -1 +  -2  i |y i -y|  = N -1  i |y i -y| with y = median(y) Note that when N is even, y is not unique, but can be anything between the two middle values in a sorted list of y i ’s

Comparison Normal distribution: best estimate of expected value is sample mean Exponential distribution best estimate of expected value is sample median

Comparison Normal distribution: short tailed outlier extremely uncommon expected value should be chosen to make outliers have as small a deviation as possible Exponential distribution: relatively long-tailed outlier relatively common expected value should ignore actual value of outliers yiyi medianmean outlier yiyi medianmean

another important distribution Gutenberg-Richter distribution (e.g. earthquake magnitudes) for earthquakes greater than some threshhold magnitude m 0, the probability that the earthquake will have a magnitude greater than m is –b (m-m 0 ) or P(m) = exp{ – log(10) b (m-m 0 ) } = exp{-b’ (m-m 0 ) } with b’= log(10) b P(m)=10

This is a cumulative distribution, thus the probability that magnitude is greater than m 0 is unity P(m) = exp{ –b’ (m-m 0 ) } = exp{0} = 1 Probability density distribution is its derivative p(m) = b’ exp { –b’ (m-m 0 ) }

Maximum likelihood estimate of b’ is L(m) = N log(b’) – b’  i (m i -m 0 )  L/  b’ = 0 = N/b’ -  i (m i -m 0 ) b’ = N /  i (m i -m 0 )

Originally Gutenberg & Richter made a mistake … magnitude, m Log 10 P(m) slope = -b … by estimating slope, b using least-squares, and not the Maximum Likelihood formula least-squares fit

yet another important distribution Fisher distribution on a sphere (e.g. paleomagnetic directions) given unit vectors x i that scatter around some mean direction x, the probability distribution for the angle  between x i and x (that is, cos(  )=x i  x) is p(  ) = sin(  ) exp{  cos(  ) }  2 sinh(  )  is called the “precision parameter”

Rationale for functional form p(  )  exp{  cos(  ) } For  close to zero   1 – ½  2 so p(  )  exp{  cos(  ) } = exp{  exp{ – ½  2 } which is a gaussian

I’ll let you figure out the maximum likelihood estimate of the central direction, x, and the precision parameter, 