Bootstrapping in regular graphs

Slides:



Advertisements
Similar presentations
“Students” t-test.
Advertisements

Hypothesis testing and confidence intervals by resampling by J. Kárász.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Sampling: Final and Initial Sample Size Determination
Sampling Distributions (§ )
Econ 140 Lecture 61 Inference about a Mean Lecture 6.
ELEC 303 – Random Signals Lecture 18 – Statistics, Confidence Intervals Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 10, 2009.
Central Limit Theorem.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Chapter 6 Introduction to Sampling Distributions
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Statistics Lecture 20. Last Day…completed 5.1 Today Parts of Section 5.3 and 5.4.
Evaluating Hypotheses
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Bootstrapping LING 572 Fei Xia 1/31/06.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
5-3 Inference on the Means of Two Populations, Variances Unknown
Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson6-1 Lesson 6: Sampling Methods and the Central Limit Theorem.
Tests of Hypothesis [Motivational Example]. It is claimed that the average grade of all 12 year old children in a country in a particular aptitude test.
QUIZ CHAPTER Seven Psy302 Quantitative Methods. 1. A distribution of all sample means or sample variances that could be obtained in samples of a given.
Standard error of estimate & Confidence interval.
Chapter 6: Sampling Distributions
Review of normal distribution. Exercise Solution.
© Copyright McGraw-Hill CHAPTER 6 The Normal Distribution.
Simulation Output Analysis
ESTIMATING with confidence. Confidence INterval A confidence interval gives an estimated range of values which is likely to include an unknown population.
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
QBM117 Business Statistics Estimating the population mean , when the population variance  2, is known.
1 Chapter 6. Section 6-1 and 6-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Make observations to state the problem *a statement that defines the topic of the experiments and identifies the relationship between the two variables.
Active Learning Lecture Slides For use with Classroom Response Systems Statistical Inference: Confidence Intervals.
Chapter 7: Sample Variability Empirical Distribution of Sample Means.
Population and Sample The entire group of individuals that we want information about is called population. A sample is a part of the population that we.
8 Sampling Distribution of the Mean Chapter8 p Sampling Distributions Population mean and standard deviation,  and   unknown Maximal Likelihood.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
1 Chapter 6. Section 6-1 and 6-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Estimates and Sample Sizes Chapter 6 M A R I O F. T R I O L A Copyright © 1998,
Confidence Intervals for Variance and Standard Deviation.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Confidence Intervals for a Population Mean, Standard Deviation Unknown.
Math 3680 Lecture #15 Confidence Intervals. Review: Suppose that E(X) =  and SD(X) = . Recall the following two facts about the average of n observations.
1 Chapter 8 Interval Estimation. 2 Chapter Outline  Population Mean: Known  Population Mean: Unknown  Population Proportion.
Chapter 9 Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis.
1 Probability and Statistics Confidence Intervals.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
Central Limit Theorem Let X 1, X 2, …, X n be n independent, identically distributed random variables with mean  and standard deviation . For large n:
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Chapter 6: Sampling Distributions
CHAPTER 10 Comparing Two Populations or Groups
Inference: Conclusion with Confidence
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
Chapter 7 Review.
ESTIMATION.
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Inference: Conclusion with Confidence
Maximum likelihood estimation
Week 10 Chapter 16. Confidence Intervals for Proportions
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Random Sampling Population Random sample: Statistics Point estimate
CI for μ When σ is Unknown
CONCEPTS OF ESTIMATION
Bootstrapping Jackknifing
Sampling Distributions (§ )
Techniques for the Computing-Capable Statistician
How Confident Are You?.
Presentation transcript:

Bootstrapping in regular graphs Gesine Reinert, Oxford With Susan Holmes, Stanford

What is the bootstrap? Efron (1979), Bickel and Freedman (1981), Singh (1981) Resampling procedure, used to construct confidence intervals and calculate standard errors for statistics

The bootstrap procedure Have random sample of size n, say Draw M observations out of the n, with replacement Calculate the statistic of interest for this sample of size M Repeat many times Use the standard deviation in these samples to estimate standard deviation in the population

Example: median Suppose we would like to estimate the median of a population from a sample of size n Sample M=n observations with replacement from the observed data, Take the median of this simulated data set Repeat these steps B times: B simulated medians These medians are approximately draws from the sampling distribution of the median of n observations: Calculate their standard deviation to estimate the standard error of the median

When does the bootstrap work? The underlying idea is that of Russian dolls – the bootstrap samples should relate to the original sample just as the original sample relates to the unknown population (Count the freckles on the faces of Russian dolls)

Empirical measures Each observation can be represented by a point mass in space The average of these point masses is called empirical measure: a random quantity taking values in the set of measures

Limits of empirical measures This empirical measure will converge to a limit if the conditions are right; just like the law of large numbers Just like for real-valued random quantities, for independent identically distributed observations an approximation by a Gaussian measure holds We say that the bootstrap works when the bootstrap empirical measure can be approximated by a Gaussian measure centred around the true measure

Conditions for validity? The theoretical arguments proving that the bootstrap works rely on large independent samples But in dependent observations the standard deviation would be estimated wrongly In time series: blockwise bootstrap: Kuensch (1989), Carlstein et al. (1998): sample a whole block of observations in the time series, use the block to approximate the standard deviation

Dependency graphs For random variables we can construct a graph with the random variables as the vertices Two vertices are linked by an edge if and only if the corresponding random variables are dependent The set of all neighbours of a vertex is then the set of all random variables which are dependent on the vertex random variable

Bootstrapping in such graphs To capture the dependence structure, we bootstrap not isolated vertices but whole neighbourhoods of dependence together with the vertex Have to weight and re-scale observations

Regular graph If all dependency neighbourhoods have the same size, i.e. every vertex has the same degree, then we have a regular graph If the dependency neighbourhoods are small, then the bootstrap works (have numerical bound)

Re-weighting If the graph is not only regular, but also all pairwise intersections of dependency neighbourhoods have the same size, g, say, then adjust the variance estimate by multiplying with M and then divide by (n-g), where M is the size of the bootstrap sample, and n is the original sample size Same weights as above also if intersections are all empty

Weights in K-nearest neighbour graphs Place vertices on a circle, connect each vertex to its k nearest neighbours to the left and to the right; so each vertex has degree 2k Have to multiply variance estimator by M and divide by n-2k But also have to weight covariance part differently, depending on the size of dependency neighbourhood overlaps

Example: Bucky ball

Weighted network For each edge: simulate i.i.d. standard normals Fix a random orientation of the edge For each vertex: add the normals for edges going into the vertex, and subtract the normals going out of the vertex Sampling distribution for the variance?

Realisation

Dependency bucky graph

Variances

Numerical values

Summary Dependency graph bootstrapping from graphs, when edges indicate dependence, works when the graph is (reasonably) regular, provided that the variance estimates are multiplied by the correction factor Independent bootstrapping may lead to wrong standard error estimates

Reference S. Holmes and G. Reinert: Stein’s method for the bootstrap. In: Stein’s Method: Expository Lectures and Applications. P. Diaconis and S. Holmes, eds, IMS, Hayward, 2004. http://www.stats.ox.ac.uk/~reinert/papers/steinbootstrap.pdf