Bootstrapping in regular graphs Gesine Reinert, Oxford With Susan Holmes, Stanford
What is the bootstrap? Efron (1979), Bickel and Freedman (1981), Singh (1981) Resampling procedure, used to construct confidence intervals and calculate standard errors for statistics
The bootstrap procedure Have random sample of size n, say Draw M observations out of the n, with replacement Calculate the statistic of interest for this sample of size M Repeat many times Use the standard deviation in these samples to estimate standard deviation in the population
Example: median Suppose we would like to estimate the median of a population from a sample of size n Sample M=n observations with replacement from the observed data, Take the median of this simulated data set Repeat these steps B times: B simulated medians These medians are approximately draws from the sampling distribution of the median of n observations: Calculate their standard deviation to estimate the standard error of the median
When does the bootstrap work? The underlying idea is that of Russian dolls – the bootstrap samples should relate to the original sample just as the original sample relates to the unknown population (Count the freckles on the faces of Russian dolls)
Empirical measures Each observation can be represented by a point mass in space The average of these point masses is called empirical measure: a random quantity taking values in the set of measures
Limits of empirical measures This empirical measure will converge to a limit if the conditions are right; just like the law of large numbers Just like for real-valued random quantities, for independent identically distributed observations an approximation by a Gaussian measure holds We say that the bootstrap works when the bootstrap empirical measure can be approximated by a Gaussian measure centred around the true measure
Conditions for validity? The theoretical arguments proving that the bootstrap works rely on large independent samples But in dependent observations the standard deviation would be estimated wrongly In time series: blockwise bootstrap: Kuensch (1989), Carlstein et al. (1998): sample a whole block of observations in the time series, use the block to approximate the standard deviation
Dependency graphs For random variables we can construct a graph with the random variables as the vertices Two vertices are linked by an edge if and only if the corresponding random variables are dependent The set of all neighbours of a vertex is then the set of all random variables which are dependent on the vertex random variable
Bootstrapping in such graphs To capture the dependence structure, we bootstrap not isolated vertices but whole neighbourhoods of dependence together with the vertex Have to weight and re-scale observations
Regular graph If all dependency neighbourhoods have the same size, i.e. every vertex has the same degree, then we have a regular graph If the dependency neighbourhoods are small, then the bootstrap works (have numerical bound)
Re-weighting If the graph is not only regular, but also all pairwise intersections of dependency neighbourhoods have the same size, g, say, then adjust the variance estimate by multiplying with M and then divide by (n-g), where M is the size of the bootstrap sample, and n is the original sample size Same weights as above also if intersections are all empty
Weights in K-nearest neighbour graphs Place vertices on a circle, connect each vertex to its k nearest neighbours to the left and to the right; so each vertex has degree 2k Have to multiply variance estimator by M and divide by n-2k But also have to weight covariance part differently, depending on the size of dependency neighbourhood overlaps
Example: Bucky ball
Weighted network For each edge: simulate i.i.d. standard normals Fix a random orientation of the edge For each vertex: add the normals for edges going into the vertex, and subtract the normals going out of the vertex Sampling distribution for the variance?
Dependency bucky graph
Numerical values
Summary Dependency graph bootstrapping from graphs, when edges indicate dependence, works when the graph is (reasonably) regular, provided that the variance estimates are multiplied by the correction factor Independent bootstrapping may lead to wrong standard error estimates
Reference S. Holmes and G. Reinert: Stein’s method for the bootstrap. In: Stein’s Method: Expository Lectures and Applications. P. Diaconis and S. Holmes, eds, IMS, Hayward, 2004.