Download presentation
Presentation is loading. Please wait.
1
R Data Manipulation Bootstrapping
Lab 3 R Data Manipulation Bootstrapping
2
Today is the last day for basic R language.
Working directory Combining datasets Creating and deleting variables Bootstrapping How to make a function in R!
3
Get/Set Working Directory
Set working directory on hard drive to be the location of a library and reference output files Can also use top menu Session -> set working directory Useful to have working directory set before analyses Won’t have to specify path for writing output
4
Manipulating Datasets
Load up SchoolA, SchoolB, and SchoolC for following examples SchoolA our original dataset SchoolB has two additional variables CGPA Credit SchoolC has additional rows of data We will merge these datasets together
5
Rows might or might not have matches in each data set.
Combining R Data Sets You will look at combining data sets in two ways, concatenation and merging. Stacking observations from multiple data sets (concatenation) Joining multiple data sets side by side (merging) Data Set 1 Data Set 2 Data Set 3 Data Set 1 Data Set 2 Data Set 3 Data Set 3 Data Set 3 Rows might or might not have matches in each data set.
6
Merging R Datasets A merge combines two or more existing data sets by joining observations side-by-side. Datasets are linked together by some key variable Merge function
7
Merging R Datasets Use the merge function
General form of a merge procedure New object is defined Use merge function Specify datasets to merge Specify name of key variable
8
Concatenating R Data Sets
Use the rbind function (binds rows) General form of a rbind procedure New object is defined Use rbind function Specify datasets Need to make sure datasets have same variables
9
Create Or Delete Variables
As an example, suppose the data set SchoolAB contains variables HSGPA and CGPA. A new variable avgGPA can be added to the data set as follows: Variables can be deleted just as easily
10
Eliminating variables from an existing data set
Using subset: Specify dataset Use select option to define which variables to keep in new dataset.
11
Bootstrapping!
12
Bootstrapping? Bootstrapping is a nonparametric technique free from any distributional assumption of data such as normality. We do not rely on any seriously restrictive assumptions concerning the shape of the sampled populations. This technique is useful when we are uncomfortable with: Small sample size. Non-normal distribution of the sample. Statistic has complicated or no confidence interval calculation
13
General Bootstrap Procedure
Sample with replacement from original data where each resampled N equals original N Calculate statistic of interest with resampled data Repeat step 1-2 with many replications (~10,000) Sort the empirical values in ascending order, find the values corresponding to the .025 and .975 percentile ranks, and use them as lower and upper critical values for two-tailed null hypothesis significance testing. If the interval does not contain zero, then the effect is statistically significant (p<.05).
14
Bootstrap Procedure In R
Install boot package to use boot function Boot function requires three things Dataset for resampling A function that specifies the statistic returned Number of bootstrap replications
15
How to create a function
Specify object name for function Specify arguments or parameters fed into function Write code that uses arguments provided Return some object Can be of any type (single value, list, dataset…) myfunction <- function(arg1, arg2, ... ){ code return(object) } myfunction(arg1,arg2)
16
Example Bootstrap Function
Requires two specific parameters Data Indices Within function, need to define d, the resampled datasets Write code to obtain some statistic from d Return object that serves as output
17
Running Bootstrap Let’s get bootstrapped mean for CGPA in our final school dataset Use boot function Specify dataset or variable, function you created, and number of replications Print results to see all resampled results Plot results to get histogram of results
18
Getting 95% Confidence Interval
Boot.ci typically requires three parameters Output from boot function (results) Confidence level Type of bootstrap Recommend bias-corrected and accelerated bootstrap (BCA)
19
Other Notes on Bootstrap
Can return more than one bootstrapped statistic Just return a vector instead of one value in boot function Lab code includes example of bootstrapped regression Produces bootstrapped beta weights Requires additional formula parameter in boot function For your own personal interest/use
20
Homework3 Due next lab (Sept. 21st)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.