R Data Manipulation Bootstrapping

Slides:



Advertisements
Similar presentations
Biomedical Presentation Name: 牟汝振 Teach Professor: 蔡章仁.
Advertisements

Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.
Multiple regression analysis
PSY 307 – Statistics for the Behavioral Sciences
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Chapter 11 Multiple Regression.
Inference about a Mean Part II
Simple Linear Regression Analysis
Bootstrapping applied to t-tests
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
ECONOMETRICS I CHAPTER 5: TWO-VARIABLE REGRESSION: INTERVAL ESTIMATION AND HYPOTHESIS TESTING Textbook: Damodar N. Gujarati (2004) Basic Econometrics,
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Resampling techniques
Nonparametric Statistics
Chapter 15 – Analysis of Variance Math 22 Introductory Statistics.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
SW318 Social Work Statistics Slide 1 Percentile Practice Problem (1) This question asks you to use percentile for the variable [marital]. Recall that the.
Bootstrap Event Study Tests Peter Westfall ISQS Dept. Joint work with Scott Hein, Finance.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Nonparametric Tests with Ordinal Data Chapter 18.
Functions with R The first step is to invoke the boot library >library(boot) The boot function need the following arguments: boot(data,statistic,R,arguments.
Project Plan Task 8 and VERSUS2 Installation problems Anatoly Myravyev and Anastasia Bundel, Hydrometcenter of Russia March 2010.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Lecture 8 Resampling inference Trevor A. Branch FISH 553 Advanced R School of Aquatic and Fishery Sciences University of Washington.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Multiple Regression Analysis: Inference
Estimating standard error using bootstrap
Lecture Slides Elementary Statistics Twelfth Edition
Logic of Hypothesis Testing
BINARY LOGISTIC REGRESSION

Confidence Interval Estimation
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Hypotheses and test procedures
CHAPTER 10 Comparing Two Populations or Groups
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
3. The X and Y samples are independent of one another.
Sampling distribution
Comparing Two Means t-tests: Rationale for the tests t-tests as a GLM
Estimates and Sample Sizes Sections 6-2 & 6-4
CHAPTER 10 Comparing Two Populations or Groups
Environmental Modeling Basic Testing Methods - Statistics
Elementary Statistics
Data Analysis for Two-Way Tables
Lab 2 Data Manipulation and Descriptive Stats in R
Test for Mean of a Non-Normal Population – small n
Statistical Inference: One- Sample Confidence Interval
The future is a vain hope, the past is a distracting thought
十二、Nonparametric Methods (Chapter 12)
Statistical inference
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Interval Estimation and Hypothesis Testing
CHAPTER 10 Comparing Two Populations or Groups
Bootstrapping Jackknifing
Cross-validation Brenda Thomson/ Peter Fox Data Analytics
Introduction to SAS Essentials Mastering SAS for Data Analytics
CHAPTER 10 Comparing Two Populations or Groups
AP Statistics Chapter 12 Notes.
Nonparametric Statistics
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter 7 Estimation: Single Population
CHAPTER 10 Comparing Two Populations or Groups
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

R Data Manipulation Bootstrapping Lab 3 R Data Manipulation Bootstrapping

Today is the last day for basic R language. Working directory Combining datasets Creating and deleting variables Bootstrapping How to make a function in R!

Get/Set Working Directory Set working directory on hard drive to be the location of a library and reference output files Can also use top menu Session -> set working directory Useful to have working directory set before analyses Won’t have to specify path for writing output

Manipulating Datasets Load up SchoolA, SchoolB, and SchoolC for following examples SchoolA our original dataset SchoolB has two additional variables CGPA Credit SchoolC has additional rows of data We will merge these datasets together

Rows might or might not have matches in each data set. Combining R Data Sets You will look at combining data sets in two ways, concatenation and merging. Stacking observations from multiple data sets (concatenation) Joining multiple data sets side by side (merging) Data Set 1 Data Set 2 Data Set 3 Data Set 1 Data Set 2 Data Set 3 Data Set 3 Data Set 3 Rows might or might not have matches in each data set.

Merging R Datasets A merge combines two or more existing data sets by joining observations side-by-side. Datasets are linked together by some key variable Merge function

Merging R Datasets Use the merge function General form of a merge procedure New object is defined Use merge function Specify datasets to merge Specify name of key variable

Concatenating R Data Sets Use the rbind function (binds rows) General form of a rbind procedure New object is defined Use rbind function Specify datasets Need to make sure datasets have same variables

Create Or Delete Variables As an example, suppose the data set SchoolAB contains variables HSGPA and CGPA. A new variable avgGPA can be added to the data set as follows: Variables can be deleted just as easily

Eliminating variables from an existing data set Using subset: Specify dataset Use select option to define which variables to keep in new dataset.

Bootstrapping!

Bootstrapping? Bootstrapping is a nonparametric technique free from any distributional assumption of data such as normality. We do not rely on any seriously restrictive assumptions concerning the shape of the sampled populations. This technique is useful when we are uncomfortable with: Small sample size. Non-normal distribution of the sample. Statistic has complicated or no confidence interval calculation

General Bootstrap Procedure Sample with replacement from original data where each resampled N equals original N Calculate statistic of interest with resampled data Repeat step 1-2 with many replications (~10,000) Sort the empirical values in ascending order, find the values corresponding to the .025 and .975 percentile ranks, and use them as lower and upper critical values for two-tailed null hypothesis significance testing. If the interval does not contain zero, then the effect is statistically significant (p<.05).

Bootstrap Procedure In R Install boot package to use boot function Boot function requires three things Dataset for resampling A function that specifies the statistic returned Number of bootstrap replications

How to create a function Specify object name for function Specify arguments or parameters fed into function Write code that uses arguments provided Return some object Can be of any type (single value, list, dataset…) myfunction <- function(arg1, arg2, ... ){ code return(object) } myfunction(arg1,arg2)

Example Bootstrap Function Requires two specific parameters Data Indices Within function, need to define d, the resampled datasets Write code to obtain some statistic from d Return object that serves as output

Running Bootstrap Let’s get bootstrapped mean for CGPA in our final school dataset Use boot function Specify dataset or variable, function you created, and number of replications Print results to see all resampled results Plot results to get histogram of results

Getting 95% Confidence Interval Boot.ci typically requires three parameters Output from boot function (results) Confidence level Type of bootstrap Recommend bias-corrected and accelerated bootstrap (BCA)

Other Notes on Bootstrap Can return more than one bootstrapped statistic Just return a vector instead of one value in boot function Lab code includes example of bootstrapped regression Produces bootstrapped beta weights Requires additional formula parameter in boot function For your own personal interest/use

Homework3 Due next lab (Sept. 21st)