Data Management Module: Creating, Adding and Dropping Variables Programming in R Data Management Module: Creating, Adding and Dropping Variables
Data Management Module Importing and Exporting Imputting data directly into R Creating, Adding and Dropping Variables Assigning objects Subsetting and Formatting Working with SAS Files Using SQL in R
Managing Variables: Accessing Variables There are two basic ways to access variables: You can reference the column number or the variable number You can reference the variable name. PRIESTLEY/STAT4030
Managing Variables: Accessing Variables Data frames in R are special matrices. Matrices have the concept of rows and columns (rXc). To access a cell X[r,c] To access a column X[,c] To access a row X[r,] PRIESTLEY/STAT4030
Managing Variables: Accessing Variables For example, this will access all rows and column 1: fallsurvey[,1] This will access row 1 and all columns: fallsurvey[1,] This will access just the first obs in the first column: fallsurvey[1,1] PRIESTLEY/STAT4030
Managing Variables - Keep There are a few different ways to specify the variables to keep in the data set We can specify specific columns by the column number We can also keep columns by specifying the name PRIESTLEY/STAT4030
Managing Variables - Keep # Keep based on column number # fallsurvey1 <- fallsurvey[,1:2] head(fallsurvey1) str(fallsurvey1) # Keep based on variable name # fallsurvey2 <- fallsurvey[,c("Sem...Year","Adj.GPA")] head(fallsurvey2) PRIESTLEY/STAT4030
Managing Variables - Drop Similarly, there are a few different ways to specify the variables to keep in the data set. We can specify specific columns by the column number: # Drop based on column number fallsurvey1 <- fallsurvey[,-2] head(fallsurvey1) fallsurvey2 <- fallsurvey[,names(fallsurvey)[c(-2,-3)]] head(fallsurvey2) While variables CAN be dropped by referencing the variable name, this is more difficult and will be covered later. PRIESTLEY/STAT4030
Managing Variables - Renaming To rename a variable, you need to use the “names” function. Basically, here you are calling the function, identifying the dataset, then identifying the vector (column) that you are renaming: names(fallsurvey)[names(fallsurvey)=="Sem...Year"] <- "Sem/Year" str(fallsurvey) PRIESTLEY/STAT4030
Managing Variables - New variables Variables can be easily created from other variables available in the data set. Variables can be created in the context of SQL statements. Care must be taken to ensure the resulting variable appears in the expected location. PRIESTLEY/STAT4030
Managing Variables - New variables Think about the difference between these two sets of code… total.drinks <- fallsurvey$Drinks.before.Noon+fallsurvey$Drinks.after.Noon head(total.drinks) ls() fallsurvey$total.drinks <- fallsurvey$Drinks.before.Noon+fallsurvey$Drinks.after.Noon head(fallsurvey$total.drinks) head(fallsurvey) PRIESTLEY/STAT4030