Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management Module: Concatenating, Stacking, Merging and Recoding

Similar presentations


Presentation on theme: "Data Management Module: Concatenating, Stacking, Merging and Recoding"— Presentation transcript:

1 Data Management Module: Concatenating, Stacking, Merging and Recoding
Programming in R Data Management Module: Concatenating, Stacking, Merging and Recoding

2 Data Management Module
Importing and Exporting Imputting data directly into R Creating, Adding and Dropping Variables Assigning objects Subsetting and Formatting Working with SAS Files Using SQL in R Concatenating, Stacking and Merging Replacing values

3 Data Management: SAS Files in R
You have likely noticed that SAS files have a .sas7bdat extension. To get these files into R, you must: Convert them into xport files in SAS. Install the hmisc package in R (or the foreign package). Use the sasxport.get function to “write” the SAS data into an R dataframe.

4 Data Management: Concatenating
To “concatenate” basically means to bring together columns (vectors) of data. In R, this is accomplished through the function cbind: Newdata <- cbind(data1, data2) This will create as many columns are in the sum of data1 and data2. Note that a “matchkey” is not needed.

5 Data Management: Stacking
To “stack” basically means to bring together rows of data. In R, this is accomplished through the function rbind: Newdata <- rbind(data1, data2) This will create as many rows are in the sum of data1 and data2. Note that there MUST be the same column names in data1 and data2. Note that a “matchkey” is not needed.

6 Data Management: Merging
To “Merge” basically means to bring together dataframes. In R, this is accomplished through the function merge: Newdata <- merge (data1, data2, by="PrimaryKey", all="TRUE") Note that all = TRUE will include all rows and columns for both data1 and data2 – essentially an outer join. all=FALSE will include only rows and columns that are present in both data1 and data2 – essentially an inner join. Note that a “matchkey” IS needed.

7 Data Management: Recoding values
Frequently when you receive data either a) the values are incorrect or b) they are “coded” with designated values – like 999. These values need to be replaced. This process is called recoding or imputation – depending on the logic behind the replacement. These notes will cover recoding (simply replacing one value with another). Later notes will cover statistical imputation methods.

8 Data Management: Recoding values
At this point, lets recode values using the same logic you would use in Excel: IF(Condition, value if true, value if false) In R: newvariable<-ifelse(oldvariable test, value if true, value if false)


Download ppt "Data Management Module: Concatenating, Stacking, Merging and Recoding"

Similar presentations


Ads by Google