Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA MANAGEMENT MODULE: Subsetting and Formatting

Similar presentations


Presentation on theme: "DATA MANAGEMENT MODULE: Subsetting and Formatting"— Presentation transcript:

1 DATA MANAGEMENT MODULE: Subsetting and Formatting
STAT 4030 – Programming in R DATA MANAGEMENT MODULE: Subsetting and Formatting Jennifer Lewis Priestley, Ph.D. Kennesaw State University 1

2 DATA MANAGEMENT MODULE
Importing and Exporting Imputting data directly into R Creating, Adding and Dropping Variables Assigning objects Subsetting and Formatting Working with SAS Files Using SQL in R 2 2 2

3 Data Management: Subsetting
There are different ways to subset a data frame or select rows based on some criteria. First, check the values of the variable. pricedata$region==3 This line of R code will return TRUE when the value of region is 3. 3

4 Data Management: Subsetting
Then select all the rows from the data frame when the criteria is true. pricedata[pricedata$region==3,] This line of R code will return all rows of pricedata when the region is 3. Using ‘<-’ we can assign the results to a new object: Newdata<-pricedata[pricedata$region==3,]

5 Data Management: Subsetting
We can also use the function subset(data.frame, criteria) subset(pricedata, region==3) This line of R code will do the same thing we previously described. Again, we can assign the results to a new object with newpricedata <- subset(pricedata, region==3)

6 Data Management: Subset
We can also check for multiple criteria using and/or operators. & is “and.” | is “or.” We can select records when region==3 and line==4. This is also known as an “inner join” > subset(pricedata, region == 3 & line==4) 6

7 Data Management: Subset
We can also select records when region is 3 or line is 4. This criteria should return more records since it does not require the selection criteria to both be true at the same time. This is also known as an “outer join”. subset(pricedata, region == 3 | line==4) 7

8 Data Management: Sorting
If you are new to R, then the natural function to consider is “sort()”. The best function to use is the function order(). The function is order(variable(s), decreasing=FALSE) To sort the data frame it needs to be placed in the row index data.frame[order(variables),] 8

9 Data Management: Sorting
If we want to sort the price data by the cost of the devices then: pricedata[order(pricedata$cost),] We can also sort by multiple variables: pricedata[order(pricedata$region, pricedata$cost),] 9

10 Data Management: Labels
A label can be used to give categorical variables coded with numbers a meaningful description. For instance, the data may have region coded as 1. The value 1 really means “East” so we want to label the value 1 as East. Labels are part of the attributes. To check the attributes, there is a function called attributes. Use the function str() to determine the variable’s structure and how it is stored. 10

11 Data Management: Labels
In R, there are two functions to use… If the data is nominal, then use the function factor factor(variable, levels or values, labels) If the data is ordinal, then use the function ordered ordered(variable, levels or values, labels) 11


Download ppt "DATA MANAGEMENT MODULE: Subsetting and Formatting"

Similar presentations


Ads by Google