Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming in R Subset, Sort, and format data. In this session, I will introduce the topics: Subsetting the observations in a data frame. Sorting a data.

Similar presentations


Presentation on theme: "Programming in R Subset, Sort, and format data. In this session, I will introduce the topics: Subsetting the observations in a data frame. Sorting a data."— Presentation transcript:

1 Programming in R Subset, Sort, and format data

2 In this session, I will introduce the topics: Subsetting the observations in a data frame. Sorting a data frame Formatting values in a data frame.

3 Subset There are different ways to subset a data frame or select rows based on some criteria. First, check the values of the variable. –pricedata$region==3 This line of R code will return TRUE when the value of region is 3.

4 Subset Then select all the rows from the data frame when the criteria is true. >pricedata[pricedata$region==3,] This line of R code will return all rows of pricedata when the region is 3. Using ‘<-’ we can assign the results to a new object.

5 Subset We can also use the function subset(data.frame, criteria). >subset(pricedata, region==3) The line of R code will do the same thing we previously described. Again, we can assign the results to a new object with > newpricedata <- subset(pricedata, region==3)

6 Subset We can also check for multiple criteria using and/or operators. & is “and.” | is “or.” We can select records when region==3 and line==4. This is also known as an “inner join” > subset(pricedata, region == 3 & line==4)

7 Subject We can also select records when region is 3 or line is 4. –This criteria should return more records since it does not require the selection criteria to both be true at the same time. –This is also known as an “outer join”. > subset(pricedata, region == 3 | line==4)

8 Sorting If you are new to R, then the natural function to consider is “sort()”. The best function to use is the function order(). The function is order(variable(s), decreasing=FALSE) To sort the data frame it needs to be placed in the row index data.frame[order(variables),]

9 Sorting If we want to sort the price data by the cost of the devices then I would do > pricedata[order(pricedata$cost),] We can also sort by multiple variables > pricedata[order(pricedata$region, pricedata$cost),]

10 Labels A label can be used to give categorical variable coded with numbers a meaningful description. For instance, the data may have region coded as 1. The value 1 really means “East” so we want to label the value 1 as East.

11 Labels Labels are part of the attributes. To check the attributes, there is a function called attributes. Use the function str() to determine how the variable is stored.

12 Labels In R, there are two functions to use. If the data is nominal, then use the function > factor(variable, levels or values, labels) If the data is ordinal, then use the function >ordered(variable, levels or values, labels)


Download ppt "Programming in R Subset, Sort, and format data. In this session, I will introduce the topics: Subsetting the observations in a data frame. Sorting a data."

Similar presentations


Ads by Google