Presentation is loading. Please wait.

Presentation is loading. Please wait.

R Course 3rd lecture.

Similar presentations


Presentation on theme: "R Course 3rd lecture."— Presentation transcript:

1 R Course 3rd lecture

2 Topics to cover Last class we covered creating and naming vectors and using vectors for operations This class we will cover: Vector Subsetting, Matrices, dataframes and arrays

3 Vector subsetting > remain <- c(spades = 11, hearts = 12,diamonds = 11, clubs = 13) > remain[1] Spades 11 > remain[3] diamonds 11

4 Further subsetting > Remain["spades"] spades 11 remain[c(1, 4)]
spades clubs remain[c(4, 1)] Clubs spades 13 11

5 > remain[-1] All but index 1 are returned
>remain[-c(1,2)] > remain[-"spades"] Error in -"spades" : invalid argument to unary operator

6 Subsetting using logical vectors
> remain <- c(spades = 11, hearts = 12, diamonds = 11, clubs = 13) > remain[c(FALSE, TRUE, FALSE, TRUE)] hearts clubs 12 13 > selection_vector <- c(FALSE, TRUE, FALSE, TRUE) > remain[selection_vector]

7 poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10) days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") names(poker_vector) <- days_vector names(roulette_vector) <- days_vector # Poker results of Wednesday: poker_wednesday # Roulette results of Friday: roulette_friday

8 Task # Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240) roulette_vector <- c(-24, -50, 100, -350, 10) days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") names(poker_vector) <- days_vector names(roulette_vector) <- days_vector # Poker results of Wednesday: poker_wednesday # Roulette results of Friday: roulette_friday

9 # Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240) roulette_vector <- c(-24, -50, 100, -350, 10) days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") names(poker_vector) <- days_vector names(roulette_vector) <- days_vector # Mid-week poker results: poker_midweek # End-of-week roulette results: roulette_endweek

10 poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10) days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") names(poker_vector) <- days_vector names(roulette_vector) <- days_vector # Create logical vector corresponding to profitable poker days: selection_vector # Select amounts for profitable poker days: poker_profits

11 More advanced subsetting
Player: House: # Select the player's score for the third game: player_third # Select the scores where player exceeds house: winning_scores # Count number of times player < 18: n_low_score Hints:With square brackets, select the player's score for the third game, using any of the techniques that you've learned about. Store the result in player_third. Subset the player vector to only select the scores that exceeded the scores of house, so the scores that had the player win. Use subsetting in combination with the relational operator >. Assign the subset to the variable winning_scores. Count the number of times the score inside player was lower than 18. This time, you should use a relational operator in combination with sum(). Save the resulting value in a new variable, n_low_score.

12 Matrix One Two Alpha 245 304 Bravo 178 257 Charlie 314 260
2-dimensional, 1 type We briefly looked at creating a matrix using the method matrix(data=c(1:9),nrow=3,ncol=3) Now we’ll look at creating One Two Alpha 245 304 Bravo 178 257 Charlie 314 260

13 Alpha<- c(245,304) Bravo<-c(178,257) Charlie<-c(314,260) ABCmatrix<-rbind(Alpha,Bravo,Charlie) colnames(ABCmatrix)<-c("one","two") Another way of naming is dimnames(ABCmatrix)<-list(rowname,colname) where rowname/colname are vectors representing the names

14 Subsetting matrices Matrix[row,col] E.g. ABCmatrix[1,] ABCmatrix[2,2]
ABCmatrix[c(1,3),] ABCmatrix[-3,]

15 Task A vector box is defined that represents the box office numbers from the first three Star Wars movies (A New Hope, Empire Strikes Back, and Return of the Jedi). The first, third and fifth element correspond to the US box office revenue for the movies, the second, fourth and sixth element represent the non-US box office revenue. box <- c( , 314.4, , , , 165.8) Construct a matrix with one row for each movie. The first column is for the US box office revenue, and the second column for the non-US box office revenue. Name the matrix star_wars_matrix

16 Another method new_hope <- c(460.998, 314.4)
empire_strikes <- c( , ) return_jedi <- c( , 165.8) Use rbind() to create star_wars_matrix, and name the columns and rows

17 Which options are correct
Which options are correct? Hint: If you’re stuck try them out to see what they do A A & C B&D D # option A star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi) rownames(star_wars_matrix) <- c("US", "non-US") colnames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi") # option B col <- c("US", "non-US") row <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi") rbind(new_hope, empire_strikes, return_jedi, names = c(col, row)) # option C col <- c("US", "non-US") row <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi") star_wars_matrix <- matrix(c(new_hope, empire_strikes, return_jedi), byrow = TRUE, nrow = 3, dimnames = list(col, row)) # option D col <- c("US", "non-US") row <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi") star_wars_matrix <- matrix(c(new_hope, empire_strikes, return_jedi), byrow = TRUE, nrow = 3, dimnames = list(row, col))

18 The answer was D! Both option A and C have switched rows and column, so these generate an error. Option B is completely invalid syntax. rbind() has no names argument!

19 Practicing subsetting
Using Star_wars_matrix: # Select all US box office revenue # Select revenue for "A New Hope" # Calculate the average non-US revenue per movie: non_us_all # Calculate the average non-US revenue of first two movies: non_us_some # Create a submatrix for all figures for "A New Hope" and "Return of the Jedi"

20 Using names and logical vectors to subset
Using names, select the US revenues for "A New Hope" and "The Empire Strikes Back". Using logical vectors, select the last two rows and both columns from star_wars_matrix. Finally, select the non-US revenue for "The Empire Strikes Back" with whatever technique you like.

21 Which one of these calls selects the total revenue for the 2nd, 4th and 6th movie in the matrix?
A&B All 4 A&D C&D Only B All_wars_matrix US non-US total ANH 314.4 ESB 247.9 ROTJ 165.8 TPM 552.5 AOTC 338.7 ROTS 468.5 # option A: all_wars_matrix[seq(2, 6, by = 2), "total"] # option B: all_wars_matrix[c(F,T,F,T,F,T), c(F,T)] # option C: all_wars_matrix[c("The Empire Strikes Back", 4, 6), c(T,T,F)] # option D: all_wars_matrix[c(F,T), "total"]

22 The answer was A & D!

23 Dataframe 2-dimensional, any type
df <- data.frame(name = c("Matt", "Joe", "Chris"), age = c(52, 29, 25), relationshipStatus = c("married", "single", "married"))

24 Task planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune") type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant") diameter <- c(0.382, 0.949, 1, 0.532, , 9.449, 4.007, 3.883) rotation <- c(58.64, , 1, 1.03, 0.41, 0.43, -0.72, 0.67) rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE) Create a dataframe from these vectors Make sure that you've actually created a data frame with 8 observations and 5 variables with str()

25 Array n-dimensional, same type
a <- array(data=c(1:6), dim = c(2,2,2)) gives a 2x2x2 array

26 Factors Factors > blood <- c("B", "AB", "O", "A", "O", "O", "A", "B") > blood_factor <- factor(blood) [1] B AB O A O O A B Levels: A AB B O > str(blood_factor) Factor w/ 4 levels "A","AB","B","O": Can also do various things such as renaming levels and ordering

27 Lists - Used to store R objects with no coercion, but some loss of functionality
> song <- list("Rsome times", 190, 5) > names(song) <- c("title", "duration", "track") > song <- list(title = "Rsome times",duration = 190,track = 5) Extending lists friends <- c("Kurt", "Florence","Patti", "Dave") song$sent <- friends Can also subset lists, [ (incl. names) vs [[: song[1] vs song[[1]]

28 Transposing A very useful command is t()
This transposes data and can be very useful when manipulating data sets


Download ppt "R Course 3rd lecture."

Similar presentations


Ads by Google