Download presentation
Presentation is loading. Please wait.
1
R Course 2nd lecture
2
Recap of last class Basics Data types and Coercion
Operators and precedence Data structures (Atomic Vectors, lists, matrices, data frames, arrays)
3
Atomic One dimensional, must be of the same type c(1:9) c(“A”,”B”,”C”)
Can use operations such as sum etc. on these
4
Example: how many cards remain after drawing some cards from a deck
remain <- c(11, 12, 11, 13) suits <- c("spades", "hearts", "diamonds", "clubs") names(remain) <- suits Other methods: remain <- c(spades = 11, hearts = 12, diamonds = 11, clubs = 13) remain <- c("spades" = 11, "hearts" = 12, "diamonds" = 11, "clubs" = 13)
5
List One dimensional, can be any types
x <- list(c(1,2,3), 100, c(TRUE, FALSE, TRUE), list("a", "b", ”c”))
6
Task Create a logical vector with the following elements in order: True, False, True Name the columns of the vector for the days of the week(Mon to Fri): poker_vector <- c(140, -50, 20, -120, 240) Create the variable roulette_vector to represent winnings/losses of playing roulette. On Monday you lost $24 - Tuesday you lost $50 - Wednesday you won $100 - Thursday you lost $ Friday you won $10. Make sure to name the columns!
7
Consider the following
poker_vector1 <- c(140, -50, 20, -120, 240) names(poker_vector1) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") poker_vector2 <- c(Monday = 140, -50, 20, -120, 240) roulette_vector1 <- c(-24, -50, 100, -350, 10) days_vector <- names(poker_vector1) names(roulette_vector1) <- days_vector roulette_vector2 <- c(-24, -50, 100, -350, 10) names(roulette_vector2) <- "Monday" Which of the following statements is true? The code to define poker_vector2 is systematically invalid Poker_vector1 and poker_vector2 have different lengths poker_vector1 and roulette_vector1 have the same names, while poker_vector2 and roulette_vector2 show a names mismatch. You can only use names() to set the names of a vector, making days_vector <- names(poker_vector1) invalid.
8
Answer The answer was 3 as the method to produce roulette_vector2 gives Monday <NA> <NA> <NA> <NA> Whereas the method to produce poker_vector2 gives Monday
9
Task A_vector <- c(1, 2, 3) B_vector <- c(4, 5, 6)
# Take the sum of A_vector and B_vector: total_vector # Print total_vector # Calculate the difference between A_vector and B_vector: diff_vector # Print diff_vector
10
Hint: this question requires use of the sum function
# Casino winnings from Monday to Friday poker_vector <- c(140, -50, 20, -120, 240) roulette_vector <- c(-24, -50, 100, -350, 10) days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") names(poker_vector) <- days_vector names(roulette_vector) <- days_vector # Calculate your daily earnings: total_daily # Total winnings with poker: total_poker # Total winnings with roulette: total_roulette # Total winnings overall: total_week # Print total_week Hint: this question requires use of the sum function
11
Create a new vector containing logicals, poker_better, that tells whether your poker gains exceeded your roulette results on a daily basis. Calculate total_poker and total_roulette as in the previous exercise. Using total_poker and total_roulette, Check if your total gains in poker are higher than for roulette by using a comparison. Assign the result of this comparison to the variable choose_poker and print it out. What do you conclude, should you focus on roulette or on poker?
12
Another example Poker_past: -70 90 110 -120 30
Roulette_past: poker_present Monday Tuesday Wednesday Thursday Friday # Calculate total gains for your entire past week: total_past # Difference of past to present performance: diff_poker
13
Vector subsetting > remain <- c(spades = 11, hearts = 12,diamonds = 11, clubs = 13) > remain[1] Spades 11 > remain[3] diamonds 11
14
Further subsetting > Remain["spades"] spades 11 remain[c(1, 4)]
spades clubs remain[c(4, 1)] Clubs spades 13 11
15
> remain[-1] All but index 1 are returned
>remain[-c(1,2)] > remain[-"spades"] Error in -"spades" : invalid argument to unary operator
16
Subsetting using logical vectors
> remain <- c(spades = 11, hearts = 12, diamonds = 11, clubs = 13) > remain[c(FALSE, TRUE, FALSE, TRUE)] hearts clubs 12 13 > selection_vector <- c(FALSE, TRUE, FALSE, TRUE) > remain[selection_vector]
17
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10) days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") names(poker_vector) <- days_vector names(roulette_vector) <- days_vector # Poker results of Wednesday: poker_wednesday # Roulette results of Friday: roulette_friday
18
Task # Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240) roulette_vector <- c(-24, -50, 100, -350, 10) days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") names(poker_vector) <- days_vector names(roulette_vector) <- days_vector # Poker results of Wednesday: poker_wednesday # Roulette results of Friday: roulette_friday
19
# Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240) roulette_vector <- c(-24, -50, 100, -350, 10) days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") names(poker_vector) <- days_vector names(roulette_vector) <- days_vector # Mid-week poker results: poker_midweek # End-of-week roulette results: roulette_endweek
20
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10) days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") names(poker_vector) <- days_vector names(roulette_vector) <- days_vector # Create logical vector corresponding to profitable poker days: selection_vector # Select amounts for profitable poker days: poker_profits
21
More advanced subsetting
Player: House: # Select the player's score for the third game: player_third # Select the scores where player exceeds house: winning_scores # Count number of times player < 18: n_low_score Hints:With square brackets, select the player's score for the third game, using any of the techniques that you've learned about. Store the result in player_third. Subset the player vector to only select the scores that exceeded the scores of house, so the scores that had the player win. Use subsetting in combination with the relational operator >. Assign the subset to the variable winning_scores. Count the number of times the score inside player was lower than 18. This time, you should use a relational operator in combination with sum(). Save the resulting value in a new variable, n_low_score.
22
Matrix One Two Alpha 245 304 Bravo 178 257 Charlie 314 260
2-dimensional, 1 type We briefly looked at matrix(data=c(1:9),nrow=3,ncol=3) Now we’ll look at creating the following matrix with rbind() which can be used to combine vectors and matrices to create a matrix One Two Alpha 245 304 Bravo 178 257 Charlie 314 260
23
Alpha<- c(245,304) Bravo<-c(178,257) Charlie<-c(314,260) ABCmatrix<-rbind(Alpha,Bravo,Charlie) colnames(ABCmatrix)<-c("one","two") Another way of naming is dimnames(ABCmatrix)<-list(rowname,colname) where rowname/colname are vectors representing the names
24
Subsetting matrices Matrix[row,col]
Try the following and see if you can work out what they do ABCmatrix[1,] ABCmatrix[2,2] ABCmatrix[c(1,3),] ABCmatrix[-3,]
25
Matrix elements can be accessed with matrix[row,column] notation
Matrix elements can be accessed with matrix[row,column] notation. Omitting row requests all rows, and omitting column requests all columns. a <- matrix(1:6, nrow=2) # row 2 column 3 a[2,3] ## [1] # all rows column 2 b[,2] ## [1] # all columns row 1 a[1,] ## [1] 1 3 5
26
Task a vector box is defined that represents the box office numbers from the first three Star Wars movies. The first, third and fifth element correspond to the US box office revenue for the movies, the second, fourth and sixth element represent the non-US box office revenue. box <- c( , 314.4, , , , 165.8) Construct a matrix with one row for each movie. The first column is for the US box office revenue, and the second column for the non-US box office revenue. Name the matrix star_wars_matrix
27
Another method new_hope <- c(460.998, 314.4)
empire_strikes <- c( , ) return_jedi <- c( , 165.8) Use rbind() to create star_wars_matrix, and name the columns and rows
28
Which options are correct? A A & C B&D D
# option A star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi) rownames(star_wars_matrix) <- c("US", "non-US") colnames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi") # option B col <- c("US", "non-US") row <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi") rbind(new_hope, empire_strikes, return_jedi, names = c(col, row)) # option C col <- c("US", "non-US") row <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi") star_wars_matrix <- matrix(c(new_hope, empire_strikes, return_jedi), byrow = TRUE, nrow = 3, dimnames = list(col, row)) # option D col <- c("US", "non-US") row <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi") star_wars_matrix <- matrix(c(new_hope, empire_strikes, return_jedi), byrow = TRUE, nrow = 3, dimnames = list(row, col))
29
The answer was D! Both option A and C have switched rows and column, so these generate an error. Option B is completely invalid syntax. rbind() has no names argument!
30
Using Star_wars_matrix
# Select all US box office revenue # Select revenue for "A New Hope" # Average non-US revenue per movie: non_us_all # Average non-US revenue of first two movies: non_us_some # Create a submatrix for all figures for "A New Hope" and "Return of the Jedi"
31
Using names and logical vectors to subset
Using names, select the US revenues for "A New Hope" and "The Empire Strikes Back". Using logical vectors, select the last two rows and both columns from star_wars_matrix. Finally, select the non-US revenue for "The Empire Strikes Back" with whatever technique you like.
32
Which one of these calls selects the total revenue for the 2nd, 4th and 6th movie in the matrix?
A&B All 4 A&D C&D Only B All_wars_matrix US non-US total ANH 314.4 ESB 247.9 ROTJ 165.8 TPM 552.5 AOTC 338.7 ROTS 468.5 # option A: all_wars_matrix[seq(2, 6, by = 2), "total"] # option B: all_wars_matrix[c(F,T,F,T,F,T), c(F,T)] # option C: all_wars_matrix[c("The Empire Strikes Back", 4, 6), c(T,T,F)] # option D: all_wars_matrix[c(F,T), "total"]
33
The answer was A & D!
34
Dataframe 2-dimensional, any type
df <- data.frame(name = c("George", "Joe", "Chris"), age = c(52, 29, 25), relationshipStatus = c("married", "single", "married"))
35
# a logical vector and numeric vector of equal length mydata <- data.frame(diabetic = c(TRUE, FALSE, TRUE, FALSE), height = c(65, 69, 71, 73)) mydata ## diabetic height ## TRUE ## 2 FALSE ## TRUE ## 4 FALSE
36
With a two-dimensional structure, data frames can be subset like matrices [rows, columns]. # row 3 column 2 mydata[3,2] ## [1] # using column name mydata[1:2, "height"] ## [1] # all rows of column "height" mydata[,"diabetic"] ## [1] TRUE FALSE TRUE FALSE
37
We can subset data frames like lists as well
We can subset data frames like lists as well. The columns are considered the list elements, so we can use either [[]] or $ to extract columns. Extracted columns are vectors. # subsetting creates a numeric vector mydata$height[2:3] ## [1] # this is a numeric vector mydata[["height"]] ## [1] mydata[["height"]][2] ## [1] 69
38
colnames(data_frame) returns the column names of data_frame (or matrix). colnames(data_frame) <- c("some", "names") assigns column names to data_frame. # get column names colnames(mydata) ## [1] "diabetic" "height" # assign column names colnames(mydata) <- c("Diabetic", "Height") colnames(mydata) ## [1] "Diabetic" "Height" # to change one variable name, just use indexing colnames(mydata)[1] <- "Diabetes" colnames(mydata) ## [1] "Diabetes" "Height"
39
Use dim() on two-dimensional objects to get the number of rows and columns. Use str(), to see the structure of the object, including its class (discussed later) and the data types of elements. # number of rows and columns dim(mydata) ## [1] #d is of class "data.frame" #all of its variables are of type "integer" str(mydata) ## 'data.frame': 4 obs. of 2 variables: ## $ Diabetes: logi TRUE FALSE TRUE FALSE ## $ Height : num
40
Array n-dimensional, same type
a <- array(data=c(1:6), dim = c(2,2,2)) gives a 2x2x2 array
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.