Recoding III: Introducing apply()

Slides:



Advertisements
Similar presentations
R for Macroecology Aarhus University, Spring 2011.
Advertisements

1 R Programming Zhi Wei. 2 Language layout  Three types of statement expression: it is evaluated, printed, and the value is lost (3+5) assignment: passes.
Lecture 7 Sept 19, 11 Goals: two-dimensional arrays (continued) matrix operations circuit analysis using Matlab image processing – simple examples Chapter.
Lecture 6 Sept 15, 09 Goals: two-dimensional arrays matrix operations circuit analysis using Matlab image processing – simple examples.
Concatenation MATLAB lets you construct a new vector by concatenating other vectors: – A = [B C D... X Y Z] where the individual items in the brackets.
Introduction to R Lecture 3: Data Manipulation Andrew Jaffe 9/27/10.
Session 3: More features of R and the Central Limit Theorem Class web site: Statistics for Microarray Data Analysis.
Arrays 1 Multiple values per variable. Why arrays? Can you collect one value from the user? How about two? Twenty? Two hundred? How about… I need to collect.
Matlab for Engineers Manipulating Matlab Matrices Chapter 4.
Advanced Topics- Functions Introduction to MATLAB 7 Engineering 161.
Accuracy Chapter 5.1 Data Screening. Data Screening So, I’ve got all this data…what now? – Please note this is going to deviate from the book a bit and.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
MA/CS 375 Fall 2002 Lecture 3. Example 2 A is a matrix with 3 rows and 2 columns.
Course A201: Introduction to Programming 09/30/2010.
Introduction to Loops For Loops. Motivation for Using Loops So far, everything we’ve done in MATLAB, you could probably do by hand: Mathematical operations.
Recap Saving Plots Summary of Chapter 5 Introduction of Chapter 6.
1 Writing Better R Code Hui Zhang Research Analytics.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Programming Fundamentals. Topics to be covered Today Recursion Inline Functions Scope and Storage Class A simple class Constructor Destructor.
MA/CS 375 Fall 2002 Lecture 2. Motivation for Suffering All This Math and Stuff Try the Actor demo from
SCRIPTS AND FUNCTIONS DAVID COOPER SUMMER Extensions MATLAB has two main extension types.m for functions and scripts and.mat for variable save files.
Multidimensional Arrays Computer and Programming.
Arrays and Loops. Learning Objectives By the end of this lecture, you should be able to: – Understand what a loop is – Appreciate the need for loops and.
Ch.6 Efficient calculations. 6.1 Vectorized computations  Replacing numbers with ‘for’ expression tmp
1 Introduction to R A Language and Environment for Statistical Computing, Graphics & Bioinformatics Introduction to R Lecture 3
Control Structures Hara URL:
Operator Overloading Introduction
Trace Tables In today’s lesson we will look at:
Topic: Python Lists – Part 1
Manipulating MATLAB Matrices Chapter 4
Loops BIS1523 – Lecture 10.
Introduction to R Carolina Salge March 29, 2017.
Repetition Structures Chapter 9
Matlab Training Session 4: Control, Flow and Functions
GC211Data Structure Lecture2 Sara Alhajjam.
CMPT 120 Topic: Python strings.
CS1010 Discussion Group 11 Week 12 – Structured Structures.
Peter Kuriyama FISH 512 April 9th, 2014
Data manipulation in R: dplyr
Numerical Descriptives in R
Functions BIS1523 – Lecture 17.
Chapter 6 Methods: A Deeper Look
Recoding III: Introducing apply()
Recoding II: Numerical & Graphical Descriptives
CSCI N207 Data Analysis Using Spreadsheet
Matlab tutorial course
L07 Apply and purrr EPID 799C Fall 2018.
Topics Sequences Introduction to Lists List Slicing
Stat 251 (2009, Summer) Lab 1 TA: Yu, Chi Wai.
Loop Statements & Vectorizing Code
CSCI N317 Computation for Scientific Applications Unit R
Multidimensional array
Topics Introduction to Value-returning Functions: Generating Random Numbers Writing Your Own Value-Returning Functions The math Module Storing Functions.
SYMMETRIC SKEWED LEFT SKEWED RIGHT
Loops and Arrays in JavaScript
Lets Play with arrays Singh Tripty
Topics Sequences Lists Copying Lists Processing Lists
Functions continued.
IPC144 Introduction to Programming Using C Week 4 – Lesson 2
R Course 1st Lecture.
Topics Sequences Introduction to Lists List Slicing
Data analysis with R and the tidyverse
Advanced data management
Loop Statements & Vectorizing Code
R programming.
R for Epi Workshop Module 1: Learning Base R
General Computer Science for Engineers CISC 106 Lecture 11
Software Development Techniques
CMPT 120 Topic: Python strings.
Announcements Exam 2 Next Week!! Final Exam :
Presentation transcript:

Recoding III: Introducing apply() EPID 799C, Lecture 6 Wednesday, Sept. 19, 2018

Introducing the Apply() Family Functions in base R to manipulate slices of data from matrices, arrays, lists, and dataframes in a repetitive way without explicit use of loop constructs. Also known as “functional functions” (functions taking a function as an argument). Typically require very few lines of code. Family members: apply() lapply() sapply() vapply() mapply() rapply() tapply() eapply() Q: How can I use a loop to […insert task here…]? A: Don’t. Use one of the apply functions. https://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/

Motivation 1 from Last Week’s Lecture #Need to recode 99’s as R’s missing value NA births$mdif[births$mdif==99] <- NA births$wksgest[births$wksgest==99] <- NA births$mage[births$mage==99] <- NA births$totpreg[births$totpreg==99] <- NA births$visits[births$visits==99] <- NA # …and so on…

Motivation 1 from Last Week’s Lecture #Need to recode 99’s as R’s missing value NA births$mdif[births$mdif==99] <- NA births$wksgest[births$wksgest==99] <- NA births$mage[births$mage==99] <- NA births$totpreg[births$totpreg==99] <- NA births$visits[births$visits==99] <- NA # …and so on… This is repetitive and error-prone!!

Motivation 2 from Last Week’s Lecture #Need min, max, mean, standard deviation, and interquartile range for all continuous variables min(births$mage, na.rm = T) max(births$mage, na.rm = T) mean(births$mage, na.rm = T) sd(births$mage, na.rm = T) IQR(births$mage, na.rm = T) # …and now repeat for a bunch of other variables…

Motivation 2 from Last Week’s Lecture #Need min, max, mean, standard deviation, and interquartile range for all continuous variables min(births$mage, na.rm = T) max(births$mage, na.rm = T) mean(births$mage, na.rm = T) sd(births$mage, na.rm = T) IQR(births$mage, na.rm = T) # …and now repeat for a bunch of other variables…

General Idea “For each of these things… (columns or rows in a dataframe, elements of a vector, etc.) …do this… (apply a function) …put it together… (compile the results of applying that function to each thing) …and give me the results.” (create a new object or print results to the console)

apply() Syntax: apply(X= , MARGIN= , FUN=, …) X = the matrix you want to perform a function on MARGIN = defines how the function is applied (1 for rows, 2 for columns) MARGIN=1 applies the function on each observation (row) at a time. MARGIN=2 applies the function on each variable (column) at a time. FUN = any R function you want to apply Could be built-in or user-defined

apply() example apply(X = births[,c("mage","wksgest","totpreg","visits")], MARGIN = 2, FUN = mean, na.rm=T) #Mean of non-missing values for each variable

lapply() Applies a function to each element of a list or vector, returning a list X = a list or vector FUN = R function to be applied No longer need margin since function is applied to each element of the list

apply() vs. lapply() a <- apply(X = births[,c("mage","wksgest","totpreg","visits")], MARGIN = 2, FUN = mean, na.rm=T) l <- lapply(X = births[,c("mage","wksgest","totpreg","visits")], What’s the difference between the R objects we created, a and l?

lapply() for one of our motivating examples lapply(X = births[,c("mage","wksgest","totpreg","visits")], FUN = function(x) rbind(min = min(x, na.rm=T), max = max(x, na.rm=T), mean = mean(x, na.rm=T), sd = sd(x, na.rm=T), iqr = IQR(x, na.rm=T)))

lapply() for one of our motivating examples lapply(X = births[,c("mage","wksgest","totpreg","visits")], FUN = function(x) rbind(min = min(x, na.rm=T), max = max(x, na.rm=T), mean = mean(x, na.rm=T), sd = sd(x, na.rm=T), iqr = IQR(x, na.rm=T))) This is an anonymous function.

lapply() for one of our motivating examples lapply(X = births[,c("mage","wksgest","totpreg","visits")], FUN = function(x) rbind(min = min(x, na.rm=T), max = max(x, na.rm=T), mean = mean(x, na.rm=T), sd = sd(x, na.rm=T), iqr = IQR(x, na.rm=T))) This is an anonymous function. Alternatively, we could have created it outside of apply() and named it in our environment. my_key_stats <- function(x) { rbind(min = min(x, na.rm=T), max = max(x, na.rm=T), mean = mean(x, na.rm=T), sd = sd(x, na.rm=T), iqr = IQR(x, na.rm=T)) } Then, in apply() we would specify FUN = my_key_stats.

lapply() for one of our motivating examples #Create R object with output my_summary <- lapply(X = births[,c("mage","wksgest","totpreg","visits")], FUN = function(x) rbind(min = min(x, na.rm=T), max = max(x, na.rm=T), mean = mean(x, na.rm=T), sd = sd(x, na.rm=T), iqr = IQR(x, na.rm=T)))

lapply() for one of our motivating examples #Create R object with output my_summary <- lapply(X = births[,c("mage","wksgest","totpreg","visits")], FUN = function(x) rbind(min = min(x, na.rm=T), max = max(x, na.rm=T), mean = mean(x, na.rm=T), sd = sd(x, na.rm=T), iqr = IQR(x, na.rm=T))) data.frame(my_summary) #outputs a dataframe instead of a list

sapply() Very similar to lapply, but tries to simplify output to a vector or matrix X = list or vector FUN = R function to be applied No margin specified

sapply() vs. lapply() What’s the difference in output? lapply(births, mean, na.rm=T) sapply(births, mean, na.rm=T) What’s the difference in output?

sapply() for the other motivating example #Our missing data function fix_missing <- function(x) { x[x == 99] <- NA x } #Variables with missing data coded as 99 vars_to_fix <-births[,c("mage","wksgest","totpreg","visits")] #Recoding each variable with our function vars_fixed <- sapply(X = vars_to_fix, FUN = fix_missing)

tapply() Run a function by group (i.e., levels of a factor variable). Syntax is similar, but now includes INDEX= to specify the grouping variable. Simple example: tapply(X = births$visits, INDEX = births$pnc5_f, FUN = quantile, na.rm=T)

Final Thoughts The best way to prepare for apply()… is to dive into examples! Oftentimes, it takes writing out repetitive/inefficient/error-prone code to realize that an apply() function could have saved the day. Developing intuition for apply() takes practice… which we’ll continue throughout this semester.