Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action”

Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action” from Zotero and open it

Statistical software: SPSS, Stata, and R SPSSStataR DescriptionCommand driven statistical program Statistical programming environment that also allows interactive use AudienceDesigned for corporate use Designed for researchers/scien tists Designed to be general DocumentationExplains how to use SPSS Explains the analyses Points to original sources AvailabilityInstalled on all Aalto computers? Installed on all TUAS computers Installed on all Aalto computers CostAalto has a site license Student version 35$ Free

My take on the software I use Stata and R I am more productive with Stata in the tasks that it is designed for (And Stata has excellent documentation) R is more flexible and better for data management, and is better for making examples People in the DIEM department use mainly SPSS and Stata Some are moving from SPSS to Stata, but no-one moves the other way Students on my courses tend to slightly prefer R because they can install it (legally) on their home computers and they do just fine with that. But R is not the best choice for everyone. You cannot go wrong with Stata.

Datasets and command files Datasets Observations on rows Variables on columns Stata works with one file at a time R can work with multiple files at a time Manipulated with commands Data files are never edited! Command files A sequence of data manipulation and analysis commands to be applied to the data Stores the logic of your analysis Should contain a lot of comments where you explain the logic

Using the software: Menus vs. Typing commands vs. Command file Menus Good for learning the program Good if you do not remember the command for a particular analysis (Lack of menus is one of the reasons why R has a steeper learning curve) Typing commands This is normally the fastest way to explore the data and experiment with the analyses Command file Should always be used for the analyzes that you want to publish

Run Intro.R

Overview of the user interface

Compile a notebook

Introduction to R

1.Using the software as calculator 2.Accessing and reading the documentation 3.Creating and running projects as analysis files 4.Loading and manipulating datasets (e.g. merging, sorting, filtering) 5.Basic exploratory data analysis including means, correlations, etc 6.Basics of graphics 7.Generating data and running simple simulations 8.Creating loops in analysis files and other very basic automation

Packages Install package “lmtest’ Load package “lmtest” R in action: 1.4 Packages Click here

Using R as calculator Type thisExplanation 100+2/3 Basic math (100+2)/3 You can use round brackets to group operations so that they are carried out first 5*10^2 The symbol * means multiply, and ^ means "to the power", so this gives 5 times (10 squared), i.e. 500 1/0 R knows about infinity (and minus infinity) 0/0 undefined results take the value NaN ("not a number") sqrt(4) Square root function Type into console https://en.wikibooks.org/wiki/Statistical_Analysis:_an_Introduction_using_R/R/R_as_a_calculator

Using the help Try the following regression lm R in action: 1.3.2 Getting help Read the section Try the examples Type here

Help page for lm

Working with datasets

Accessing built-in datasets Load the packages “car” and “psych” List available datasets data() Datasets are accessed by their name mtcars Insepect the dataset describe(mtcars) scatterplotMatrix(mtcars)

Loading CSV files Load a dataset from UCLA website read.csv(“http://www.ats.ucla.edu/stat/data/test.csv”)“http://www.ats.ucla.edu/stat/data/test.csv Store the dataset with name myData <- read.csv(“http://www.ats.ucla.edu/stat/data/test.csv”)“http://www.ats.ucla.edu/stat/data/test.csv Print the dataset myData http://www.ats.ucla.edu/stat/r/modules/raw_data.htm

Loading CSV files from your computer R will load and save files to working directory Download the datasets for Data Analysis Assignment 4 (optional) from MyCourses and unzip the file Set your working directory to the directory where you unzipped the files and load the CSV file read.csv(“Orbis_Export_1.csv”)

Setting up Start a new R Script and copy-paste Listing 4.1 into the file manager <- c(1, 2, 3, 4, 5) date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08", "5/1/09") country <- c("US", "US", "UK", "UK", "UK") gender <- c("M", "F", "F", "M", "F") age <- c(32, 45, 25, 39, 99) q1 <- c(5, 3, 3, 3, 2) q2 <- c(4, 5, 5, 3, 2) q3 <- c(5, 2, 5, 4, 1) q4 <- c(5, 5, 5, NA, 2) q5 <- c(5, 5, 2, NA, 1) leadership <- data.frame(manager, date, country, gender, age, q1, q2, q3, q4, q5, stringsAsFactors=FALSE)

Selecting cases and variables (subsetting) Data.frame has two dimensions: rows, and columns leadership[,] Value on the left side of the comma selects rows, value on the right side selects columns leadership[1,] leadership[,1] leadership[1,1] Selecting with names leadership[,”date”] leadership$date leadership$date[1]

Creating vectors and selecting with vectors Vector is a sequence of numbers or strings 3:5 c(1,2,4) c(“gender”,”age”) Selecting with vector leadership[3:5,] leadership[c(1,2,4), c(“gender”,”age”)]

Comparisons Comparisons return vectors of TRUE and FALSE leadership$age > 40 leadership$age > 40 & leadership$country == “UK” Converting from TRUE and FALSE to indices which(leadership$age > 40)

Selecting cases with comparison leadership[leadership$age > 40,] People often forget the comma here

The subset command

Manipulating data Setting outlier to missing value leadership[leadership$age == 99,] <- NA Locating observations with missing data leadership[is.na(leadership$age),] <- NA Select what to update Assign new value

Creating variables Select a non-existing variable leadership$agecat[leadership$age > 75] <- "Elder" leadership$agecat[leadership$age >= 55 & leadership$age <= 75] <- "Middle Aged" leadership$agecat[leadership$age < 55] <- "Young"

Renaming variables Assign new values to names names(leadership)[1] <- “managerID” A better approach with reshape package leadership <- rename(leadership, c(manager="managerID”, date="testDate") Recreate the leadership dataset after trying these

Sorting datasets Get the order of values order(leadership$age) order(-leadership$age) Sort by selecting with the order leadership[order(leadership$age),] If you want to keep the new order, store the result with the same name leadership <- leadership[order(leadership$age),]

Merging datasets Merge two datasets (add columns) hairColor <- cbind(gender = c(“M”,”F”), hair=c(“Blonde”,”Brunette”)) merge(leadership, hairColor, by=“gender”) Alternative, you can use cbind if you know that the data are in the same order and have same number of rows Append datasets (add rows) rbind(leadership, leadership)

Applying what we just went through Hints: Use scale to standardize Math, Science, and English Use quantile to calculate grade cutoffs

Basics of exploratory data analysis

Statistical functions

Applying functions to data frames ?apply apply(mtcars,2, mean) apply(mtcars,2, sd) apply(mtcars,2, quantile)

More convenient way to get descriptive statistics using the psych package describe(mtcars) describeBy(mtcars, group = mtcars$cyl)

Frequency tables (7.2) table(mtcars$cyl) table(mtcars$gear) table(mtcars$gear, mtcars$cyl) prop.table(table(mtcars$gear, mtcars$cyl)) prop.table(table(mtcars$gear, mtcars$cyl),1) prop.table(table(mtcars$gear, mtcars$cyl),2)

Correlations (7.3) cor(mtcars) lowerCor(mtcars) corr.test(mtcars)

Basics of graphics

Plot example (3.1 Working with Graphs) plot(mtcars$wt, mtcars$mpg) abline(lm(mpg~wt, data = mtcars)) title("Regression of MPG on Weight")

Examples Browse graph examples at: http://shinyapps.org/apps/RGraphCompendium/index.php

Exporting graphics as files pdf(“myGraph.pdf”) plot(mtcars$wt, mtcars$mpg) abline(lm(mpg~wt, data = mtcars)) title("Regression of MPG on Weight”) dev.off()

Kernel density plot plot(density(mtcars$mpg))

Scatter plot matrix scatterplotMatrix(mtcars)

scatterplotMatrix(mtcars[,1:3])

Aggregating and restructuring data

Aggregating data ?aggregate aggregate(mtcars, mtcars$cyl, mean) aggregate(mtcars, list(mtcars$cyl), mean) aggregate(mtcars, list(cyl = mtcars$cyl), mean) aggregate(mtcars, list(cyl = mtcars$cyl, mtcars$gear), mean)

Reshaping data using reshape2 package

Reshape dw <- data.frame( id = 1001:1004, y_1 = 1:4, y_2 = 11:14, x_1 = 1:4, x_2 = 11:14, w = 1:4) library(reshape2) dm <- melt(dw,measure.vars = c("y_1","y_2","x_1","x_2")) ds <- colsplit(dm$variable, pattern="_", names = c("variable", "time")) dm <- cbind(dm[,-3],ds) dl <- dcast(dm,... ~ variable)

Simple simulations

Generating random numbers Throw ten dice sample(1:6,10, replace = TRUE) Generate ten standard normal variables (mean = 0, SD = 1) rnorm(10)

Effects of model misspecification on regression x1 <- rnorm(1000) x2 <- x1 + rnorm(1000) y <- x1 + x2 + rnorm(1000) lm(y ~ x1 + x2) lm(y ~ x1)

Mean of ten dice dice <- sample(1:6,10, replace = TRUE) mean(dice) reps <- replicate(10000,{ dice <- sample(1:6,10, replace = TRUE) mean(dice) }) plot(density(reps))

Loops and other basic automation

Loops and conditions for(counter in 1:10){ if(counter == 5){ print("Five") } else{ print("Not five") }

Conclusion

Getting started 1.Study R in action 2.Search for online examples 3.Ask for help online (e.g. course forum) 1.If you have a problem, it often helps to post your full analysis file or log https://gist.github.com https://gist.github.com 4.Online courses 1.https://www.datacamp.com/ courses/free-introduction- to-r

http://www.ats.ucla.edu/stat/dae/

Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action”

Similar presentations

Presentation on theme: "Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action”"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action”

Similar presentations

Presentation on theme: "Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action”"— Presentation transcript:

Similar presentations

About project

Feedback