Download presentation
Presentation is loading. Please wait.
Published byRegina Anthony Modified over 8 years ago
1
Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action” from Zotero and open it
2
Statistical software: SPSS, Stata, and R SPSSStataR DescriptionCommand driven statistical program Statistical programming environment that also allows interactive use AudienceDesigned for corporate use Designed for researchers/scien tists Designed to be general DocumentationExplains how to use SPSS Explains the analyses Points to original sources AvailabilityInstalled on all Aalto computers? Installed on all TUAS computers Installed on all Aalto computers CostAalto has a site license Student version 35$ Free
3
My take on the software I use Stata and R I am more productive with Stata in the tasks that it is designed for (And Stata has excellent documentation) R is more flexible and better for data management, and is better for making examples People in the DIEM department use mainly SPSS and Stata Some are moving from SPSS to Stata, but no-one moves the other way Students on my courses tend to slightly prefer R because they can install it (legally) on their home computers and they do just fine with that. But R is not the best choice for everyone. You cannot go wrong with Stata.
4
Datasets and command files Datasets Observations on rows Variables on columns Stata works with one file at a time R can work with multiple files at a time Manipulated with commands Data files are never edited! Command files A sequence of data manipulation and analysis commands to be applied to the data Stores the logic of your analysis Should contain a lot of comments where you explain the logic
5
Using the software: Menus vs. Typing commands vs. Command file Menus Good for learning the program Good if you do not remember the command for a particular analysis (Lack of menus is one of the reasons why R has a steeper learning curve) Typing commands This is normally the fastest way to explore the data and experiment with the analyses Command file Should always be used for the analyzes that you want to publish
6
Run Intro.R
7
Overview of the user interface
8
Compile a notebook
9
Introduction to R
10
1.Using the software as calculator 2.Accessing and reading the documentation 3.Creating and running projects as analysis files 4.Loading and manipulating datasets (e.g. merging, sorting, filtering) 5.Basic exploratory data analysis including means, correlations, etc 6.Basics of graphics 7.Generating data and running simple simulations 8.Creating loops in analysis files and other very basic automation
11
Packages Install package “lmtest’ Load package “lmtest” R in action: 1.4 Packages Click here
12
Using R as calculator Type thisExplanation 100+2/3 Basic math (100+2)/3 You can use round brackets to group operations so that they are carried out first 5*10^2 The symbol * means multiply, and ^ means "to the power", so this gives 5 times (10 squared), i.e. 500 1/0 R knows about infinity (and minus infinity) 0/0 undefined results take the value NaN ("not a number") sqrt(4) Square root function Type into console https://en.wikibooks.org/wiki/Statistical_Analysis:_an_Introduction_using_R/R/R_as_a_calculator
13
Using the help Try the following regression lm R in action: 1.3.2 Getting help Read the section Try the examples Type here
14
Help page for lm
19
Working with datasets
21
Accessing built-in datasets Load the packages “car” and “psych” List available datasets data() Datasets are accessed by their name mtcars Insepect the dataset describe(mtcars) scatterplotMatrix(mtcars)
22
Loading CSV files Load a dataset from UCLA website read.csv(“http://www.ats.ucla.edu/stat/data/test.csv”)“http://www.ats.ucla.edu/stat/data/test.csv Store the dataset with name myData <- read.csv(“http://www.ats.ucla.edu/stat/data/test.csv”)“http://www.ats.ucla.edu/stat/data/test.csv Print the dataset myData http://www.ats.ucla.edu/stat/r/modules/raw_data.htm
23
Loading CSV files from your computer R will load and save files to working directory Download the datasets for Data Analysis Assignment 4 (optional) from MyCourses and unzip the file Set your working directory to the directory where you unzipped the files and load the CSV file read.csv(“Orbis_Export_1.csv”)
25
Setting up Start a new R Script and copy-paste Listing 4.1 into the file manager <- c(1, 2, 3, 4, 5) date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08", "5/1/09") country <- c("US", "US", "UK", "UK", "UK") gender <- c("M", "F", "F", "M", "F") age <- c(32, 45, 25, 39, 99) q1 <- c(5, 3, 3, 3, 2) q2 <- c(4, 5, 5, 3, 2) q3 <- c(5, 2, 5, 4, 1) q4 <- c(5, 5, 5, NA, 2) q5 <- c(5, 5, 2, NA, 1) leadership <- data.frame(manager, date, country, gender, age, q1, q2, q3, q4, q5, stringsAsFactors=FALSE)
26
Selecting cases and variables (subsetting) Data.frame has two dimensions: rows, and columns leadership[,] Value on the left side of the comma selects rows, value on the right side selects columns leadership[1,] leadership[,1] leadership[1,1] Selecting with names leadership[,”date”] leadership$date leadership$date[1]
27
Creating vectors and selecting with vectors Vector is a sequence of numbers or strings 3:5 c(1,2,4) c(“gender”,”age”) Selecting with vector leadership[3:5,] leadership[c(1,2,4), c(“gender”,”age”)]
28
Comparisons Comparisons return vectors of TRUE and FALSE leadership$age > 40 leadership$age > 40 & leadership$country == “UK” Converting from TRUE and FALSE to indices which(leadership$age > 40)
29
Selecting cases with comparison leadership[leadership$age > 40,] People often forget the comma here
30
The subset command
31
Manipulating data Setting outlier to missing value leadership[leadership$age == 99,] <- NA Locating observations with missing data leadership[is.na(leadership$age),] <- NA Select what to update Assign new value
32
Creating variables Select a non-existing variable leadership$agecat[leadership$age > 75] <- "Elder" leadership$agecat[leadership$age >= 55 & leadership$age <= 75] <- "Middle Aged" leadership$agecat[leadership$age < 55] <- "Young"
33
Renaming variables Assign new values to names names(leadership)[1] <- “managerID” A better approach with reshape package leadership <- rename(leadership, c(manager="managerID”, date="testDate") Recreate the leadership dataset after trying these
34
Sorting datasets Get the order of values order(leadership$age) order(-leadership$age) Sort by selecting with the order leadership[order(leadership$age),] If you want to keep the new order, store the result with the same name leadership <- leadership[order(leadership$age),]
35
Merging datasets Merge two datasets (add columns) hairColor <- cbind(gender = c(“M”,”F”), hair=c(“Blonde”,”Brunette”)) merge(leadership, hairColor, by=“gender”) Alternative, you can use cbind if you know that the data are in the same order and have same number of rows Append datasets (add rows) rbind(leadership, leadership)
36
Applying what we just went through Hints: Use scale to standardize Math, Science, and English Use quantile to calculate grade cutoffs
37
Basics of exploratory data analysis
39
Statistical functions
40
Applying functions to data frames ?apply apply(mtcars,2, mean) apply(mtcars,2, sd) apply(mtcars,2, quantile)
41
More convenient way to get descriptive statistics using the psych package describe(mtcars) describeBy(mtcars, group = mtcars$cyl)
42
Frequency tables (7.2) table(mtcars$cyl) table(mtcars$gear) table(mtcars$gear, mtcars$cyl) prop.table(table(mtcars$gear, mtcars$cyl)) prop.table(table(mtcars$gear, mtcars$cyl),1) prop.table(table(mtcars$gear, mtcars$cyl),2)
43
Correlations (7.3) cor(mtcars) lowerCor(mtcars) corr.test(mtcars)
44
Basics of graphics
45
Plot example (3.1 Working with Graphs) plot(mtcars$wt, mtcars$mpg) abline(lm(mpg~wt, data = mtcars)) title("Regression of MPG on Weight")
46
Examples Browse graph examples at: http://shinyapps.org/apps/RGraphCompendium/index.php
47
Exporting graphics as files pdf(“myGraph.pdf”) plot(mtcars$wt, mtcars$mpg) abline(lm(mpg~wt, data = mtcars)) title("Regression of MPG on Weight”) dev.off()
48
Kernel density plot plot(density(mtcars$mpg))
49
Scatter plot matrix scatterplotMatrix(mtcars)
50
scatterplotMatrix(mtcars[,1:3])
51
Aggregating and restructuring data
52
Aggregating data ?aggregate aggregate(mtcars, mtcars$cyl, mean) aggregate(mtcars, list(mtcars$cyl), mean) aggregate(mtcars, list(cyl = mtcars$cyl), mean) aggregate(mtcars, list(cyl = mtcars$cyl, mtcars$gear), mean)
53
Reshaping data using reshape2 package
54
Reshape dw <- data.frame( id = 1001:1004, y_1 = 1:4, y_2 = 11:14, x_1 = 1:4, x_2 = 11:14, w = 1:4) library(reshape2) dm <- melt(dw,measure.vars = c("y_1","y_2","x_1","x_2")) ds <- colsplit(dm$variable, pattern="_", names = c("variable", "time")) dm <- cbind(dm[,-3],ds) dl <- dcast(dm,... ~ variable)
55
Simple simulations
56
Generating random numbers Throw ten dice sample(1:6,10, replace = TRUE) Generate ten standard normal variables (mean = 0, SD = 1) rnorm(10)
57
Effects of model misspecification on regression x1 <- rnorm(1000) x2 <- x1 + rnorm(1000) y <- x1 + x2 + rnorm(1000) lm(y ~ x1 + x2) lm(y ~ x1)
58
Mean of ten dice dice <- sample(1:6,10, replace = TRUE) mean(dice) reps <- replicate(10000,{ dice <- sample(1:6,10, replace = TRUE) mean(dice) }) plot(density(reps))
59
Loops and other basic automation
60
Loops and conditions for(counter in 1:10){ if(counter == 5){ print("Five") } else{ print("Not five") }
61
Conclusion
62
Getting started 1.Study R in action 2.Search for online examples 3.Ask for help online (e.g. course forum) 1.If you have a problem, it often helps to post your full analysis file or log https://gist.github.com https://gist.github.com 4.Online courses 1.https://www.datacamp.com/ courses/free-introduction- to-r
63
http://www.ats.ucla.edu/stat/dae/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.