Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action”

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

R for Macroecology Aarhus University, Spring 2011.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Using the IEA IDB Analyzer to merge and analyze data.
Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Introduction to SPSS Descriptive Statistics. Introduction to SPSS Statistics Program for the Social Sciences (SPSS) Commonly used statistical software.
Lecture 2 LISAM. Statistical software.. LISAM What is LISAM? Social network for Creating personal pages Creating courses  Storing course materials (lectures,
RESEARCH HUB AT THE UNIVERSITY LIBRARIES PENN STATE UNIVERSITY TOUR OF STATISTICAL PACKAGES.
Introduction to R: The Basics Rosales de Veliz L., David S.L., McElhiney D., Price E., & Brooks G. Contributions from Ragan. M., Terzi. F., & Smith. E.
Sociology 690 SPSS Introduction. Using SPSS The Statistical Package for the Social Sciences (SPSS) started at Stanford University in the late 1960’s.
Introduction to SPSS (For SPSS Version 16.0)
L1: INTRODUCTION Getting started with Stata Angela Ambroz May 2015.
Data Analysis Using SPSS
ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
LINDSEY BREWER CSSCR (CENTER FOR SOCIAL SCIENCE COMPUTATION AND RESEARCH) UNIVERSITY OF WASHINGTON September 17, 2009 Introduction to SPSS (Version 16)
SPSS Presented by Chabalala Chabalala Lebohang Kompi Balone Ndaba.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Math 15 Lecture 7 University of California, Merced Scilab A “Very” Short Introduction.
Introduction to R Lecture 1: Getting Started Andrew Jaffe 8/30/10.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
R Programming Yang, Yufei. Normal distribution.
ITBP 119 Algorithms and Problem Solving Section 2.1 Installing software Section 2.2 First Programs Section 2.3 Variables.
Using PTOManager.co m to create a Student Directory May 4, 2009 L.P.S. VIPS Meeting.
Using Weighted Data Donald Miller Population Research Institute 812 Oswald Tower, December 2008.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
STAT 251 Lab 1. Outline Lab Accounts Introduction to R.
An Introduction to R Statistical Computing AMS 597 Stony Brook University Spring 2009 By Tianyi Zhang.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Comparison of different output options from Stata
Data Science and Big Data Analytics Chap 3: Data Analytics Using R
Sociology 680 SPSS Introduction. Using SPSS The Statistical Package for the Social Sciences (SPSS) started at Stanford University in the late 1960’s.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
Chris Knight Beginners’ workshop.
Before the class starts: 1) login to a computer 2) start Stata 13.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Before the class starts: Login to a computer Read the Data analysis assignment 4 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Basics of R INSTRUCTOR: AMANDA MCGOUGH TUESDAY, MARCH 29, 2016.
MIS2502: Data Analytics Introduction to Advanced Analytics and R.
Introduction to R Chris Free. Introduction to R Free! Superior (if not comparable) to commercial alternatives Available on all platforms Not just for.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Statistical Exploratory Analysis with “EnQuireR” 1.Introduction 2.Installation 3.How to 4.Report.
R Brown-Bag Seminar 2.1 Topic: Introduction to R Presenter: Faith Musili ICRAF-Geoscience Lab.
Lecture 2: Introduction to R
Examples, examples: Outline
Introduction to R Carolina Salge March 29, 2017.
Getting Started with R.
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Getting your data into R
Introduction to R Studio
LINDSEY BREWER CSSCR (CENTER FOR SOCIAL SCIENCE COMPUTATION AND RESEARCH) UNIVERSITY OF WASHINGTON September 17, 2009 Introduction to SPSS (Version 16)
ECONOMETRICS ii – spring 2018
Use of Mathematics using Technology (Maltlab)
MIS2502: Data Analytics Introduction to R and RStudio
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Data analysis with R and the tidyverse
Presentation transcript:

Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action” from Zotero and open it

Statistical software: SPSS, Stata, and R SPSSStataR DescriptionCommand driven statistical program Statistical programming environment that also allows interactive use AudienceDesigned for corporate use Designed for researchers/scien tists Designed to be general DocumentationExplains how to use SPSS Explains the analyses Points to original sources AvailabilityInstalled on all Aalto computers? Installed on all TUAS computers Installed on all Aalto computers CostAalto has a site license Student version 35$ Free

My take on the software I use Stata and R I am more productive with Stata in the tasks that it is designed for (And Stata has excellent documentation) R is more flexible and better for data management, and is better for making examples People in the DIEM department use mainly SPSS and Stata Some are moving from SPSS to Stata, but no-one moves the other way Students on my courses tend to slightly prefer R because they can install it (legally) on their home computers and they do just fine with that. But R is not the best choice for everyone. You cannot go wrong with Stata.

Datasets and command files Datasets Observations on rows Variables on columns Stata works with one file at a time R can work with multiple files at a time Manipulated with commands Data files are never edited! Command files A sequence of data manipulation and analysis commands to be applied to the data Stores the logic of your analysis Should contain a lot of comments where you explain the logic

Using the software: Menus vs. Typing commands vs. Command file Menus Good for learning the program Good if you do not remember the command for a particular analysis (Lack of menus is one of the reasons why R has a steeper learning curve) Typing commands This is normally the fastest way to explore the data and experiment with the analyses Command file Should always be used for the analyzes that you want to publish

Run Intro.R

Overview of the user interface

Compile a notebook

Introduction to R

1.Using the software as calculator 2.Accessing and reading the documentation 3.Creating and running projects as analysis files 4.Loading and manipulating datasets (e.g. merging, sorting, filtering) 5.Basic exploratory data analysis including means, correlations, etc 6.Basics of graphics 7.Generating data and running simple simulations 8.Creating loops in analysis files and other very basic automation

Packages Install package “lmtest’ Load package “lmtest” R in action: 1.4 Packages Click here

Using R as calculator Type thisExplanation 100+2/3 Basic math (100+2)/3 You can use round brackets to group operations so that they are carried out first 5*10^2 The symbol * means multiply, and ^ means "to the power", so this gives 5 times (10 squared), i.e /0 R knows about infinity (and minus infinity) 0/0 undefined results take the value NaN ("not a number") sqrt(4) Square root function Type into console

Using the help Try the following regression lm R in action: Getting help Read the section Try the examples Type here

Help page for lm

Working with datasets

Accessing built-in datasets Load the packages “car” and “psych” List available datasets data() Datasets are accessed by their name mtcars Insepect the dataset describe(mtcars) scatterplotMatrix(mtcars)

Loading CSV files Load a dataset from UCLA website read.csv(“ Store the dataset with name myData <- read.csv(“ Print the dataset myData

Loading CSV files from your computer R will load and save files to working directory Download the datasets for Data Analysis Assignment 4 (optional) from MyCourses and unzip the file Set your working directory to the directory where you unzipped the files and load the CSV file read.csv(“Orbis_Export_1.csv”)

Setting up Start a new R Script and copy-paste Listing 4.1 into the file manager <- c(1, 2, 3, 4, 5) date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08", "5/1/09") country <- c("US", "US", "UK", "UK", "UK") gender <- c("M", "F", "F", "M", "F") age <- c(32, 45, 25, 39, 99) q1 <- c(5, 3, 3, 3, 2) q2 <- c(4, 5, 5, 3, 2) q3 <- c(5, 2, 5, 4, 1) q4 <- c(5, 5, 5, NA, 2) q5 <- c(5, 5, 2, NA, 1) leadership <- data.frame(manager, date, country, gender, age, q1, q2, q3, q4, q5, stringsAsFactors=FALSE)

Selecting cases and variables (subsetting) Data.frame has two dimensions: rows, and columns leadership[,] Value on the left side of the comma selects rows, value on the right side selects columns leadership[1,] leadership[,1] leadership[1,1] Selecting with names leadership[,”date”] leadership$date leadership$date[1]

Creating vectors and selecting with vectors Vector is a sequence of numbers or strings 3:5 c(1,2,4) c(“gender”,”age”) Selecting with vector leadership[3:5,] leadership[c(1,2,4), c(“gender”,”age”)]

Comparisons Comparisons return vectors of TRUE and FALSE leadership$age > 40 leadership$age > 40 & leadership$country == “UK” Converting from TRUE and FALSE to indices which(leadership$age > 40)

Selecting cases with comparison leadership[leadership$age > 40,] People often forget the comma here

The subset command

Manipulating data Setting outlier to missing value leadership[leadership$age == 99,] <- NA Locating observations with missing data leadership[is.na(leadership$age),] <- NA Select what to update Assign new value

Creating variables Select a non-existing variable leadership$agecat[leadership$age > 75] <- "Elder" leadership$agecat[leadership$age >= 55 & leadership$age <= 75] <- "Middle Aged" leadership$agecat[leadership$age < 55] <- "Young"

Renaming variables Assign new values to names names(leadership)[1] <- “managerID” A better approach with reshape package leadership <- rename(leadership, c(manager="managerID”, date="testDate") Recreate the leadership dataset after trying these

Sorting datasets Get the order of values order(leadership$age) order(-leadership$age) Sort by selecting with the order leadership[order(leadership$age),] If you want to keep the new order, store the result with the same name leadership <- leadership[order(leadership$age),]

Merging datasets Merge two datasets (add columns) hairColor <- cbind(gender = c(“M”,”F”), hair=c(“Blonde”,”Brunette”)) merge(leadership, hairColor, by=“gender”) Alternative, you can use cbind if you know that the data are in the same order and have same number of rows Append datasets (add rows) rbind(leadership, leadership)

Applying what we just went through Hints: Use scale to standardize Math, Science, and English Use quantile to calculate grade cutoffs

Basics of exploratory data analysis

Statistical functions

Applying functions to data frames ?apply apply(mtcars,2, mean) apply(mtcars,2, sd) apply(mtcars,2, quantile)

More convenient way to get descriptive statistics using the psych package describe(mtcars) describeBy(mtcars, group = mtcars$cyl)

Frequency tables (7.2) table(mtcars$cyl) table(mtcars$gear) table(mtcars$gear, mtcars$cyl) prop.table(table(mtcars$gear, mtcars$cyl)) prop.table(table(mtcars$gear, mtcars$cyl),1) prop.table(table(mtcars$gear, mtcars$cyl),2)

Correlations (7.3) cor(mtcars) lowerCor(mtcars) corr.test(mtcars)

Basics of graphics

Plot example (3.1 Working with Graphs) plot(mtcars$wt, mtcars$mpg) abline(lm(mpg~wt, data = mtcars)) title("Regression of MPG on Weight")

Examples Browse graph examples at:

Exporting graphics as files pdf(“myGraph.pdf”) plot(mtcars$wt, mtcars$mpg) abline(lm(mpg~wt, data = mtcars)) title("Regression of MPG on Weight”) dev.off()

Kernel density plot plot(density(mtcars$mpg))

Scatter plot matrix scatterplotMatrix(mtcars)

scatterplotMatrix(mtcars[,1:3])

Aggregating and restructuring data

Aggregating data ?aggregate aggregate(mtcars, mtcars$cyl, mean) aggregate(mtcars, list(mtcars$cyl), mean) aggregate(mtcars, list(cyl = mtcars$cyl), mean) aggregate(mtcars, list(cyl = mtcars$cyl, mtcars$gear), mean)

Reshaping data using reshape2 package

Reshape dw <- data.frame( id = 1001:1004, y_1 = 1:4, y_2 = 11:14, x_1 = 1:4, x_2 = 11:14, w = 1:4) library(reshape2) dm <- melt(dw,measure.vars = c("y_1","y_2","x_1","x_2")) ds <- colsplit(dm$variable, pattern="_", names = c("variable", "time")) dm <- cbind(dm[,-3],ds) dl <- dcast(dm,... ~ variable)

Simple simulations

Generating random numbers Throw ten dice sample(1:6,10, replace = TRUE) Generate ten standard normal variables (mean = 0, SD = 1) rnorm(10)

Effects of model misspecification on regression x1 <- rnorm(1000) x2 <- x1 + rnorm(1000) y <- x1 + x2 + rnorm(1000) lm(y ~ x1 + x2) lm(y ~ x1)

Mean of ten dice dice <- sample(1:6,10, replace = TRUE) mean(dice) reps <- replicate(10000,{ dice <- sample(1:6,10, replace = TRUE) mean(dice) }) plot(density(reps))

Loops and other basic automation

Loops and conditions for(counter in 1:10){ if(counter == 5){ print("Five") } else{ print("Not five") }

Conclusion

Getting started 1.Study R in action 2.Search for online examples 3.Ask for help online (e.g. course forum) 1.If you have a problem, it often helps to post your full analysis file or log Online courses 1. courses/free-introduction- to-r