ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?

Slides:



Advertisements
Similar presentations
Statistical Software Packages: How do I get this into that? Gillian Byrne Memorial University of Newfoundland Atlantic DLI Training - April 23, 2004.
Advertisements

Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
R for Macroecology Aarhus University, Spring 2011.
MATLAB – A Computational Methods By Rohit Khokher Department of Computer Science, Sharda University, Greater Noida, India MATLAB – A Computational Methods.
The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Knowing Understanding the Basics Writing your own code part 2 SAS Lab.
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Stata Introduction Sociology 229A, Class 2 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Good Data Management Practices Patty Glynn 10/31/05
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
Introduction to R. Statistical Software Statistical software – Wide variety of software tools that researchers use to analyze data – Common examples are.
Digital Text and Data Processing Introduction to R.
STAT 3130 Statistical Methods II Missing Data and Imputation.
 Overview of SPSS  Interface  Getting Started  Managing Data  Descriptive Statistics  Basic Analysis  Additional Resources.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Introduction to R Part 2. Working Directory The working directory is where you are currently saving data in R. What is the current working directory?
Analysis of Variance (Two Factors). Two Factor Analysis of Variance Main effect The effect of a single factor when any other factor is ignored. Example.
Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007.
ARGUMENT 'FUN' IS MISSING, WITH NO DEFAULT: AN R WORKSHOP.
Questionnaire Development: SPSS and Reliability Personality Lab October 8, 2010.
Presented by : Sébastien Lauzon (Finance Canada).
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
Contact: Phil Benjamin: Web site for this presentation: eden.rutgers.edu/~pmben Office hour: Wed:
United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Class Opener:. Identifying Matrices Student Check:
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Introduction to R October 10, 2008 Please go to: Find the workshop materials for this course Download them.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
PSC 47410: Data Analysis Workshop  What’s the purpose of this exercise?  The workshop’s research questions:  Who supports war in America?  How consistent.
An Introduction Katherine Nicholas & Liqiong Fan.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
Visit us on the Web at UM Stats Camp Intro to SPSS for Windows Sam Gordji Spring 2009
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Selecting Cases PowerPoint Prepared by Alfred.
Chapter 3-4 More R functions Graphs!. Random note The package DSUR from the Field book is not a thing. ◦ That’s ok! We’ll figure it out.
BMTRY 789 Lecture9: Proc Tabulate Readings – Chapter 11 & Selected SUGI Reading Lab Problems , 11.2 Homework Due Next Week– HW6.
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
The Research Process First, Collect data and make sure that everything is coded properly, things are not missing. Do this for whatever program your using.
Chris Knight Beginners’ workshop.
Basics of R INSTRUCTOR: AMANDA MCGOUGH TUESDAY, MARCH 29, 2016.
Conjoint Analysis. 1. Managers frequently want to know what utility a particular product feature or service feature will have for a consumer. 2. Conjoint.
With the support of the LPP programme of the European Union 1 This project has been funded with support from the European Commission. This publication.
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Understanding SPSS II Workshop Series August 9, 2016.
Introduction to R user-friendly and absolutely free
Categorical Variables in Regression
SPSS: Using statistical software — a primer
Development Environment
Understanding SPSS II Workshop Series July 19, 2017.
Statistics in SPSS Lecture 2
M.Sc. In Financial Analysis
ECONOMETRICS ii – spring 2018
SPSS STATISTICAL PACKAGE FOR SOCIAL SCIENCES
Lab 1 Introductions to R Sean Potter.
Bivariate Testing (Chi Square)
Communication and Coding Theory Lab(CS491)
funCTIONs and Data Import/Export
Data Analysis Module: Chi Square
Understand the importance of data context
Ordinary Least Square estimator using STATA
Presentation transcript:

ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?

Outline The three Ws of R: What, Where, and Why Commonly used operators Formatting your data for R Working with data in R Exporting data from R

What is R? “R is a language and environment for statistical computing and graphics.” ( It’s a programming language first and a statistical analysis tool second Entirely syntax based – Similar to the SAS and SPSS syntax User can download “packages” which are similar to SPSS Modules

Where is R? Available for download at:  Works on PCs, Macs, and Linux OS Doesn’t require a ton of computer memory (I find that it runs smoother than both SAS and SPSS)

Why use R? It’s an open-source project aka FREE! It’s gaining traction in both industry (Google, Facebook, & Kickstarter) and academia It’s combination of programming flexibility and statistical analyses capabilities makes it one of the more powerful data analysis programs out there

Commonly used operators <- :Assignment operator # :Comment operator >, <, ==, | :Boolean operators +, -, *, ^ :Mathematical operators

Formatting your data for R: A Brief Intro R can read SAS, SPSS, STATA, txt files, and csv files I recommend that you store your data in a csv file  R can easily read csv files  Csv files can be imported to and exported from SAS and SPSS  Other statistical programs can easily read csv files I write all of my code in notepad (more habit than anything else), but R has many different GUIs

Formatting your data for R: Three easy steps 1) Turn your data file into a csv file 2) Use the read.csv() function  Dataset <- read.csv(‘Dataset location.csv’) 3) Dataset is now a user-defined object (in this particular case it’s a dataframe in R) that contains all of your data

Formatting your data for R: Common Mistakes (That I’ve made 100 times over) R cannot read \ (the backslash), thus when you write the location of your dataset you have to use either / or \\ R is case sensitive, so ‘C:\\Dataset.csv’ and ‘C:\\dataset.csv’ are not the same in R speak Always make sure you include the file extension (.csv,.txt,.whatever)!

Working with data in R: Things to check I always check the dimensions of my dataset  dim(Dataset) – this will return two numbers: row x column. Rows = number of cases and columns = number of variables Check the names of your dataset  names(Dataset) Check the descriptive statistics for anything out of the ordinary:  colMeans(Dataset[,1:10],na.rm=T)  sapply(Dataset[,1:10],sd,na.rm=T) Notice the brackets?

Working with data in R: Subsetting your dataset First, begin thinking about your dataset as a matrix  Rows = cases and columns = variables Dataset[5,1] means return the observation stored in row 5 column 1 Dataset[,1] means return all of the rows in column 1 Dataset[2,1:5] means return all of the observations in row 2 and columns 1 through 5

Working with data in R: Subsetting your dataset Alternatively, you can reference a column directly by using the $ operator:  Dataset$Var1 will return the entire Var1 column from Dataset What if I want to filter by some variable?  ds.Female <- Dataset[Dataset$Var11 == ‘Female’,]  The above creates a dataframe called ds.Female that filtered out any case where Var11 equaled ‘Male’

Working with data in R: Reverse Coding What do you do if you have some variables that need to be reverse coded?  (1 + highest scale value – Variable) is the general formula  Dataset$Var12 <- 8 – Dataset$Var10 – This does two things. 1) Creates another column in Dataset labeled Var12 and 2) Sets Var12 equal to 8 – Var10  Check with cor(Dataset$Var10, Dataset$Var12, use=‘complete.obs’)

Working with data in R: Internal R Functions mean(Dataset$Var1,na.rm=T) = mean(Dataset[,1],na.rm=T) sd(Dataset$Var4, na.rm=T) min(Dataset$Var5,na.rm=T) and max(Dataset$Var5, na.rm=T) cor(Dataset, use=‘complete.obs’)

Working with data in R: Internal R Functions modlm <- lm(Var2 ~ Var3, data=Dataset)  Ordinary Least Squares Regression, regressing variable 2 onto variable 3 modanova <- lm(Var4 ~ as.factor(Var11), data=Dataset)  OLS Regression, regressing variable 4 onto the categorical gender variable – This is an ANOVA! modanova1 <- aov(Var4 ~ as.factor(Var11), data=Dataset)  aov is R’s built in ANOVA function dif <- TukeyHSD(modanova1)  Tukey’s Honestly Significant Difference Test

Exporting Data from R write.csv(Dataset, ‘Location.csv’) BOOM goes the dynamite

Thank you!