Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

CATHERINE AND ANNIE Python: Part 3. Intro to Loops Do you remember in Alice when you could use a loop to make a character perform an action multiple times?
Data Analysis using SPSS By Dr. Shaik Shaffi Ahamed Ph. D
R for Macroecology Aarhus University, Spring 2011.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
Programming with MATLAB
How to enter data in SPSS
Chapter 8 and 9 Review: Logical Functions and Control Structures Introduction to MATLAB 7 Engineering 161.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Alternative text for elementary statistics –Elementary Concepts –Basic Statistics.
Lecture 2 LISAM. Statistical software.. LISAM What is LISAM? Social network for Creating personal pages Creating courses  Storing course materials (lectures,
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln Data Analysis Using R Week2: Data Structure, Types and Manipulation in R.
Introduction to SPSS (For SPSS Version 16.0)
Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University
INTRO TO PROGRAMMING Chapter 2. M-files While commands can be entered directly to the command window, MATLAB also allows you to put commands in text files.
CIS Computer Programming Logic
 Overview of SPSS  Interface  Getting Started  Managing Data  Descriptive Statistics  Basic Analysis  Additional Resources.
ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
REVIEW 2 Exam History of Computers 1. CPU stands for _______________________. a. Counter productive units b. Central processing unit c. Copper.
Intro to R R is a free version of S-plus R is a free version of S-plus Can be used interactively but script or syntax files are commonly used to record.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Introduction to R Part 2. Working Directory The working directory is where you are currently saving data in R. What is the current working directory?
Introduction to R Lecture 3: Data Manipulation Andrew Jaffe 9/27/10.
Sociological metodology Quantification Petr Soukup.
CPS120: Introduction to Computer Science Decision Making in Programs.
Data Objects in R Vector1 dimensionAll elements have the same data types Data types: numeric, character logic, factor Matrix2 dimensions Array2 or more.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Installing R CRAN: –(R homepage: –Windows 95 and later  Base –rw2001.exe.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Decision Structures, String Comparison, Nested Structures
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
EGR 115 Introduction to Computing for Engineers MATLAB Basics 6: Debugging in MATLAB Monday 15 Sept 2014 EGR 115 Introduction to Computing for Engineers.
INTRODUCTION TO MATLAB DAVID COOPER SUMMER Course Layout SundayMondayTuesdayWednesdayThursdayFridaySaturday 67 Intro 89 Scripts 1011 Work
Digital Image Processing Introduction to MATLAB. Background on MATLAB (Definition) MATLAB is a high-performance language for technical computing. The.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved Student Grades Application Introducing Two-Dimensional Arrays and RadioButton.
R Workshop #2 Basic Data Analysis. What we did last week: Understand the basics of how R works Generated objects (vectors, matrices, etc.) Read in data.
CPS120: Introduction to Computer Science Decision Making in Programs.
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action”
Pinellas County Schools
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
Introduction to R user-friendly and absolutely free
Basic concepts of C++ Presented by Prof. Satyajit De
EEE 161 Applied Electromagnetics
Programming in R Intro, data and programming structures
A basic tutorial for fundamental concepts
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
Stats Lab #1 TA: Kyle Davis
Naomi Altman Department of Statistics (Based on notes by J. Lee)
Introduction to R Studio
Numerical Descriptives in R
Use of Mathematics using Technology (Maltlab)
Statistics 540 Computing in Statistics
Basics of R, Ch Functions Help Managing your Objects
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Have you signed up (or had) your meeting?
Presentation transcript:

Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009

Class Outline I.What is R? II.The R Environment III.Reading in Data IV.Viewing and Manipulating Data V.Data Analysis

What is R? R is frequently thought of as another statistics package, like SPSS, Stata or SAS. While many people use R for statistical analysis, R is actually a full programming environment.

What is R? R is completely command-driven. There are very few menu items, so you must use the R language to do anything. Another important distinction between traditional stats packages and R is that R is object-oriented.

Why Use R? Free! Extremely flexible Many additional packages available Excellent graphics Disadvantages Steep learning curve Difficult data entry

Download R Download R: Available for Linux, MacOS, and Windows

The R Environment A traditional stats program like SPSS or Stata only contains one rectangular dataset at a time. All analysis is done on the current dataset. In contrast, the R environment is like a sandbox. It can contain a large number of different objects.

The R Environment R is also function-driven. The functions act on objects and return objects. Functions themselves are objects, too! function works its black-box magic! Input Arguments (Objects) Output (Objects)

Rectangular Dataset (Excel, SPSS, Stata, SAS) Variable 1Variable 2Variable 3 Case 1 Case 2 Case 3 Case 4 Case 5

R Environment (Object-Oriented) Function 1 Function 2 Results Vector 1 Vector 2 Matrix Data Frame String Numeric Value

Help Function help(function name) help.search(“search term”) Note: R is case-sensitive! Try: help(help), ls()

Help Function Sometimes one help file will contain information for several functions. Usage: Shows syntax for command and required arguments (input) and any default values for arguments. Value: the output object of the function

Setting Up Our Data > library(datasets) > mtcars > ?mtcars > write.csv(mtcars, “C:/temp/cars.csv”)

Creating Objects Assignment operator: = or <- Objects need to be assigned a name, otherwise they get dumped to main window, not saved to the environment. c() is a useful function for creating vectors

Reading in Data read.table(filename,...) > cars = read.csv(C:/temp/cars.csv) I prefer the CSV (comma-separated values) format. Almost every stats program will export to this format.

Viewing Data What does the dataset look like? > str(cars) > colnames(cars) > dim(cars) > nrow(cars) > ncol(cars) You can also assign row/col names with these functions.

Common Mode Types ModePossible Values Logical TRUE or FALSE or NA Integer Whole numbers Numeric Real numbers Character Single character or String (in double quotes)

Common Object Types ObjectModes More than one mode? vector Logical, Char, or Numeric No factor Logical, Char, or Numeric No matrix Logical, Char, or Numeric No data frame Logical, Char, and Numeric Yes

Creating Objects ObjectCreate Function vector c(), vector() factor factor() matrix matrix() data frame data.frame()

Viewing Data: Indexing datasetname[rownum, columnnum] > cars[1,4] displays value at row 1, column 4 > cars[2:5, 6] displays rows 2-5, column 6

Viewing Data: Indexing > cars[, 2] displays all rows, column 2 > cars[4,] displays row 4, all columns

Viewing Data You can also access columns (variables) using the ‘$’ symbol if the data frame has column names: > cars$mpg > cars$wt

Manipulating Data Now we can give that first column (variable) a better name than “X”. > colnames(cars) = c(“name”, colnames(cars)[2:ncol(cars)])

Manipulating Data > str(cars) R has the unfortunate habit of trying to turn vectors of character strings into factors (categorical data). > cars$name = as.character(cars$name)

Manipulating Data: Operators Arithmetic: + - * / ^ Comparison < less than > greater than <= less than or equal to >= greater than or equal to == is equal to != is not equal to Logical ! not & and | or xor() exclusive or

Manipulating Data Viewing subsets of data using column names and operators: > cars[cars$vs == 1,] > cars[cars$cyl >= 6,] > cars$name[cars$hp > 100] > cars$name[cars$wt > 3]

Analyzing Data What do the variables look like? > table(cars$gear) > hist(cars$qsec) > mean(cars$mpg) > sd(cars$mpg) > cor(cars$mpg, cars$wt) > mean(cars$mpg[cars$cyl == 4])

Manipulating Data Transforming variables: > wt.lb = cars$wt * 1000 This creates a new vector called wt.lb of length 32 (our number of cases).

Manipulating Data We can use wt.lb without “adding” it to our dataframe. But if you like the rectangular dataset concept, you can column bind it to the existing dataframe: > cars = cbind(cars, wt.lb)

Data Analysis Hypothesis Testing t.test, prop.test Regression lm(), glm()

Data Analysis: OLS Regression > regr = lm(cars$mpg ~ wt.lb + cars$hp + cars$cyl) The output of the regression is also an object. We’ve named it regr. > summary(regr)

Saving Data You can use write.csv() or write.table() to save your dataset. When you quit R, it will ask if you want to save the workspace. This includes all the objects you have created, but it does not include the code you’ve written. You can also use save.image() to save the workspace. You should always save your code in a *.r file.

Other Useful Functions > ifelse() > is.na() > match() > merge() > apply() > order() > sort()

Other Resources Main R website: UW CSSS Intro to R UW CSDE Intro to R UCLA Statistical Computing

Advanced Topics More on factors Lists (data type) Loops String manipulation Writing your own functions Graphics