Computing for Research I Spring 2013 Primary Instructor: Elizabeth Garrett-Mayer Introduction to R March 5.

Slides:



Advertisements
Similar presentations
Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
Advertisements

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Introduction to Matlab Workshop Matthew Johnson, Economics October 17, /13/20151.
Introduction to MATLAB for Biomedical Engineering BME 1008 Introduction to Biomedical Engineering FIU, Spring 2015 Lesson 2: Element-wise vs. matrix operations.
Jack Davis Andrew Henrey FROM N00B TO PRO. PURPOSE Create a simulator from scratch that: Generates data from a variety of distributions Makes a response.
MATLAB – What is it? Computing environment / programming language Tool for manipulating matrices Many applications, you just need to get some numbers in.
R tutorial g/methods2.2010/R-intro.pdf.
Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Descriptive Statistics In SAS Exploring Your Data.
Introduction to MATLAB Northeastern University: College of Computer and Information Science Co-op Preparation University (CPU) 10/22/2003.
Introduction to R A. Di Bucchianico. Introduction to R2 Types of statistical software command-line software –requires knowledge of syntax of commands.
MATLAB TUTORIAL Dmitry Drutskoy Some material borrowed from the departmental MATLAB info session by Philippe Rigollet Kevin Wayne.
Lecture 2 LISAM. Statistical software.. LISAM What is LISAM? Social network for Creating personal pages Creating courses  Storing course materials (lectures,
Slide 1 Detecting Outliers Outliers are cases that have an atypical score either for a single variable (univariate outliers) or for a combination of variables.
Introduction to R: The Basics Rosales de Veliz L., David S.L., McElhiney D., Price E., & Brooks G. Contributions from Ragan. M., Terzi. F., & Smith. E.
Introduction to SPSS (For SPSS Version 16.0)
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
LISA Short Course Series R Basics Ana Maria Ortega Villa Fall 2013 LISA: R BasicsFall 2013.
 Overview of SPSS  Interface  Getting Started  Managing Data  Descriptive Statistics  Basic Analysis  Additional Resources.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
REVIEW 2 Exam History of Computers 1. CPU stands for _______________________. a. Counter productive units b. Central processing unit c. Copper.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Introduction to R Part 2. Working Directory The working directory is where you are currently saving data in R. What is the current working directory?
Spring /6.831 User Interface Design and Implementation1 Lecture 15: Experiment Analysis.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Niraj J. Pandya, Element Technologies Inc., NJ.  Summarize all possible combinations of class level variables even if few categories are altogether missing.
Introduction to R Lecture 3: Data Manipulation Andrew Jaffe 9/27/10.
Math 3400 Computer Applications of Statistics Lecture 1 Introduction and SAS Overview.
BMTRY 789 Introduction to SAS Programming Lecturer: Annie N. Simpson, MSc.
Data Objects in R Vector1 dimensionAll elements have the same data types Data types: numeric, character logic, factor Matrix2 dimensions Array2 or more.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Matlab Basics Tutorial. Vectors Let's start off by creating something simple, like a vector. Enter each element of the vector (separated by a space) between.
An Introduction to R graphics Cody Chiuzan Division of Biostatistics and Epidemiology Computing for Research I, 2012.
I❤RI❤R Kin Wong (Sam) Game Plan Intro R Import SPSS file Descriptive Statistics Inferential Statistics GraphsQ&A.
Installing R CRAN: –(R homepage: –Windows 95 and later  Base –rw2001.exe.
Then click the box for Normal probability plot. In the box labeled Standardized Residual Plots, first click the checkbox for Histogram, Multiple Linear.
R Programming Yang, Yufei. Normal distribution.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Matlab Basics FIN250f: Lecture 3 Spring 2010 Grifths Web Notes.
Math 15 Lecture 9 University of California, Merced Scilab A Short Introduction – No. 3 Today – Quiz #4.
Laboratory 1. Introduction to SAS u Statistical Analysis System u Package for –data entry –data manipulation –data storage –data analysis –reporting.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Scientific Computing (w1) R Computing Workshops An Introduction to Scientific Computing workshop 1.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Data Science and Big Data Analytics Chap 3: Data Analytics Using R
STAT 534: Statistical Computing Hari Narayanan
Postgraduate Computing Lectures PAW 1 PAW: Physicist Analysis Workstation What is PAW? –A tool to display and manipulate data. Learning PAW –See ref. in.
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
Introduction to plotting data Fish 552: Lecture 4.
Chris Knight Beginners’ workshop.
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
MIS2502: Data Analytics Introduction to Advanced Analytics and R.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Statistical Programming Using the R Language Lecture 2 Basic Concepts II Darren J. Fitzpatrick, Ph.D April 2016.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Programming in R Intro, data and programming structures
DEPARTMENT OF COMPUTER SCIENCE
INTRODUCTION TO BASIC MATLAB
MATLAB DENC 2533 ECADD LAB 9.
MATH 493 Introduction to MATLAB
MIS2502: Data Analytics Introduction to R and RStudio
Introduction to Matlab
Data analysis with R and the tidyverse
R tutorial
Presentation transcript:

Computing for Research I Spring 2013 Primary Instructor: Elizabeth Garrett-Mayer Introduction to R March 5

Check out online resources R-intro.pdf

R. Kabacoff on learning R after SPSS and SAS ( Why R has A Steep Learning Curve : A long answer to a simple question... I have been a hardcore SAS and SPSS programmer for more than 25 years, a Systat programmer for 15 years and a Stata programmer for 2 years. But when I started learning R recently, I found it frustratingly difficult. Why? I think that there are two reasons why R can be challenging to learn quickly. First, while there are many introductory tutorials (covering data types, basic commands, the interface), none alone are comprehensive. In part, this is because much of the advanced functionality of R comes from hundreds of user contributed packages. Hunting for what you want can be time consuming, and it can be hard to get a clear overview of what procedures are available. The second reason is more ephemeral. As users of statistical packages, we tend to run one proscribed procedure for each type of analysis. Think of PROC GLM in SAS. We can carefully set up the run with all the parameters and options that we need. When we run the procedure, the resulting output may be a hundred pages long. We then sift through this output pulling out what we need and discarding the rest. The paradigm in R is different. Rather than setting up a complete analysis at once, the process is highly interactive. You run a command (say fit a model), take the results and process it through another command (say a set of diagnostic plots), take those results and process it through another command (say cross-validation), etc. The cycle may include transforming the data, and looping back through the whole process again. You stop when you feel that you have fully analyzed the data. It may sound trite, but this reminds me of the paradigm shift from top-down procedural programming to object oriented programming we saw a few years ago. It is not an easy mental shift for many of us to make. In that in the end, however, I believe that you will feel much more intimately in touch with your data and in control of your work. And it's fun!

Installing R Choose appropriate interface – windows – Mac – Linux Follow install instructions

R interface batching file: File -> open script run commands: Ctrl-R Save session: sink([filename])….sink() Quit session: q()

General Syntax result <- function(object(s), options…) function(object(s), options…) Object-oriented programming Note that ‘result’ is an object

First things first: help([function]) or ?function help.search(“linear model”) or ??”linear model” help.start()

Choosing your default setwd(“[pathname for directory]”) getwd() need “\\” instead of “\” when giving paths.Rdata.Rhistory

Start with data read.table read.csv scan dget

Extracting variables from data Use $: data$AGE note it is case-sensitive! attach([data]) and detach([data])

Descriptive statistics summary mean, median var quantile range, max, min

Missing values sometimes cause ‘error’ message na.rm=T na.option=na.omit

Data Objects data.frame, as.data.frame, is.data.frame – names([data]) – row.names([data]) matrix, as.matrix, is.matrix – dimnames([data]) factor, as.factor, is.factor – levels([factor]) arrays lists functions vectors scalars

Creating and manipulating combine: c cbind: combine as columns rbind: combine as rows list: make a list rep(x,n): repeat x n times seq(a,b,i): create a sequence between a and b in increments of i seq(a,b, length=k): create a sequence between a and b with length k with equally spaced increments

ifelse ifelse(condition, true, false) – agelt50 <- ifelse(data$AGE<50,1,0) – for equality must use “==“ – “or” is indicated by `|’ e.g., young.or.old 65,1,0) cut(x, breaks) – agegrp <- cut(data$AGE, breaks=c(0,50,60,130)) – agegrp <- cut(data$AGE, breaks=c(0,50,60,130), labels=c(0,1,2)) – agegrp <- cut(data$AGE, breaks=c(0,50,60,130), labels=F)

Looking at objects dim length sort attributes

Subsetting Use [ ] Vectors – data$AGE[data$REGION==1] – data$AGE[data$LOS<10] Matrices & Dataframes – data[data$AGE<50, ] – data[, 2:5] – data[data$AGE<50, 2:5]

Some math abs(x) sqrt(x) x^k log(x) (natural log, by default) choose(n,k)

Matrix Manipulation Matrix multiplication: A%*%B transpose: t(X) diag(X)

Table table(x,y) tabulate(x)

Statistical Tests and CI’s t.test fisher.test and binom.exact wilcox.test

Plots hist boxplot plot – pch, type, lwd – xlab, ylab – xlim, ylim – xaxt, yaxt axis

Plot Layout par(mfrow=c(2,1)) par(mfrow=c(1,1)) par(mfcol=c(2,2)) help(par)

Probability Distributions Normal: – rnorm(N,m,s): generate random normal data – dnorm(x,m,s): density at x for normal with mean m, std dev s – qnorm(p,m,s): quantile associated with cumulative probability of p for normal with mean m, std dev s – pnorm(q,m,s): cumulative probability at quantile q for normal with mean m, std dev s Binomial – rbinom – etc.

Libraries Additional packages that can be loaded (next lecture) Example: epitools library library(help=[libname])

Keeping things tidy ls() and objects() rm() rm(list=ls())