(Re) introduction to Linux and R Sarah Medland Boulder 2013.

Slides:



Advertisements
Similar presentations
Easily retrieve data from the Baan database
Advertisements

Statistical Software Packages: How do I get this into that? Gillian Byrne Memorial University of Newfoundland Atlantic DLI Training - April 23, 2004.
R for Macroecology Aarhus University, Spring 2011.
User Interface. What is a User Interface  A user interface is a link between the user and the computer. It allows the user and the computer to communicate.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode Final Project Dataset! –“Housekeeping” commands vs. data.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
EViews. Agenda Introduction EViews files and data Examining the data Estimating equations.
RESEARCH HUB AT THE UNIVERSITY LIBRARIES PENN STATE UNIVERSITY TOUR OF STATISTICAL PACKAGES.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
 Overview of SPSS  Interface  Getting Started  Managing Data  Descriptive Statistics  Basic Analysis  Additional Resources.
ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?
Carolina Environmental Program UNC Chapel Hill The Analysis Engine – A New Tool for Model Evaluation, Sensitivity and Uncertainty Analysis, and more… Alison.
An introduction to R: get familiar with R Guangxu Liu Bio7932.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
A very brief introduction to R
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
A very brief introduction to using R & MX - Matthew Keller Some material cribbed from: UCLA Academic Technology Services Technical Report Series (by Patrick.
Introduction to R Part 2. Working Directory The working directory is where you are currently saving data in R. What is the current working directory?
Programming in R Getting data into R. Importing data into R In this session we will learn: Some basic R commands How to enter data directly into R How.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Matlab Basics Tutorial. Vectors Let's start off by creating something simple, like a vector. Enter each element of the vector (separated by a space) between.
I❤RI❤R Kin Wong (Sam) Game Plan Intro R Import SPSS file Descriptive Statistics Inferential Statistics GraphsQ&A.
What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation tools. Others include Maple Mathematica MathCad.
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
Then click the box for Normal probability plot. In the box labeled Standardized Residual Plots, first click the checkbox for Histogram, Multiple Linear.
Developed By Information Technology Services University Of Saskatchewan.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Lecture 20: Choosing the Right Tool for the Job. What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation.
Matlab Introduction  Getting Around Matlab  Matrix Operations  Drawing Graphs  Calculating Statistics  (How to read data)
Introduction to CADStat. CADStat and R R is a powerful and free statistical package [
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
Welcome  Log on using the username and password you received at registration  Copy the folder: F:/sarah/mon-morning To your H drive.
BMTRY 789 Lecture9: Proc Tabulate Readings – Chapter 11 & Selected SUGI Reading Lab Problems , 11.2 Homework Due Next Week– HW6.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
Chris Knight Beginners’ workshop.
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
MIS2502: Data Analytics Introduction to Advanced Analytics and R.
Pinellas County Schools
With the support of the LPP programme of the European Union 1 This project has been funded with support from the European Commission. This publication.
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
A quick guide to other statistical software
Introduction to R user-friendly and absolutely free
Block 1: Introduction to R
(Re) introduction to Linux Sarah Medland Boulder 2017
Lecture 2: Introduction to R
A very brief introduction to R
(Re) introduction to Linux Sarah Medland Boulder 2016
R Programming.
Data Analysis Introduction To Online Education BIT
Lab 1 Introductions to R Sean Potter.
(Re) introduction to Linux Sarah Medland Boulder 2015
Introduction to R.
An introduction to data analysis using R
Use of Mathematics using Technology (Maltlab)
Using AMOS With SPSS Files.
%SUBMIT R A SAS® Macro to Interface SAS and R
CSCI N207 Data Analysis Using Spreadsheet
John D. Cook M. D. Anderson Cancer Center
MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Introduction to Matlab
SEM: Step by Step In AMOS and Mplus.
Presentation transcript:

(Re) introduction to Linux and R Sarah Medland Boulder 2013

Superfast intro to R

What is it? R is an interpreted computer language. – System commands can be called from within R R is used for data manipulation, statistics, and graphics. It is made up of: – operators (+ - <- * %*% …) for calculations on arrays & matrices – large, coherent, integrated collection of functions – facilities for making unlimited types of publication quality graphics – user written functions & sets of functions (packages); 800+ contributed packages so far & growing

Advantages o Fast** and free. o State of the art: Statistical researchers provide their methods as R packages. SPSS and SAS are years behind R! o 2 nd only to MATLAB for graphics. o Active user community o Excellent for simulation, programming, computer intensive analyses, etc. o Forces you to think about your analysis. o Interfaces with database storage software (SQL)

Disadvantages o Not user start - steep learning curve, minimal GUI. o No commercial support; figuring out correct methods or how to use a function on your own can be frustrating. o Easy to make mistakes and not know. o Working with large datasets is limited by RAM!!! o Data prep & cleaning can be messier & more mistake prone in R vs. SPSS or SAS o Hostility on the R listserve

Learning R....

R-help listserve....

Using R this week – Can use R studio if you want R-studio

Setting this up at home Install R first Install R studio Install packages

Start up R via R studio 4 windows: Syntax – can be opened in regular txt file - saved Terminal – output & temporary input - usually unsaved Data manager – details of data sets and variables Plots etc

R sessions are interactive

GETTING STARTED

How to use help in R? R has a help system built in. If you know which function you want help with simply use ?_______ or help(_____) with the function in the blank. ?hist. help(hist) If you don’t know which function to use, then use help.search(“_______”). help.search(“histogram”).

Importing Data First make sure your data is in an easy to read format such as space, tab or CSV Use code: D <- read.table(“ozbmi2.txt”,header=TRUE) D <-read.table(“ozbmi2.txt”,na.strings=“- 99”,header=TRUE) D <- read.table(“ozbmi2.csv”, sep=“,” header=TRUE) D <- read.csv(“ozbmi2.csv”, header=TRUE)

Exporting Data Tab delimited write.table(D, “newdata.txt”,sep=“\t”) To xls library(xlsReadWrite) write.xls(D, “newdata.xls")

Checking data #list the variables in D names(D) # dimensions of D dim(D) # print the first 10 rows of D head(D, n=10) #referring to variables in D #format is Object$variable head(D$age, n=10)

Basic Manipulation #You can make new variables within an existing object D$newage<- D$age*100 #Or overwrite a variable D$age<- D$age*100 #Or recode a variable #D$catage 30, c("older"), c("younger"))

Checking data #Mean and variance mean(D$age, na.rm =TRUE) var(D$age, na.rm =TRUE) #For a number of variables lapply(D, mean, na.rm=TRUE) sapply(D, mean, na.rm=TRUE)

Checking data A bit more info summary(D$age) summary(D$age[which(D$agecat==1)]) What about a categorical variable table(D$agecat) table(D$agecat,D$zyg)

Some basic analysis typing D$ is getting annoying so we can attach the data attach(D) table(agecat,zyg) # detach(D) Correlations anyone? cor(wt1,bmi1, use="complete") cor(ht1,bmi1, use="complete")

regression Multiple Linear Regression fit <- lm(bmi1 ~ age + zyg, data=D) summary(fit) # Other useful functions coefficients(fit) # model coefficients confint(fit, level=0.95) # CIs for model parameters anova(fit) # anova table vcov(fit) # covariance matrix for model parameters

Basic plots Histogram #basic hist(age) #basic hist(age, breaks=12, col=‘red’) # Add labels hist(age, breaks=12, col='red', xlab='age in years',main='Histogram of age‘)

Looking at your data... #Kernal density plot d <- density(age, na.rm = "TRUE") # returns the density data plot(d) # plots the results

Looking at your data... #Kernal density plot by zyg? library(sm) # create value labels zyg.f <- factor(zyg, levels= seq(1,5), labels = c("MZF", "MZM", "DZF", "DZM", "DZOS")) # plot densities sm.density.compare(age, zyg, xlab="Years") title(main="Years by ZYG") # add legend colfill<-c(2:(2+length(levels(zyg.f)))) legend(.8,3, levels(zyg.f), fill=colfill)

Huh what? > library(sm) Error in library(sm) : there is no package called 'sm' > sm.density.compare(age, zyg, xlab="Years") Error: could not find function "sm.density.compare"

Adding a package... install.packages()

Looking at your data... #Kernal density plot by zyg? library(sm) # create value labels zyg.f <- factor(zyg, levels= seq(1,5), labels = c("MZF", "MZM", "DZF", "DZM", "DZOS")) # plot densities sm.density.compare(age, zyg, xlab="Years”) title(main="Years by ZYG") # add legend colfill<-c(2:(2+length(levels(zyg.f)))) legend(.8,3, levels(zyg.f), fill=colfill)

That’s great but how do I save it? # make a png file to hold the plot png("zygdensity.png") # create value labels zyg.f <- factor(zyg, levels= seq(1,5), labels = c("MZF", "MZM", "DZF", "DZM", "DZOS")) # plot densities sm.density.compare(age, zyg, xlab="Years”) title(main="Years by ZYG") # add legend via mouse click colfill<-c(2:(2+length(levels(zyg.f)))) legend(.8,3, levels(zyg.f), fill=colfill) # close the png file to allow viewing dev.off()

Final Words of Warning “Using R is a bit akin to smoking. The beginning is difficult, one may get headaches and even gag the first few times. But in the long run, it becomes pleasurable and even addictive. Yet, deep down, for those willing to be honest, there is something not fully healthy in it.” --Francois Pinard

Vienna