Introduction to R Arin Basu MD MPH DataAnalytics

Slides:



Advertisements
Similar presentations
MATLAB – A Computational Methods By Rohit Khokher Department of Computer Science, Sharda University, Greater Noida, India MATLAB – A Computational Methods.
Advertisements

Introduction to Matlab
Introduction to Matlab Workshop Matthew Johnson, Economics October 17, /13/20151.
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
Latent Growth Curve Modeling In Mplus:
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) School of Social Sciences (SSS) Jawaharlal Nehru University (JNU) New Delhi -
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Introduction to Matlab By: Dr. Maher O. EL-Ghossain.
Linux+ Guide to Linux Certification, Second Edition
Alternative text for elementary statistics –Elementary Concepts –Basic Statistics.
Lecture 2 LISAM. Statistical software.. LISAM What is LISAM? Social network for Creating personal pages Creating courses  Storing course materials (lectures,
EViews. Agenda Introduction EViews files and data Examining the data Estimating equations.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
RESEARCH HUB AT THE UNIVERSITY LIBRARIES PENN STATE UNIVERSITY TOUR OF STATISTICAL PACKAGES.
Introduction to R Statistical Software Anthony (Tony) R. Olsen USEPA ORD NHEERL Western Ecology Division Corvallis, OR (541)
Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University
EPSII 59:006 Spring Topics Using TextPad If Statements Relational Operators Nested If Statements Else and Elseif Clauses Logical Functions For Loops.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
SPSS Presented by Chabalala Chabalala Lebohang Kompi Balone Ndaba.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Introduction to R Part 2. Working Directory The working directory is where you are currently saving data in R. What is the current working directory?
®® Microsoft Windows 7 for Power Users Tutorial 13 Using the Command-Line Environment.
1 Lab of COMP 406 Teaching Assistant: Pei-Yuan Zhou Contact: Lab 1: 12 Sep., 2014 Introduction of Matlab (I)
Linux+ Guide to Linux Certification, Third Edition
Math 15 Lecture 10 University of California, Merced Scilab Programming – No. 1.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Guide to Linux Installation and Administration, 2e1 Chapter 7 The Role of the System Administrator.
Week 3 Exploring Linux Filesystems. Objectives  Understand and navigate the Linux directory structure using relative and absolute pathnames  Describe.
What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation tools. Others include Maple Mathematica MathCad.
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
Introduction to R. Why use R Its FREE!!! And powerful, fairly widely used, lots of online posts about it Uses S -> an object oriented programing language.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
Laboratory 1. Introduction to SAS u Statistical Analysis System u Package for –data entry –data manipulation –data storage –data analysis –reporting.
Introduction to Matlab. Outline:  What is Matlab? Matlab Screen Variables, array, matrix, indexing Operators (Arithmetic, relational, logical ) Display.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Lecture 20: Choosing the Right Tool for the Job. What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation.
Chapter Six Introduction to Shell Script Programming.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Introduction to R Carol Bult The Jackson Laboratory Functional Genomics (BMB550) Spring 2011.
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
INTRODUCTION TO MATLAB Dr. Hugh Blanton ENTC 4347.
Learn R Toolkit D Kelly O'DayExcel & R WorldsMod 2 - Excel & R Worlds: 1 Module 2 Moving Between Excel & R Worlds Do See & HearRead Learning PowerPoint.
Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF.
Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with the Department of Statistics and the Center.
NET 222: COMMUNICATIONS AND NETWORKS FUNDAMENTALS ( NET 222: COMMUNICATIONS AND NETWORKS FUNDAMENTALS (PRACTICAL PART) Tutorial 2 : Matlab - Getting Started.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
Linux+ Guide to Linux Certification, Second Edition Chapter 4 Exploring Linux Filesystems.
Introduction to CADStat. CADStat and R R is a powerful and free statistical package [
Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with the Department of Statistics and the Center.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
1 Introduction to Perl / R 26 – 27 April, 2010 By Fabian Glaser and Michael Shmoish Bioinformatics Knowledge Unit, The Lorry I. Lokey Interdisciplinary.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Outline Matlab tutorial How to start and exit Matlab Matlab basics.
Use of Mathematics using Technology (Maltlab)
Basics of R, Ch Functions Help Managing your Objects
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
R Course 1st Lecture.
A brief introduction to the nutrient tool-kit, getting R Studio to work and checking the data Martyn Kelly
> Introduction to Nelson Rios, Tulane University
Presentation transcript:

Introduction to R Arin Basu MD MPH DataAnalytics

We’ll Cover What is R How to obtain and install R How to read and export data How to do basic statistical analyses Econometric packages in R

What is R Software for Statistical Data Analysis Based on S Programming Environment Interpreted Language Data Storage, Analysis, Graphing Free and Open Source Software

Obtaining R Current Version: R Comprehensive R Archive Network: Binary source codes Windows executables Compiled RPMs for Linux Can be obtained on a CD

Installing R Binary (Windows/Linux): One step process – exe, rpm (Red Hat/Mandrake), apt-get (Debian) Linux, from sources: $ tar –zxvf “filename.tar.gz” $ cd filename $./configure $ make $ make check $ make install

Starting R Windows, Double-click on Desktop Icon Linux, type R at command prompt $ R

Strengths and Weaknesses Strengths – Free and Open Source – Strong User Community – Highly extensible, flexible – Implementation of high end statistical methods – Flexible graphics and intelligent defaults Weakness – Steep learning curve – Slow for large datasets

Basics Highly Functional – Everything done through functions – Strict named arguments – Abbreviations in arguments OK (e.g. T for TRUE) Object Oriented – Everything is an object – “ <- ” is an assignment operator – “X <- 5”: X GETS the value 5

Getting Help in R From Documentation: –?WhatIWantToKnow –help(“WhatIWantToKnow”) –help.search(“WhatIWantToKnow”) –help.start() –getAnywhere(“WhatIWantToKnow”) –example(“WhatIWantToKnow”) Documents: “Introduction to R” Active Mailing List – Archives – Directly Asking Questions on the List

Data Structures Supports virtually any type of data Numbers, characters, logicals (TRUE/ FALSE) Arrays of virtually unlimited sizes Simplest: Vectors and Matrices Lists: Can Contain mixed type variables Data Frame: Rectangular Data Set

Data Structure in R LinearRectangular All Same TypeVECTORSMATRIX* MixedLISTDATA FRAME

Running R Directly in the Windowing System (Console) Using Editors – Notepad, WinEdt, Tinn-R: Windows – Xemacs, ESS (Emacs speaks Statistics) On the Editor: –source(“filename.R”) – Outputs can be diverted by using sink(“filename.Rout”)

R Working Area This is the area where all commands are issued, and non-graphical outputs observed when run interactively

In an R Session… First, read data from other sources Use packages, libraries, and functions Write functions wherever necessary Conduct Statistical Data Analysis Save outputs to files, write tables Save R workspace if necessary (exit prompt)

Specific Tasks To see which directories and data are loaded, type: search() To see which objects are stored, type: ls() To include a dataset in the searchpath for analysis, type: attach(NameOfTheDataset, expression) To detach a dataset from the searchpath after analysis, type: detach(NameOfTheDataset)

Reading data into R R not well suited for data preprocessing Preprocess data elsewhere (SPSS, etc…) Easiest form of data to input: text file Spreadsheet like data: – Small/medium size: use read.table() – Large data: use scan() Read from other systems: – Use the library “foreign”: library(foreign) – Can import from SAS, SPSS, Epi Info – Can export to STATA

Reading Data: summary Directly using a vector e.g.: x <- c(1,2,3…) Using scan and read.table function Using matrix function to read data matrices Using data.frame to read mixed data library(foreign) for data from other programs

Accessing Variables edit( ) Subscripts essential tools – x[1] identifies first element in vector x – y[1,] identifies first row in matrix y – y[,1] identifies first column in matrix y $ sign for lists and data frames – myframe$age gets age variable of myframe – attach(dataframe) -> extract by variable name

Subset Data Using subset function – subset() will subset the dataframe Subscripting from data frames – myframe[,1] gives first column of myframe Specifying a vector – myframe[1:5] gives first 5 rows of data Using logical expressions – myframe[myframe[,1], < 5,] gets all rows of the first column that contain values less than 5

Graphics Plot an object, like: plot(num.vec) – here plots against index numbers Plot sends to graphic devices – can specify which graphic device you want postscript, gif, jpeg, etc… you can turn them on and off, like: dev.off() Two types of plotting – high level: graphs drawn with one call – Low Level: add additional information to existing graph

High Level: generated with plot()

Low Level: Scattergram with Lowess

Programming in R Functions & Operators typically work on entire vectors Expressions surrounded by {} Codes separated by newlines, “;” not necessary You can write your own functions and use them

Statistical Functions in R Descriptive Statistics Statistical Modeling – Regressions: Linear and Logistic – Probit, Tobit Models – Time Series Multivariate Functions Inbuilt Packages, contributed packages

Descriptive Statistics Has functions for all common statistics summary() gives lowest, mean, median, first, third quartiles, highest for numeric variables stem() gives stem-leaf plots table() gives tabulation of categorical variables

Statistical Modeling Over 400 functions – lm, glm, aov, ts Numerous libraries & packages – survival, coxph, tree (recursive trees), nls, … Distinction between factors and regressors – factors: categorical, regressors: continuous – you must specify factors unless they are obvious to R – dummy variables for factors created automatically Use of data.frame makes life easy

How to model Specify your model like this: – y ~ x i +c i, where – y = outcome variable, x i = main explanatory variables, c i = covariates, + = add terms – Operators have special meanings + = add terms, : = interactions, / = nesting, so on… Modeling -- object oriented – each modeling procedure produces objects – classes and functions for each object

Synopsis of Operators nesting onlyno specific%in% limiting interaction depthsexponentiation^ interaction onlysequence: main effect and nestingdivision/ main effect and interactionsmultiplication* add or remove termsadd or subtract+ or - In Formula meansUsually meansOperator

Modeling Example: Regression carReg <- lm(speed~dist, data=cars) carReg = becomes an object to get summary of this regression, we type summary(carReg) to get only coefficients, we type coef(carReg), or carReg$coef don’t want intercept? add 0, so carReg <- lm(speed~0+dist, data=cars)

Multivariate Techniques Several Libraries available – mva, hmisc, glm, – MASS: discriminant analysis and multidim scaling Econometrics packages – dse (multivariate time series, state-space models), ineq: for measuring inequality, poverty estimation, its: for irregular time series, sem: structural equation modeling, and so on… [

Summarizing… Effective data handling and storage large, coherent set of tools for data analysis Good graphical facilities and display – on screen – on paper well-developed, simple, effective programming

For more resources, check out… R home page R discussion group Search Google for R and Statistics

For more information, contact