Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:

Slides:



Advertisements
Similar presentations
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Advertisements

Introduction to Matlab
1 EMT 101 – Engineering Programming Dr. Farzad Ismail School of Aerospace Engineering Universiti Sains Malaysia Nibong Tebal Pulau Pinang Week 10.
Introduction to Matlab Workshop Matthew Johnson, Economics October 17, /13/20151.
 2005 Pearson Education, Inc. All rights reserved Introduction.
Introduction to C Programming
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
Multiple regression analysis
Introduction to Matlab By: Dr. Maher O. EL-Ghossain.
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
Review of Matrix Algebra
Guide To UNIX Using Linux Third Edition
Introduction to C Programming
Introduction to Unix (CA263) Introduction to Shell Script Programming By Tariq Ibn Aziz.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
Programming For Nuclear Engineers Lecture 12 MATLAB (3) 1.
Introduction to MATLAB Session 1 Prepared By: Dina El Kholy Ahmed Dalal Statistics Course – Biomedical Department -year 3.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
1 Lab of COMP 406 Teaching Assistant: Pei-Yuan Zhou Contact: Lab 1: 12 Sep., 2014 Introduction of Matlab (I)
Introduction to Java Applications Part II. In this chapter you will learn:  Different data types( Primitive data types).  How to declare variables?
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
Input, Output, and Processing
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Introduction to Matlab Module #2 Page 1 Introduction to Matlab Module #2 – Arrays Topics 1.Numeric arrays (creation, addressing, sizes) 2.Element-by-Element.
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
 Pearson Education, Inc. All rights reserved Introduction to Java Applications.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Chapter 3 MATLAB Fundamentals Introduction to MATLAB Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 1 – Matlab Overview EGR1302. Desktop Command window Current Directory window Command History window Tabs to toggle between Current Directory &
INTRODUCTION TO MATLAB MATLAB is a software package for computation in engineering, science, and applied mathemat-ics. It offers a powerful programming.
Lecture 20: Choosing the Right Tool for the Job. What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Introduction to Matlab  Matlab is a software package for technical computation.  Matlab allows you to solve many numerical problems including - arrays.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Introduction to Matlab
INTRODUCTION TO MATLAB DAVID COOPER SUMMER Course Layout SundayMondayTuesdayWednesdayThursdayFridaySaturday 67 Intro 89 Scripts 1011 Work
INTRODUCTION TO MATLAB Dr. Hugh Blanton ENTC 4347.
Introduction to MATLAB 1.Basic functions 2.Vectors, matrices, and arithmetic 3.Flow Constructs (Loops, If, etc) 4.Create M-files 5.Plotting.
Introduction to Java Applications Part II. In this chapter you will learn:  Different data types( Primitive data types).  How to declare variables?
1 An Introduction to R © 2009 Dan Nettleton. 2 Preliminaries Throughout these slides, red text indicates text that is typed at the R prompt or text that.
1 Faculty Name Prof. A. A. Saati. 2 MATLAB Fundamentals 3 1.Reading home works ( Applied Numerical Methods )  CHAPTER 2: MATLAB Fundamentals (p.24)
SCRIPTS AND FUNCTIONS DAVID COOPER SUMMER Extensions MATLAB has two main extension types.m for functions and scripts and.mat for variable save files.
MATLAB (Matrix Algebra laboratory), distributed by The MathWorks, is a technical computing environment for high performance numeric computation and.
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Development Environment
Topics Designing a Program Input, Processing, and Output
Programming in R Intro, data and programming structures
Matlab Training Session 4: Control, Flow and Functions
Introduction to Scripting
Scripts & Functions Scripts and functions are contained in .m-files
User-Defined Functions
MATLAB DENC 2533 ECADD LAB 9.
Matlab Workshop 9/22/2018.
R Data Manipulation Bootstrapping
StatLab Matlab Workshop
Basics of R, Ch Functions Help Managing your Objects
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Topics Designing a Program Input, Processing, and Output
Topics Designing a Program Input, Processing, and Output
R Course 1st Lecture.
Presentation transcript:

Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site: Find the mirror nearest to you when downloading

Wise to read this first!

Install with PDF Manual included !! Check this box! (All boxes can be checked if you have enough memory space.)

Start  (All )Program  R  R 2.9.0

R comes with no typical menu selection graphical user interface (GUI) All must be entered at command level (or by writing scripts). Entering data Functions: c, matrix, cbind, data.frame, read.table Help on functions available i R GUI from Help  R functions (text) …

Entering data from keyboard Example: We want to enter the vector x = (1, 2) and the matrix To enter something (whatever) we use the assignment operator “ <-” The function c() combines individual values (comma-spaced) to a vector Assigning a vector:

Printing the value on screen: Either enter the variable or use the function print() Note that the output begins with [1]. This is the row number, and in this case x is interpreted as a row vector Listing defined objects (vectors, matrices, data frames): Use the function ls() with no arguments

What if we just use ls ? The source code of the function ls() is printed on screen

Removing objects: Use the function rm() (Enter x again: )

Assigning a matrix: Alternative 1: Use the function matrix() a<-matrix( values,nrow= m,ncol= n ) values is a list of values enclosed in c(), i.e. a row vector or an already defined vector. m is the number of rows and n is the number of columns of the matrix. The number of values must be dividable by both m and n. The values are entered column-wise. The identifiers nrow= and ncol= can be omitted Note the double indexing, first number for row and second number for column

Identifiers skipped If row and column numbers are “erroneously” specified: Note! There is a result, though, but the fourth value is omitted.

Alternative 2: Concatenating (already existing) columns Use the function cbind() …with already existing columns (vectors): Note! The columns will now be indexed by the original column (vector) names

Collecting vectors and matrices with the same number of rows in a data frame Use the function data.frame( object 1, object 2, …, object k ) Matrices need to be protected, otherwise each column of a matrix will be identified as a single object in the data frame. Protection is made with the function I()

Objcets within a data frame can be called upon using the syntax dataframe $ object

Names of objects within a data frame can be called, set or changed by handling the object names()

Reading from an external data file Assume we have our data stored on the file demo.dat in directory D:\undv\732A26 xa.1a Set correct working directory in R: Note! Path must be specified with slashes ( / ) which is Unix-language and not backslashes ( \ ) which is DOS-language. To see which is the current working directory:

To read from the file, use the function read.table( filename,header= logical_value,sep= separator ) filename is the name of the file enclosed with double quotes ( ” ” ). It can be specified with the whole path if it is not in the current working directory logical_value is set to TRUE if the columns in the file have headers, otherwise it should be set to FALSE (it is set automatically if omitted, but the result may be “unexpected”) separator is set to the separator sign for the columns in the file, (default is ” ” for blank-separated columns)

Note! read.table treats every column of the file as an individual column, i.e. it cannot be used to read a matrix directly into the workspace The columns of a stored matrix must be recombined to create the matrix

The matrix can be added to the data frame by using cbind()

Writing to an external file The function write.table( dataframe, filename,append= logical_value,sep= logical_value, quote= logical_value,row.names= logical_value,col.names= logical_value ) can be used for different formats of the output dataframe is the name of the data frame to be written on file filename is the name of the file to write to logical_value is either TRUE or FALSE If append=FALSE (default) a file will be created and any existing file with that name will be destroyed. If append=TRUE the data frame will be added (vertical concatenation) to an existing file.

Examples: Exploring demo1.dat with Notepad (“Anteckningar” in Swedish) Row numbers!

Nothing in output will be quoted

Tab-separated, but the first header do not correspond vertically with the first column. The first column of the file is the row number.

( append=FALSE is default and can therefore be omitted for new file creation) Row numbers have now been removed and headers correspond vertically with the columns.

Note! Multiple lines can be used for a command input. A carriage return before the command is completed opens a new line with the prompt “+” Column names (headers) have been removed.

Calculation The ordinary arithmetic operators “ + ”, “ – ”, “ * ” and “ / ” work element-wise

For matrix multiplication use “ %*% ”

Matrix operators/functions: transpose b=t(a) b = a T inverse b=solve(a) b = a -1 (when needed) QR-factorization qr=qr(a) Additional arguments possible qr.Q(qr)  Q qr.R(qr)  R x=qr.solve(A,b)  Solves A·x = b

Solving a linear system of equations, regression estimation

Regression model

Alternatively: “reg” becomes an object as output from qr This object has a number of members ( coef, res, fitted )

A more comprehensive regression analysis is done with the function lm() (linear model) Use help(”lm”) to learn more about this function

Putting it together in a script Gather command rows in a text file..Give it extension “.r” Call the script file with command source a<-matrix(c(2,1,1,-1),2,2) b<-c(1,2) x=qr.solve(a,b) print(x) Store in d:\undv\732A26\macro.r “#” precedes a comment

When exiting R Workspace can be saved for future sessions: save.image(” core.RData”) saves the workspace into file core.RData where core is replaced by a suitable filename base. To restore a saved workspace: load(” core.RData”) To exit from R type q()

More programming Regular sequences: Note! ”<-” can be reversed and most often ”<-” can be replaced by ”=”

Repeating patterns Note! Identifier needs to be specified ( times or each )

Looping and conditioning

Conditions must be within parentheses. Normally: Put “else” directly after “}”

Equality condition must be given with operator ” == ” Multiple statements following a for, if, else or while must be separated by semicolon ( ; ) runif(1) gives a random U(0,1) number General usage: runif( n, a, b ) n is the number of values, default: a=0, b=1

A more complex example: Simulating regression data Script: x1=c(2,3,5,6,9,10,10,12,13,15) # First x-variable x2=c(1,0,0,1,0,1,1,0,1,1) # Second x-variable y<-as.numeric(1:10) # Dimensioning y for (i in (1:10)) { # Computing y using beta1=1.1 and beta2=-4.7 # Random error is N(0,2) y[i]=12+1.1*x1[i]-4.7*x2[i]+rnorm(1,0,2) } Plot(x1,y)# generates a scatter plot y vs. x1 # Estimating the coefficients: x=cbind(rep(1,each=10),x1,x2) b=qr.solve(x,y) print(b) Store in file regress.r

Suppose we would like to get empirically derived confidence limits for  1, i.e. not using the normal distribution. beta1<-as.numeric(1:500) # Dimensioning array of b1-values x1=c(2,3,5,6,9,10,10,12,13,15) # First x-variable x2=c(1,0,0,1,0,1,1,0,1,1) # Second x-variable y<-as.numeric(1:10) # Dimensioning y for (trial in 1:500) { for (i in (1:10)) { # Computing y using beta1=1.1 and beta2=-4.7 # Random error is N(0,2) y[i]=12+1.1*x1[i]-4.7*x2[i]+rnorm(1,0,2) } # Estimating the coefficients: x=cbind(rep(1,each=10),x1,x2) b=qr.solve(x,y) # Storing b1 in array beta1[trial]=b[2] } Store in file regress2.r

Bootstrapping the estimated 90th percentile of a sample Assume we wish to assess the 90th percentile of a sample from a Poisson distribution. This means that we wish to assess the properties opf the sample percentile as an estimator of the population percentile in terms of bias 95% confidence Simulate a sample of 40 observations from a Po(7)-distribution, show an initial histogram of the sample values. Draw 500 pseudo-samples with replacement from the original sample In each pseudo-sample, compute the sample percentile Collect the pseudo-sample percentiles, translate them by subtracting the original sample percentile and estimate bias and 95% percentile confidence limits

Formulae for 90th sample percentile: Let x (1), …, x (n) depict the sample aranged in ascending order, i.e. x (1) is the smallest value and x (n) is the largest value Calculate i = 0.90·n If i is non-integer, let the 90th percentile be x (I + 1) If i is an integer, let the 90th percentile be (x (i) + x (I + 1) )/2 This construction ensures that at most 90% of the sample values are ≤ 90th percentile at most 10% of the sample values are ≥ 90th percentile

# R-script for illustrating bootstrapping of the 90th sample percentile n=40 # Sample size b=500 # Number of bootstrap replications pvec<-as.numeric(1:b)# Dimensioning vector of bootstrapped estimates x=rpois(n,7) # Generate 40 independent Po(7)- observations hist(x,main="Histogram from sample data",xlab=NULL) xsort=sort(x) # Sort the data p90index=0.90*n # Calculate decimal order for 90th percentile if (p90index-floor(p90index)>0) { p90=xsort[floor(p90index)+1]} else { # 90th perc. if decimal order is non-integer p90=(xsort[floor(p90index)]+xsort[floor(p90index)+1])/2} # 90th perc. if decimal order is integer

# Bootstrapping loop for (i in 1:b) { u=floor(40*runif(40,0,1)+1);# Vector of integers uniformly on {1,2,...,40} xstar=x[u];# Pseudo sample xstarsort=sort(xstar); if (p90index-floor(p90index)>0) { p90star=xstarsort[floor(p90index)+1]} else { # Copying estimation method p90star=(xstarsort[p90index]+xstarsort[p90index+1])/2} pvec[i]=p90star; } pvec_sort=sort(pvec)# Sorting the bootstrapped estimates pvec_trans=pvec_sort-p90# Subtracting original estimate from sorted bootstr. est.

# histogram of translated bootstrap estimates readline("Press to show next graph") hist(pvec_trans,main="Histogram of p90star-p90",xlab=NULL) # Finding 2.5th and 97.5th percentiles: L025index=0.025*b U975index=0.975*b if (L025index-floor(L025index)>0) { L025=pvec_trans[floor(L025index)+1] } else { L025=(pvec_trans[L025index]+pvec_trans[L025index+1])/2 } if (U975index-floor(U975index)>0) { U975=pvec_trans[floor(U975index)+1] } else { U975=(pvec_trans[U975index]+pvec_trans[U975index+1])/2 }

# Bias estimate: bias=mean(pvec_trans) # 95% percentile confidence interval: lower=p90-U975 upper=p90-L025 output<-data.frame(p90,bias,lower,upper) names(output)<-c("90th perc.","Bias","Lower 95% limit","Upper 95% limit") print(output)

Huge more to find out! Use the PDF manual (read at least the first chapter) Use the help function ( help(” function ” ) or ? function Use Google (search for “R: what you are looking for”)