Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone

Slides:



Advertisements
Similar presentations
MATLAB – A Computational Methods By Rohit Khokher Department of Computer Science, Sharda University, Greater Noida, India MATLAB – A Computational Methods.
Advertisements

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Jack Davis Andrew Henrey FROM N00B TO PRO. PURPOSE Create a simulator from scratch that: Generates data from a variety of distributions Makes a response.
Applied Econometrics Second edition
Slides 2c: Using Spreadsheets for Modeling - Excel Concepts (Updated 1/19/2005) There are several reasons for the popularity of spreadsheets: –Data are.
Introduction to Graphics in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
Experimental design and analyses of experimental data Lesson 2 Fitting a model to data and estimating its parameters.
12 Multiple Linear Regression CHAPTER OUTLINE
An Introduction to R: Monte Carlo Simulation MWERA 2012 Emily A. Price, MS Marsha Lewis, MPA Dr. Gordon P. Brooks.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
Intro to Programming Algebra-Geometry. Computer Programming? What is programming? The process of writing, testing, and maintaining the source code of.
Multiple regression analysis
Alternative text for elementary statistics –Elementary Concepts –Basic Statistics.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Introduction to MATLAB MECH 300H Spring Starting of MATLAB.
Linear regression models in matrix terms. The regression function in matrix terms.
Adding Automated Functionality to Office Applications.
CE 311 K - Introduction to Computer Methods Daene C. McKinney
5-3 Elimination Using Addition and Subtraction
Computer Programming (TKK-2144) 13/14 Semester 1 Instructor: Rama Oktavian Office Hr.: M.13-15, W Th , F
MATLAB Lecture One Monday 4 July Matlab Melvyn Sim Department of Decision Sciences NUS Business School
Introduction to MATLAB Session 1 Prepared By: Dina El Kholy Ahmed Dalal Statistics Course – Biomedical Department -year 3.
ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Programing Concept Ken Youssefi/Ping HsuIntroduction to Engineering – E10 1 ENGR 10 Introduction to Engineering (Part A)
1 Lab of COMP 406 Teaching Assistant: Pei-Yuan Zhou Contact: Lab 1: 12 Sep., 2014 Introduction of Matlab (I)
Lesson 17 Getting Started with Access Essentials
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Matlab Basics Tutorial. Vectors Let's start off by creating something simple, like a vector. Enter each element of the vector (separated by a space) between.
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Canonical Correlation Psy 524 Andrew Ainsworth. Matrices Summaries and reconfiguration.
Class Opener:. Identifying Matrices Student Check:
4.4 Identify and Inverse Matrices Algebra 2. Learning Target I can find and use inverse matrix.
Distributions, Iteration, Simulation Why R will rock your world (if it hasn’t already)
SIMULINK-Tutorial 1 Class ECES-304 Presented by : Shubham Bhat.
NET 222: COMMUNICATIONS AND NETWORKS FUNDAMENTALS ( NET 222: COMMUNICATIONS AND NETWORKS FUNDAMENTALS (PRACTICAL PART) Tutorial 2 : Matlab - Getting Started.
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
IST 210: PHP LOGIC IST 210: Organization of Data IST210 1.
The Matrix Equation A x = b (9/16/05) Definition. If A is an m  n matrix with columns a 1, a 2,…, a n and x is a vector in R n, then the product of A.
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action”
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Pinellas County Schools
Control Structures Hara URL:
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Programming in R Intro, data and programming structures
L – Modeling and Simulating Social Systems with MATLAB
QM222 Class 8 Section A1 Using categorical data in regression
PHP.
Using Functions
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Programming Languages
CSCI N207 Data Analysis Using Spreadsheet
CSCI N207 Data Analysis Using Spreadsheet
Matlab Basics Tutorial
Matlab Basics.
6.3 Using Elimination to Solve Systems
Presentation transcript:

Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone

What is R? The R statistical programming language is a free open source package based on the S language developed by Bell Labs. The language is very powerful for writing programs. Many statistical functions are already built in. Contributed packages expand the functionality to cutting edge research. Since it is a programming language, generating computer code to complete tasks is required.

Getting Started Where to get R? Go to Downloads: CRAN Set your Mirror: Anyone in the USA is fine. Select Windows 95 or later. Select base. Select R win32.exeR win32.exe  The others are if you are a developer and wish to change the source code.

Getting Started The R GUI?

Getting Started Opening a script. This gives you a script window.

Getting Started Submitting a program: Use button Right mouse click and run selection. Submit Selection

Getting Started Basic assignment and operations. Arithmetic Operations:  +, -, *, /, ^ are the standard arithmetic operators. Matrix Arithmetic.  * is element wise multiplication  %*% is matrix multiplication Assignment  To assign a value to a variable use “<-”

Getting Started How to use help in R?  R has a very good help system built in.  If you know which function you want help with simply use ?_______ with the function in the blank.  Ex: ?hist.  If you don’t know which function to use, then use help.search(“_______”).  Ex: help.search(“histogram”).

Importing Data How do we get data into R? Remember we have no point and click… First make sure your data is in an easy to read format such as CSV (Comma Separated Values). Use code:  D <- read.table(“path”,sep=“,”,header=TRUE)

Working with data. Accessing columns. D has our data in it…. But you can’t see it directly. To select a column use D$column.

Working with data. Subsetting data. Use a logical operator to do this.  ==, >, =, <> are all logical operators.  Note that the “equals” logical operator is two = signs. Example:  D[D$Gender == “M”,]  This will return the rows of D where Gender is “M”.  Remember R is case sensitive!  This code does nothing to the original dataset.  D.M <- D[D$Gender == “M”,] gives a dataset with the appropriate rows.

Creating a Vector To create a vector use the c() function b <- c(3,1,0.3,0.1) This creates the column vector

Random Number Generation Random number generation is important in simulations as well as some model fitting techniques.  Consider: X1 <- rnorm(100,5,2) This generates a vector of 100 normal random variables with mean 5 and standard deviation 2.

Random Number Generation Generate two more vectors: X2 <- rnorm(100,15,3) X3 <- rnorm(100,22,5) This gives us two more vectors of normally distributed values.

Determining the Size of a Vector Use the length function. n1 <- length(X1)  Use this only for vectors. Can produce different results on matricies.

Creating a Vector of Repeated Values Often we want a vector of ones around. Use the rep() function. ones <- rep(1,n1)  This creates a vector of ones of length n1.

Creating a Matrix from Vectors Use the cbind() function.  X <- cbind(ones,X1,X2,X3)  This binds the column vectors together into a matrix.

Create a Regression Relationship Using our randomly generated data create a regression relationship. Use the code: Y <- X%*%b + rnorm(100,0,1)

Estimate a Regression Model Find the normal equations Use the code XtX <- t(X)%*%X XtY <- t(X)%*%Y

Solve the normal equations To estimate the regression parameters solve the normal equations. Use the following code. bhat <- solve(XtX)%*%XtY Check it bhat lm(Y ~ X1 + X2 + X3)

Create a Regression Function Use the function() format reg1 <- function(Y,X){ res <- solve(t(X)%*%X)%*%t(X)%*%Y return(res) } Don’t forget to return the result. Remember the code in braces is the function.

Try the function Use the data already created.  reg1(Y,X)

Add to the function Use the list function to return more than one result. Essentially, you are adding properties to the object reg2. reg2 <- function(Y,X){ coeff <- solve(t(X)%*%X)%*%t(X)%*%Y resid <- Y - X%*%coeff mse <- t(resid)%*%resid/(length(Y)-length(coeff)-1) res <- list(coeff,resid,mse) return(res) }

Try the function Use the data already created.  reg2(Y,X)

Add names to the function properties Use the names function allows you to name the properties. reg3 <- function(Y,X){ coeff <- solve(t(X)%*%X)%*%t(X)%*%Y resid <- Y - X%*%coeff mse <- t(resid)%*%resid/(length(Y)-length(coeff)-1) res <- list(coeff,resid,mse) names(res) <- c('coeff','residuals','mse') return(res) }

Programming Goal: PRESS PRESS will give us the ability to demonstrate basic programming constructs in an application.  Matrix Operations  Creating Functions  Loops  Data subsetting and storage

Programming Goal: PRESS PRESS is the predictive sums of squares of a regression model. It is computed via: where is the predicted value of y i using a model fit with all of the data except observation i.

Loops To construct a for loop use the following structure for(i in 1:n){ Operations… }

PRESS PRESS <- function(Y,X){ n1 <- length(Y) ind1 <- 1:n1 presshold <- rep(0,n1) for(i in 1:n1){ X1 <- X[ind1 != i,] Y1 <- Y[ind1 != i] coef1 <- reg3(Y1,X1)$coeff X2 <- X[ind1==i,] Y2 <- Y[ind1==i,] Yp <- X2%*%coef1 presshold[i] <- (Y2 - Yp)^2 } res <- mean(presshold) return(res) }

Try the function Use the data already created. PRESS(Y,X)

If…then constructs If you are interested in an if… then statement on a vector use the ifelse() function.  ifelse(condition, True action, False action) Example X1 <- runif(15,0,1) X2 <- ifelse(X1<.5,1,0) cbind(X1,X2)  Did it work?

If…then constructs If you are not interested in a vector, then use the if{}else{} construct.

Source Files Source files allows you to store all of your created functions in a single file and have all those functions available to you. To load a self created library use: source(Path) Don’t forget that \ in the path needs to be replaced with \\

Writing to a file To write to a file use the write.table() function. write.table(dataset, path, sep=“,”, header=TRUE) This will produce a comma separated value (csv) file.

Linear Algebra Extras Eigenvalues and eigenvectors use the eigen() function. This gives an object that contains both the eigenvalues and eigenvectors Example: eigen(XtX) $values [1] $vectors [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,]

Summary R is programming environment with many standard programming structures already included. Easy to create functions. No support. Allows users to create a library of functions.

Summary All of the R code and files can be found at: