Download presentation
Presentation is loading. Please wait.
Published byRodney Edwards Modified over 9 years ago
1
Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email: elboone@vcu.edu
2
What is R? The R statistical programming language is a free open source package based on the S language developed by Bell Labs. The language is very powerful for writing programs. Many statistical functions are already built in. Contributed packages expand the functionality to cutting edge research. Since it is a programming language, generating computer code to complete tasks is required.
3
Getting Started Where to get R? Go to www.r-project.orgwww.r-project.org Downloads: CRAN Set your Mirror: Anyone in the USA is fine. Select Windows 95 or later. Select base. Select R-2.4.1-win32.exeR-2.4.1-win32.exe The others are if you are a developer and wish to change the source code.
4
Getting Started The R GUI?
5
Getting Started Opening a script. This gives you a script window.
6
Getting Started Submitting a program: Use button Right mouse click and run selection. Submit Selection
7
Getting Started Basic assignment and operations. Arithmetic Operations: +, -, *, /, ^ are the standard arithmetic operators. Matrix Arithmetic. * is element wise multiplication %*% is matrix multiplication Assignment To assign a value to a variable use “<-”
8
Getting Started How to use help in R? R has a very good help system built in. If you know which function you want help with simply use ?_______ with the function in the blank. Ex: ?hist. If you don’t know which function to use, then use help.search(“_______”). Ex: help.search(“histogram”).
9
Importing Data How do we get data into R? Remember we have no point and click… First make sure your data is in an easy to read format such as CSV (Comma Separated Values). Use code: D <- read.table(“path”,sep=“,”,header=TRUE)
10
Working with data. Accessing columns. D has our data in it…. But you can’t see it directly. To select a column use D$column.
11
Working with data. Subsetting data. Use a logical operator to do this. ==, >, =, <> are all logical operators. Note that the “equals” logical operator is two = signs. Example: D[D$Gender == “M”,] This will return the rows of D where Gender is “M”. Remember R is case sensitive! This code does nothing to the original dataset. D.M <- D[D$Gender == “M”,] gives a dataset with the appropriate rows.
12
Source Files Source files allows you to store all of your created functions in a single file and have all those functions available to you. To load a self created library use: source(Path) Don’t forget that \ in the path needs to be replaced with \\
13
Libraries In order to keep R’s memory footprint small, additional functionality is stored in libraries. These libraries can be called through the GUI or scripts. Beware that some contributed packages may conflict with some libraries.
14
Contributed Packages Since R is open source and the developers are well organized, developing and finding contributed packages is easy. Currently there are 964 contributed packages. These range from wavelets, financial mathematics to spatial data analysis.
15
Contributed Packages One popular library is lattice.
16
Contributed Packages You can install contributed packages using the GUI.
17
Contributed Packages You can install the package by selecting it from the list. Note: Installing a package does not make it immediately available for use. You still need to use the library() statement to make the functionality available to you. library(lattice)
18
Help on contributed packages Once a contributed package is loaded you can access the help for the package and a list of functions available in the package by: library(help=“lattice”)
19
The CircStats Package Many times data may come in a circular format. For example the direction of migration or flight of birds from their nest. The data is an angle not a “linear” measurement. The data can only go between 0 and 2
20
The CircStats Package Use the CircStats Package. library(CircStats) Consider the following: data <- runif(50, 0, pi) mean.dir <- circ.mean(data) mean.dir [1] 1.446502
21
The CircStats Package Randomly generate data from a Von Mises distribution data.vm <- rvm(100, 0, 3) Create a plot of it using circ.plot: circ.plot(data.vm, stack=TRUE, bins=150, shrink=1.5)
22
The CircStats Package Regression with circular data: Create some data data1 <- runif(50, 0, 2*pi) data2 <- atan2(0.15*cos(data1) + 0.25*sin(data1), 0.35*sin(data1)) + rvm(50, 0, 5) Run the regression using circ.reg: circ.lm <- circ.reg(data1, data2, order=1) circ.lm (Intercept) -0.01365604 -0.02939188 cos.alpha -0.29872673 0.41344126 sin.alpha 0.78894271 0.72908521
23
The CircStats Package Plot the data plot(data1, data2) Plot the predicted line circ.lm$fitted[circ.lm$fitted>pi] pi] - 2*pi points(data1[order(data1)], circ.lm$fitted[order(data1)], type='l')
24
The norm Contributed Package While the norm package sounds as if it would have something to do with the normal distribution it is in fact a package for dealing with missing data. It implements the Data Augmentation and Multiple Imputation scheme of Schafer (1997). Similar to SAS PROC MI.
25
The norm Contributed Package Load the library. library(norm)
26
The norm Contributed Package Generate some data. X1 <- rnorm(100,6,1) X2 <- rnorm(100,10,3) X3 <- rnorm(100,3,.2) X4 <- rnorm(100,31,2) Y <- 5 +.4*X1-.3*X2+rnorm(100,0,1)
27
The norm Contributed Package Generate some missing data. X1a <- ifelse(runif(100,0,1)<.1,NA,X1) X2a <- ifelse(runif(100,0,1)<.1,NA,X2) Put the data together. YX <- cbind(Y,X1a,X2a,X3,X4)
28
The norm Contributed Package Prep the data and parameters for multiple imputation. #do preliminary manipulations s <- prelim.norm(YX) #find the mle thetahat <- em.norm(s) #set random number generator seed rngseed(1234567)
29
The norm Contributed Package Create a list to store the individual results in. betaout <- vector("list",10) betasterrout <- vector("list",10)
30
The norm Contributed Package Run a multiple imputation loop for(i in 1:10){ ximp <- imp.norm(s,thetahat,YX) beta1 <- lm(ximp[,1]~ximp[,2]+ximp[,3]+ximp[,4]+ximp[,5] )$coefficients betaout[[i]] <- beta1 betasterrout[[i]] <- summary(lm(ximp[,1]~ximp[,2] + ximp[,3] + ximp[,4] + ximp[,5]))$coefficients[,2] }
31
The norm Contributed Package Analyze the results mi.inference(betaout,betasterrout,confidence=0.95)
32
The norm Contributed Package Look at the output (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 6.75624286 0.30502706 -0.32846960 0.05157696 -0.04154060 $std.err (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 2.70312542 0.13431178 0.04240159 0.65908509 0.05596610 $df (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 1318.8371 222.2528 13269.2373 1770.6680 27689.4900 $signif (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 1.256048e-02 2.410251e-02 1.021405e-14 9.376337e-01 4.579447e-01 $r (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 0.09004737 0.25192843 0.02673983 0.07676697 0.01835967
33
The lpSolve Package The lpSolve package allows for the solving of linear and integer programs. library(lpSolve)
34
The lpSolve Package Consider the following linear program:
35
The lpSolve Package Set up the vectors and matrices f.obj <- c(1, 9, 3) f.con <- matrix (c(1, 2, 3, 3, 2, 2), nrow=2, byrow=TRUE) f.dir <- c("<=", "<=") f.rhs <- c(9, 15)
36
The lpSolve Package The lp() function will attempt to solve the linear program. lp ("max", f.obj, f.con, f.dir, f.rhs) Success: the objective function is 40.5
37
The lpSolve Package To obtain the solution grab the solution from the object. lp("max", f.obj, f.con, f.dir, f.rhs)$solution [1] 0.0 4.5 0.0
38
The lpSolve Package Sensitivity analyses can be obtained from the lp() object. The following are objects attached to an lp() object. [1] "direction" "x.count" "objective" "const.count" [5] "constraints""int.count" "int.vec" "objval" [9] "solution" "presolve" "compute.sens" "sens.coef.from" [13] "sens.coef.to" "duals" "duals.from" "duals.to" [17] "status"
39
The lpSolve Package To solve an integer program specify the vector components for which variables need to be integers lp("max", f.obj, f.con, f.dir, f.rhs, int.vec=1:3) Success: the objective function is 37
40
The lpSolve Package To obtain the solution to the integer program use the solution statemet as before: lp("max", f.obj, f.con, f.dir, f.rhs, int.vec=1:3) $solution [1] 1 4 0
41
Summary R is programming environment with many standard programming structures already included. A large number of contributed packages. Many packages allow for use of modern statistical procedures with out having to code them yourself. Requires familiarity with R to actually implement the packages. No support. Allows users to create new packages.
42
Summary All of the R code and files can be found at: www.people.vcu.edu/~elboone2/CSS.htm
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.