Using R for Statistical Instruction Getting Started By Buddy Bilbrey, Lander University jbilbrey@lander.edu
Advantages of R Open source (free) (2016) KD Nuggets survey stated that R is the top software among professionals Industry Professionals are using because of price and capabilities Extremely powerful (many different contributors) Stable on virtually any platform Performs more than statistics (data mining and other analytic functions) Gives students a skill that is extremely desirable
Disadvantages of R STEEP learning curve Uses typed commands instead of GUI Newer versions sometimes force upgrading the software No warranty or defined help available (must use the web) Solutions are difficult to find due to limited online examples (this gets better every day)
Finding and Installing the Software https://www.r-project.org/ Must choose a mirror site for downloading Must add each package individually Throwback to 90’s command line software
Downloading R
https://cran.r-project.org/mirrors.html All mirror sites have identical copies This prevents one server from becoming overloaded
Mirror Sites List by Country
From inside the R Software – select mirror sites for each new package installed
List of Mirror Sites by country
List of Packages Available for Installing
Some Available GUIs R-Commander (most common) Sciviews-K RKWard PMG Red-R R Analytic Flow
R-Commander (GUI) Must be installed separately as a package Very common commands included Does graphing including 3D graphics Incomplete Some commands need to ran in R on command line Other packages may need to be installed (qcc (quality control), DOE, etc.) Unavailable functions are greyed out automatically NOTE: It is difficult to go from a GUI to typing in commands on command line
Opening R-Commander (case sensitive) Type the following AFTER the Rcmdr package is installed in R > library(Rcmdr)
R-Commander Examples
Importing Data into Rcmdr Can import from clipboard Can import from files Can copy/paste from clipboard .csv files work best for importing entire file Excel files sometimes are difficult
Importing into Rcmdr with Clipboard or file Name the Dataset for later referencing Clipboard or .csv file
Reference Books for use with R instruction R for Business Analytics by A. Ohri (recommended for beginners) Business Analytics for Managers by Wolfgang Jank An Introduction to Statistical Learning (ISLR) – Free .pdf book (advanced Analytics) (www.StatLearning.com ) which takes you to -> (http://www- bcf.usc.edu/~gareth/ISL/ )
ISLR Online Book eBook .pdf download
ISLR Basic Commands 2.3.1 type in single column of data > x = c(1,6,2) # stores the three values in variable x > y = c(1,4,3) # stores the three values in variable y > x # print values stored in x [1] 1 6 2 > length(x) # number of values in x [1] 3
ISLR Basic Commands 2.3.1 (cont.’) > x=matrix (data=c(1,2,3,4) , nrow=2, ncol =2) # create a table of data > x [,1] [,2] [1,] 1 3 [2,] 2 4
ISLR Basic Commands 2.3.1 (cont.’) > sqrt(x) # take the square root and square values [,1] [,2] [1,] 1.00 1.73 [2,] 1.41 2.00 > x^2 [1,] 1 9 [2,] 4 16
Other Common Commands [Create a linear regression model* – store in variable model1] > priceData <- read.csv(file.choose()) # input the data with dialog > priceModel <- lm(Price~Qty, data = priceData) # build the regression model > summary(priceModel) # print the results Data came from “Business Analytics for Managers” by Wolfgang Jank.
Experiences with Teaching Statistics in R Small bite examples (one-sample t-test, two-sample t-test, regression, etc.) Students actually retain the commands well with practice Difficult to go backwards from GUI to command lines