MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap

MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap
Aaron Zhi Cheng

Agenda How to run a R script? Key functions in ICA #7 T-test

How to run a R script? Step 1: Downloading files
Download the R script and additional files (such as .csv data files) and store in the same folder Make sure that the file names are not changed Step 1: Downloading files Open the R script in RStudio Step 2: Opening R script Set the Working Directory by going to the Session menu and select Set Working Directory/To Source File Location. Step 3: Setting Working Directory Run the R script by going to the Code menu and select Run Region/Run All. Step 4: Running R script

Key functions used in ICA #7
Package related install.packages() require() Reading data from a csv file read.csv() For descriptive statistics summary() describe() describeBy() t.test() hist() For generating output sink() pdf()

Package related functions
In lines of the script, we have if (!require("psych")) { install.packages("psych") require("psych") } if (!require("psych")): checks whether the psych package was previously installed in your computer. If already installed, the psych package will be loaded. If the psych package is not yet installed. R will download a package and install it, and then load the package download and install the psych package (once per R installation) load the psych package (once per R session)

Reading data from a csv file
In line 21, we have INPUT_FILENAME <- "NBA14Salaries.csv" In line 40, we have dataSet <- read.csv(INPUT_FILENAME) reads data from a CSV file and creates a data frame called dataSet that store the data table.

Descriptive statistics – summary()
summary() presents summary statistics about a data set, or an individual data field (column). The form of results returned by summary() depends on the data type of the fields. In line 61, we have summary(dataSet$Salary) In the console, you see This statement references the Salary column in the dataSet data frame

Descriptive statistics – summary()
summary() also accepts columns with character values: or an entire data frame:

Descriptive statistics – describe()
describe() is provided by the psych package that presents more summary statistics In line 66, we have describe(dataSet$Salary) In the console, you see

Descriptive statistics – describeBy()
describeBy() is also provided by the psych package But it presents summary statistics by a grouping variable In line 73:

t-test(): are the average salaries significantly different across positions?
In lines 87 & 93: Comparing point guards (PG) to small forwards (SF) We see that the alternative hypothesis (H1) is: true difference in means is not equal to 0. The null hypothesis (H0) is simply the opposite: there is no difference between the means.

t-test(): are the average salaries significantly different across positions?
In lines 87 & 93: If p-value> 0.05, you fail to reject the null hypothesis. Therefore, there is insufficient evidence to conclude that the average salary is different for PG vs SF players. If p-value ≤ 0.05, you reject the null hypothesis. Therefore, the is statistically different for PG vs SF players.

Histogram In lines 24-27: In lines 24-27:
NUM_BREAKS <- 25 HISTLABEL <- "Salary" HIST_BARCOLOR <- "lightgrey" HIST_TITLE <- "Histogram of NBA Player Salary Data 2013/2014" In lines 24-27: hist(dataSet$Salary, breaks=NUM_BREAKS, col=HIST_BARCOLOR, xlab=HISTLABEL)

Save output to a file – sink()
In addition to output displayed in the console, we want our output to go to a file. This will make it easier to read later. In line 46: In line 96: we have sink() again, which stops R from writing anymore output to the text output file sink(OUTPUT_FILENAME, append=FALSE, split=TRUE) This redirects the output to the file OUTPUT_FILENAME. We also instruct R to NOT append (append=FALSE) – it will overwrite the old file each time and to also send the output to the screen (split=TRUE) so we can see it’s doing what it should.

Save output to a file – pdf()
In lines : pdf(OUTPUT_HISTNAME) hist(dataSet$Salary, breaks=NUM_BREAKS, col=HIST_BARCOLOR, xlab=HISTLABEL, main=HIST_TITLE) dev.off() This create a PDF file named OUTPUT_FILENAME. and plot a histogram in the PDF file (using hist()) and close the pdf file (using dev.off())

Final Words: Debugging in R
Anyone who starts programming in R will soon run into issues like this:

Debugging in R Debugging is the art and science of fixing unexpected problems in your code. Some strategies for debugging Figure out where the bug is Try to find typos Are variable names correct? Are file names correct? Did I set the working directory correctly? Search for help (Google, Stackoverflow)

MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap

Similar presentations

Presentation on theme: "MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap

Similar presentations

Presentation on theme: "MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap"— Presentation transcript:

Similar presentations

About project

Feedback