Presentation is loading. Please wait.

Presentation is loading. Please wait.

MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap

Similar presentations


Presentation on theme: "MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap"— Presentation transcript:

1 MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap
Aaron Zhi Cheng

2 Agenda How to run a R script? Key functions in ICA #7 T-test

3 How to run a R script? Step 1: Downloading files
Download the R script and additional files (such as .csv data files) and store in the same folder Make sure that the file names are not changed Step 1: Downloading files Open the R script in RStudio Step 2: Opening R script Set the Working Directory by going to the Session menu and select Set Working Directory/To Source File Location. Step 3: Setting Working Directory Run the R script by going to the Code menu and select Run Region/Run All. Step 4: Running R script

4 Key functions used in ICA #7
Package related install.packages() require() Reading data from a csv file read.csv() For descriptive statistics summary() describe() describeBy() t.test() hist() For generating output sink() pdf()

5 Package related functions
In lines of the script, we have if (!require("psych")) { install.packages("psych") require("psych") } if (!require("psych")): checks whether the psych package was previously installed in your computer. If already installed, the psych package will be loaded. If the psych package is not yet installed. R will download a package and install it, and then load the package download and install the psych package (once per R installation) load the psych package (once per R session)

6 Key functions used in ICA #7
Package related install.packages() require() Reading data from a csv file read.csv() For descriptive statistics summary() describe() describeBy() t.test() hist() For generating output sink() pdf()

7 Reading data from a csv file
In line 21, we have INPUT_FILENAME <- "NBA14Salaries.csv" In line 40, we have dataSet <- read.csv(INPUT_FILENAME) reads data from a CSV file and creates a data frame called dataSet that store the data table.

8 Key functions used in ICA #7
Package related install.packages() require() Reading data from a csv file read.csv() For descriptive statistics summary() describe() describeBy() t.test() hist() For generating output sink() pdf()

9 Descriptive statistics – summary()
summary() presents summary statistics about a data set, or an individual data field (column). The form of results returned by summary() depends on the data type of the fields. In line 61, we have summary(dataSet$Salary) In the console, you see This statement references the Salary column in the dataSet data frame

10 Descriptive statistics – summary()
summary() also accepts columns with character values: or an entire data frame:

11 Descriptive statistics – describe()
describe() is provided by the psych package that presents more summary statistics In line 66, we have describe(dataSet$Salary) In the console, you see

12 Descriptive statistics – describeBy()
describeBy() is also provided by the psych package But it presents summary statistics by a grouping variable In line 73:

13 t-test(): are the average salaries significantly different across positions?
In lines 87 & 93: Comparing point guards (PG) to small forwards (SF) We see that the alternative hypothesis (H1) is: true difference in means is not equal to 0. The null hypothesis (H0) is simply the opposite: there is no difference between the means.

14 t-test(): are the average salaries significantly different across positions?
In lines 87 & 93: If p-value> 0.05, you fail to reject the null hypothesis. Therefore, there is insufficient evidence to conclude that the average salary is different for PG vs SF players. If p-value ≤ 0.05, you reject the null hypothesis. Therefore, the is statistically different for PG vs SF players.

15 Histogram In lines 24-27: In lines 24-27:
NUM_BREAKS <- 25 HISTLABEL <- "Salary" HIST_BARCOLOR <- "lightgrey" HIST_TITLE <- "Histogram of NBA Player Salary Data 2013/2014" In lines 24-27: hist(dataSet$Salary, breaks=NUM_BREAKS, col=HIST_BARCOLOR, xlab=HISTLABEL)

16 Key functions used in ICA #7
Package related install.packages() require() Reading data from a csv file read.csv() For descriptive statistics summary() describe() describeBy() t.test() hist() For generating output sink() pdf()

17 Save output to a file – sink()
In addition to output displayed in the console, we want our output to go to a file. This will make it easier to read later. In line 46: In line 96: we have sink() again, which stops R from writing anymore output to the text output file sink(OUTPUT_FILENAME, append=FALSE, split=TRUE) This redirects the output to the file OUTPUT_FILENAME. We also instruct R to NOT append (append=FALSE) – it will overwrite the old file each time and to also send the output to the screen (split=TRUE) so we can see it’s doing what it should.

18 Save output to a file – pdf()
In lines : pdf(OUTPUT_HISTNAME) hist(dataSet$Salary, breaks=NUM_BREAKS, col=HIST_BARCOLOR, xlab=HISTLABEL, main=HIST_TITLE) dev.off() This create a PDF file named OUTPUT_FILENAME. and plot a histogram in the PDF file (using hist()) and close the pdf file (using dev.off())

19 Final Words: Debugging in R
Anyone who starts programming in R will soon run into issues like this:

20 Debugging in R Debugging is the art and science of fixing unexpected problems in your code.  Some strategies for debugging Figure out where the bug is Try to find typos Are variable names correct? Are file names correct? Did I set the working directory correctly? Search for help (Google, Stackoverflow)


Download ppt "MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap"

Similar presentations


Ads by Google