MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap

Slides:



Advertisements
Similar presentations
A gentle introduction to R – how to load in data and produce summary statistics BRC MH Bioinformatics group.
Advertisements

Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Introduction to Eclipse. Start Eclipse Click and then click Eclipse from the menu: Or open a shell and type eclipse after the prompt.
Training Manual HOW TO LOAD A DELIMITED FILE IN X88S PRODUCT PANDORA.
Debugging Introduction to Computing Science and Programming I.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Bug Session Two. Session description In this session the use of algorithms is reinforced to help pupils plan out what they will need to program on their.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Mann-Whitney U Test PowerPoint Prepared by Alfred.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
Introduction to Statistical Inference Probability & Statistics April 2014.
Introduction to R Part 2. Working Directory The working directory is where you are currently saving data in R. What is the current working directory?
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Outline Comparison of Excel and R R Coding Example – RStudio Environment – Getting Help – Enter Data – Calculate Mean – Basic Plots – Save a Coding Script.
An Introduction to Programming with C++ Sixth Edition Chapter 14 Sequential Access Files.
MapInfo Professional 11.0: getting started Xiaogang (Marshall) Ma School of Science Rensselaer Polytechnic Institute Friday, January 25, 2013 GIS in the.
How do I export the Address Book to Excel? The first step is to go to "Address Book Report" under Admin Only menu Choose the fields you want. note that.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Files Tutor: You will need ….
Instructional Support/6, 7, 8/iSquad Session 1 of 2 Get Going with File Management for MACs.
Introduction to Programming on MATLAB Ecological Modeling Course Sep 11th, 2006.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
JavaScript Errors and Debugging Web Design Sec 6-3 Part or all of this lesson was adapted from the University of Washington’s “Web Design & Development.
MIS2502: Data Analytics Introduction to Advanced Analytics and R.
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Customer Balancing At End Of Month Click on Before running end of month debtors all allocations entered must be completed then run a trial balance and.
Block 1: Introduction to R
Chapter 14: Sequential Access Files
Appendix A Barb Ericson Georgia Institute of Technology May 2006
Development Environment
Release Numbers MATLAB is updated regularly
Lecture 2: Introduction to R
SQL and SQL*Plus Interaction
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Storing Images Connect to the server using the correct username and password. $conn = mysql_connect(“yourserver”, “joeuser”, “yourpass”); Create the database.
Chapter 19 PHP Part III Credits: Parts of the slides are based on slides created by textbook authors, P.J. Deitel and H. M. Deitel by Prentice Hall ©
Assumption of normality
DEPARTMENT OF COMPUTER SCIENCE
Social Science Research Design and Statistics, 2/e Alfred P
Data File Import / Export
User Defined Functions
ECONOMETRICS ii – spring 2018
Macrosystems EDDIE: Getting Started + Troubleshooting Tips
Lab 1 Introductions to R Sean Potter.
Lab 2 Data Manipulation and Descriptive Stats in R
Topics Introduction to File Input and Output
File IO and Strings CIS 40 – Introduction to Programming in Python
Crash course in R – short introduction
File Handling.
Code is on the Website Outline Comparison of Excel and R
Hypothesis Testing.
This is where R scripts will load
MIS2502: Data Analytics Introduction to Advanced Analytics and R
Macrosystems EDDIE: Getting Started + Troubleshooting Tips
Spreadsheets, Modelling & Databases
MIS2502: Data Analytics Introduction to R and RStudio
CSCI N207 Data Analysis Using Spreadsheet
This presentation document has been prepared by Vault Intelligence Limited (“Vault") and is intended for off line demonstration, presentation and educational.
This is where R scripts will load
Data analysis with R and the tidyverse
MIS2502: Data Analytics Advanced Analytics Using R
Topics Introduction to File Input and Output
MIS2502: Data Analytics Introduction to Advanced Analytics and R
This presentation document has been prepared by Vault Intelligence Limited (“Vault") and is intended for off line demonstration, presentation and educational.
MASH R workshop 2:.
A brief introduction to the nutrient tool-kit, getting R Studio to work and checking the data Martyn Kelly
Workshop for Programming And Systems Management Teachers
Presentation transcript:

MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap Aaron Zhi Cheng http://community.mis.temple.edu/zcheng/ acheng@temple.edu

Agenda How to run a R script? Key functions in ICA #7 T-test

How to run a R script? Step 1: Downloading files Download the R script and additional files (such as .csv data files) and store in the same folder Make sure that the file names are not changed Step 1: Downloading files Open the R script in RStudio Step 2: Opening R script Set the Working Directory by going to the Session menu and select Set Working Directory/To Source File Location. Step 3: Setting Working Directory Run the R script by going to the Code menu and select Run Region/Run All. Step 4: Running R script

Key functions used in ICA #7 Package related install.packages() require() Reading data from a csv file read.csv() For descriptive statistics summary() describe() describeBy() t.test() hist() For generating output sink() pdf()

Package related functions In lines 29-31 of the script, we have if (!require("psych")) { install.packages("psych") require("psych") } if (!require("psych")): checks whether the psych package was previously installed in your computer. If already installed, the psych package will be loaded. If the psych package is not yet installed. R will download a package and install it, and then load the package download and install the psych package (once per R installation) load the psych package (once per R session)

Key functions used in ICA #7 Package related install.packages() require() Reading data from a csv file read.csv() For descriptive statistics summary() describe() describeBy() t.test() hist() For generating output sink() pdf()

Reading data from a csv file In line 21, we have INPUT_FILENAME <- "NBA14Salaries.csv" In line 40, we have dataSet <- read.csv(INPUT_FILENAME) reads data from a CSV file and creates a data frame called dataSet that store the data table.

Key functions used in ICA #7 Package related install.packages() require() Reading data from a csv file read.csv() For descriptive statistics summary() describe() describeBy() t.test() hist() For generating output sink() pdf()

Descriptive statistics – summary() summary() presents summary statistics about a data set, or an individual data field (column). The form of results returned by summary() depends on the data type of the fields. In line 61, we have summary(dataSet$Salary) In the console, you see This statement references the Salary column in the dataSet data frame

Descriptive statistics – summary() summary() also accepts columns with character values: or an entire data frame:

Descriptive statistics – describe() describe() is provided by the psych package that presents more summary statistics In line 66, we have describe(dataSet$Salary) In the console, you see

Descriptive statistics – describeBy() describeBy() is also provided by the psych package But it presents summary statistics by a grouping variable In line 73:

t-test(): are the average salaries significantly different across positions? In lines 87 & 93: Comparing point guards (PG) to small forwards (SF) We see that the alternative hypothesis (H1) is: true difference in means is not equal to 0. The null hypothesis (H0) is simply the opposite: there is no difference between the means.

t-test(): are the average salaries significantly different across positions? In lines 87 & 93: If p-value> 0.05, you fail to reject the null hypothesis. Therefore, there is insufficient evidence to conclude that the average salary is different for PG vs SF players. If p-value ≤ 0.05, you reject the null hypothesis. Therefore, the is statistically different for PG vs SF players.

Histogram In lines 24-27: In lines 24-27: NUM_BREAKS <- 25 HISTLABEL <- "Salary" HIST_BARCOLOR <- "lightgrey" HIST_TITLE <- "Histogram of NBA Player Salary Data 2013/2014" In lines 24-27: hist(dataSet$Salary, breaks=NUM_BREAKS, col=HIST_BARCOLOR, xlab=HISTLABEL)

Key functions used in ICA #7 Package related install.packages() require() Reading data from a csv file read.csv() For descriptive statistics summary() describe() describeBy() t.test() hist() For generating output sink() pdf()

Save output to a file – sink() In addition to output displayed in the console, we want our output to go to a file. This will make it easier to read later. In line 46: In line 96: we have sink() again, which stops R from writing anymore output to the text output file sink(OUTPUT_FILENAME, append=FALSE, split=TRUE) This redirects the output to the file OUTPUT_FILENAME. We also instruct R to NOT append (append=FALSE) – it will overwrite the old file each time and to also send the output to the screen (split=TRUE) so we can see it’s doing what it should.

Save output to a file – pdf() In lines 111-113: pdf(OUTPUT_HISTNAME) hist(dataSet$Salary, breaks=NUM_BREAKS, col=HIST_BARCOLOR, xlab=HISTLABEL, main=HIST_TITLE) dev.off() This create a PDF file named OUTPUT_FILENAME. and plot a histogram in the PDF file (using hist()) and close the pdf file (using dev.off())

Final Words: Debugging in R Anyone who starts programming in R will soon run into issues like this:

Debugging in R Debugging is the art and science of fixing unexpected problems in your code.  Some strategies for debugging Figure out where the bug is Try to find typos Are variable names correct? Are file names correct? Did I set the working directory correctly? Search for help (Google, Stackoverflow)