MIS2502: Data Analytics Introduction to R and RStudio

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

R for Macroecology Aarhus University, Spring 2011.
R Language. What is R? Variables in R Summary of data Box plot Histogram Using Help in R.
MATLAB – What is it? Computing environment / programming language Tool for manipulating matrices Many applications, you just need to get some numbers in.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Introduction to SPSS (For SPSS Version 16.0)
Introduction to MATLAB ENGR 1187 MATLAB 1. Programming In The Real World Programming is a powerful tool for solving problems in every day industry settings.
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
Applications Software. Applications software is designed to perform specific tasks. There are three main types of application software: Applications packages.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Introduction to SPSS Edward A. Greenberg, PhD
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
TUTORIAL 10: PROGRAMMING WITH JAVASCRIPT Session 2: What is JavaScript?
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
A Brief Introduction to R Programming Darren J. Fitzpatrick, PhD The Bioinformatics Support Team 27/08/2015.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Outline Comparison of Excel and R R Coding Example – RStudio Environment – Getting Help – Enter Data – Calculate Mean – Basic Plots – Save a Coding Script.
Introduction to MATLAB ENGR 1181 MATLAB 1. Opening MATLAB  Students, please open MATLAB now.  CLICK on the shortcut icon → Alternatively, select… start/All.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
STAT 534: Statistical Computing Hari Narayanan
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
MIS2502: Data Analytics Introduction to Advanced Analytics and R.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Introduction to the SPSS Interface
Block 1: Introduction to R
EMPA Statistical Analysis
R Brown-Bag Seminar 2.1 Topic: Introduction to R Presenter: Faith Musili ICRAF-Geoscience Lab.
ECE 1304 Introduction to Electrical and Computer Engineering
Lecture 2: Introduction to R
Programming in R Intro, data and programming structures
Introduction to R Samal Dharmarathna.
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Introduction to R Commander
DEPARTMENT OF COMPUTER SCIENCE
Introduction to R Studio
Engineering Innovation Center
MATLAB DENC 2533 ECADD LAB 9.
User Defined Functions
ECONOMETRICS ii – spring 2018
Lab 1 Introductions to R Sean Potter.
Introduction to R.
How to Run a DataOnDemand Report
Use of Mathematics using Technology (Maltlab)
Code is on the Website Outline Comparison of Excel and R
PHP.
T. Jumana Abu Shmais – AOU - Riyadh
Digital Image Processing
Communication and Coding Theory Lab(CS491)
MIS2502: Data Analytics Introduction to Advanced Analytics and R
Spreadsheets, Modelling & Databases
Stata Basic Course Lab 2.
MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap
Tutorial 10: Programming with javascript
R Course 1st Lecture.
Data analysis with R and the tidyverse
MIS2502: Data Analytics Advanced Analytics Using R
Using R for Data Analysis and Data Visualization
MIS2502: Data Analytics Introduction to Advanced Analytics and R
L L Line CSE 420 Computer Games Lecture #3 Introduction to Python.
Introduction to the SPSS Interface
Introduction to Excel 2007 Part 1: Basics and Descriptive Statistics Psych 209.
PYTHON - VARIABLES AND OPERATORS
Presentation transcript:

MIS2502: Data Analytics Introduction to R and RStudio Aaron Zhi Cheng http://community.mis.temple.edu/zcheng/ acheng@temple.edu Acknowledgement: David Schuff

Now we will start using R and RStudio heavily in class activities and assignments R/RStudio has become one of the dominant software environments for data analysis And has a large user community that contribute functionality Make sure you download both on your computer!

Software development platform and programming language Open source, free Many, many, many statistical add-on “packages” that perform data analysis Integrated Development Environment for R Nicer interface that makes R easier to use Requires R to run (The pretty face) (The base/engine) If you have both installed, you only need to interact with Rstudio Mostly, you do not need to touch R directly

(just like a command line) RStudio Interface Environment: info of your data History: Previous commands Console (just like a command line) Files Plots Packages Help Viewer It may have an additional window for R script(s) and data view if you have any of them open

The Basics: Calculations R will do math for you: Type commands into the console and it will give you an answer

The Basics: Functions sqrt(), log(), abs(), and exp() are functions. Functions accept parameters (in parentheses) and return a value

x, y, and z are variables that can be manipulated The Basics: Variables Variables are named containers for data The assignment operator in R is: <- or = Variable names can start with a letter or digits. Just not a number by itself. Examples: result, x1, 2b (not 2) R is case sensitive (i.e. Result is a different variable than result) <- and = do the same thing rm() removes the variable from memory x, y, and z are variables that can be manipulated

Basic Data Types Type Range Assign a Value Numeric Numbers Character x<-1 y<--2.5 Character Text strings name<-"Mark" color<-"red" Logical (Boolean) TRUE or FALSE female<-TRUE

Advanced Data Types Vectors Lists Data frames

Vectors of Values A vector is a sequence of data elements of the same basic type. > scores<-c(65,75,80,88,82,99,100,100,50) > scores [1] 65 75 80 88 82 99 100 100 50 > studentnum<-1:9 > studentnum [1] 1 2 3 4 5 6 7 8 9 > ones<-rep(1,4) > ones [1] 1 1 1 1 > names<-c("Nikita","Dexter","Sherlock") > names [1] "Nikita" "Dexter" "Sherlock" c() and rep() are functions

Indexing Vectors We use brackets [ ] to pick specific elements in the vector. In R, the index of the first element is 1 > scores [1] 65 75 80 88 82 99 100 100 50 > scores[1] [1] 65 > scores[2:3] [1] 75 80 > scores[c(1,4)] [1] 65 88

Simple statistics with R You can get descriptive statistics from a vector > scores [1] 65 75 80 88 82 99 100 100 50 > length(scores) [1] 9 > min(scores) [1] 50 > max(scores) [1] 100 > mean(scores) [1] 82.11111 > median(scores) [1] 82 > sd(scores) [1] 17.09857 > var(scores) [1] 292.3611 > summary(scores) Min. 1st Qu. Median Mean 3rd Qu. Max. 50.00 75.00 82.00 82.11 99.00 100.00 Again, length(), min(), max(), mean(), median(), sd(), var() and summary() are all functions. These functions accept vectors as parameter.

Lists A list can contain many different types of elements inside it > studentnum<-1:9 > names<-c("Nikita","Dexter","Sherlock") > list1<-list(c(2,5,3), 21.3, "hello", studentnum, names) > list1 [[1]] [1] 2 5 3 [[2]] [1] 21.3 [[3]] [1] "hello" [[4]] [1] 1 2 3 4 5 6 7 8 9 [[5]] [1] "Nikita" "Dexter" "Sherlock"

Indexing Lists We use double brackets [[ ]] to pick specific elements in the list. > list1[[1]] [1] 2 5 3 > list1[[5]] [1] "Nikita" "Dexter" "Sherlock“ > list1[[5]][1] [1] "Nikita"

Data Frames A data frame is a type of variable used for storing data tables is a special type of list where every element of the list has same length (i.e. data frame is a “rectangular” list) > BMI<-data.frame( + gender = c("Male","Male","Female"), + height = c(152,171.5,165), + weight = c(81,93,78), + Age = c(42,38,26) + ) > BMI gender height weight Age 1 Male 152.0 81 42 2 Male 171.5 93 38 3 Female 165.0 78 26 > nrow(BMI) [1] 3 > ncol(BMI) [1] 4

Identify elements of a data frame To retrieving cell values > BMI[1,3] [1] 81 > BMI[1,] gender height weight Age 1 Male 152 81 42 > BMI[,3] [1] 81 93 78 More ways to retrieve columns as vectors > BMI[[2]] [1] 152.0 171.5 165.0 > BMI$height > BMI[["height“]]

Packages Packages (add-ons) are collections of R functions and code in a well-defined format. To install a package: install.packages("pysch") For every new R session (i.e., every time you re-open Rstudio), you must load the package before it can be used library(psych) or require(psych) Each package only needs to be installed once

Creating and opening a .R file The R script is where you keep a record of your work in R/RStudio. To create a .R file Click “File|New File|R Script” in the menu To save the .R file click “File|Save” To open an existing .R file click “File|Open File” to browse for the .R file

Working directory The working directory is where Rstudio will look first for scripts and files Keeping everything in a self contained directory helps organize code and analyses Check you current working directory with getwd()

To change working directory Use the Session | Set Working Directory Menu If you already have an .R file open, you can select “Set Working Directory>To Source File Location”.

Reading data from a file Usually you won’t type in data manually, you’ll get it from a file Example: 2009 Baseball Statistics (http://www2.stetson.edu/~jrasp/data.htm) reads data from a CSV file and creates a data frame called teamData that store the data table. reference the HomeRuns column in the data frame using TeamData$HomeRuns

Looking for differences across groups: The setup We want to know if National League (NL) teams scored more runs than American League (AL) Teams And if that difference is statistically significant To do this, we need a package that will do this analysis In this case, it’s the “psych” package Downloads and installs the package (once per R installation)

Looking for differences across groups: The analysis (t-test) Descriptive statistics, broken up by group (League) Results of t-test for differences in Runs by League)

Histogram hist() first parameter – data values hist(teamData$BattingAvg, xlab="Batting Average", main="Histogram: Batting Average") hist() first parameter – data values xlab parameter – label for x axis main parameter - sets title for chart

Plotting data plot() first parameter – x data values plot(teamData$BattingAvg,teamData$WinningPct, xlab="Batting Average", ylab="Winning Percentage", main="Do Teams With Better Batting Averages Win More?") plot() first parameter – x data values second parameter – y data values xlab parameter – label for x axis ylab parameter – label for y axis main parameter - sets title for chart

Running this analysis as a script Use the Code | Run Region | Run All Menu Commands can be entered one at a time, but usually they are all put into a single file that can be saved and run over and over again.

Getting help general help help about function mean() help.start() general help help(mean) help about function mean() ?mean same. Help about function mean() example(mean) show an example of function mean() help.search("regression") get help on a specific topic such as regression.

Online Tutorials If you’d like to know more about R, check these out: Quick-R (http://www.statmethods.net/index.html) R Tutorial (http://www.r-tutor.com/r-introduction) Learn R Programing (https://www.tutorialspoint.com/r/index.htm) Programming with R (https://swcarpentry.github.io/r-novice-inflammation/ There is also an interactive tutorial to learn R basics, highly recommended! (http://tryr.codeschool.com/)