Arko Barman COSC 6335 Data Mining Fall 2014.  Free, open source statistical analysis software  Competitor to commercial softwares like MATLAB and SAS.

Slides:



Advertisements
Similar presentations
Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
Advertisements

R for Macroecology Aarhus University, Spring 2011.
PRE-SCHOOL QUANT WORKSHOP II R THROUGH EXCEL. NEW YORK TIMES INFOGRAPHICS GALARY The Jobless Rate for People Like You Home Prices in Selected Cities For.
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
Carolina Environmental Program UNC Chapel Hill The Analysis Engine – A New Tool for Model Evaluation, Sensitivity and Uncertainty Analysis, and more… Alison.
An introduction to R: get familiar with R Guangxu Liu Bio7932.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Introduction to R Part 2. Working Directory The working directory is where you are currently saving data in R. What is the current working directory?
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Sébastien Lê Agrocampus Rennes A very short introduction to “R” The “Rcmdr” package and its environment.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
R Programming Yang, Yufei. Normal distribution.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
Scientific Computing (w2) An Introduction to Scientific Computing workshop 2.
Scientific Computing (w1) R Computing Workshops An Introduction to Scientific Computing workshop 1.
Files: By the end of this class you should be able to: Prepare for EXAM 1. create an ASCII file describe the nature of an ASCII text Use and describe string.
Bioinformatics for biologists
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
Introduction to CADStat. CADStat and R R is a powerful and free statistical package [
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action”
1-2 What is the Matlab environment? How can you create vectors ? What does the colon : operator do? How does the use of the built-in linspace function.
MIS2502: Data Analytics Introduction to Advanced Analytics and R.
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Physics 114: Lecture 1 Overview of Class Intro to MATLAB
Training Call Centre Reports START
EMPA Statistical Analysis
Development Environment
Programming in R Intro, data and programming structures
Chapter 5: Enhancing Your Output with ODS
R programming language
Introduction to R Samal Dharmarathna.
IET 603 Minitab Chapter 12 Christopher Smith
Arko Barman COSC 6335 Data Mining Fall 2014
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
Introduction to R Carolina Salge March 29, 2017.
Welcome to Math’s Tutorial Session-3 Data handling
Microsoft Office 2013 Coming to a PC near you!.
MATLAB DENC 2533 ECADD LAB 9.
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
ECONOMETRICS ii – spring 2018
Lab 1 Introductions to R Sean Potter.
R Data Manipulation Bootstrapping
JavaScript Charting Library
Advanced Data Import & Export Jeff Henrikson
Use of Mathematics using Technology (Maltlab)
%SUBMIT R A SAS® Macro to Interface SAS and R
This is where R scripts will load
Microsoft Excel 2007 – Level 2
funCTIONs and Data Import/Export
CSCI N317 Computation for Scientific Applications Unit R
Topics Introduction to Value-returning Functions: Generating Random Numbers Writing Your Own Value-Returning Functions The math Module Storing Functions.
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
This is where R scripts will load
This is where R scripts will load
Data analysis with R and the tidyverse
Creating a dataset in R Instructor: Li, Han
Using R for Data Analysis and Data Visualization
MIS2502: Data Analytics Introduction to Advanced Analytics and R
R tutorial
Introduction to Python
Presentation transcript:

Arko Barman COSC 6335 Data Mining Fall 2014

 Free, open source statistical analysis software  Competitor to commercial softwares like MATLAB and SAS – none of which are free  Need to install only basic functionalities  Install packages as and when needed – somewhat like Python  Excellent visualization features!

 Link for installation:  Choose a mirror link to download and install  Latest version is R  Choose default options for installation  Don’t worry about customizing startup options!  Just click ‘Okay’, ‘Next’ and ‘Finish’!

Command line

 R is an interpreter  Case sensitive  Comment lines start with #  Variable names can be alphanumeric, can contain ‘.’ and ‘_’  Variable names must start with a letter or ‘.’  Names starting with ‘.’ cannot have a digit as the second character  To use an additional package, use library( ) e.g. library(matrixStats)  Package you are trying to use must be installed before using

 Variable declaration not needed (like MATLAB)  Variables are also called objects  Commands are separated by ‘;’ or newline  If a command is not complete at the end of a newline, prompt shows ‘+’  Use command q() to quit  For help on a command, use help( ) or ?  For examples using a command, use example( )

 setwd( )  source( ) – runs the R script  sink( ) – saves outputs to  sink() – restores output to console  ls() – prints list of objects in workspace  rm(,, … ) – removes objects from workspace

 Writing scripts - use Notepad/Notepad++ and save with extension.r OR use Rcmdr package  Installing a package - Packages>Install package(s)… choose the package you want to install

assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7)) x  c() concatenates vectors/numbers end to end x<- c(10.4, 5.6, 3.1, 6.4, 21.7) x y<-c(x,0,x) y v<-2*x+y+1 v min(v) max(y) range(x) cat(v)

v<-c(1:10) w<-1:10 x<-2*1:10 (Note operator precedence here) y<-seq(-4,7) y<-seq(-4,7,2)

x <- array(1:20, dim=c(4,5)) X i <- array(c(1:3,3:1), dim=c(3,2)) i x[i] <- 0 x xy <- x %o% y #(outer product of x and y) xy z <- aperm(xy, c(2,1)) (Permutations of an array xy) z

seq1<-seq(1:6) mat1<-matrix(seq1,2) mat1 mat2<-matrix(seq1,2,byrow=T) mat2  General use: matrix(data,nrow,ncol,byrow)  Notice that ncol is redundant and optional! Number of rows

 Efficient way to store and manipulate tables e.g. v1 = c(2, 3, 5) v2 = c("aa", "bb", "cc") v3 = c(TRUE, FALSE, TRUE) df = data.frame(v1, v2, v3) df

 A vector of categorical data e.g. > data = c(1,2,2,3,1,2,3,3,1,2,3,3,1) > fdata = factor(data) > fdata [1] Levels: 1 2 3

mydata <- read.table("c:/mydata.csv", header=TRUE, sep=",", row.names="id") Path to the file Note the use of / instead of \ Whether or not to treat the first row as header Delimiter (default – White space) Row names (optional) mydata, header = TRUE, row.names = ) Commonly used for CSV files

Download the file Notice that the data is in CSV format and there are no headers! mydata, header = FALSE)  You can also read xls files using read.xls which is a part of gdata package! install.packages(pkgs="gdata") library(gdata) data ) data

 Download file ‘sampledata1’ from TA homepage>Resources mydata,header=TRUE) mydata myvars<-c("x1","x3") OR myvars<-c(1,3) mySubset<-mydata[myvars] mySubset mySubset1<-mydata[c(-1,-2)] mySubset1

 Selecting observations mySubset2<-mydata[1:6,] mySubset2  Selecting observations based on variable values mySubset3 2),] mySubset3  To avoid using mydata$ prefix in columns attach(mydata) mySubset3 2),] mySubset3 detach(mydata) Be careful when using multiple datasets!!! Remember this if you use multiple datasets!!!

mySubset4<-mydata[sample(nrow(mydata), 3),] mySubset4 mydata1<-split(mydata,mydata$class) mydata1 Groups data of different classes together! mySubset5 2,select=c(x1,x2,class)) mySubset5 Most general form for manipulating data! Total number of observations Number of samples

 merge() - used for merging two or more data frames into a single frame  cbind() – adds new column(s) to an existing data frame  rbind() – adds new row(s) to an existing data frame

 Different types of plots – helpful for data visualization and statistical analysis  Basic plot Download ‘sampledata2’ from TA webpage>Resources mydata,header=TRUE) attach(mydata) plot(x1,x2)

 abline() – adds one or more straight lines to a plot  lm() – function to fit linear regression model abline(lm(x2~x1)) title('Regression of x2 on x1')

 Boxplot boxplot(x2~x3,data=mydata,main="x2 versus x3", xlab="x3",ylab="x2")

 Histogram z = rnorm(1000, mean=0,sd=1) hist (z) hist (z, breaks = 5) bins = seq (-5, 5, by = 0.5) hist (z, breaks = bins) Generate 1000 samples of a standard normal random variable

 scatterplot() – for 2D scatterplots  scatterplot3d() – for 3D scatterplots  plot3d() – for interactive 3D plots (rgl package)  scatter3d() – another interactive 3d plot (Rcmdr package)  dotchart() – for dot plots  lines() – draws one or more lines  pie() – for drawing pie chart  pie3d() – for drawing 3D exploded pie charts (plotrix package)

Questions?