Introduction to R Dr. Satish Nargundkar
What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. 2
Advantages of R Free and open source No license restrictions 4800 packages for specialized topics Runs on many operating systems Active user groups 3
Limitations of R R can consume all the available memory Use 64 bit systems than 32 bit to solve problem String comparisons can be slow, numeric normal speed versus other software R slower for some matrix operations such as 4
Downloading R Windows(32/64 bits) R for Mac OS X Downloading R studio ad/ 5
Why get R studio? Ease of use Datasets readily accessible for viewing Commands at your finger tips Variable names at your fingertips Multiple windows for comprehensive view: Script Global Environment Console Home Screen 6
Screenshot for the script R script : Upper left hand corner of R studio screen Store all your commands in script 7
Screenshot for Global environment window Upper right hand corner of R-studio screen Shows al the datasets that have been imported and created through dataset transformations 8
Screenshot for Console R console: Lower left hand corner Runs your commands(Press enter) 9
Screenshot for Home Home: Lower right hand corner List of files you will have in your working directory 10
Reading data into R Set working directory through setwd(:/C…) Use forward slash while setting working directory Ensure dataset is in your working directory Save dataset as CSV file in your working directory Use import dataset option in R studio 11
Common Commands1 Getting list of variables Names(datasetname) Getting individual variable data column datasetname$variable Getting the metadata Summary(datasetname) Merging datasets: Totaldata<-merge(dataset1, dataset2, by = “common variable") Sorting Data attach(trainwhite) sorteddata<-data[order(variable),] Testing for missing values is.na(trainwhite) 12
Common Commands2 Finding variables that have missing values dataset[!complete.cases(dataset),] Regression regression1<-lm(y~ x1+ x2+ x3, data=dataset) Summary of regression Summary(regression1) Coefficients of regression Coefficients(regression1) Predicted values and residuals Fitted(regression1) Residuals(regression1) 2 way frequency table attach(trainwhite) freqtable<-table(pH,fixed_acidity) 13
Common commands 3 Making histograms hist(dataset$variable) Creating a graph Attach(datasetname) Plot(variable1, varaible2) abline(lm(v1~v2)) Title(“Regression of v1 on v2”) 14
Best Practices in R Use R studio to import dataset Convert dataset into CSV format before importing Use the online R communities for quick help, better than R’s help page Save scripts for common commands Make notes using ## for future reference Close all other work before using R, makes other things slower 15
Type the variable names as is (R is case sensitive) Common Errors: Watch out 16
References 17