Unit- 3 R for Data Analysis.

Slides:

Advertisements

Similar presentations

Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.

Advertisements

R for Macroecology Aarhus University, Spring 2011.

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.

Introduction to C Programming

Overview of C++ Chapter 2 in both books programs from books keycode for lab: get Program 1 from web test files.

Computer Science: A Structured Programming Approach Using C1 Objectives ❏ To understand the structure of a C-language program. ❏ To write your first C.

Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.

Program A computer program (also software, or just a program) is a sequence of instructions written in a sequence to perform a specified task with a computer.

Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.

Program A computer program (also software, or just a program) is a sequence of instructions written in a sequence to perform a specified task with a computer.

Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,

Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.

R Programming Yang, Yufei. Normal distribution.

Introduction to Programming with RAPTOR

Programming Fundamental Slides1 Data Types, Identifiers, and Expressions Topics to cover here: Data types Variables and Identifiers Arithmetic and Logical.

Introduction to Programming

INTRODUCTION TO MATLAB MATLAB is a software package for computation in engineering, science, and applied mathemat-ics. It offers a powerful programming.

Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.

INTRODUCTION TO MATLAB Dr. Hugh Blanton ENTC 4347.

Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.

CHAPTER 2 PROBLEM SOLVING USING C++ 1 C++ Programming PEG200/Saidatul Rahah.

Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.

Java Basics. Tokens: 1.Keywords int test12 = 10, i; int TEst12 = 20; Int keyword is used to declare integer variables All Key words are lower case java.

16BIT IITR Data Collection Module If you have not already done so, download and install R from download.

Computer Science: A Structured Programming Approach Using C1 Objectives ❏ To understand the structure of a C-language program. ❏ To write your first C.

Introduction to R.

Last week: We talked about: History of C Compiler for C programming

Basic concepts of C++ Presented by Prof. Satyajit De

R programming language

Introduction to R Samal Dharmarathna.

Introduction to the C Language

Introduction to Python

Documentation Need to have documentation in all programs

Basic Elements of C++.

Objectives Identify the built-in data types in C++

Data Types, Identifiers, and Expressions

EGR 115 Introduction to Computing for Engineers

Introduction to Programming

Variables, Expressions, and IO

Other Kinds of Arrays Chapter 11

R Programming Language

MATLAB DENC 2533 ECADD LAB 9.

MATLAB: Structures and File I/O

Chapter 19 JavaScript.

Basic Elements of C++ Chapter 2.

Lab 1 Introductions to R Sean Potter.

R Programming I EPID 799C Fall 2017.

Introduction to the C Language

IDENTIFIERS CSC 111.

Variables In programming, we often need to have places to store data. These receptacles are called variables. They are called that because they can change.

Introduction to Python

Data Types, Identifiers, and Expressions

Introduction to Programming

Matlab review Matlab is a numerical analysis system

2.1 Parts of a C++ Program.

T. Jumana Abu Shmais – AOU - Riyadh

Data Analytics (CS40003) Programming with R Lecture #4

Introduction to Programming

INTRODUCTION TO MATLAB

Focus of the Course Object-Oriented Software Development

MIS2502: Data Analytics Introduction to R and RStudio

Programming For Big Data

R Course 1st Lecture.

Unit 3: Variables in Java

Introduction to Programming

JAVA. Java is a high-level programming language originally developed by Sun Microsystems and released in Java runs on a variety of platforms, such.

Introduction to Python

Getting Started With Coding

PYTHON - VARIABLES AND OPERATORS

Presentation transcript:

Unit- 3 R for Data Analysis

R Language R is a programming language and software environment for Statistical analysis, Graphics representation and Reporting. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team.

Windows Installation You can download the Windows installer version of R from R-3.2.2 for Windows (32/64 bit) and save it in a local directory. As it is a Windows installer (.exe) with a name "R-version-win.exe". You can just double click and run the installer accepting the default settings. If your Windows is 32-bit version, it installs the 32-bit version. But if your windows is 64-bit, then it installs both the 32-bit and 64-bit versions. After installation you can locate the icon to run the Program in a directory structure "R\R-3.2.2\bin\i386\Rgui.exe" under the Windows Program Files. Clicking this icon brings up the R-GUI which is the R console to do R Programming.

Linux Installation R is available as a binary for many version of Linux at the location R Binaries. The instruction to install for various flavors of Linux varies. These steps are mentioned under each type of Linux version in the mentioned link. Still you are in hurry, then you can use yum command to install R as follows: $ yum install R Above command will install core functionality of R programming along with standard packages.

Basic Syntax- First Hello World Program > a<-" Hello World" > print(a) [1] " Hello World"

Comments: Single comment is written using # in the beginning of the statement as follows: # This is my First Program R does not support multi-line comments

Data Types In program we use variables.. Variables are used to store various kind of information Variables are nothing but reserved memory locations to store values. This means that when you create a variable you reserve some space in memory. In R the variables are not declared as some data type. The variables are assigned with R-Objects and the data type of the R-object becomes the data type of the variable. There are many types of R-objects. The frequently used ones are Vectors Lists Matrices Arrays Factors Data Frames

Vector Object The simplest of these objects is the vector object and there are six data types of these atomic vectors, also termed as six classes of vectors. The other R-Objects are built upon the atomic vectors. Logical > v<-"TRUE" > class(v) [1] "character" Numeric > v<-77.5 [1] "numeric"

Integer Complex > v<-4L > class(v) [1] "integer"

> v<-charToRaw("Hello") > v [1] 48 65 6c 6c 6f [1] "raw" Character > v<-"yes" > class(v) [1] "character" > v<-'yes' Raw > v<-charToRaw("Hello") > v [1] 48 65 6c 6c 6f [1] "raw"

When you want to create vector with more than one element, you should use c() function which means to combine the elements into a vector. >apple<-c('red','green','yellow') > apple [1] "red" "green" "yellow" > print(apple) > > print(class(apple)) [1] "character"

LIST A list is a R-object which can contain many different types of elements inside it like vectors, functions and even another list inside it. > list1<-list(c(1,2,5),32.3,sin,"shweta") > list1 [[1]] [1] 1 2 5 [[2]] [1] 32.3 [[3]] function (x) .Primitive("sin") [[4]] [1] "shweta"

Questions What are the different types of data types available in R. What is the difference between numeric and integer vector class What is the difference between cat and print What is the significance of List What are the different classes of vectors .

Matrix== matrix(data,nrow,ncol,byrow,dimnames) > M=matrix(c('a','a','a','b','b','b'),nrow=2,ncol=3) > M [,1] [,2] [,3] [1,] "a" "a" "a" [2,] "b" "b" "b" > M=matrix(c('a','a','a','b','b','b'),nrow=3,ncol=2) [,1] [,2] [1,] "a" "b" [2,] "a" "b" [3,] "a" "b"

> M=matrix(c(3:14),nrow=4, byrow=TRUE) > M [,1] [,2] [,3] [1,] 3 4 5 [2,] 6 7 8 [3,] 9 10 11 [4,] 12 13 14 > M=matrix(c(3:14),nrow=4,byrow=FALSE) [1,] 3 7 11 [2,] 4 8 12 [3,] 5 9 13 [4,] 6 10 14 To access the matrix values > M[1,3] [1] 11 > M[,3] [1] 11 12 13 14 > M[3,] [1] 5 9 13 >

> rowname=c("r1","r2","r3","r4") > colname=c("c1","c2","c3") >M=matrix(c(3:14),nrow=4,byrow=TRUE,dimnames=list(rowname,col name)) > M c1 c2 c3 r1 3 4 5 r2 6 7 8 r3 9 10 11 r4 12 13 14

Array- Store data in more then two dimension Array(data,dim) > v1=c(1,2,3) > v2=c(4,5,6,7,8,9) > A=array(c(v1,v2),dim=c(3,3,2)) > A , , 1 [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 , , 2

Question Create a two 2-D array Assign variable1 as 4,5,6,7 by (<- operator) Assign variable2 as “My”,“Name”,“is”,“shweta” by (= operator). Assign variable3 as TRUE,1 by (-> operator) Display the output as Variable1 is ………variable2 is……………& variable3 is………………. Then curser should come to the new line Find out the class of each variable-

Factor Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values. Like "male, "Female" and True, False etc. They are useful in data analysis for statistical modeling.

Variables Name Variable Name Validity Reason var_name2. valid Has letters, numbers, dot and underscore var_name% Invalid Has the character '%'. Only dot(.) and underscore allowed. 2var_name invalid Starts with a number .var_name , var.name Can start with a dot(.) but the dot(.)should not be followed by a number. .2var_name The starting dot is followed by a number making it invalid _var_name Starts with _ which is not valid

Variable Assignment > var.1<-c(4,5,7,9) > var.2=c("Hello","we","r","learning R") > c(TRUE,1)->var.3 > print(var.1) [1] 4 5 7 9 > cat("var 1 is", var.1, "\n") var 1 is 4 5 7 9 > cat("var 2 is", var.2, "\n") var 2 is Hello we r learning R > cat("var 3 is", var.3, "\n") var 3 is 1 1

> class(var.1) [1] "numeric" > class(var.2) [1] "character" > class(var.3) To know all the variables currently available in the workspace we use the ls()function. print(ls()) [1] "a" "apple" "list1" "v" "var.1" "var.2" "var.3"

Operators: Arithmetic Operators > a<-c(1,2,3) > b<-c(4,5,6) > a+b [1] 5 7 9 > a-b [1] -3 -3 -3 > a*b [1] 4 10 18 > a/b [1] 0.25 0.40 0.50 > a^b [1] 1 32 729 > a%%b [1] 1 2 3 > b%a Error: unexpected input in "b%a" > b%%a [1] 0 1 0

Relational operator > a>b [1] FALSE FALSE FALSE > a<b [1] TRUE TRUE TRUE > a==b > a<=b > a!=b >

Mislleneous Operator( : , %in%) c=5:18 > c [1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 > v1<-5 > v2<-16 > t=1:10 > print(v1%in%t) [1] TRUE > print(v2%in%t) [1] FALSE

Decision Making(if-else) > x<-40L > if(is.integer(x)){ + print("x is integer")} [1] "x is integer" > x<-c("what","is","R") > if("R"%in%x){ + print("R is in x") + }else{ + print("R is not in x")} [1] "R is in x" >

Functions > z<-3.5-8i > Re(z) [1] 3.5 > Im(z) [1] -8 > Mod(z) [1] 8.732125 > Conj(z) [1] 3.5+8i > is.complex(z) [1] TRUE

> is.numeric(z) [1] FALSE > as.numeric(z) [1] 3.5 Warning message: imaginary parts discarded in coercion > as.complex(z) [1] 3.5-8i > floor(5.98) [1] 5 > ceiling(5.9) [1] 6 > ceiling(5.1) > floor(5.18)

> trunc(5.4) [1] 5 > trunc(-5.4) [1] -5 > signif(12345678,6) [1] 12345700 > signif(12345678,5) [1] 12346000 > log(10) [1] 2.302585 > sin(pi) [1] 1.224606e-16 > pi [1] 3.141593 > sin(pi/2) [1] 1 >

> seq(3,8) [1] 3 4 5 6 7 8 > 3:8 > mean(3:6) [1] 4.5 > sum(4,5) [1] 9 > sum(4:8) [1] 30 > new<-function(a) + {for(i in 1:a){ + b<-i^2 + print(b)}} > new(4) [1] 1 [1] 4 [1] 16

> new<-function(a,b,c){ + result<-(a*b+c) + print(result) + } > new(2,3,2) [1] 8 > new(a=3,b=5,c=2) [1] 17 + print(a) + print(b) + print(c)

FACTOR > data<-c("East","West","East","North","North","East") > print(is.factor(data)) [1] FALSE > Factor_data=factor(data) > Factor_data [1] East West East North North East Levels: East North West > is.factor(Factor_data) [1] TRUE

We can generate factor levels by using the gl() function. gl(n,k,labels) > v<-gl(4,4,labels=c("East","West","North","South")) > v [1] East East East East West West West West North North North North [13] South South South South Levels: East West North South

DataFrame== Data frames are tabular data objects DataFrame== Data frames are tabular data objects. Unlike a matrix in data frame each column can contain different modes of data. > empdata<-data.frame( + emp_id=c(1:5), + emp_name=c("Shweta","Sonal","Shipra","Manisha","Varsha")) > empdata emp_id emp_name 1 1 Shweta 2 2 Sonal 3 3 Shipra 4 4 Manisha 5 5 Varsha

> height=c(132,166,123,145) > weight=c(48,50,64,44) > gender=c("Female","Male","Male","Female") > data<-data.frame(height,weight,gender) > data height weight gender 1 132 48 Female 2 166 50 Male 3 123 64 Male 4 145 44 Female > is.factor(data$gender) [1] TRUE

Question- Data Frame Create a student data with the help of Data Frame data type with the following fields: Student_Rollno Student_Name Student_Gender Student_marksinDS

To check the structure of dataframe Str(emp) To get the summary Summary(emp) > emp<-data.frame(emp_id=c(1:3),emp_name=c("Shweta","Gargi","Sumit"),emp_salary=c(300,200,400)) > emp emp_id emp_name emp_salary 1 1 Shweta 300 2 2 Gargi 200 3 3 Sumit 400 'data.frame': 3 obs. of 3 variables: $ emp_id : int 1 2 3 $ emp_name : Factor w/ 3 levels "Gargi","Shweta",..: 2 1 3 $ emp_salary: num 300 200 400 > emp[1:2,] > emp[2:3,]

> emp[c(2,3),c(2,3)] emp_name emp_salary 2 Gargi 200 3 Sumit 400 > emp$emp_dept<-c("CSE","ECE","Medical") > emp emp_id emp_name emp_salary emp_dept 1 1 Shweta 300 CSE 2 2 Gargi 200 ECE 3 3 Sumit 400 Medical >

> emp.new<- data.frame(emp_id=34,emp_name="dddd",emp_salary=2222,emp_d ept="dd") > rbind(emp,emp.new) emp_id emp_name emp_salary emp_dept 1 1 Shweta 300 CSE 2 2 Gargi 200 ECE 3 3 Sumit 400 Medical 4 34 dddd 2222 dd

function > new<-function(a) + {for(i in 1:a){ + b<-i^2 + print(b)}} > new(4)

Question Create a function to display the table of any number.. -- with argument --without argument

To get the current working directory > print(getwd()) Create a csv file in that particular directory with the data Id, name, salary,dept Fill data with , How to read the data from csv file data<-read.csv("input.csv")

> print(getwd()) [1] "C:/Users/SHWETA MONGIA/Documents" > data<-read.csv("input.csv") Warning message: In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'input.csv' > data id name salary 1 1 Rick 40000 2 2 Gary 200000 3 3 Ryan 300000

> is.data.frame(data) [1] TRUE > ncol(data) [1] 3 > nrow(data) > sal<-max(data$salary) > sal [1] 300000 >

> subset(data, salary==max(salary)) id name salary 3 3 Ryan 300000 > subset(data, salary>100000 & id>2)