Introduction to R.

Slides:



Advertisements
Similar presentations
Statistical Techniques I EXST7005 Start here Measures of Dispersion.
Advertisements

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Measures of Variation Sample range Sample variance Sample standard deviation Sample interquartile range.
Example of multivariate data What is R? R is available as Free Software under the terms of the Free Software Foundation'sFree Software Foundation GNU General.
Edpsy 511 Homework 1: Due 2/6.
Business Statistics BU305 Chapter 3 Descriptive Stats: Numerical Methods.
End Show Introduction to Electronic Spreadsheets Unit 3.
Digital Text and Data Processing Introduction to R.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
Chapter 4 MATLAB Programming Combining Loops and Logic Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Spreadsheets Objective 6.02
Mathcad Variable Names A string of characters (including numbers and some “special” characters (e.g. #, %, _, and a few more) Cannot start with a number.
Summary statistics Using a single value to summarize some characteristic of a dataset. For example, the arithmetic mean (or average) is a summary statistic.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
JMB Chapter 1EGR Spring 2010 Slide 1 Probability and Statistics for Engineers  Descriptive Statistics  Measures of Central Tendency  Measures.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Introduction to R. Why use R Its FREE!!! And powerful, fairly widely used, lots of online posts about it Uses S -> an object oriented programing language.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Chapter 2 Analysis using R. Few Tips for R Commands included here CANNOT ALWAYS be copied and pasted directly without alteration. –One major reason is.
Numerical Measures of Variability
Numerical descriptors BPS chapter 2 © 2006 W.H. Freeman and Company.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Introduction to R Carol Bult The Jackson Laboratory Functional Genomics (BMB550) Spring 2011.
STAT 534: Statistical Computing Hari Narayanan
© 2015 by Wade Rogers Introduction to R Cytomics Workshop December, 2015.
MDH Chapter 1EGR 252 Fall 2015 Slide 1 Probability and Statistics for Engineers  Descriptive Statistics  Measures of Central Tendency  Measures of Variability.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
1 An Introduction to R © 2009 Dan Nettleton. 2 Preliminaries Throughout these slides, red text indicates text that is typed at the R prompt or text that.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
Python Programing: An Introduction to Computer Science
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
R Roger Barlow HEP Computing seminar 21 st February 2008.
Basics in R part 2. Variable types in R Common variable types: Numeric - numeric value: 3, 5.9, Logical - logical value: TRUE or FALSE (1 or 0)
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
Pinellas County Schools
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor
Introduction to python programming
Lesson 3: Using Formulas
Excel AVERAGEIF Function
Descriptive Statistics
Programming in R Intro, data and programming structures
R programming language
Introduction to R Samal Dharmarathna.
Measures of dispersion
Data Types and Structures
Introduction to MATLAB
Algorithms Problem: Write pseudocode for a program that keeps asking the user to input integers until the user enters zero, and then determines and outputs.
EGR 115 Introduction to Computing for Engineers
Introduction to C Topics Compilation Using the gcc Compiler
R Programming Language
Central Tendency and Variability
Description of Data (Summary and Variability measures)
Review- vector analyses
Introduction to Python
Introduction to Python
I/O in C Lecture 6 Winter Quarter Engineering H192 Winter 2005
Matlab tutorial course
Web DB Programming: PHP
Communication and Coding Theory Lab(CS491)
Spreadsheets 2 Explain advanced spreadsheet concepts and functions
Spreadsheets Objective 6.02
Programming For Big Data
R Course 1st Lecture.
CHAPTER 2: Basic Summary Statistics
Data analysis with R and the tidyverse
Spreadsheets Objective 6.02
Introduction to Computer Science
Presentation transcript:

Introduction to R

R Suite of software facilities for data manipulation, calculation and graphical display. Effective data handling and storage facility, A suite of operators for calculations on arrays, large, coherent, integrated collection of intermediate tools for data analysis Graphical facilities for data analysis and display Simple and effective programming language (called ‘S’)

R Environment within which many classical and modern statistical techniques have been implemented. A few of these are built into the base R environment, but many are supplied as packages . To do geostatistical analysis and mapping, many recognize R+gstat as the only complete and fully-operational package.

Variable Assignment We assign values to variables with the assignment operator "=". Just typing the variable by itself at the prompt will print out the value. We should note that another form of assignment operator "<-" is also in use. > x = 1  > x  [1] 1

Functions R functions are invoked by its name, then followed by the parenthesis, and zero or more arguments. The following apply the function c to combine three numeric values into a vector. > c(1, 2, 3)  [1] 1 2 3 TC_g.kg = Min. 1st Qu. Median Mean 3rd Qu. Max. 1.327 7.800 11.730 33.900 22.760 523.300 Log(TC_g.kg) = Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2831 2.0540 2.4620 2.7300 3.1250 6.2600

Comments All text after the pound sign "#" within the same line is considered a comment. > 1 + 1 # this is a comment [1] 2 Q1-Q3 = encompass 50% of all the data (this is 50% around the median – which remember is the midpoint of the data). The central 50% is not influenced by any of the extreme values so will give you a better estimate of the variability you may expect. TC_g.kg = Min. 1st Qu. Median Mean 3rd Qu. Max. 1.327 7.800 11.730 33.900 22.760 523.300 Log(TC_g.kg) = Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2831 2.0540 2.4620 2.7300 3.1250 6.2600

Basic data Types Numeric Integer Complex Logical Character

Data Types: Numeric Decimal values are called numerics in R. It is the default computational data type. If we assign a decimal value to a variable x as follows: > x = 10.5       # assign a decimal value  > x              # print the value of x  [1] 10.5  > class(x)       # print the class name of x  [1] "numeric"

Data Types: Numeric Furthermore, even if we assign an integer to a variable k, it is still being saved as a numeric value. > k = 1  > k              # print the value of k  [1] 1  > class(k)       # print the class name of k  [1] "numeric"

Data Type: Integer In order to create an integer variable in R, we invoke the as.integer function. We can be assured that y is indeed an integer by applying the is.integer function. > y = as.integer(3)  > y              # print the value of y  [1] 3  > class(y)       # print the class name of y  [1] "integer"  > is.integer(y)  # is y an integer?  [1] TRUE

Data Type: Integer Incidentally, we can coerce a numeric value into an integer with the same as.integer function. > as.integer(3.14)    # coerce a numeric value  [1] 3

Data Types: Logical A logical value is often created via comparison between variables. > x = 1; y = 2   # sample values  > z = x > y      # is x larger than y?  > z              # print the logical value  [1] FALSE  > class(z)       # print the class name of z  [1] "logical" Transformations may include: Log, square root, power, arc-sine, box-cox (this runs through the variety of available transformations and allows you to select).

Data Type:Character A character object is used to represent string values in R. We convert objects into character values with the as.character() function: > x = as.character(3.14)  > x              # print the character string  [1] "3.14"  > class(x)       # print the class name of x  [1] "character" Two character values can be concatenated with the paste function: > fname = "Joe"; lname ="Smith"  > paste(fname, lname)  [1] "Joe Smith"

Vectors A vector is a sequence of data elements of the same basic type. Members in a vector are officially called components or members. Here is a vector containing three numeric values 2, 3 and 5. > c(2, 3, 5) [1] 2 3 5 And here is a vector of logical values. > c(TRUE, FALSE, TRUE, FALSE, FALSE) [1] TRUE FALSE TRUE FALSE FALSE A vector can contain character strings. > c("aa", "bb", "cc", "dd", "ee") [1] "aa" "bb" "cc" "dd" "ee" Incidentally, the number of members in a vector is given by the length function. > length(c("aa", "bb", "cc", "dd", "ee")) [1] 5

Vectors Extract a vector from a matrix obs=values[,4] # store last column in a variable

Input/Output Read data from a file Write data on a file setwd("C:\\Users\\morti\\Desktop") # set working directory values=read.csv("random.txt", header = TRUE, sep = "") # read from txt file Write data on a file write("RESULTS", file="data.txt")

Statistical Descriptors Calculate the statistics of this file: mean, median, standard deviation, variance, quartiles (1 and 3) mean(a) median(a) sd(a) var(a) quantile(a,prob=seq(0,1,0.25))

Statistical Descriptors obs=values[,4] # store last column in a variable print (obs) # set working directory mean_obs= mean(obs) # calculate mean of the observations median_obs=median(obs) # calculate the median of the observations variance=var(obs) # calculate variance st_dev=sqrt(var) # calculate standard deviation range_obs= range(obs) # calculate range range_res=abs(max(obs)-min(obs)) # calculate range q25=quantile(obs, c(.25)) # calculate quantile 25

Normal Distribution in R pnorm(84, mean=72, sd=15.2, lower.tail=TRUE)