Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Slides:



Advertisements
Similar presentations
SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
Advertisements

Workshop in R & GLMs: #3 Diane Srivastava University of British Columbia
Statistical Methods II
Workshop in R & GLMs: #2 Diane Srivastava University of British Columbia
Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.
Psychology 202b Advanced Psychological Statistics, II
PARAMETRIC AND NONPARAMETRIC TEST. Parametric Test  If the information about the population is completely known by means of its parameters then statistical.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Hypothesis Testing.
January 7, afternoon session 1 Multi-factor ANOVA and Multiple Regression January 5-9, 2008 Beth Ayers.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Count Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Confidence intervals. Population mean Assumption: sample from normal distribution.
Programme in Statistics (Courses and Contents). Elementary Probability and Statistics (I) 3(2+1)Stat. 101 College of Science, Computer Science, Education.
Parametric Tests 1) Assumption of population normality 2) homogeneity of variance Parametric more powerful than nonparametric.
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Assumption and Data Transformation. Assumption of Anova The error terms are randomly, independently, and normally distributed The error terms are randomly,
Hypothesis Testing :The Difference between two population mean :
Generalized Linear Models
Lesson 12-1 Algebra Check Skills You’ll Need 12-4
1 PARAMETRIC VERSUS NONPARAMETRIC STATISTICS Heibatollah Baghi, and Mastee Badii.
Statistical Methods II
Exercise Simplify 5x + 3y – x + 10y. 4x + 13y. Simplify 74 – 5m – 2m – 8. – 7m + 66 Exercise.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
Further distributions
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
Linear Model. Formal Definition General Linear Model.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
Multivariate Data Analysis Chapter 2 – Examining Your Data
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
Psychology 202a Advanced Psychological Statistics November 12, 2015.
Analyzing Statistical Inferences July 30, Inferential Statistics? When? When you infer from a sample to a population Generalize sample results to.
Soc 3306a Lecture 7: Inference and Hypothesis Testing T-tests and ANOVA.
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Beginning Statistics Table of Contents HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc.
Innovative Teaching Article (slides with auxiliary information: © 2014) James W. Grice Oklahoma State University Department of Psychology.
A Brief Introduction to JMP 10 Tim Bruce 16 October 2014.
Variance Stabilizing Transformations. Variance is Related to Mean Usual Assumption in ANOVA and Regression is that the variance of each observation is.
STA 216 Generalized Linear Models Instructor: David Dunson 211 Old Chem, (NIEHS)
Chap 5-1 Chapter 5 Discrete Random Variables and Probability Distributions Statistics for Business and Economics 6 th Edition.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Statistics -Continuous probability distribution 2013/11/18.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Regression Overview. Definition The simple linear regression model is given by the linear equation where is the y-intercept for the population data, is.
Transforming the data Modified from:
CHAPTER 12 MODELING COUNT DATA: THE POISSON AND NEGATIVE BINOMIAL REGRESSION MODELS Damodar Gujarati Econometrics by Example, second edition.
WiFi password:
Statistical Modelling
Introduction to Hypothesis Test – Part 2
Discrete Probability Distributions
Psychology 202a Advanced Psychological Statistics
Chapter 12: Regression Diagnostics
Analysis of Data Graphics Quantitative data
Correct statistics in ecological research
BIVARIATE REGRESSION AND CORRELATION
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
Normal Density Curve. Normal Density Curve 68 % 95 % 99.7 %
OVERVIEW OF BAYESIAN INFERENCE: PART 1
Single-Factor Studies
Part IV Significantly Different Using Inferential Statistics
Single-Factor Studies
إحصاء تربوي الفصل الأول: المفاهيم الأساسية الإحصائية
What is Regression Analysis?
Regression is the Most Used and Most Abused Technique in Statistics
Sampling Distributions
If the question asks: “Find the probability if...”
A protocol for data exploration to avoid common statistical problems
UNIT-4.
Presentation transcript:

Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics

Why not? Time to learn code Very simple statistics may be faster with “point-and-click” software (e.g. Statistica, JMP)

Why generalized linear models (GLMs)? Most ecological data FAIL these two assumptions of parametric statistics: Variance is independent of mean (“homoscedasticity”) Data are normally distributed

Taylors power law: most ecological data has 1>b>2 Mean Variance Variance = a* Mean b

Many types of ecological data are expected to be non-normal Count data are expected to be Poisson Examples: population size, species richness Binary (0,1) data are expected to be binomial Examples: survivorship, species presence

Workshop in R & GLMs Session 1: Basic commands + linear models Session 2: Testing parametric assumptions Session 3: How generalized linear models work Session 4: Model simplification and overdispersion

Exercise 1. Open R “>” is the command prompt 2. Write: x <- “hello” x 3. What do the arrow keys do? And the “end” key? Ready!

Exercise x <- 5 y<- 1 x+y; x*y; x/y ; x^y sqrt(x); log (x); exp (x) Careful! Capitalization matters, Y and y are different. Spaces do not matter, x<-5 is the same as x < - 5. “;” means new command follows

Vectors X <- c(8,2,5,9) “c” means combine

Vectors x <- rep (0,4) x <- 1:4 x <- seq (1,7, by=2) 0,0,0,0 1,2,3,4 1,3,5,7 Create a vector called “test” 0,0,0,0,2,4,6,8,10 using all of the commands c, rep, seq test<- c (rep(0,4), seq(2,10,by=2))

Vectors Select an element of your vector (x = 1,3,5,7): x[2]3 1,5 3,5,7 x[c(1,3)] x[2:4] Change an element of your vector (x = 1,3,5,7): x[1] <- 9 ; x 9,3,5,7

Matrices Dog <- c(1,4,6,8) Cat<- c(2,3,5,7) Animals<-cbind (Dog, Cat) DogCat vector matrix

Logical operators x<- 5; y<- 6 x > y x< y x==y x!=y True is the same as 1, false is the same as 0 false true false true 2 + (x>=y) 2 + (x<=y) 2323

Logical operators x<- c(1,2,3,4); y<- c(5,6,7,8) z = 7]; z Useful for quickly making subsets of your data! 3,4 x<- c(1,0.01,3,0.02) In this vector, change all values <1 to 0 x[x<1]<-0

Conditional operators x<- 5 ; z<-0 if (x>4) {z<-2}; z Could have a large program running in { } 2

Loops y<-0; x<-0 for (y in 1:20) {x<- x+ 0.5; print(x)} Useful for programming randomization procedures. Bootstrap example: y<-0; x<-1:50 output<-rep(0,1000) for (y in 1:1000) {output [y] <- var (sample (x, replace=T))} mean(output)

Writing programs I encourage you to use the script editor! File > New script Write your code Select the code you want to run (CTRL-A is all code) Run code (CTRL-R) File > Save as R script files are always *.R

Entering data 1. In Excel, give your data columns/rows and text data simple one word labels (e.g."treatment") 2. Format cells so < 8 digits per cell. 3. Save as "csv" file. 4. Use the following command to find and load your file: diane<-read.table(file.choose(),sep=“,”,header=TRUE) 5. Check it is there! diane Invent a dataframe name

Dataframes Dataframes are analogous to spreadsheets Best if all columns in your dataframe have the same length Missing values are coded as "NA" in R If you coded your missing values with a different label in your spreadsheet (e.g. "none") then: read.table (….., na.strings="none")

Dataframes Two ways to identify a column (called "treatment") in your dataframe (called "diane"): diane$treatment OR attach(diane); treatment At end of session, remember to: detach(diane)

Summary statistics length (x) mean (x) var (x) cor (x,y) sum (x) summary (x) minimum, maximum, mean, median, quartiles What is the correlation between two variables in your dataset?

Factors A factor has several discrete levels (e.g. control, herbicide) If a vector contains text, R automatically assumes it is a factor. To manually convert numeric vector to a factor: x <- as.factor(x) To check if your vector is a factor, and what the levels are: is.factor(x) ; levels(x)

1. Download R on your computer. Either go to and follow the download CRAN linkshttp:// or directly to 2. Instruction Manuals to R are found at main webpage: follow links to Documentation > Manuals I recommend "An Introduction to R" Homework

3. Write a short program that: Allows you to import the data from Lakedata_06.csv (posted on Make lake area into a factor called AreaFactor: Area 0 to 5 ha: small Area 5.1 to 10: medium Area > 10 ha: large

hints You will need to: 1. Tell R how long AreaFactor will be. 2. Assign cells in AreaFactor to each of the 3 levels 3. Make AreaFactor into a factor, then check that it is a factor