Basic statistic inference in R

Slides:



Advertisements
Similar presentations
Hypothesis Testing Steps in Hypothesis Testing:
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Linear regression models
Statistical Tests Karen H. Hagglund, M.S.
Chapter 13 Multiple Regression
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Chapter 12 Multiple Regression
Basic Statistical Review
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Final Review Session.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Student’s t statistic Use Test for equality of two means
Today Concepts underlying inferential statistics
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Nonparametrics and goodness of fit Petter Mostad
Regression and Correlation Methods Judy Zhong Ph.D.
1 Advances in Statistics Or, what you might find if you picked up a current issue of a Biological Journal.
Inference for regression - Simple linear regression
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Simple Linear Regression ANOVA for regression (10.2)
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10: Correlation and Regression Model.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Principles of statistical testing
Chapter 10 Inference for Regression
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
STATS 10x Revision CONTENT COVERED: CHAPTERS
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Nonparametric Statistics
Interpretation of Common Statistical Tests Mary Burke, PhD, RN, CNE.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Appendix I A Refresher on some Statistical Terms and Tests.
Stats Methods at IC Lecture 3: Regression.
CHAPTER 12 More About Regression
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
SAME THING?.
Correlation and Simple Linear Regression
When You See (This), You Think (That)
CHAPTER 12 More About Regression
Simple Linear Regression and Correlation
CHAPTER 12 More About Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Basic statistic inference in R Shai Meiri

“We expect to find differences between x and y” is a trivial saying Everything differs!!! “We expect to find differences between x and y” is a trivial saying The statistician within you asks “Are the differences we found are larger that expected by chance?” The biologist within you asks “Why the differences I found are in the direction and the level they are?”

Moments of central tendency Mean Arithmetic mean: Σxi/n Geometric mean: (x1*x2*…*xn)1/n Harmonic mean:

Moments of central tendency in R 1. Arithmetic mean: Σxi/n Example: Use the function “mean” data<-c(2,3,4,5,6,7,8) mean(data) [1] 5 2. Geometric mean: (x1*x2*…*xn)1/n You can also use the .csv file : Example: dat<-read.csv("island_type_final2.csv") Attach(dat) mean(lat) [1] 17.40439 data<-c(2,3,4,5,6,7,8) exp(mean(log(data))) [1] 4.549163

Moments of central tendency A. mean B. Median C. Mode General example: data<-c(2,3,4,5,6,7,8) median(data) [1] 5 Example from the .csv: median(mass) [1] 0.69

Moments of central tendency http://www.statmethods.net/management/functions.html Mean Variance = Σ(xi-μ)2 / n Is the mean is a good measurement to what is happening in the population when the variance is low? data<-c(2,3,4,5,6,7,8) var(data) [1] 4.666667 Var(lat) [1] 89.20388 Example:

Moments of central tendency Mean Variance The second moment of central tendency is the measurement of how much the data is scattered around the first moment (mean) An example for the second moment are the variance, the standard variation, standard error, coefficient of variation and the confidence interval of 90%, 95% and 99% from something

Moments of central tendency #for: data<-c(2,3,4,5,6,7,8) Sample size: length(data) Variance: var(data) Standard deviation: sd(data) se<-(sd(data)/length(data)^0.5) se [1] 0.8164966 Standard error: CV<-sd(data)/mean(data) CV [1] 0.4320494 coefficient of variation:

Moments of central tendency Mean Variance Skew Skewed distribution of frequencies is not symmetric Do you think that the arithmetic mean is a good measurement of central tendency for a skewed frequency distribution What is the mean salary of the student here and of Bill Gates?

Moments of central tendency Skew skew<-function(data){ m3<-sum((data-mean(data))^3)/length(data) s3<-sqrt(var(data))^3 m3/s3} skew(data) The SE of skewness: sdskew<-function(x) sqrt(6/length(x))

Moments of central tendency Mean Variance Skew Kurtosis

Moments of central tendency Kurtosis kurtosis<-function(x){ m4<-sum((x-mean(x))^4)/length(x) s4<-var(x)^2 m4/s4-3 } kurtosis(x) sdkurtosis<-function(x) sqrt(24/length(x)) SE of kurtosis:

A normal distribution can get a value of mean and variance but its skewness and the kurtosis should equal to zero Values of skew and kurtosis have their own variance – and zero should be outside of their confidence interval in order for them to be significantly different from zero

Residuals When doing statistics we’re creating models of the reality One of the most simple models is the mean: The mean height of Israeli citizens is 173 cm The mean salary is 9271 ₪ (correct for April 2014) The mean service in IDF is 24 months (I guess) 46,699 ₪ for a month (excluding the bottles) Rab. Dov Lior Served in IDF for 1 month m2.06 http://www.haaretz.co.il/1.2057452

Residuals When doing statistics we’re creating models of the reality We can see here that our models: 24 months, 9271 ₪ and 173 cm are not very successful The Residual Is how much a certain value is far from the prediction of the model. Omri Caspi is far away in 32 cm from the model “Israeli = 173” and in 29 cm from the more complicated model: “Israeli man = 177, Israeli women = 168” Residual = ₪ 37428 Residual = -23 month IDF service Residual = 33 cm

Residuals When doing statistics we’re creating models of the reality dat<-read.csv("island_type_final2.csv") model<-lm(mass~iso+area+age+lat, data=dat) out<-model$residuals out write.table(out, file = "residuals.txt",sep="\t",col.names=F,row.names=F) #note that residual values are in the order entered (i.e., not alphabetic, not by residual size – first in, first out) Residual = ₪ 37428 Residual = 33 cm Residual = -23 month service

Theoretical statistics and statistical inference When we have data it is best that we first describe them: plot graphs, calculate the mean and so on In statistical inference we are testing the behavior of our data compared to a certain hypothesis We can present our hypothesis as a statistical model For Example: The distribution of the heights is normal Number of species increases with area Number of species increases with area with a power function of 0.25

Frequency distribution* How many observations are in each bin? dat<-read.csv("island_type_final2.csv") attach(dat) names(dat) Hist(mass) Describes the distribution of all observations *graphic form = “histogram”

Frequency distribution What did we learn? dat<-read.csv("island_type_final2.csv") attach(dat) Hist(mass) There are no mass smaller than one tenth of a gram or larger than 100 kg Lizard with mass between 1 and 10 are very common – larger or smaller lizards are rare The distribution is unimodal and skewed to the right

Frequency distribution Histograms don’t have to be so ugly dat<-read.csv("island_type_final2.csv") attach(dat) hist(mass, col="purple",breaks=25,xlab="log mass (g)",main="masses of island lizards - great data by Maria",cex.axis=1.2,cex.lab=1.5)

Presenting a categorical predictor with a continuous response variable dat<-read.csv("island_type_final2.csv") attach(dat) plot(type,brood) Always prefer boxplot to barplot

Presenting a continuous variable against another continuous variable dat<-read.csv("island_type_final2.csv") attach(dat) plot(mass,clutch) plot(mass,clutch,pch=16, col=“blue”)

Which test should we choose? It changes according to the nature of our response variable (=y variable), and mostly according to the nature of our predictor variables If the response variable is “success or failure” and the null hypothesis is equality of both we’ll use a binomial test If the response variable is counts we’ll usually use chi-square or G In many cases our response variable will be continuous (14 species, 78 individuals, 54 heartbeats per second, 7.3 eggs, 23 degrees)

Which test should we choose? What is your response variable ? Continuous (14 species, 78 individuals, 23 degrees, 7.3 eggs) Counts (frequency: 6 females, 4 males) Success or failure (found the cheese/idiot) Chi-square or G (=log-likelihood) Binomial Soon…

Binomial test in R You need to define the number of successes from the whole sample size. For example: 19 out of 34 is not significant 19 out of 20 is significant binom.test(19,34) Exact binomial test data: 19 and 34 number of successes = 19, number of trials = 34 p-value = 0.6076 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.3788576 0.7281498 sample estimates: probability of success 0.5588235 binom.test(19,20) Exact binomial test data: 19 and 20 number of successes = 19, number of trials = 20, p-value = 4.005e-05 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.7512672 0.9987349 sample estimates: probability of success 0.95

Chi-square test in R chisq.test Data: lizard insularity & diet: chisq.test habitat diet species# island carnivore 488 herbivore 43 omnivore 177 mainland 1901 101 269 M<-as.table(rbind(c(1901,101,269),c(488,43,177))) chisq.test(M) data: M χ2 = 80.04, df = 2, p-value < 2.2e-16

Chi-square test in R chisq.test χ2 = 17.568, df = 4, p-value = 0.0015 Now lets use our dataset: chisq.test dat<-read.csv("island_type_final2.csv") install.packages("reshape") library(reshape) cast(dat, type ~ what, length) type anoles else gecko Continental 7 45 Land_bridge 1 30 14 Oceanic 23 110 44 M<-as.table(rbind(c(7,45,45),c(1,30,14),c(23,110,44))) chisq.test(M) data: M χ2 = 17.568, df = 4, p-value = 0.0015

Which test should we choose? If our response variable is continuous then we’ll choose our test based on the predictor variables If our predictor variable is categorical (Area 1, Area 2, Area 3 or species A, species B, species C) We’ll use ANOVA If our predictor variable is continuous (temperature, body mass, height) We’ll use REGRESSION

t-test in R t.test(x,y) dimorphism<-read.csv("ssd.csv",header=T) Sex size female 79.7 male 85 120 133.0 118 126.0 105.8 112 106 121.0 95 111.0 86 93.0 65 75.0 230 240.0 t.test(x,y) dimorphism<-read.csv("ssd.csv",header=T) attach(dimorphism) names(dimorphism) males<-size[Sex=="male"] females<-size[Sex=="female"] t.test(females,males) Welch Two Sample t-test data: females and males t = -2.1541, df = 6866.57, p-value = 0.03127 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -7.5095545 -0.3536548 sample estimates: mean of x mean of y 88.17030 92.10191

t-test in R (2) lm(x~y) Estimate standard error t p value (Intercept) Sex size female 79.7 male 85 120 133.0 118 126.0 105.8 112 106 121.0 95 111.0 86 93.0 65 75.0 230 240.0 lm(x~y) dimorphism<-read.csv("ssd.csv",header=T) attach(dimorphism) names(dimorphism) model<-lm(size~Sex,data=dimorphism) summary(model) Estimate standard error t p value (Intercept) 88.17 1.291 68.32 <2e-16 *** Sexmale 3.932 1.825 2.154 0.031 *

Paired t-test in R t.test(x,y,paired=TRUE) female male 88.17 92.10 Species size Sex Xenagama_zonura 79.7 female 85 male Xenosaurus_grandis 120 133.0 Xenosaurus_newmanorum 118 126.0 Xenosaurus_penai 105.8 112 Xenosaurus_platyceps 106 121.0 Xenosaurus_rectocollaris 95 111.0 Zonosaurus_anelanelany 86 93.0 Zootoca_vivipara 65 75.0 Zygaspis_nigra 230 240.0 Zygaspis_quadrifrons 195 227.0 t.test(x,y,paired=TRUE) dimorphism<-read.csv("ssd.csv",header=T) attach(dimorphism) names(dimorphism) males<-size[Sex=="male"] females<-size[Sex=="female"] t.test(females,males, paired=TRUE) Paired t-test data: females and males t = -10.192, df = 3503, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -4.688 -3.175 sample estimates: mean of the differences -3.931 female male 88.17 92.10 tapply(size,Sex,mean)

ANOVA in R aov model<-aov(x~y) Df Sum sq Mean sq F value Pr(>F) species type clutch Trachylepis_sechellensis Continental 0.6 Trachylepis_wrightii 0.65 Tropidoscincus_boreus 0.4 Tropidoscincus_variabilis 0.45 Urocotyledon_inexpectata 0.3 Varanus_beccarii 0.58 Algyroides_fitzingeri Land_bridge Anolis_wattsi Archaeolacerta_bedriagae Cnemaspis_affinis Cnemaspis_limi 0.18 Cnemaspis_monachorum Amblyrhynchus_cristatus Oceanic 0.35 Ameiva_erythrocephala Ameiva_fuscata Ameiva_plei 0.41 Anolis_acutus Anolis_aeneus Anolis_agassizi Anolis_bimaculatus Anolis_bonairensis ANOVA in R aov model<-aov(x~y) island<-read.csv("island_type_final2.csv",header=T) names(island) [1] "species" "what" "family" "insular" "Archipelago" "largest_island" [7] "area" "type" "age" "iso" "lat" "mass" [13] "clutch" "brood" "hatchling" "productivity“ model<-aov(clutch~type,data=island) summary(model) Df Sum sq Mean sq F value Pr(>F) type 2 0.466 0.23296 2.784 0.0635 . Residuals 289 24.184 0.08368

Post-hoc test for ANOVA in R TukeyHSD(model) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = clutch ~ type, data = island) $type   diff lwr upr p adj Land_bridge-Continental 0.124 -0.0025 0.2505 0.0561 Oceanic-Continental 0.0218 -0.0671 0.1108 0.8318 Oceanic-Land_bridge -0.102 -0.2206 0.0163 0.1066 The difference is not significant. Notice that zero is always in the confidence interval. The difference between Land bridge islands and Continental islands is very close to significance (p = 0.056)

correlation in R cor.test(x,y) mass 5 1.21 0.83 4 1.84 18 1.39 0.42 0.29 20 0.45 1.54 0.36 0.27 0.04 0.01 21 0.95 0.51 22 0.74 0.92 island<-read.csv("island_type_final2.csv",header=T) names(island) [1] "species" "what" "family" "insular" "Archipelago" "largest_island" [7] "area" "type" "age" "iso" "lat" "mass" [13] "clutch" "brood" "hatchling" "productivity“ attach(island) cor.test(mass,lat) Pearson's product-moment correlation data: mass and lat t = -1.138, df = 317, p-value = 0.256 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.17239 0.04635 sample estimates: cor -0.06378 The variable “cor” is the correlation coefficient r

Same data as in the previous example regression in R Same data as in the previous example lm (=“linear model”): lm (y~x) model<-lm(mass~lat,data=island) summary(model) Call: lm(formula = mass ~ lat, data = island) Residuals: Min 1Q Median 3Q Max -4.708 -1.774 0.470 1.465 3.725 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.958034 0.096444 9.934 <2e-16 *** lat -0.00554 0.004872 -1.138 0.256 Residual standard error: 0.8206 on 317 degrees of freedom Multiple R-squared: 0.004069, Adjusted R-squared: 0.0009268 F-statistic: 1.295 on 1 and 317 DF, p-value: 0.256

lm vs. aov We can also use ‘lm’ with data that fits ANOVA In this case we’ll receive all the data that ‘summary’ gives for ‘lm’ function for regression including parameter estimates, SE, difference between factors and p-values for contrasts between categories of our predictor variable

aov vs. lm aov results lm results More later on We can use ‘lm’ also on data that fits ANOVA In this case we’ll receive all the data that ‘summary’ gives for ‘lm’ function for regression including parameter estimates, SE, difference between factors and p-values for contrasts between category pairs of our predictor variable island<-read.csv("island_type_final2.csv",header=T) model<-aov(clutch~type,data=island) model2<-lm(clutch~type,data=island) summary(model) summary(model2) Df Sum sq Mean sq F value Pr(>F) type 2 0.466 0.23296 2.784 0.0635 . Residuals 289 24.184 0.08368 aov results Estimate Std. Error t value Pr(>|t|)   (Intercept) 0.33149 0.02984 11.11 <2e-16 *** typeLand_bridge 0.12399 0.05369 2.309 0.0216 * typeOceanic 0.02184 0.03777 0.578 0.5635 lm results Residual standard error: 0.2893 on 289 degrees of freedom (27 observations deleted due to missingness) Multiple R-squared: 0.0189, Adjusted R-squared: 0.01211 F-statistic: 2.784 on 2 and 289 DF, p-value: 0.06346 More later on

Assumptions of statistical tests (all statistical tests) A non-random, non-independent sample of Israeli people Random sampling (assumption of all tests not only parametric) Independence (spatial, phylogenetic etc.)

Assumptions of parametric test. A. ANOVA In addition to the assumptions of all tests Homoscedasticity Normal distribution of the residuals "Comments on earlier drafts of this manuscript made it clear that for many readers who analyze data but who are not particularly interested in statistical questions, any discussion of statistical methods becomes uncomfortable when the term ‘‘error variance’’ is introduced.“ Smith, R. J. 2009. Use and misuse of the reduced major axis for line-fitting. American Journal of Physical Anthropology 140: 476-486. Richard Smith & 3 friends Reading material: Sokal & Rohlf 1995. Biometry. 3rd edition. Pages 392-409 (especially 406-407 for normality)

Always look at your data Don’t just rely on the statistics! Anscombe's quartet Summary statistics are the same for all four data sets: n = 11 means of x & y (9, 7.5), standard deviation (4.12) regression & residual SS R2 = (0.816) regression line (y = 3 + 0.5x) Anscombe 1973. Graphs in statistical analysis. The American Statistician 27: 17–21. http://en.wikipedia.org/wiki/Anscombe%27s_quartet

Assumptions of parametric tests. B. Regression 1. Homoscedasticity Smith, R. J. 2009. Use and misuse of the reduced major axis for line-fitting. American Journal of Physical Anthropology 140: 476-486.

Assumptions of parametric tests. B. Regression Homoscedasticity The explanatory variable was sampled without error Smith, R. J. 2009. Use and misuse of the reduced major axis for line-fitting. American Journal of Physical Anthropology 140: 476-486.

Assumptions of parametric tests. B. Regression Homoscedasticity The explanatory variable was sampled without error Normal distribution of the residuals of each response variable Smith, R. J. 2009. Use and misuse of the reduced major axis for line-fitting. American Journal of Physical Anthropology 140: 476-486.

Assumptions of parametric tests. B. Regression Homoscedasticity The explanatory variable was sampled without error Normal distribution of the residuals of each response variable Equality of variance between the values of the explanatory variables Smith, R. J. 2009. Use and misuse of the reduced major axis for line-fitting. American Journal of Physical Anthropology 140: 476-486.

Assumptions of parametric tests. B. Regression Homoscedasticity The explanatory variable was sampled without error Normal distribution of the residuals of each response variable Equality of variance between the values of the explanatory variables Linear relationship between the response and the predictor Smith, R. J. 2009. Use and misuse of the reduced major axis for line-fitting. American Journal of Physical Anthropology 140: 476-486.

How will we test if our model follows the assumptions? R has a very useful model diagnostic functions which allows us to evaluate in a graphical matter how much our model follows the model assumption (especially in regression) https://www.youtube.com/watch?v=eTZ4VUZHzxw ראו גם: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/plot.lm.html

What can we do when our data doesn’t follow the assumptions? We can ignore it and hope that our test is robust enough to break the assumptions: this is not as unreasonable as it sounds Use non-parametric tests Use generalized linear models (glm); which means: Transformation (in glm it means changing the link functions) Change error distribution in glm) to non-normal distribution) Use non-linear tests Use randomization (more about it in Roi’s lessons)

I think it is really wrong to have a presentation without any animal pictures in it Non-parametric test Non-parametric test do not assume equality of variance or normal distribution. They are based on Ranks Disadvantages: There are no test for models with multiple predictors Many times their statistical power is very low compared to a equivalent parametric test They do not give you parameter estimation (slopes, intercepts)

נראה לי ממש לא בסדר שבמצגת שלמה אין לי תמונות של חיות Non-parametric tests Non-parametric test do not assume equality of variance or normal distribution. They are based on Ranks חסרונות: לא קיימים מבחנים למודלים מרובי predictors לעיתים קרובות ה-statistical power שלהם נמוך משל מבחן פרמטרי מקביל לא מאפשרים הערכת פרמטרים (שיפועים ונקודות חיתוך)

The photographed is not related to the lectures A few useful non-parametric tests Orycteropus afer The photographed is not related to the lectures Chi-square test is a non-parametric test Kolmogorov-Smirnov is a non-parametric test used to compare to frequency distributions (or to compare “our” distribution to a known distribution. For example: a normal distribution Mann-Whitney U = Wilcoxon rank sum Is a non-parametric test equivalent to students t-test Wilcoxon two-sample (=Wilcoxon signed-rank) test replaces paired-t-test Kruskal-Wallis replaces one-way ANOVA Spearman test Kendall’s-tau test replaces correlation tests

Non-parametric tests in R Kolmogorov-Smirnov is a non-parametric test used to compare to frequency distributions (or to compare “our” distribution to a known distribution. For example: a normal distribution Orycteropus afer The photographed is not related to the lectures We need to define in R the grouping variable and the response: lets say we want to compare between the frequency distribution of lizard body mass on oceanic and land bridge islands island<-read.csv("island_type_final2.csv",header=T) attach(island) levels(type) [1] "Continental" "Land_bridge" "Oceanic“ Land_bridge<-mass[type=="Land_bridge"] Oceanic <-mass[type==" Oceanic"] ks.test(Land_bridge, Oceanic) Two-sample Kolmogorov-Smirnov test  data: Land_bridge and Oceanic D = 0.1955, p-value = 0.1288 alternative hypothesis: two-sided