Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to R, Statistics, and the grammar of graphics Thomas INGICCO E. Delacroix, Dante et Virgile aux Enfers E. Delacroix, Dante and Virgile in.

Similar presentations


Presentation on theme: "Introduction to R, Statistics, and the grammar of graphics Thomas INGICCO E. Delacroix, Dante et Virgile aux Enfers E. Delacroix, Dante and Virgile in."— Presentation transcript:

1 Introduction to R, Statistics, and the grammar of graphics Thomas INGICCO E. Delacroix, Dante et Virgile aux Enfers E. Delacroix, Dante and Virgile in Hell

2 Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # Array ee <- array(1:4, dim=c(2, 3, 2)) ee <- array(1:4, c(2, 3, 2))

3 Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # Array ee <- array(1:4, dim=c(2, 3, 2)) ee <- array(1:4, c(2, 3, 2)) Data3d1 <- matrix(c(0.72,100.32,0.75,100.36,0.77,100.32,0.81,100.32,0.77,100.29,0.77,100.2 4,0.73,100.28,0.7,100.26,0.7,100.3,0.67,100.33), 10, 2, byrow=T)

4 Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # Array ee <- array(1:4, dim=c(2, 3, 2)) ee <- array(1:4, c(2, 3, 2)) Data3d1 <- matrix(c(0.72,100.32,0.75,100.36,0.77,100.32,0.81,100.32,0.77,100.29,0.77,100.2 4,0.73,100.28,0.7,100.26,0.7,100.3,0.67,100.33), 10, 2, byrow=T) colnames(Data3d)<-c("x", "y") rownames(Data3d)<-paste("Lan", 1:10, sep="")

5 Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # Array ee <- array(1:4, dim=c(2, 3, 2)) ee <- array(1:4, c(2, 3, 2)) Data3d1 <- matrix(c(0.72,100.32,0.75,100.36,0.77,100.32,0.81,100.32,0.77,100.29,0.77,100.2 4,0.73,100.28,0.7,100.26,0.7,100.3,0.67,100.33), 10, 2, byrow=T) colnames(Data3d)<-c("x", "y") rownames(Data3d)<-paste("Lan", 1:10, sep="") t(Data3d) Data3d2 <- Data3d1

6 Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # Array ee <- array(1:4, dim=c(2, 3, 2)) ee <- array(1:4, c(2, 3, 2)) Data3d1 <- matrix(c(0.72,100.32,0.75,100.36,0.77,100.32,0.81,100.32,0.77,100.29,0.77,100.2 4,0.73,100.28,0.7,100.26,0.7,100.3,0.67,100.33), 10, 2, byrow=T) colnames(Data3d)<-c("x", "y") rownames(Data3d)<-paste("Lan", 1:10, sep="") t(Data3d) Data3d2 <- Data3d1 array(cbind(Data3d1, Data3d2), dim=c(10, 2, 2))

7 Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # List ff <- list(aa, bb, cc, dd)

8 Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: ## Table hh <- table(gg) hh <- table(gg, dd[1:6,11])

9 Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: hhh <- data.frame(gg, dd[1:6,11]) colnames(hhh) <- c("gg","Lip") # Rename the columns hhhh <- table(hhh) data.frame(gg, na.omit(dd[1:6,11])) # Function na.omit data.frame(gg, na.omit(dd[1:7,11])) dim(hhhh) # Number of lines and columns dimnames(hhhh)

10 Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: margin.table(hhhh) # Calculate the margins margin.table(hhhh, 1) margin.table(hhhh, 2) hhhh[3,] <- c(1000,2000) # Replace line 3 cbind(hhhh,hhh) # Concatenate the columns of two tables t(hhhh) # Transposition

11 Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # Factor gg <- rep(c("Everted", "Round", "Flat"), c(1,2,3)) is.vector(gg) is.character(gg) gg1 <- factor(gg)

12 Individuals Variables 1…j…p 1x 11 …x 1j …x 1p ………… ix i1 …x ij …x ip ………… nx n1 …x nj …x np FAMGENSPIDUNWLNWMTWATWANWMDWADWTLNHAGE GibbonsHylobatesH.sp1880_1167_D7.117.7410.999.268.169.429.59188,3110.37A GibbonsHylobatesH.sp1880_1167_G6.128.5311.39.298.549.59.42187,510.13A GibbonsHylobatesH.sp1880_1170_D6.189.7210.818.917.698.058.78177,248.94A GibbonsHylobatesH.sp1880_1170_G6.4410.0910.688.969.078.058.69177,599.29A GibbonsHylobatesH.sp1901_102_D6.3111.6915.1911.799.2611.8311.6206,6911.49A GibbonsHylobatesH.sp1901_102_G7.1411.1314.9311.689.0611.7611.3205,3211.49A

13 Continuous quantitative variable Length of dog calcaneum {67.0 54.7 7.0 48.5 14.0 17.2 20.7 13.0 43.4 40.2 38.9 54.5 59.8 48.3 22.9 11.5 34.4 35.1 38.7 30.8 30.6 43.1 56.8 40.8 41.8 42.5 31.0 31.7 30.2 25.9 49.2 37.0 35.915.0 30.2 7.2 36.2 45.5 7.8 33.4 36.1 40.2 42.7 42.5 16.2 39.0 35.0 37.0 31.4 37.6 39.9 36.2 42.8 46.424.7 49.1 46.0 35.9 7.8 48.2 15.2 32.5 44.7 42.6 38.8 17.4 40.8 29.1 14.6 59.2} Discrete quantitative variable Number of flakes per context {1 0 3 3 0 0 1 1 0 0 1 1 0 2 2 1 0 1 0 0 1 3 0 0 0 2 0 2 5 0 0 0 0 1 1 0 0 0 1 0 0 1 4 0 2 2 1 2 2 2 1 1 0 2 0 0 1 0 4 2 0 0 2 3 1 1 1 0 0 1 0 0 2 0 0 0 2 2 0 0 1 0 2 2 0 1 0 3 3 0 2 0 2 2 3 0 3 1 0 0} Qualitative variable Colour of the pot {black, red, black, red, brown, brown, black, grey, red, black} Different kind of data

14

15

16 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We add all the measures And we divide by the number of measurements

17 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We add all the measures And we divide by the number of measurements

18 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We add all the measures And we divide by the number of measurements sum(DATA[1:49,6]) / length(DATA[1:49,6])

19 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We add all the measures And we divide by the number of measurements sum(DATA[1:49,6]) / length(DATA[1:49,6]) mean(DATA[1:49,6])

20 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We add all the measures And we divide by the number of measurements sum(DATA[1:49,6]) / length(DATA[1:49,6]) mean(DATA[1:49,6]) colMeans(DATA[1:49,6:11])

21 Descriptive and inferential statistics Example: You are told that you have a serious illness for which the mean survival period is six months… Statistics interest you ! Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation

22 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The Mode is the most frequent value Sample > Median = Median < sample

23 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The Mode is the most frequent value Sample > Median = Median < sample median(DATA[1:49,6])

24 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The Mode is the most frequent value Sample > Median = Median < sample median(DATA[1:49,6]) quantile(DATA[1:49,6])

25 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The Mode is the most frequent value Sample > Median = Median < sample median(DATA[1:49,6]) quantile(DATA[1:49,6]) min(DATA[1:49,6]) max(DATA[1:49,6])

26 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The Mode is the most frequent value Sample > Median = Median < sample median(DATA[1:49,6]) quantile(DATA[1:49,6]) min(DATA[1:49,6]) max(DATA[1:49,6]) range(DATA[1:49,6])

27 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The Mode is the most frequent value Sample > Median = Median < sample

28 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We calculate the difference between every value and the mean We square this difference We sum the squared differences And we divide by the number of value

29 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We calculate the difference between every value and the mean We square this difference We sum the squared differences And we divide by the number of value

30 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We calculate the difference between every value and the mean We square this difference We sum the squared differences And we divide by the number of value

31 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We calculate the difference between every value and the mean We square this difference We sum the squared differences And we divide by the number of value

32 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We calculate the difference between every value and the mean We square this difference We sum the squared differences And we divide by the number of value The variance is the mean of the squared differences to the mean

33 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The standard deviation is the square root of the variance

34 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation

35 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Transform the standard deviation into the metrics of the variable It permits to compare two variables Problem: when X is close to zero, it becomes useless

36 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation To measure the difference to the mean in the standard deviation metrics, we use: This is the centered- reduced variable of mean=0 and variance=1

37 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation

38 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Covariance measures the degree of dependance of two variables: Are the values of each measurement drift independantly away from the centre of gravity, or are they drifting away together? If x and y are independant, then the covariance is equal to 0

39 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Covariance measures the degree of dependance of two variables: Are the values of each measurement drift independantly away from the centre of gravity, or are they drifting away together? If x and y are independant, then the covariance is equal to 0 We multiply the x-deviation to the mean to its associated y-deviation We sum these products We divide by the number of values

40 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Covariance measures the degree of dependance of two variables: Are the values of each measurement drift independantly away from the centre of gravity, or are they drifting away together? If x and y are independant, then the covariance is equal to 0 We multiply the x-deviation to the mean to its associated y-deviation We sum these products We divide by the number of values

41 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Covariance measures the degree of dependance of two variables: Are the values of each measurement drift independantly away from the centre of gravity, or are they drifting away together? If x and y are independant, then the covariance is equal to 0 We multiply the x-deviation to the mean to its associated y-deviation We sum these products We divide by the number of values So covariance is the sum of the crossed products

42 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation So covariance is the sum of the crossed products ( sum(DATA[1:49,6] * DATA[1:49,7]) - prod(sum(DATA[1:49,6]),sum(DATA[1:49,7])) / length(DATA[1:49,6]) ) / ( length(DATA[1:49,6])-1 )

43 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation So covariance is the sum of the crossed products ( sum(DATA[1:49,6] * DATA[1:49,7]) - prod(sum(DATA[1:49,6]),sum(DATA[1:49,7])) / length(DATA[1:49,6]) ) / ( length(DATA[1:49,6])-1 ) Cov(DATA[1:49,6],DATA[1:49,7])

44 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Pearson’s correlation coefficient differs from the covariance by its absence of unit and its boundaries between -1 and 1

45 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Pearson’s correlation coefficient differs from the covariance by its absence of unit and its boundaries between -1 and 1 cov(DATA[1:49,6],DATA[1:49,7]) / (sd(DATA[1:49,6]) * sd(DATA[1:49,7]))

46 Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Pearson’s correlation coefficient differs from the covariance by its absence of unit and its boundaries between -1 and 1 cov(DATA[1:49,6],DATA[1:49,7]) / (sd(DATA[1:49,6]) * sd(DATA[1:49,7])) Cor(DATA[1:49,6],DATA[1:49,7])


Download ppt "Introduction to R, Statistics, and the grammar of graphics Thomas INGICCO E. Delacroix, Dante et Virgile aux Enfers E. Delacroix, Dante and Virgile in."

Similar presentations


Ads by Google