Download presentation
Presentation is loading. Please wait.
Published byNicholas Bruce Modified over 9 years ago
1
Introduction to R, Statistics, and the grammar of graphics Thomas INGICCO E. Delacroix, Dante et Virgile aux Enfers E. Delacroix, Dante and Virgile in Hell
2
Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # Array ee <- array(1:4, dim=c(2, 3, 2)) ee <- array(1:4, c(2, 3, 2))
3
Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # Array ee <- array(1:4, dim=c(2, 3, 2)) ee <- array(1:4, c(2, 3, 2)) Data3d1 <- matrix(c(0.72,100.32,0.75,100.36,0.77,100.32,0.81,100.32,0.77,100.29,0.77,100.2 4,0.73,100.28,0.7,100.26,0.7,100.3,0.67,100.33), 10, 2, byrow=T)
4
Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # Array ee <- array(1:4, dim=c(2, 3, 2)) ee <- array(1:4, c(2, 3, 2)) Data3d1 <- matrix(c(0.72,100.32,0.75,100.36,0.77,100.32,0.81,100.32,0.77,100.29,0.77,100.2 4,0.73,100.28,0.7,100.26,0.7,100.3,0.67,100.33), 10, 2, byrow=T) colnames(Data3d)<-c("x", "y") rownames(Data3d)<-paste("Lan", 1:10, sep="")
5
Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # Array ee <- array(1:4, dim=c(2, 3, 2)) ee <- array(1:4, c(2, 3, 2)) Data3d1 <- matrix(c(0.72,100.32,0.75,100.36,0.77,100.32,0.81,100.32,0.77,100.29,0.77,100.2 4,0.73,100.28,0.7,100.26,0.7,100.3,0.67,100.33), 10, 2, byrow=T) colnames(Data3d)<-c("x", "y") rownames(Data3d)<-paste("Lan", 1:10, sep="") t(Data3d) Data3d2 <- Data3d1
6
Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # Array ee <- array(1:4, dim=c(2, 3, 2)) ee <- array(1:4, c(2, 3, 2)) Data3d1 <- matrix(c(0.72,100.32,0.75,100.36,0.77,100.32,0.81,100.32,0.77,100.29,0.77,100.2 4,0.73,100.28,0.7,100.26,0.7,100.3,0.67,100.33), 10, 2, byrow=T) colnames(Data3d)<-c("x", "y") rownames(Data3d)<-paste("Lan", 1:10, sep="") t(Data3d) Data3d2 <- Data3d1 array(cbind(Data3d1, Data3d2), dim=c(10, 2, 2))
7
Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # List ff <- list(aa, bb, cc, dd)
8
Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: ## Table hh <- table(gg) hh <- table(gg, dd[1:6,11])
9
Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: hhh <- data.frame(gg, dd[1:6,11]) colnames(hhh) <- c("gg","Lip") # Rename the columns hhhh <- table(hhh) data.frame(gg, na.omit(dd[1:6,11])) # Function na.omit data.frame(gg, na.omit(dd[1:7,11])) dim(hhhh) # Number of lines and columns dimnames(hhhh)
10
Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: margin.table(hhhh) # Calculate the margins margin.table(hhhh, 1) margin.table(hhhh, 2) hhhh[3,] <- c(1000,2000) # Replace line 3 cbind(hhhh,hhh) # Concatenate the columns of two tables t(hhhh) # Transposition
11
Classes – how you present your data - Vector – series of values of 1 dimension -Matrix – series of values of 2 dimensions -Arrays – series of values of n dimensions -Data Frame – series of values in columns -List – series of objects -Table – Contingency table … but before all it is a language with its own grammar made of: # Factor gg <- rep(c("Everted", "Round", "Flat"), c(1,2,3)) is.vector(gg) is.character(gg) gg1 <- factor(gg)
12
Individuals Variables 1…j…p 1x 11 …x 1j …x 1p ………… ix i1 …x ij …x ip ………… nx n1 …x nj …x np FAMGENSPIDUNWLNWMTWATWANWMDWADWTLNHAGE GibbonsHylobatesH.sp1880_1167_D7.117.7410.999.268.169.429.59188,3110.37A GibbonsHylobatesH.sp1880_1167_G6.128.5311.39.298.549.59.42187,510.13A GibbonsHylobatesH.sp1880_1170_D6.189.7210.818.917.698.058.78177,248.94A GibbonsHylobatesH.sp1880_1170_G6.4410.0910.688.969.078.058.69177,599.29A GibbonsHylobatesH.sp1901_102_D6.3111.6915.1911.799.2611.8311.6206,6911.49A GibbonsHylobatesH.sp1901_102_G7.1411.1314.9311.689.0611.7611.3205,3211.49A
13
Continuous quantitative variable Length of dog calcaneum {67.0 54.7 7.0 48.5 14.0 17.2 20.7 13.0 43.4 40.2 38.9 54.5 59.8 48.3 22.9 11.5 34.4 35.1 38.7 30.8 30.6 43.1 56.8 40.8 41.8 42.5 31.0 31.7 30.2 25.9 49.2 37.0 35.915.0 30.2 7.2 36.2 45.5 7.8 33.4 36.1 40.2 42.7 42.5 16.2 39.0 35.0 37.0 31.4 37.6 39.9 36.2 42.8 46.424.7 49.1 46.0 35.9 7.8 48.2 15.2 32.5 44.7 42.6 38.8 17.4 40.8 29.1 14.6 59.2} Discrete quantitative variable Number of flakes per context {1 0 3 3 0 0 1 1 0 0 1 1 0 2 2 1 0 1 0 0 1 3 0 0 0 2 0 2 5 0 0 0 0 1 1 0 0 0 1 0 0 1 4 0 2 2 1 2 2 2 1 1 0 2 0 0 1 0 4 2 0 0 2 3 1 1 1 0 0 1 0 0 2 0 0 0 2 2 0 0 1 0 2 2 0 1 0 3 3 0 2 0 2 2 3 0 3 1 0 0} Qualitative variable Colour of the pot {black, red, black, red, brown, brown, black, grey, red, black} Different kind of data
16
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We add all the measures And we divide by the number of measurements
17
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We add all the measures And we divide by the number of measurements
18
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We add all the measures And we divide by the number of measurements sum(DATA[1:49,6]) / length(DATA[1:49,6])
19
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We add all the measures And we divide by the number of measurements sum(DATA[1:49,6]) / length(DATA[1:49,6]) mean(DATA[1:49,6])
20
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We add all the measures And we divide by the number of measurements sum(DATA[1:49,6]) / length(DATA[1:49,6]) mean(DATA[1:49,6]) colMeans(DATA[1:49,6:11])
21
Descriptive and inferential statistics Example: You are told that you have a serious illness for which the mean survival period is six months… Statistics interest you ! Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation
22
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The Mode is the most frequent value Sample > Median = Median < sample
23
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The Mode is the most frequent value Sample > Median = Median < sample median(DATA[1:49,6])
24
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The Mode is the most frequent value Sample > Median = Median < sample median(DATA[1:49,6]) quantile(DATA[1:49,6])
25
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The Mode is the most frequent value Sample > Median = Median < sample median(DATA[1:49,6]) quantile(DATA[1:49,6]) min(DATA[1:49,6]) max(DATA[1:49,6])
26
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The Mode is the most frequent value Sample > Median = Median < sample median(DATA[1:49,6]) quantile(DATA[1:49,6]) min(DATA[1:49,6]) max(DATA[1:49,6]) range(DATA[1:49,6])
27
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The Mode is the most frequent value Sample > Median = Median < sample
28
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We calculate the difference between every value and the mean We square this difference We sum the squared differences And we divide by the number of value
29
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We calculate the difference between every value and the mean We square this difference We sum the squared differences And we divide by the number of value
30
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We calculate the difference between every value and the mean We square this difference We sum the squared differences And we divide by the number of value
31
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We calculate the difference between every value and the mean We square this difference We sum the squared differences And we divide by the number of value
32
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation We calculate the difference between every value and the mean We square this difference We sum the squared differences And we divide by the number of value The variance is the mean of the squared differences to the mean
33
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation The standard deviation is the square root of the variance
34
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation
35
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Transform the standard deviation into the metrics of the variable It permits to compare two variables Problem: when X is close to zero, it becomes useless
36
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation To measure the difference to the mean in the standard deviation metrics, we use: This is the centered- reduced variable of mean=0 and variance=1
37
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation
38
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Covariance measures the degree of dependance of two variables: Are the values of each measurement drift independantly away from the centre of gravity, or are they drifting away together? If x and y are independant, then the covariance is equal to 0
39
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Covariance measures the degree of dependance of two variables: Are the values of each measurement drift independantly away from the centre of gravity, or are they drifting away together? If x and y are independant, then the covariance is equal to 0 We multiply the x-deviation to the mean to its associated y-deviation We sum these products We divide by the number of values
40
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Covariance measures the degree of dependance of two variables: Are the values of each measurement drift independantly away from the centre of gravity, or are they drifting away together? If x and y are independant, then the covariance is equal to 0 We multiply the x-deviation to the mean to its associated y-deviation We sum these products We divide by the number of values
41
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Covariance measures the degree of dependance of two variables: Are the values of each measurement drift independantly away from the centre of gravity, or are they drifting away together? If x and y are independant, then the covariance is equal to 0 We multiply the x-deviation to the mean to its associated y-deviation We sum these products We divide by the number of values So covariance is the sum of the crossed products
42
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation So covariance is the sum of the crossed products ( sum(DATA[1:49,6] * DATA[1:49,7]) - prod(sum(DATA[1:49,6]),sum(DATA[1:49,7])) / length(DATA[1:49,6]) ) / ( length(DATA[1:49,6])-1 )
43
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation So covariance is the sum of the crossed products ( sum(DATA[1:49,6] * DATA[1:49,7]) - prod(sum(DATA[1:49,6]),sum(DATA[1:49,7])) / length(DATA[1:49,6]) ) / ( length(DATA[1:49,6])-1 ) Cov(DATA[1:49,6],DATA[1:49,7])
44
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Pearson’s correlation coefficient differs from the covariance by its absence of unit and its boundaries between -1 and 1
45
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Pearson’s correlation coefficient differs from the covariance by its absence of unit and its boundaries between -1 and 1 cov(DATA[1:49,6],DATA[1:49,7]) / (sd(DATA[1:49,6]) * sd(DATA[1:49,7]))
46
Descriptive and inferential statistics Position parameters: Mean Mode Mediane Dispersion parameters: Standard deviation Variance Maximum Minimum Coefficient of variation Covariance Coefficient of correlation Pearson’s correlation coefficient differs from the covariance by its absence of unit and its boundaries between -1 and 1 cov(DATA[1:49,6],DATA[1:49,7]) / (sd(DATA[1:49,6]) * sd(DATA[1:49,7])) Cor(DATA[1:49,6],DATA[1:49,7])
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.