Research methodology R Statistics – Introduction Dr. Sanna Härkönen, R&D Manager, Bitcomp Oy
Contents Topic Contents 15.11. Introduction to R Basic use of Rstudio Basic commands (reading and writing data, using data frames) 17.11. Modeling examples Example studies with R 21.11. Introduction to group work Model fitting Aggregating, plotting, linear regression and its interpretation 23.11. (1) Model validation RMSE, bias, t-test 23.11. (2) GIS analysis with R Rasters, shapefiles -> mapping 25.11. Group presentations Best practices: using R for data interpretation in scientific reports and studies
R Statistics Script language Great for data analysis and statistical computing Efficient vector and matrix calculations Advantages: Any programming tasks, modeling etc Versatile packages for environmental analysis! For example data clustering, decision trees, kNN imputation, GIS data analysis, … Links: https://www.r-project.org/ https://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf https://www.analyticsvidhya.com/blog/2015/07/guide-data-visualization-r/ Just Google, there is huge amount of tutorials and example codes available!
R Studio Code window Console
Tips R code a <- b + c is same as a = b + c Variable names are case sensitive (a is not same as A) Running code: mark the desired line(s) in code window and press CTRL + ENTER Clean the console: CTRL + L Show the previous code lines in console: up-array https://www.rstudio.com/wp-content/uploads/2016/10/r-cheat- sheet-3.pdf
1 Basic commands Reading data in read.csv() Cheking first lines head() Checking summary statistics summary() Data frames data.frame() Creating new column to data frame & calculating its value my_dataframe$my_new_variable <- my_dataframe$var1 + my_dataframe$var2 Removing column my_dataframe$my_new_variable <- NULL Conditionals: Ifelse() Taking subset subset() Plotting data plot() Writing data out write.csv()
Exercise 1 Download ”Modeling_data_all.csv” from Wiki Read modeling data set in RStudio to object called A A <- read.csv(”C:/temp/Modeling_data_all.csv”) Check first lines of your data set: head(A) Check summary statistics on data set A summary(A)
Calculate new variable N to data frame A (number of stems / ha, based on mean diameter D, cm, and total basal area BA, m2/ha) A$N <- A$BA / (pi * (0.5 * A$D / 100)^2) Calculate new variable ”mean_stem_volume1” to data frame A, based on total volume and N A$mean_stem_volume1 <- A$TOTAL_VOLUME / A$N
Calculate new variable mean_stem_volume2: using Laasasenaho volume function [note! Ln in R is log() ] Laasasenaho volume (V, liters) function (based on D, diameter (cm)): Scots pine: ln(V) = -5.39417 + 3.48060 * ln(2+1.25 * D) -0.039884 * D A$mean_stem_volume2 <- exp(-5.39417 + 3.48060 * log(2+1.25 * A$D) -0.039884 * A$D) / 1000 (converted from liters to m3)
Print summary statistics on your data set: summary(A) Check visually how well the two different mean stem volumes correlate together : plot(x, y) Print boxplots showing 1) mean stem volume, 2) total volume and 3) difference on mean stem volume1 and mean stem volume2 by different tree species classes and site types boxplot(x~y)
Aggregate the data based on species and site type A_agg <- aggregate(A, list(A$SP_GROUP, A$FOREST_TYPE), mean) Consider, how could you utilize R for interpreting your modeling data in ”Material” chapter of scientific report