Useful packages for visualisation, GIS analysis and more

Slides:



Advertisements
Similar presentations
Forecasting Using the Simple Linear Regression Model and Correlation
Advertisements

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Gordon Stringer, UCCS1 Regression Analysis Gordon Stringer.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
R for Research Data Analysis using R Day2: Advanced R Baburao Kamble University of Nebraska-Lincoln.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Quick maps in R Melanie Frazier, NCEAS Presentation materials here:
Correlation and Regression Quantitative Methods in HPELS 440:210.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
Chapter 1 Review MDM 4U Mr. Lieff. 1.1 Displaying Data Visually Types of data Quantitative Discrete – only whole numbers are possible Continuous – decimals/fractions.
Chapter 1: Exploring Data Sec. 1.2: Displaying Quantitative Data with Graphs, cont.
I❤RI❤R Kin Wong (Sam) Game Plan Intro R Import SPSS file Descriptive Statistics Inferential Statistics GraphsQ&A.
© Buddy Freeman, Independence of error assumption. In many business applications using regression, the independent variable is TIME. When the data.
Linear Regression Analysis Using MS Excel Tutorial for Assignment 2 Civ E 342.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
GRAPHING A “PICTURE” OF THE RELATIONSHIP BETWEEN THE INDEPENDENT AND DEPENDENT VARIABLES.
Statistical Analysis Topic – Math skills requirements.
Data Analysis, Presentation, and Statistics
Histograms, Frequency Polygons, and Ogives
Variable A Variable isanything that may affect (change) the out come of the experiment. In an experiment we are looking for a “Cause and Effect” “Cause.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation.
Bivariate Data – Scatter Plots and Correlation Coefficient……
Graphing Basics. Why do we graph? Visual representation of data “Short hand” for presenting large amounts of information at once Easier to visualize trends.
Psychology 202a Advanced Psychological Statistics October 27, 2015.
SPSS: Using statistical software — a primer
Covariance/ Correlation
Relative Cumulative Frequency Graphs
*Bring Money for Yearbook!
Warm Up Scatter Plot Activity.
Modeling in R Sanna Härkönen.
*Bring Money for Yearbook!
REGRESSION (R2).
The Commute: The Battle of Finding Distance
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Research methodology R Statistics – Introduction
Summary Statistics in R Commander
Covariance/ Correlation
Covariance/ Correlation
BIVARIATE REGRESSION AND CORRELATION
Model diagnostics Tim Paine, modified from Zarah Pattison’s slides
Crash course in R – plotting maps
Tell a Story with the Data
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
Bivariate Testing (Chi Square)
Linear Regression.
Graphs in Science Chapter 2 Section 3.
CPSC 531: System Modeling and Simulation
Data Presentation Carey Williamson Department of Computer Science
Graphs & Data Tables.
Bivariate Testing (Chi Square)
Regression is the Most Used and Most Abused Technique in Statistics
Simple Linear Regression
Descriptive and Inferential
U4-14 TITLE Predicting Periodic Properties PURPOSE To predict the density germanium using calculated densities for silicon, tin, and lead. HYPOTHESIS The.
Data Analysis Module: Chi Square
CHAPTER 12 More About Regression
11C Line of Best Fit By Eye, 11D Linear Regression
Adequacy of Linear Regression Models
Covariance/ Correlation
Reasoning in Psychology Using Statistics
Carey Williamson Department of Computer Science University of Calgary
Reasoning in Psychology Using Statistics
Adequacy of Linear Regression Models
Adequacy of Linear Regression Models
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
Inference for Regression
Research methodology R Statistics – Introduction
Presentation transcript:

Useful packages for visualisation, GIS analysis and more R statistics Useful packages for visualisation, GIS analysis and more

Packages in R Huge list of packages available (statistics, analysis, visualization, …): https://cran.r-project.org/ https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list- of-useful-R-packages Installing package: install.packages(”package_name”) After installed: library(package_name) You can also make your own package

Data visualization in R Basic commands: plot(x, y), boxplot(y~x), hist(y), … plot(A$VOLUME~ A$SP_GROUP, xlab=”Species”, ylab=”Total volume, m3”) hist(A$VOLUME, xlab=”Total volume, m3/ha”) plot(A$D, A$VOLUME, xlab=”Diameter, cm”, ylab=”Total volume, m3”)

Ggplot2 package for visualization http://r4stats.com/examples/graphics-ggplot2/ http://t-redactyl.io/blog/2016/04/creating-plots-in-r-using-ggplot2-part-10-boxplots.html

Code examples Plotting with class-specific colours and modifying y- axis range ggplot(A2, aes(y=TOTAL_VOLUME, x=H)) + geom_point(aes(colour = as.factor(SP_GROUP)))+ xlab("Mean height, m")+ ylab("Total volume, m3/ha")+ guides(colour=guide_legend(title="Species group"))+ ylab(range(0,600))

Code examples Plotting data with fit line using data set called A2: Adding also axis labels and ranges: ggplot(A2, aes(y=TOTAL_VOLUME, x=H)) + geom_point() + geom_smooth(method=lm)+ xlab(”Mean height, m”)+ ylab(”Total volume, m3/ha”) Plotting data with fit line using data set called A2: ggplot(A2, aes(y=TOTAL_VOLUME, x=H)) + geom_point() + geom_smooth(method=lm)

Code examples Barplot showing frequency in different forest types classified by species group ggplot(A2, aes(x = as.factor(SP_GROUP), fill = as.factor(FOREST_TYPE)) ) + geom_bar() + guides(fill=guide_legend(title="Forest type"))+ xlab("Species group")

Code examples Pie chart showing frequency in different classes ggplot(A2, aes(x = factor(""), fill = as.factor(FOREST_TYPE)) ) + geom_bar() + coord_polar(theta = "y") + scale_x_discrete("")+ guides(fill=guide_legend(title="Forest type"))

Code examples Visualising several distributions in one graph ggplot(A2, aes(x=TOTAL_VOLUME)) + geom_density(aes(fill=factor(A2$SP_GROUP)), alpha=.8) + guides(fill=guide_legend(title="Tree species group"))+ xlab("Total volume (m3/ha)")+ ylab("Density")

GIS analysis in R Several packages available raster sp rgdal Nice tutorials are available, check e.g. http://neondataskills.org/R/Raster-Data-In-R/

Code examples Reading raster data in: P <- raster(”C:/Temp/testraster.tif”)) Resample to lower resolution (current pixel size*100): P2 <- aggregate(P, 100, mean) Reclassify: classes <- c(1, 3, 1, 4, 7, 2) #here 1-3 reclassified to 1, 4-7 reclassified to 2 reclassify(P, classes)

Code examples VAR1 <- raster(”C:/Temp/mytestdata.tif”) VAR2 <- raster(”C:/Temp/mytestdata2.tif”) # Calculating new raster by applying some (here # just some nonsense) function # to the input rasters VAR3 <- VAR1^2 + 0.51*VAR2

Ggmap https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf # Example using data frame df, which contains following columns: lat, lon, class on 3 # locations nearby Joensuu library(ggmap) mapsat <- get_map(location = c(lon = mean(df$lon), lat = mean(df$lat)), zoom = 10, maptype = "satellite", scale = 2) # plotting the map with some points ggmap(mapsat)+ geom_point(data = df, aes(x = lon, y = lat, fill = as.factor(class), alpha = 0.8), size=df$class, shape = 21)+ guides(fill=FALSE, alpha=FALSE, size=FALSE)

maproad <- get_map(location = c(lon = mean(df$lon), lat = mean(df$lat)), zoom = 10, maptype = "roadmap", scale = 2) # plotting the map with some points ggmap(maproad) + geom_point(data = df, aes(x = lon, y = lat, fill = "red", alpha = 0.8), size = df$class, shape = 21) + guides(fill=FALSE, alpha=FALSE, size=FALSE)

Questions?

R modeling exercise in nutshell Linear regression Check visually the correlation of dependent vs independent variables in your data set Check normality of the variables Do you need to do transformation for some variables? Try fitting with the most relevant variables Check model output: significance of variables, R-squared, residuals (homoscedasticity, normality) Model validation RMSE, bias, significance of bias (T-test) Applying the selected model with raster data Making thematic map by reclassifying the resulting raster

Group work Deadline 9th December (late submissions won’t be accepted!) Try applying some of the ”advanced” plotting (ggplot package) ~5-10 pages sanna.harkonen@helsinki.fi