Research methodology R Statistics – Introduction

Slides:

Advertisements

Similar presentations

Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.

Advertisements

R for Macroecology Aarhus University, Spring 2011.

PRE-SCHOOL QUANT WORKSHOP II R THROUGH EXCEL. NEW YORK TIMES INFOGRAPHICS GALARY The Jobless Rate for People Like You Home Prices in Selected Cities For.

Two topics in R: Simulation and goodness-of-fit HWU - GS.

Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.

Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.

R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.

Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.

Questionnaire Development Part II: SPSS, Reliability, and Validity Personality Lab October 11, 2010.

MATLAB Lecture One Monday 4 July Matlab Melvyn Sim Department of Decision Sciences NUS Business School

732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.

Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.

Matlab Workshop 1/10/07 Lesson 1: Matlab as a graphing calculator.

How to start Visual Studio 2008 or 2010 (command-line program)

MATLAB Harri Saarnisaari, Part of Simulations and Tools for Telecommunication Course.

Outline Comparison of Excel and R R Coding Example – RStudio Environment – Getting Help – Enter Data – Calculate Mean – Basic Plots – Save a Coding Script.

Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone

What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)

Chapters 2 & 3 MATLAB Skills This tutorial revisits Examples 3.1 to 3.4 to show how MATLAB can be used to solve the same problems 1.Scatter Plots 2.Other.

An Introduction to R Statistical Computing AMS 597 Stony Brook University Spring 2009 By Tianyi Zhang.

Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

LESSON ONE DECISION ANALYSIS Subtopic 4 - R Programming Created by The North Carolina School of Science and Math forThe North Carolina School of Science.

R Workshop #2 Basic Data Analysis. What we did last week: Understand the basics of how R works Generated objects (vectors, matrices, etc.) Read in data.

Statistics 3502/6304 Prof. Eric A. Suess Chapter 3.

Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.

PreCalculus 1-7 Linear Models. Our goal is to create a scatter plot to look for a mathematical correlation to this data.

MIS2502: Data Analytics Introduction to Advanced Analytics and R.

Pinellas County Schools

16BIT IITR Data Collection Module If you have not already done so, download and install R from download.

Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.

Introduction to Data Manipulation, Analysis, and Visualization with R Patrick Grof-Tisza.

Basic statistics for corpus linguistics

Useful packages for visualisation, GIS analysis and more

Block 1: Introduction to R

Research methodology MSC COURSE VALIDATING of MODELS

R Brown-Bag Seminar 2.1 Topic: Introduction to R Presenter: Faith Musili ICRAF-Geoscience Lab.

Data Tools: R and RStudio

Is My Model Valid? Using Simulation to Understand Your Model and If It Can Accurately Predict Events Brad Foulkes JMP Discovery Summit 2016.

Modeling in R Sanna Härkönen.

Lecture 2: Introduction to R

Introduction to R Samal Dharmarathna.

Second Annual Cytomics Workshop April, 2017

Introduction to R.

Introduction to Matlab

MATLAB DENC 2533 ECADD LAB 9.

Prepared by Kimberly Sayre and Jinbo Bi

Lab 1 Introductions to R Sean Potter.

Introduction to R.

2-7 Curve Fitting with Linear Models Holt Algebra 2.

Introduction to R By Robert Biddle.

MATH 493 Introduction to MATLAB

Crash course in R – short introduction

HMI 7530– Programming in R Introduction

STAT 4030 – Programming in R Introduction

Model Selection In multiple regression we often have many explanatory variables. How do we find the “best” model?

Code is on the Website Outline Comparison of Excel and R

This is where R scripts will load

CSCI N207 Data Analysis Using Spreadsheet

Communication and Coding Theory Lab(CS491)

Installing Packages Introduction to R, Part II

MIS2502: Data Analytics Introduction to R and RStudio

MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap

This is where R scripts will load

R Course 1st Lecture.

Data analysis with R and the tidyverse

Research methodology R Statistics – Introduction

Python4ML An open-source course for everyone

Presentation transcript:

Research methodology R Statistics – Introduction Dr. Sanna Härkönen, R&D Manager, Bitcomp Oy

Contents Topic Contents 15.11. Introduction to R Basic use of Rstudio Basic commands (reading and writing data, using data frames) 17.11. Modeling examples Example studies with R 21.11. Introduction to group work Model fitting Aggregating, plotting, linear regression and its interpretation 23.11. (1) Model validation RMSE, bias, t-test 23.11. (2) GIS analysis with R Rasters, shapefiles -> mapping 25.11. Group presentations Best practices: using R for data interpretation in scientific reports and studies

R Statistics Script language Great for data analysis and statistical computing Efficient vector and matrix calculations Advantages: Any programming tasks, modeling etc Versatile packages for environmental analysis! For example data clustering, decision trees, kNN imputation, GIS data analysis, … Links: https://www.r-project.org/ https://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf https://www.analyticsvidhya.com/blog/2015/07/guide-data-visualization-r/ Just Google, there is huge amount of tutorials and example codes available!

R Studio Code window Console

Tips R code a <- b + c is same as a = b + c Variable names are case sensitive (a is not same as A) Running code: mark the desired line(s) in code window and press CTRL + ENTER Clean the console: CTRL + L Show the previous code lines in console: up-array https://www.rstudio.com/wp-content/uploads/2016/10/r-cheat- sheet-3.pdf

1 Basic commands Reading data in read.csv() Cheking first lines head() Checking summary statistics summary() Data frames data.frame() Creating new column to data frame & calculating its value my_dataframe$my_new_variable <- my_dataframe$var1 + my_dataframe$var2 Removing column my_dataframe$my_new_variable <- NULL Conditionals: Ifelse() Taking subset subset() Plotting data plot() Writing data out write.csv()

Exercise 1 Download ”Modeling_data_all.csv” from Wiki Read modeling data set in RStudio to object called A A <- read.csv(”C:/temp/Modeling_data_all.csv”) Check first lines of your data set: head(A) Check summary statistics on data set A summary(A)

Calculate new variable N to data frame A (number of stems / ha, based on mean diameter D, cm, and total basal area BA, m2/ha) A$N <- A$BA / (pi * (0.5 * A$D / 100)^2) Calculate new variable ”mean_stem_volume1” to data frame A, based on total volume and N A$mean_stem_volume1 <- A$TOTAL_VOLUME / A$N

Calculate new variable mean_stem_volume2: using Laasasenaho volume function [note! Ln in R is log() ] Laasasenaho volume (V, liters) function (based on D, diameter (cm)): Scots pine: ln(V) = -5.39417 + 3.48060 * ln(2+1.25 * D) -0.039884 * D A$mean_stem_volume2 <- exp(-5.39417 + 3.48060 * log(2+1.25 * A$D) -0.039884 * A$D) / 1000 (converted from liters to m3)

Print summary statistics on your data set: summary(A) Check visually how well the two different mean stem volumes correlate together : plot(x, y) Print boxplots showing 1) mean stem volume, 2) total volume and 3) difference on mean stem volume1 and mean stem volume2 by different tree species classes and site types boxplot(x~y)

Aggregate the data based on species and site type A_agg <- aggregate(A, list(A$SP_GROUP, A$FOREST_TYPE), mean) Consider, how could you utilize R for interpreting your modeling data in ”Material” chapter of scientific report