Lecture 1 Introduction Olivier MISSA, Advanced Research Skills.

Slides:



Advertisements
Similar presentations
1 Adding a statistics package Module 2 Session 7.
Advertisements

SADC Course in Statistics Confidence intervals using CAST (Session 07)
SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5.
SADC Course in Statistics Revision on tests for means using CAST (Session 17)
SADC Course in Statistics Excel for statistics Module B2, Session 11.
A very brief introduction to R - Matthew Keller Some material cribbed from: UCLA Academic Technology Services Technical Report Series (by Patrick Burns)
Two topics in R: Simulation and goodness-of-fit HWU - GS.
R Mohammed Wahaj. What is R R is a programming language which is geared towards using a statistical approach and graphics Statisticians and data miners.
Data Handling & Analysis ZO4030 Andrew Jackson
Data analysis Incorporating slides from IS208 (© Yale Braunstein) to show you how 208 and 214 are telling you many of the the same things; and how to use.
Lecture 1 GEOG2590 – GIS for Physical Geography1 GIS for Physical Geography GEOG2590 Dr Steve Carver School of Geography.
SADC Course in Statistics Adding a statistics package Module I3, Session 13.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Systems Analysis and Design
Seven good reasons why everyone should be using R.
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
Multidisciplinary Research Methods Training Professor Linda A Lawton Graduate School Leader & Director of PgCert Research Methods.
What is R Muhammad Omer. What is R  R is the programing language software for statistical computing and data analysis  The R language is extensively.
By: Jade Wright, Garth Lo Bello, Andrew Roberts, Prue Tinsey and Tania Young.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
Developing transferable skills and enhancing employability through liaison interpreting F. Chouc French Teaching Fellow
Introduction to heidi: The higher education information database for institutions Nicola Phelps heidi Service Manager.
Chapter 10 Hypothesis Testing
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
1 An Introduction – UCF, Methods in Ecology, Fall 2008 An Introduction By Danny K. Hunt & Eric D. Stolen Getting Started with R (with speaker notes)
1 Commissioned by PAMSA and German Technical Co-Operation National Certificate in Paper & Pulp Manufacturing NQF Level 2 Apply basic knowledge of statistics.
Data Visualization using R
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 2b, February 6, 2015 Lab exercises: beginning to work with data: filtering, distributions, populations,
Analyzing and Interpreting Quantitative Data
Lecture 1.2 Field work (lab work). Analysis of data.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Btec National - Principles of Software Development 1 Principles of Software Design and Development More On Choosing a Language.
Introduction to the ICT Module Tutor: Pam Maunders.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
1 Excel for statistics Module 1, Session 4. 2 Learning Objectives participants should be able to: Explain how an Excel add-in can provide the equivalent.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
An Introduction to R Statistical Computing AMS 597 Stony Brook University Spring 2009 By Tianyi Zhang.
Chapter 6: Analyzing and Interpreting Quantitative Data
Using Official Statistics resources in your class room Emma Mawby and Te Aomihia Walker Statistics New Zealand
Applied Quantitative Analysis and Practices LECTURE#14 By Dr. Osman Sadiq Paracha.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Statistics with TI-Nspire™ Technology Module E Lesson 1: Elementary concepts.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 2a, February 2, 2016, LALLY 102 Data and Information Resources, Role of Hypothesis, Exploration and.
1 Lecture 5 Introduction to Hypothesis Tests Slides available from Statistics & SPSS page of Social Science Statistics Module.
HMS 320 Understanding Statistics Part 2. Quantitative Data Numbers of something…. (nominal - categorical Importance of something (ordinal - rankings)
Introduction to R Aedín Culhane
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Cell Diameters and Normal Distribution. Frequency Distributions a frequency distribution is an arrangement of the values that one or more variables take.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Amy Wagaman Amherst College Mathematics and Statistics.
LangTest: An easy-to-use stats calculator Punjaporn P.
A quick guide to other statistical software
Advanced Data Analytics
SPSS: Using statistical software — a primer
Statistics in SPSS Lecture 2
Disseminating Research Findings Shawn A. Lawrence, PhD, LCSW SOW 3401
Statistical tests for quantitative variables
Lecture 1. Introduction to Information and Web Technologies
Analyzing and Interpreting Quantitative Data
Adventures in teaching and learning data analysis with R
General Computer Applications by Barbara Teterycz
Graphics with r A statistical tool for high school maths
Today’s Beginner Workshop
Lecture 5 Introduction to Hypothesis tests
Advanced Algebra Unit 1 Vocabulary
PSYCHOLOGY AND STATISTICS
Presentation transcript:

Lecture 1 Introduction Olivier MISSA, Advanced Research Skills

2 Aims Introduce the use of R for advanced statistical analyses beyond "Statistics for Ecologists". Demonstrate these analyses on a broad range of questions and situations. Develop your understanding of statistical programming. Empower you to tackle future analytical challenges on your own.

3 Aims Other skills will be developed too. Produce posters using CorelDraw (graphics package). Learn how to write a grant proposal.

4 Learning Outcomes At the end of the module, you should be able to : Determine which test to use for significance testing. Explore the inherent structure of your data through a wide range of multivariate techniques. Work out which model "best explains" the variable you are interested in. Produce high quality graphs (ready for publication) using fully R graphical capabilities.

5 Organisation Staff Olivier Missa (OM), module organiser, R sessions Emma Rand (ER), R sessions Phil Roberts (PTR), CorelDraw session Peter Mayhew (PJM), Grant writing session

6 Organisation Structure 9 theoretical lectures (OM) on advanced stats. 9 practical sessions (OM & ER) on using R. 1 practical session (PTR) on CorelDraw. 1 tutorial session (PJM) on Grant writing.

7 Organisation Content L1Introduction L2 – L4Linear Models L5 – L6GLMs & Mixed-effects models L7Non-Linear Models L8 – L9Multivariate Analyses Each lecture is accompanied by a practical session

8 Organisation Assessment Open Data Analysis exercise, Written report withIntroduction, Material & Methods, Results, Discussion. particular emphasis on justifying the analyses and interpreting the results properly.

9 What is R ? "R is a language and environment for statistical computing and graphics" R website A programming language, actually a dialect of S, which was developed in the 80s by John Chambers at the Bell Labs. The Bell Labs then sold S to MathSoft (now Insightful Co.), which developed it further into S-Plus, a commercial Statistical package. In the 90s, S was rewritten from scratch by two statisticians, Ross Ihaka & Rob Gentleman, from New Zealand. Since then R has continued to grow in scale and scope and is currently maintained by about 20 people across the globe.

10 Why use R ? The Key Benefits : it's Free It won't cost you a penny ever Open How things are calculated is not hidden Fully customisable The user is in full control Cutting Edge Stats Pros use it to create new techniques Very Widespread (increasingly so) Thousands of contributors (packages), millions of users Supported by an international user community happy to provide help and assistance

11 Why use R ? The Drawback : Steep Learning Curve You need to learn the language You need to know what you are doing (stats)

12 What is R Good for ? Absolutely everything (to do with data) Statistics Modelling Programming / Simulations Graphics (from very simple to complex, 2D, 3D,...) Database (simple relational functions) Bioinformatics (Bioconductor project) Platform interacting with other Softwares (e.g. Ggobi, WinBUGS, MySQL, GRASS GIS)

13 Example of a session > data(volcano) > dim(volcano) [1] > volcano [,1] [,2] [,3] [,4] [,5] [,6] [,7]... [,61] [1,] [2,] [87,] > volcano[1:3,1:3] [,1] [,2] [,3] [1,] [2,] [3,]

14 > range(volcano) [1] > mean(volcano) [1] > sd(volcano) [1] [8] > ?sd ## help('sd') does the same > sd function (x, na.rm = FALSE) { if (is.matrix(x)) apply(x, 2, sd, na.rm = na.rm) else if (is.vector(x)) sqrt(var(x, na.rm = na.rm)) else if (is.data.frame(x)) sapply(x, sd, na.rm = na.rm) else sqrt(var(as.vector(x), na.rm = na.rm)) }...

15 > sd(as.vector(volcano)) [1] > summary(as.vector(volcano)) Min. 1st Qu. Median Mean 3rd Qu. Max > volcano.v <- as.vector(volcano) > dim(volcano.v) NULL > length(volcano.v) [1] 5307 > 61*87 [1] 5307 > volcano.v[1:87] == volcano[,1] [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE [87] TRUE > volcano.v[1:61] == volcano[1,]... only three values (out of 61) show "TRUE"

16 > plot(volcano) not useful, only show that elevation in columns 1 and 2 tend to be correlated

17 > plot(volcano) > plot(volcano.v, pch=20) > hist(volcano, prob=TRUE, +xlab="volcano elevation (m)") > x <- seq(90,200,1) > curve(dnorm(x, mean=mean(volcano.v), +sd=sd(volcano.v)), add=TRUE) > shapiro.test(volcano.v) Error in shapiro.test(volcano.v) : sample size must be between 3 and 5000 > smpl <- sample(volcano.v, 5000) > shapiro.test(smpl) Shapiro-Wilk normality test data: smpl W = , p-value < 2.2e-16 W E

18 > library(nortest) ## Package of Normality tests > ad.test(volcano) ## Anderson-Darling Anderson-Darling normality test data: volcano A = , p-value < 2.2e-16 > cvm.test(volcano) ## Cramer-von Mises > lillie.test(volcano) ## Lilliefors > pearson.test(volcano) ## Pearson (Chi 2 ) > sf.test(smpl) ## Shapiro-Francia > qqnorm(volcano.v) > qqline(volcano.v, col="red")

19 > x <- 10*(1:nrow(volcano)) ## 10, 20,..., 610 > y <- 10*(1:ncol(volcano)) ## 10, 20,..., 870 > image(x, y, volcano)

20 > x <- 10*(1:nrow(volcano)) > y <- 10*(1:ncol(volcano)) > image(x, y, volcano) > image(x, y, volcano, asp=1)

21 > x <- 10*(1:nrow(volcano)) > y <- 10*(1:ncol(volcano)) > image(x, y, volcano) > image(x, y, volcano, asp=1) > image(x, y, volcano, asp=1, +col = terrain.colors(100), +axes = FALSE, asp=1)

22 > x <- 10*(1:nrow(volcano)) > y <- 10*(1:ncol(volcano)) > image(x, y, volcano) > image(x, y, volcano, asp=1) > image(x, y, volcano, asp=1, +col = terrain.colors(100), +axes = FALSE, asp=1) > contour(x, y, volcano, +levels = seq(90, 200, by=5), +add = TRUE, col = "peru")

23 > x <- 10*(1:nrow(volcano)) > y <- 10*(1:ncol(volcano)) > image(x, y, volcano) > image(x, y, volcano, asp=1) > image(x, y, volcano, asp=1, +col = terrain.colors(100), +axes = FALSE) > contour(x, y, volcano, +levels = seq(90, 200, by=5), +add = TRUE, col = "peru") > image(x, y, volcano, asp=1, +col = terrain.colors(100), +axes = FALSE) > contour(x, y, volcano, +levels = seq(90, 200, by=10), +add = TRUE, col = "peru")

24 Gallery of other Volcano Graphs image + contour persp surface3d persp with shading

25 More Classical Graphs Histogram + Theoretical curve Boxplot Stripchart Barplot Pie chart 3D models