EART20170 Computing, Data Analysis & Communication skills

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Lesson 10: Linear Regression and Correlation
Welcome to PHYS 225a Lab Introduction, class rules, error analysis Julia Velkovska.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
1 Functions and Applications
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Experimental Uncertainties: A Practical Guide What you should already know well What you need to know, and use, in this lab More details available in handout.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
SIMPLE LINEAR REGRESSION
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
EART20170 Computing, Data Analysis & Communication skills Lecturer: Dr Paul Connolly (F18 – Sackville Building) 1. Data analysis.
Lecture 5 Curve fitting by iterative approaches MARINE QB III MARINE QB III Modelling Aquatic Rates In Natural Ecosystems BIOL471 © 2001 School of Biological.
SIMPLE LINEAR REGRESSION
Business Statistics - QBM117 Least squares regression.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Linear Regression and Correlation Topic 18. Linear Regression  Is the link between two factors i.e. one value depends on the other.  E.g. Drivers age.
Lorelei Howard and Nick Wright MfD 2008
Review Regression and Pearson’s R SPSS Demo
Correlation & Regression Math 137 Fresno State Burger.
Correlation and Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
Least-Squares Regression
Linear Regression and Correlation
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Chapter 15 Correlation and Regression
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Jon Curwin and Roger Slater, QUANTITATIVE METHODS: A SHORT COURSE ISBN © Thomson Learning 2004 Jon Curwin and Roger Slater, QUANTITATIVE.
Chapter 6 & 7 Linear Regression & Correlation
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
Ch4 Describing Relationships Between Variables. Pressure.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION © 2012 The McGraw-Hill Companies, Inc.
METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Regression Regression relationship = trend + scatter
1.1 example these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable frequency.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
LECTURE 3: ANALYSIS OF EXPERIMENTAL DATA
1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Basic Statistics Linear Regression. X Y Simple Linear Regression.
Correlation & Regression Analysis
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
V. Rouillard  Introduction to measurement and statistical analysis CURVE FITTING In graphical form, drawing a line (curve) of best fit through.
CORRELATION ANALYSIS.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
Regression and Correlation of Data Correlation: Correlation is a measure of the association between random variables, say X and Y. No assumption that one.
Fundamentals of Data Analysis Lecture 10 Correlation and regression.
Chapter 12 Understanding Research Results: Description and Correlation
SUR-2250 Error Theory.
Regression and Correlation
Chapter 5 STATISTICS (PART 4).
CHAPTER 29: Multiple Regression*
Correlation and Regression-III
Descriptive and Inferential
Propagation of Error Berlin Chen
Presentation transcript:

EART20170 Computing, Data Analysis & Communication skills Lecturer: Dr Paul Connolly (F18 – Sackville Building) p.connolly@manchester.ac.uk 1. Data analysis (statistics) 3 lectures & practicals statistics open-book test (2 hours) 2. Computing (Excel statistics/modelling) 2 lectures assessed practical work Course notes etc: http://cloudbase.phy.umist.ac.uk/people/connolly Recommended reading: Cheeney. (1983) Statistical methods in Geology. George, Allen & Unwin

Recap – last lecture The four measurement scales: nominal, ordinal, interval and ratio. There are two types of errors: random errors (precision) and systematic errors (accuracy). Basic graphs: histograms, frequency polygons, bar charts, pie charts. Gaussian statistics describe random errors. The central limit theorem Central values, dispersion, symmetry Weighted mean.

Some common problems

Use tables 1 -3.1667 10.0278 4 -0.1667 0.0278 6 1.8333 3.3611 3 -1.1667 1.3611 7 2.8333 8.0278  25 22.8333

Lecture 2 Correlation between two variables Classical linear regression Reduced major axis regression Propagation of errors in compound quantities.

Correlation Many real-life quantities have a dependence on some thing else. E.g dependence of rock permeability on porosity. How can we quantify the strength and direction of a linear relationship between X and Y variables?

Correlation  y = sum of all y-values  x = sum of all x-values Linear correlation (Pearson’s coefficient)  y = sum of all y-values  x = sum of all x-values  x2 = sum of all x2 values  y2 = sum of all y2 values  xy = sum of the x times y values Like other numerical measures, the population correlation coefficient is (the Greek letter ``rho'‘, ) and the sample correlation coefficient is denoted by r.

Correlation Values of r y r = +1 y r = -1 y r = 0 x x x Perfect positive correlation Perfect negative correlation No correlation

Correlation coefficient, r r2, fraction of explained r2 is the amount of variation in x and y that is explained by the linear relationship. It is often called the `goodness of fit’ E.g. if an r = 0.97 is obtained then r2 = 0.95 so 100x0.95=95% of the total variation in x and y is explained by the linear relationship, but the remaining 5% variation is due to “other” causes. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 -1.0 -0.5 +0.0 +0.5 +1.0 Correlation coefficient, r r2, fraction of explained variation

Regression analysis How can we fit an equation to a set of numerical data x, y such that it yields the best fit for all the data?

Classical linear regression An approximate fit yields a straight line that passes through the set of points in the best possible manner without being required to pass exactly through any of the points.

Classical linear regression y x m { ei c Linear Regression Y=mx+c Where ei is the deviation of the data point from the fit line, c is the intercept, m is the gradient. Assumes that the error is present only in y.

How do we define a good fit? If the sum of all deviations is a minimum? ei If the sum of all the absolute deviations is a minimum? |ei| If the maximum deviation is a minimum? emax If the sum of all the squares of the deviations is a minimum? ei2

Classical linear regression The best way is to minimise the sum of the squares of the deviation. Formally this involves some Mathematics: At each value of xi: Therefore the deviations from the curve are: The sum of the squares:

Classical linear regression How do you find the minimum of a function? Use calculus Differentiate and set to zero Two simultaneous equations

Classical linear regression Solving the two equations yields:

Classical linear regression x y xy x2  ?

Classical linear regression Classical linear regression only considered errors in the Y values of the data. How can we consider errors in both x and y values? Use Reduced major axis regression

Reduced major axis regression { y x dy c dx Method to quantify a linear relationship where both variables are dependent and have errors Instead of minimising e2=(Y-y)2 we minimise e2=dy2+dx2.

Reduced major axis regression

Reduced major axis regression y x-x’ y-y’ (x-x’)2 (y-y’)2  ?

Error propagation Every measurement of a variable has an error. Often the error quoted is one standard deviation of the mean (mean ± standard deviation) The standard deviation of the sample mean is usually our best estimate of the population standard deviation

Error propagation Error propagation is a way of combining two or more random errors together to get a third. The equations assume that the errors are Gaussian in nature. It can be used when you need to measure more than one quantity to get at your final result. For example, if you wanted to predict permeability from a measured porosity and grainsize. The equations introduced here let you propagate the uncertainties on your data through the calculation and come up with an uncertainty on your results. How then do we combine variables which have errors?

Error propagation - quoted Relationship Error propagation (k=constant)

Example of propagation of error Suppose we measure the thickness of a rock bed using a tape measure. The tape measure is shorter then the bed thickness so we have to do it in two steps x and y. We repeat the measurements 100 times and obtain the following mean and standard deviation values for x and y: The thickness of the bed should be simply: But what about the error on the total thickness? x=12.1±0.3 cm y=4.2±0.2 cm x+y=16.3 cm

Example of propagation of error It is given by propagating the individual errors as follows: So the final answer for the total thickness of the bed is: Error propagation formulae are non-intuitive and understanding how they are derived requires some mathematical knowledge 16.3±0.4 cm

More complex examples What if we have several functions of several variables? E.g. calculating density using Archimedes Principle: This equation contains two functions and two variables Error propagation is best done in parts, so first work out value and error in denominator: Then the value and error of: In a few of weeks we will use a Monte Carlo method for solving more complex functions

Reminder Statistics practical #2 Those not taking BIOL20451: Roscoe 3.5 1100 – 1300 Tuesday Those taking BIOL20451: Williamson 1.12 1400 – 1600 Tuesday

Some common problems Weighted mean f x

What does adding two variables really mean?