Presentation is loading. Please wait.

Presentation is loading. Please wait.

Math 15 Introduction to Scientific Data Analysis Lecture 5 Association Statistics & Regression Analysis University of California, Merced.

Similar presentations


Presentation on theme: "Math 15 Introduction to Scientific Data Analysis Lecture 5 Association Statistics & Regression Analysis University of California, Merced."— Presentation transcript:

1 Math 15 Introduction to Scientific Data Analysis Lecture 5 Association Statistics & Regression Analysis University of California, Merced

2 WeekDateConceptsProject Due 1 2January 28Introduction to the data analysis 3February 4Excel #1 – General Techniques 4February 11Excel #2 – Plotting Graphs/ChartsQuiz #1 5February 18Holiday 6February 25Excel #3 – Statistical AnalysisQuiz #2 7March 3Excel #4 – Regression Analysis 8March 10Excel #5 – Interactive ProgrammingQuiz #3 9March 17Introduction to Computer Programming - Part - I March 24Spring Recesses 10March 31Introduction to Computer Programming - Part - IIProject #1 11April 7Programming – #1Quiz #4 12April 14Programming – #2 13April 21Programming – #3Quiz #5 14April 28Programming – #4 15May 5Programming - #5Quiz #6 16May 12Movies / EvaluationsProject #2 FinalMay ???Final Examination Course Lecture Schedule Quiz Next Week!

3 UC Merced3 Project #1 – Due March 31 st, 2008  Projects can be performed individually or in groups of three, with following rules: Teams turn in one project report and get the same grade. A team consists of at most 3 people—no copying between teams! Team project report must include a title page, where a team describe each team member’s contribution. 10% bonus for projects done individually Individual projects must not be copied from anyone else No late project will be accepted! Project #1 will be posted at UCMCROP by Next Monday!

4 UC Merced4 Review: Measures of dispersion or variability  Variance or Standard Deviation The one on the left is more dispersed than the one on the right. It has a higher variance or standard deviation. Average Mode

5 UC Merced5 Which is more precise measurement?  Although the standard deviation is a good measure of the precision of a given set of data, it can be difficult to compare the standard deviation from two different types of measurements directly.  You might need to do such a comparison to determine the largest source of uncertainty in an experimentally determined answer 446 35.49 Average mgml  (standard Deviation)= 23  = 4.5

6 UC Merced6 Get the Right Tool for the Job!

7 UC Merced7 Measures of dispersion or variability  One way to do this comparison A relative standard deviation, RSD, is simply the ratio of the standard deviation over the mean 446 35.49 Average mgml  = 23  = 4.5 RSD = 100x(23/446) = 5.2 RSD = 100x(4.5/35.49) = 12.7

8 UC Merced8  Any Questions?

9 UC Merced9 Common Practice for Data Analysis  A common task in data analysis is to investigate an association between two variables. To see if two variables vary together To see how one variable affect another. Correlation Regression

10 UC Merced10 Correlation  A correlation tells us whether the two variables vary together. i.e. as one goes up the other goes up (or goes down) Correlation Coefficient (Pearson product-moment correlation coefficient or Pearson’s r) Correlation Coefficient (Pearson product-moment correlation coefficient or Pearson’s r)

11 UC Merced11 Correlation Coefficient  Vary from +1 (perfect correlation) through 0 (no correlation) to -1 (perfect negative correlation)

12 UC Merced12 Correlation Coefficient – cont.  Always draw a diagram to check There are no OUTLIERS. If there are outliers, the following may not apply. The relation is not curved ( r only refers to LINEAR correlation) r (approx.)strength of tendencywhat with what 0.9 to 1stronghigh y with high x and low y with low x 0.7 to 0.9somehigh y with high x and low y with low x 0.3 to 0.7littlehigh y with high x and low y with low x -0.3 to 0.3noneneither high nor low y with high or low x -0.3 to -0.7littlelow y with high x and high y with low x -0.7 to -0.9somelow y with high x and high y with low x -0.9 to -1stronglow y with high x and high y with low x

13 UC Merced13 Excel Function – Correlation Coefficient  = CORREL(array1,array2) or  = PEARSON(array1,array2) Positive Correlation Lengths of a leg bone (in cm ) in penguin mating pairs

14 UC Merced14 Ice cream sales vs. number of people who drown at sea Correlation Coefficient 0.927

15 UC Merced15 Wait! What kinds of conclusion can we make from the correlation relationship?

16 UC Merced16 Examples  Ice cream sales correlate with the number of people who drown at sea. Therefore, ice cream causes people to drown.  Since the 1950s, both the atmospheric CO 2 level and crime levels have increased sharply. Hence, atmospheric CO 2 causes crime. Not Good Ones!

17 UC Merced17 Ice cream sales vs. number of people who drown at sea Correlation Coefficient 0.927

18 UC Merced18 Correlation does not imply causation  There can be no conclusion made regarding the existence or the direction of a cause and effect relationship only from the fact that A is correlated with B. Correlation Coefficient only tells you whether the two variables vary together.  Determining whether there is an actual cause and effect relationship requires further investigation, even when the relationship between A and B is statistically significant, a large effect size is observed, or a large part of the variance is explained.

19 UC Merced19  Any Questions?

20 UC Merced20 Regression  Regression is used when we have some reasons to believe that changes in one variable cause changes in the other. Correlation coefficient is not evidence for a causal relationship.  The simplest kind of causal relationship is a straight-line (or linear) relationship. Linear regression

21 UC Merced21 Linear regression  Linear regression assumes a linear relationship between two variables: Dependent factor, y, and independent factor, x.  In a mathematical approach, this relationship can be described by the following linear equation: where a is called the slope and b is called the intercept. This equation, which allows you to calculate y (dependent) based on x (independent), is based on the least square method.

22 UC Merced22 Review - Math  Linear Equation Slope and Intercept 8 3 y = 3x + 8

23 UC Merced23 Slope & Intercept formula Y-values X-values Lengths of a leg bone (in cm ) in penguin mating pairs

24 UC Merced24 y = ax + b  a – slope & b - intercept X-values Predicted Y-values =$C$10*B3+$C$11 1 2 3 4 5 6 7 8 9 10 11 12 B C X-value Don’t forget $ sign!

25 UC Merced25 Plot a linear regression (or trend) line – Part 1 You can add a linear regression line

26 UC Merced26 Plot a linear regression (or trend) line – Part 2  Right-click on any data point on the graph  Choose Add Trendline  Click on Options tab, and select Display equation and Display R-squared.  Click “ Ok ” Don ’ t forget to check these two parts!

27 UC Merced27 Plot a linear regression (or trend) line – Part 2 – cont.  R 2 Value (R-squared value – RSQ) “ measure of scatter ”  The closer this value comes to 1, the more accurate the prediction.

28 UC Merced28 Let’s review the process! Lengths of a leg bone (in cm ) in penguin mating pairs If there are some reasons to believe some causalities between two variables, then, plot a graph! Regression To see if two variables vary together To see how one variable affect another.

29 UC Merced29  Any Questions?


Download ppt "Math 15 Introduction to Scientific Data Analysis Lecture 5 Association Statistics & Regression Analysis University of California, Merced."

Similar presentations


Ads by Google