Presentation of Data Tables and graphs are convenient for presenting data. They present the data in an organized format, enabling the reader to find.

Slides:



Advertisements
Similar presentations
Quality control tools
Advertisements

X,Y scatterplot These are plots of X,Y coordinates showing each individual's or sample's score on two variables. When plotting data this way we are usually.
Simple Linear Regression Analysis
Lesson 10: Linear Regression and Correlation
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Chapter 3 Bivariate Data
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
LSP 120: Quantitative Reasoning and Technological Literacy Section 118 Özlem Elgün.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Linear Regression and Correlation Analysis
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Math 227 Elementary Statistics Math 227 Elementary Statistics Sullivan, 4 th ed.
Correlation and Regression Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Lecture 3-2 Summarizing Relationships among variables ©
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Introduction to Linear Regression and Correlation Analysis
Correlation Scatter Plots Correlation Coefficients Significance Test.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Linear Regression and Correlation
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Quantitative Skills: Data Analysis and Graphing.
Data Collection & Processing Hand Grip Strength P textbook.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Covariance and correlation
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Quantitative Skills 1: Graphing
Section 2.4 Representing Data.
The Scientific Method Honors Biology Laboratory Skills.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Chapter 2 Describing Data.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Summarizing Bivariate Data
Examining Relationships in Quantitative Research
Graphing Data: Introduction to Basic Graphs Grade 8 M.Cacciotti.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Statistical Analysis Topic – Math skills requirements.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Chapter 8: Simple Linear Regression Yang Zhenlin.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Data Analysis, Presentation, and Statistics
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Discovering Mathematics Week 9 – Unit 6 Graphs MU123 Dr. Hassan Sharafuddin.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
BPS - 5th Ed. Chapter 231 Inference for Regression.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Plotting in Excel Ken youssefi Engineering 10.
Plotting in Excel KY San Jose State University Engineering 10.
Chapter 13 Simple Linear Regression
Department of Mathematics
Statistical analysis.
Regression and Correlation
Graphing.
Statistical analysis.
Chapter 5 STATISTICS (PART 4).
Mathematical Modeling
Graphing AP Biology.
Honors Statistics Review Chapters 7 & 8
Presentation transcript:

Presentation of Data Tables and graphs are convenient for presenting data. They present the data in an organized format, enabling the reader to find information quickly. Goal: Bar graphs including error bars (SD or SEM)

Data Tables Table 1 Tables are most easily constructed using your word processor's table function or a spread sheet such as Excel.

Tables should be numbered sequentially beginning with Table 1 Tables should be numbered sequentially beginning with Table 1. Include a descriptive title. The title should enable the reader to understand the table without reading the rest of the document. When creating tables, be sure to state the units of all measurements. Example: Table 1.  Number of bird species observed in 10 different woodlots in Clinton County, NY on January 18, 2006. Each bird count was done by two observers over a 1-hour period beginning at 8:00 AM.

Graphs There are four common types of graphs used in biology: bar graph, frequency histogram, XY scatterplot, XY line graph.

Bar Graphs Bar graphs are best when the data are in groups or categories.

The data in the example below come from population counts of several different kinds of mammals in a woodlot in Clinton County, NY in July 2006. Grey squirrel – 8 Red squirrels – 4 Chipmunks - 17 White-footed mice – 26 White-tailed deer – 2 A bar graph is best for these data because they are categories; it is not possible to have a data point that is between grey squirrels and red squirrels.

Parts of a Graph: This is an example of a typical bar graph with the various component parts labeled in red. Figure 1: Mean germination (%) (+SD) of gourd seeds following various pregermination treatments

Frequency histogram

Frequency Histogram Frequency histograms (also called frequency distributions) are bar-type graphs that show how the measured individuals are distributed along an axis of the measured variable. Frequency (the Y axis) can be absolute (i.e. number of counts) or relative (i.e. percent or proportion of the sample.)

A familiar example would be a histogram of exam scores, showing the number of students who achieved each possible score. UC Davis

When an investigation involves measurement data, one of the first steps is to construct a histogram to represent the data’s distribution to see if it approximates a normal distribution Creating this kind of graph requires setting up bins—uniform range intervals that cover the entire range of the data. Then the number of measurements that fit in each bin (range of units) are counted and graphed on a frequency diagram, or histogram.

Bar charts are very like histograms except that the columns are not usually adjacent to one another. This is to emphasize the point that these data are not directly related to one another. They could be placed in any order.

Error Bars

Error bars are a graphical representation of the variability of data and are used on graphs to indicate the error, or uncertainty in a reported measurement. They give a general idea of how accurate a measurement is, or conversely, how far from the reported value the true (error free) value might be. Error bars often represent one standard deviation of uncertainty or one standard error. These quantities are not the same and so the measure selected should be stated explicitly in the graph or supporting text.

Many questions and investigations in biology call for a comparison of populations. For example, Are the spines on fish in one lake without predators shorter than the spines on fish in another lake with predators? or Are the leaves of ivy grown in the sun different from the leaves of ivy grown in the shade? If the variables are measured variables, then the best graph to represent the data is probably a bar graph of the means of the two samples with standard error indicated (Figure 1).

In Figure 1, the sample standard error bar (also known as the sample error of the sample mean) is a notation at the top of each shaded bar that shows the sample standard error (SE, in this case, ±1).

Sample standard error bars are not particularly easy to plot on a graph, however. In Excel, for example, the user needs to choose the “custom error bar” option. An Internet search will yield links to video instructions on “how to plot error bars in Excel.” http://www.youtube.com/watch?v=G10_qGcuELA Most of the time, bar graphs should include standard error rather than standard deviation. The standard error bars provide more information about how different the two means may be from each other.

For some labs, students should include standard error (or standard deviation) in their analysis and use standard error bars on their graphical displays when appropriate. Error bars or not? Always include error bars (SD or SEM) when plotting means. In some courses you may be asked to plot other measures associated with the mean, such as confidence intervals.

Error bars can be used to compare visually two quantities if various other conditions hold. This can determine whether differences are statistically significant

The graph show an overlap of the error bar. If two SE error bars overlap you can conclude that the difference is not statistically significant. Sample A and Sample B are not significantly different.

If the two error bars do not overlap then we CANNOT conclude that they are statistically different. At this stage the student should proceed to a t-test to determine any statistically significant difference.

Making a Bar Graph with Error Bars with EXCEL

Example: Some students grow tomato plants with and without fertilizer. (1) Create a data table using EXCEL. (2) Calculate the mean, standard deviation, and standard error (SEM) of their data (3) Make a bar graph comparing their means including SEM error bars.

Click on both of these

Highlight the two average cells and the column titles (so they show up on graph) Insert | Chart | Column Add title, axes labels

Click grey area

Change to white

4. Click on left axis Change minimum to 0 Maximum to 50 5. Click on series 1… delete

Youtube that explains how to do error bars: http://www.youtube.com/watch?v=G10_qGcuELA Click chart |Choose Chart on menu bar | Source data | columns This will allow you to add error bars separately

Click on the chart | Options

Make the overlap negative This will separate the columns

Now click on one column and then add Y error bars… the value is on your data table

+/- SEM control fertilizer Add with text box

PROBLEM 6: do these and print out your bar graph http://www.youtube.com/watch?v=G10_qGcuELA Example: Some students grow tomato plants with and without fertilizer. (1) Create a data table using EXCEL. (2) Calculate the mean, standard deviation, and standard error (SEM) of their data (3) Make a bar graph comparing their means including SEM error bars.

X,Y scatterplot These are plots of X,Y coordinates showing each individual's or sample's score on two variables. When plotting data this way we are usually interested in knowing whether the two variables show a "relationship", i.e. do they change in value together in a consistent way? When comparing one measured variable against another—looking for trends or associations— it is appropriate to plot the individual data points on an x-y plot, creating a scatterplot.

A scatter plot is a type of graph that shows how two sets of data might be connected. When you plot a series of points on a graph, you’ll have a visual idea of whether your data might have a linear, exponential or some other kind of connection. Creating scatter plots by hand can be cumbersome, especially if you have a large number of plot points. Microsoft Excel has a built in graphing utility that can instantly create a scatter plot from your data. This enables you to look at your data and perform further tests without having to re-enter your data. For example, if your scatter plot looks like it might be a linear relationship, you can perform linear regression in one or two clicks of your mouse.

If the relationship is thought to be linear, a linear regression line can be calculated and plotted to help filter out the pattern that is not always apparent in a sea of dots (Figure 3).

In this example, the value of r (square root of R2) can be used to help determine if there is a statistical correlation between the x and y variables to infer the possibility of causal mechanisms. Such correlations point to further questions where variables are manipulated to test hypotheses about how the variables are correlated.

Students can also use scatterplots to plot a manipulated independent x-variable against the dependent y-variable. Students should become familiar with the shapes they’ll find in such scatterplots and the biological implications of these shapes.

A concave upward curve is associated with exponentially increasing functions (for example, in the early stages of bacterial growth).

In ecology, a species-area curve is a relationship between the area of a habitat, or of part of a habitat, and the number of species found within that area.

A sine wave–like curve is associated with a biological rhythm.

A sine wave–like curve is associated with a biological rhythm. Figure 1: Predator-Prey Curve

Elements of effective graphing Students will usually use computer software to create their graphs. In so doing, they should keep in mind the following elements of effective graphing: • A graph must have a title that informs the reader about the experiment and tells the reader exactly what is being measured. • The reader should be able to easily identify each line or bar on the graph.

Big or little? For course-related papers, a good rule of thumb is to size your figures to fill about one-half of a page. Readers should not have to reach for a magnifying glass to make out the details. Compound figures may require a full page

• Axes must be clearly labeled with units as follows: ––The x-axis shows the independent variable. Time is an example of an independent variable. Other possibilities for an independent variable might be light intensity or the concentration of a hormone or nutrient. ––The y-axis denotes the dependent variable— the variable that is being affected by the condition (independent variable) shown on the x-axis.

Intervals must be uniform Intervals must be uniform. For example, if one square on the x-axis equals five minutes, each interval must be the same and not change to 10 minutes or one minute. The intervals do not have to be the same on each axis… they represent different quantities. If there is a break in the graph, such as a time course over which little happens for an extended period, it should be noted with a break in the axis and a corresponding break in the data line.

Tick marks - Use common sense when deciding on major (numbered) versus minor ticks. Major ticks should be used to reasonably break up the range of values plotted into integer values. Within the major intervals, it is usually necessary to add minor interval ticks that further subdivide the scale into logical units (i.e., a interval that is a factor of the major tick interval). For example, when using major tick intervals of 10, minor tick intervals of 1,2, or 5 might be used, but not 4. –– It is not necessary to label each interval. Labels can identify every five or 10 intervals, or whatever is appropriate. ––The labels on the x-axis and y-axis should allow the reader to easily see the information.

Parts of a Graph: This is an example of a typical line graph with the various component parts labeled in red.

More than one condition of an experiment may be shown on a graph by the use of different lines. For example, the appearance of a product in an enzyme reaction at different temperatures can be compared on the same graph. In this case, each line must be clearly differentiated from the others—by a label, a different style, or colors indicated by a key. These techniques provide an easy way to compare the results of experiments.

Figure 3: Release of reducing sugars from alfalfa straw by crude extracellular enzymes from thermophilic and nonthermophilic fungi.

• The graph should clarify whether the data start at the origin (0,0) or not. The line should not be extended to the origin if the data do not start there. In addition, the line should not be extended beyond the last data point (extrapolation) unless a dashed line(or some other demarcation) clearly indicates that this is a prediction about what may happen.

Scatterplot A scatterplot is a useful summary of a set of bivariate data (two variables), usually drawn before working out a linear correlation coefficient or fitting a regression line. It gives a good visual picture of the relationship between the two variables, and aids the interpretation of the correlation coefficient or regression model.

Each unit contributes one point to the scatterplot, on which points are plotted but not joined. The resulting pattern indicates the type and strength of the relationship between the two variables. The following plots demonstrate the appearance of positively associated, negatively associated, and non-associated variables: Positive correlation Negative correlation No correlation

A scatterplot can be a helpful tool in determining the strength of the relationship between two variables. If there appears to be no association between the proposed explanatory and dependent variables (i.e., the scatterplot does not indicate any increasing or decreasing trends), then fitting a linear regression model to the data probably will not provide a useful model. Positive correlation Negative correlation No correlation

Correlation Statistics – allow one to determine/describe the relationship between variables. a. Linear Regression – Line of best fit used to express the relationship between two variables and predict potential outcomes based on a given value for a variable. The line of best fit follows the familiar equation of y = mx + b, where b is the y intercept and m is the slope of the line. ii. A steep slope indicates a strong effect. iii. A shallow slope indicates a weak effect. iv. A negative slope indicates a negative effect. That is an increase in X results in a decrease in Y. v. The line of best fit can be used to predict a value of one variable given a value for the other variable.

Linear Regression Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an independent (explanatory) variable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model. R2 is a statistic that will give some information about the goodness of fit of a model. In regression, the R2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R2 of 1 indicates that the regression line perfectly fits the data.

A valuable numerical measure of association between two variables is the correlation coefficient, which is a value between -1 and 1 indicating the strength of the association of the observed data for the two variables.

Correlation positive

A positive correlation indicates a positive association between the variables (increasing values in one variable correspond to increasing values in the other variable), while a negative correlation indicates a negative association between the variables (increasing values is one variable correspond to decreasing values in the other variable). A correlation value close to 0 indicates no association between the variables.

Correlation in Linear Regression The square of the correlation coefficient, R², is a useful value in linear regression. This value represents the fraction of the variation in one variable that may be explained by the other variable. Thus, if a correlation of 0.8 is observed between two variables (say, height and weight, for example), then a linear regression model attempting to explain either variable in terms of the other variable will account for 64% of the variability in the data. The correlation coefficient also relates directly to the regression line Y = a + bX for any two variables. Because the least-squares regression line will always pass through the means of x and y, the regression line may be entirely described by the means, standard deviations, and correlation of the two variables under investigation.

x is the independent variable y is the dependent variable A linear regression line has an equation of the form: x is the independent variable y is the dependent variable m is slope of the line is b b is the intercept (the value of y when x = 0)

Least-Squares Regression The most common method for fitting a regression line is the method of least-squares. This method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line (if a point lies on the fitted line exactly, then its vertical deviation is 0). Because the deviations are first squared, then summed, there are no cancellations between positive and negative values.

Given a scatter plot, we can draw the line that best fits the data

There are two tests for correlation: the Pearson correlation coefficient ( r ), and Spearman's rank-order correlation coefficient (rs ). These both vary from +1 (perfect correlation) through 0 (no correlation) to –1 (perfect negative correlation). If your data are continuous and normally-distributed use Pearson, otherwise use Spearman.

What is the Pearson Correlation Coefficient? Correlation between variables is a measure of how well the variables are related. The most common measure of correlation in statistics is the Pearson Correlation (technically called the Pearson Product Moment Correlation or PPMC), which shows the linear relationship between two variables. Two letters are used to represent the Pearson correlation: Greek letter rho (ρ) for a population and the letter “r” for a sample. R2 is a statistic that will give some information about the goodness of fit of a model. In regression, the R2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R2 of 1 indicates that the regression line perfectly fits the data. http://www.statisticshowto.com/articles/what-is-the-pearson-correlation-coefficient/

Correlation between variables is a measure of how well the variables are related. The most common measure of correlation in statistics is the Pearson Correlation (technically called the Pearson Product Moment Correlation or PPMC), which shows the linear relationship between two variables. Two letters are used to represent the Pearson correlation: Greek letter rho (ρ) for a population and the letter “r” for a sample.

In linear least squares regression with an estimated intercept term, R2 equals the square of the Pearson correlation coefficient between the observed and modeled (predicted) data values of the dependent variable.

What are the Possible Values for the Pearson Correlation? Results are between -1 and 1. A result of -1 means that there is a perfect negative correlation between the two values at all, while a result of 1 means that there is a perfect positive correlation between the two variables. A result of 0 means that there is no linear relationship between the two variables.

High correlation: 0.5 to 1.0 or -0.5 to 1.0 What are the Possible Values for the Pearson Correlation? You will very rarely get a correlation of 0, -1 or 1. You’ll get somewhere in between. The closer the value of r gets to zero, the greater the variation the data points are around the line of best fit. High correlation: 0.5 to 1.0 or -0.5 to 1.0 Medium correlation: 0.3 to 0.5 or -0.3 to 0.5 Low correlation: 0.1 to 0.3 or -0.1 to -0.3

Pearson Product Moment (PPM) Correlation – unit-less value ranging from –1.0 to +1.0 that describes the goodness of fit of the relationship between two variables. i. An |r| value of 1.00 represents a perfect correlation. ii. An |r| value above 0.85 represents a very high correlation. iii. An |r| value of 0.70 – 0.84 represents a high correlation. iv. An |r| value of 0.55 – 0.69 represents a moderate correlation. v. An |r| value of 0.40 – 0.54 represents a low correlation. vi. An |r| value of 0.00 – 0.39 represents no correlation.

In statistics, the Pearson product-moment correlation coefficient (sometimes referred to as the PPMCC or PCC, or Pearson's r) is a measure of the linear correlation (dependence) between two variables X and Y, giving a value between +1 and −1 inclusive. It is widely used in the sciences as a measure of the strength of linear dependence between two variables. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s.

What Do I Have to Consider When Using the Pearson product-moment correlation? The PPMC does not differentiate between dependent and independent variables. For example, if you are investigating the correlation between a high caloric diet and diabetes, you might find a high correlation of 0.8. However, you could also run a PPMC with the variables switched around (diabetes causes a high caloric diet), which would make no sense. Therefore, as a researcher you have to be mindful of the variables you are plugging in. In addition, the PPMC will not give you any information about the slope of the line — it only tells you whether there is a high correlation.

Real Life Example Pearson correlation is used in thousands of real life situations. For example, scientists in China wanted to know if there was a correlation between spatial distribution and genetic differentiation in weedy rice populations in a study to determine the evolutionary potential of weedy rice.

Real Life Example The graph below shows the observed heterozygosity of weedy rice plotted against the multilocus outcrossing rate. Pearson’s correlation between the two groups was analyzed, showing a significant positive correlation of between 0.783 and 0.895 for weedy rice populations.

Analysis of 4999 Online Physician Ratings Indicates That Most Patients Give Physicians a Favorable Rating Kadry B, Chu LF, Kadry B, Gammas D, Macario A - J. Med. Internet Res. (2011) http://openi.nlm.nih.gov/gridquery.php?q=pearson%20correlation Figure 2: Pearson correlation comparing overall rating versus staff rating (n = 4999, Pearson correlation, r = .715, P < .001).

Impulsivity, gender, and the platelet serotonin transporter in healthy subjects f1-ndt-6-009: A) Positive correlation between the Bmax and the cognitive complexity factor in men (Pearson correlation = 0.378, P = 0.006). B) Negative correlation between the Kd and the motor impulsivity factor in men (Pearson correlation = −0.673, P = 0.023). http://openi.nlm.nih.gov/gridquery.php?q=pearson%20correlation%20&atab=2 Women are more impulsive

new method to measure IOP Comparison Between Dynamic Contour Tonometry and Goldmann Applanation Tonometry new method to measure IOP Comparing methods of measuring eye pressure Figure 1: Pearson correlation analysis of intraocular pressure (IOP) measurements obtained by Goldmann tonometry and dynamic contour tonometry (n=451, R=0.853, p<0.001).

Which of these has the highest Pearson coefficient? Abstract: Gene expression profiles provide important information about the biology of breast tumors and can be used to develop prognostic tests. However, the implementation of quantitative RNA-based testing in routine molecular pathology has not been accomplished, so far. The EndoPredict assay has recently been described as a quantitative RT-PCR-based multigene expression test to identify a subgroup of hormone-receptor-positive tumors that have an excellent prognosis with endocrine therapy only. To transfer this test from bench to bedside, it is essential to evaluate the test-performance in a multicenter setting in different molecular pathology laboratories. In this study, we have evaluated the EndoPredict (EP) assay in seven different molecular pathology laboratories in Germany, Austria, and Switzerland. Fig4: Correlation analysis of the EndoPredict test results in the seven different pathology laboratories. a–g Results of the individual laboratories. h Pearson correlation coefficients

and error bars with EXCEL Making an XY plot with a regression line and error bars with EXCEL

How to Create a Linear Regression Equation with Microsoft Excel A scatter plot will show you where your points lie will give you a visual clue about whether your data is linear, exponential or some either type of relationship. Therefore, if you aren’t sure your data is linear in nature, create a scatter plot.

Finding a linear regression equation via a scatter plot and a trendline.

If you know that one variable causes the changes in the other variable, then you can use linear regression to investigate the relation. This fits a straight line to the data, and gives the values of the slope and intercept of that line (m and b in the equation y = mx + b). The simplest way to do this in Excel is to plot a scatter graph of the data and use the trend line feature of the graph. Right-click on a data point on the graph, select Add Trend line, and choose Linear. Click on the Options tab, and select Display equation on chart. You can also choose to set the intercept to be zero (or some other value). The full equation with the slope and intercept values are now shown on the chart.

Step 1: Enter your data into an EXCEL file Left column x, right column is y Step 2: Create a scatter plot for your data INSERT / Chart / select XY(scatter) in chart wizard

Step 2: Create a scatter plot for your data INSERT / Chart / select XY(scatter) in chart wizard

Step 3: Click anywhere on the graph. Step 4: Click the “Chart” tab and then chart options to modify things on the graph

Step 5: Click anywhere on the graph. Step 6: Click the “Chart” tab and then “add trendline”

Step 7: In the add trendline menu click the option button. Step 8: Click on the boxes… Set intercept = 0 (If you want line to include 0,0) Display equation on chart Display R-squared value on chart

You can move this and add a white background The R2 value is close to +1… what does this mean? What is the Pearson correlation constant?

This is the same graph with y intercept set to 0 Why should it pass 0,0 ?

Adding error bars...

Click on data points

Example: The size of breeding pairs of penguins was measured to see if there was correlation between the sizes of the two sexes. In Excel r is calculated using the formula: = CORREL (X range, Y range) . Insert | Function | CORREL It is usual to draw a scatter graph of the data whenever a correlation is being investigated.

It is usual to draw a scatter graph of the data whenever a correlation is being investigated. R can be calculated from R2 The scatter graph and both correlation coefficients clearly indicate a strong positive correlation. In other words large females do pair with large males. Of course this doesn't say why, but it shows there is a correlation to investigate further.

THE END

Causation and correlation ? 1.1.6 Explain that the existence of a correlation does not establish that there is a causal relationship between two variables.. Typically in Biology your experiment may involve a continuous independent variable and a continuously variable dependent variable. e.g effect of enzyme concentration on the rate of an enzyme catalyzed reaction. The statistical analysis would set out to test the strength of the relationship (correlation). Once a correlation between two factors has been established from experimental data it would be necessary to advance the research to determine what the causal relationship might be.

Causation Correlation does not imply causation! It is important to realize that if the statistical analysis of data indicates a correlation between the independent and dependent variable this does not prove any causation. Only further investigation will reveal the causal effect between the two variables. Correlation does not imply causation! Skirt lengths and stock prices are highly correlated (as stock prices go up, skirt lengths get shorter). The number of cavities in elementary school children and vocabulary size have a strong positive correlation. Clearly there is no real interaction between the factors involved simply a co-incidence of the data.

Correlation vs. Causation :We have been discussing correlation Correlation vs. Causation :We have been discussing correlation. We have looked at situations where there exists a strong positive relationship between our variables x and y. However, just because we see a strong relationship between two variables, this does not imply that a change in one variable causes a change in the other variable. Correlation does not imply causation! Consider the following: In the 1990s, researchers found a strong positive relationship between the number of television sets per person x and the life expectancy y of the citizens in different countries. That is, countries with many TV sets had higher life expectancies. Does this imply causation? By increasing the number of TVs in a country, can we increase the life expectancy of their citizens? Are there any hidden variables that may explain this strong positive correlation?

There is a strong positive correlation between ice cream sales and shark attacks. That is, as ice cream sales increase, the number of shark attacks increase. Is it reasonable to conclude the following? Ice cream consumption causes shark attacks.

All of the previous examples show a strong positive correlation between the variables. However, in each example it is not the case that one variable causes a change in the other variable. For example, increasing the number of ice cream sales does not increase the number of shark attacks. There are outside factors, also known as lurking variables, which cause the correlation between these variables.

Correlation does not imply causation! Correlation does not always mean that one thing causes the other thing (causation), because a something else might have caused both. For example, on hot days people buy ice cream, and people also go to the beach where some are eaten by sharks. There is a correlation between ice cream sales and shark attacks (they both go up as the temperature goes up in this case). But just because ice cream sales go up does not cause (causation) more shark attacks. Correlation does not imply causation!

You may be interested to know that global warming, earthquakes, hurricanes, and other natural disasters are a direct effect of the shrinking numbers of Pirates since the 1800s. For your interest, I have included a graph of the approximate number of pirates versus the average global temperature over the last 200 years. As you can see, there is a statistically significant inverse relationship between pirates and global temperature.

THE END

In Excel r is calculated using the formula: = CORREL (X range, Y range) . To calculate rs , first make two new columns showing the ranks (or order) of the X and Y data (either by hand or using Excel's = RANK command), and then calculate the Pearson correlation on the rank data. It is usual to draw a scatter graph of the data whenever a correlation is being investigated.

In the illustrated example the size of breeding pairs of penguins was measured to see if there was correlation between the sizes of the two sexes. The scatter graph and both correlation coefficients clearly indicate a strong positive correlation.

In other words large females do pair with large males. Of course this doesn't say why, but it shows there is a correlation to investigate further.

Linear Regression In statistics, the coefficient of determination, denoted R2 and pronounced R squared, indicates how well data points fit a line or curve. It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, as the proportion of total variation of outcomes explained by the model.

Before attempting to fit a linear model to observed data, a modeler should first determine whether or not there is a relationship between the variables of interest. This does not necessarily imply that one variable causes the other (for example, higher SAT scores do not cause higher college grades), but that there is some significant association between the two variables.

Organization of the data Graphing Graphs are a basic, but surprisingly powerful, tool for communicating information because they can identify trends that are not visible in a large group of numbers. Organization of the data 1. The first step is to organize the data into a matrix or table consisting of rows and columns. This allows visual inspection of the data to ensure that there are no missing data points and that none of the data points are too far out of the “normal” range of the other data. One or two excessively large (or small) values (outliers) often indicate that a mistake has been made.

2. The second step is to reduce or condense the data by averaging the values for each group (i.e. calculating the mean or average). 3. The graph is generally constructed using the averaged data for each condition. An individual set of data may also be graphed to illustrate a point. Setting up the graph 1. Two types of graphs are most commonly used to present biological data: a. Line or scatter plots (X-Y graphs) b. Bar graphs or histograms.

2. Effective graphs need to be able to stand on their own, therefore, as a minimum they should include: a. Title – the title must tell exactly what the graph illustrates. The usual format for a title is “The effect of X (the independent variable) on Y (the dependent variable). i. Recall that the independent variable is the one controlled or manipulated by the experimenter. The dependent variable changes in response to the independent variable. b. Axes – must be labeled with the name of the variable and the units of measurement. The data will be displayed much more effectively if the axes represent only the range of the variables (no big gaps).

x

Sample question: Create a scatter plot in Microsoft Excel plotting the following data from a study investigating the relationship between height and weight of pre-diabetic patients: Height (inches): 72, 71,70,67,65,64,64,63,62,60 Weight (lb): 180, 178,190,150,145,132,170,120,143,98 Step 1: Type your data into a spreadsheet. For the scatter plot to work correctly, your data must be entered into two columns. The example shows data entered for height (column A) and weight (column B) http://www.statisticshowto.com/articles/how-to-create-a-scatter-plot-in-microsoft-excel/

http://www. statisticshowto http://www.statisticshowto.com/articles/how-to-create-a-scatter-plot-in-microsoft-excel/ Step 2: Highlight your data. To highlight your data, left click at the top left of your data and then drag the mouse to the bottom right.

http://www. statisticshowto http://www.statisticshowto.com/articles/how-to-create-a-scatter-plot-in-microsoft-excel/ Step 3: Click the “Insert” button on the ribbon, then click “Scatter,” then click “Scatter with only markers.” Microsoft Excel will create a scatter plot from your data and display the graph next to your data in the spreadsheet.

http://www. statisticshowto http://www.statisticshowto.com/articles/how-to-create-a-scatter-plot-in-microsoft-excel/ Tip: If you want to change the data (and therefore your graph), there’s no need to redo the whole procedure. When you type new data into either column, Microsoft Excel will automatically calculate the change and instantly display the new graph.

Scatter Diagrams and Regression Lines If data is given in pairs then the scatter diagram of the data is just the points plotted on the xy-plane. The scatter plot is used to visually identify relationships between the first and the second entries of paired data.

The scatter plot below represents the age vs. size of a plant The scatter plot below represents the age vs. size of a plant. It is clear from the scatter plot that as the plant ages, its size tends to increase. If it seems to be the case that the points follow a linear pattern well, then we say that there is a high linear correlation, while if it seems that the data do not follow a linear pattern, we say that there is no linear correlation. If the data somewhat follow a linear path, then we say that there is a moderate linear correlation.

Since the formula for calculating the correlation coefficient standardizes the variables, changes in scale or units of measurement will not affect its value. For this reason, the correlation coefficient is often more useful than a graphical depiction in determining the strength of the association between two variables.