The Reasons for the Steps of Descriptive Statistics

Slides:



Advertisements
Similar presentations
The standard error of the sample mean and confidence intervals
Advertisements

PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Standard Error for AP Biology
Measurement, Quantification and Analysis Some Basic Principles.
Understanding sample survey data
QUIZ CHAPTER Seven Psy302 Quantitative Methods. 1. A distribution of all sample means or sample variances that could be obtained in samples of a given.
Standard error of estimate & Confidence interval.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 7 - Sampling Distribution of Means.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.
Measuring change in sample survey data. Underlying Concept A sample statistic is our best estimate of a population parameter If we took 100 different.
A QUANTITATIVE RESEARCH PROJECT -
Advanced Quantitative Techniques
Chapter 9 Roadmap Where are we going?.
Chapter 8: Estimating with Confidence
Chapter 7 Review.
Chapter 8: Estimating with Confidence
Estimating the Value of a Parameter Using Confidence Intervals
Measurement, Quantification and Analysis
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
AP Biology Intro to Statistics
Standard Error for AP Biology
AP Biology Intro to Statistics
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
CHAPTER 10 Comparing Two Populations or Groups
Standard Error for AP Biology
AP Biology Intro to Statistics
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
By C. Kohn Waterford Agricultural Sciences
CHAPTER 21: Comparing Two Means
Physics 114: Exam 2 Review Material from Weeks 7-11
AP Biology Intro to Statistics
Confidence Intervals for Proportions
Interval Estimation.
Descriptive and inferential statistics. Confidence interval
ESTIMATION
Geology Geomath Chapter 7 - Statistics tom.h.wilson
Using Statistics in Biology
Using Statistics in Biology
CHAPTER 10 Comparing Two Populations or Groups
Standard Error for AP Biology
Statistics in Biology.
Chapter 10: Estimating with Confidence
Chapter 8: Estimating with Confidence
CHAPTER 10 Comparing Two Populations or Groups
Product moment correlation
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
CHAPTER 10 Comparing Two Populations or Groups
Facts from figures Having obtained the results of an investigation, a scientist is faced with the prospect of trying to interpret them. In some cases the.
Chapter 8: Estimating with Confidence
CHAPTER 10 Comparing Two Populations or Groups
Chapter 8: Estimating with Confidence
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
GENERALIZATION OF RESULTS OF A SAMPLE OVER POPULATION
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Section 10.2 Comparing Two Means.
CHAPTER 10 Comparing Two Populations or Groups
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Working with Two Populations
Chapter 8: Estimating with Confidence
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Statistics in Biology: Standard Error of the Mean & Error Bars
Presentation transcript:

The Reasons for the Steps of Descriptive Statistics Comparing the number of AMY1 (salivary amylase) genes in people from cultures with high starch diets to people from cultures with low starch diets. These ten people were sampled randomly from a much larger random sample. First, a note on sampling: We take sample measurements from a population of possible measurements in an attempt to estimate some parameters of the population, like population mean and population standard deviation. Thus, the descriptive statistics we generate from a sample are themselves estimates and all estimates are plagued with error and uncertainty.

The Reasons for the Steps of Descriptive Statistics Comparing the number of AMY1 (salivary amylase) genes in people from cultures with high starch diets to people from cultures with low starch diets. These ten people were sampled randomly from a much larger random sample. After obtaining the sample and calculating the sample mean, we square the differences between each measurement in a group and its sample mean. This calculation amplifies the big differences (gives them more weight because they may be notable outliers) and minimizes the little differences (they may be chance/accidental differences).

The Reasons for the Steps of Descriptive Statistics Comparing the number of AMY1 (salivary amylase) genes in people from cultures with high starch diets to people from cultures with low starch diets. These ten people were selected randomly from a much larger random sample. The sample variance gives us the average sum of the squared differences for the sample. One reason we divide by n – 1 instead of n is to artificially increase our experimental error (noise) just a bit. This trick forces us to be a little more conservative when making generalizations about our measurement sample. Another reason is that when we calculated the sample mean, we lost a degree of freedom.

The Reasons for the Steps of Descriptive Statistics Comparing the number of AMY1 (salivary amylase) genes in people from cultures with high starch diets to people from cultures with low starch diets. These ten people were selected randomly from a much larger random sample. In this next step, we calculate the sample standard deviation by taking the square root of the sample variance. The step of taking the square root of the sample variance takes the sample standard deviation back to the same units of measurement we had at the beginning. In this case, the number of genes a person has, instead of # of genes2.

The Reasons for the Steps of Descriptive Statistics Comparing the number of AMY1 (salivary amylase) genes in people from cultures with high starch diets to people from cultures with low starch diets. These ten people were selected randomly from a much larger random sample. The equation for the sample standard error of the mean (SEM) is the result of the relationship between the dispersion of individual observations around the population mean (the standard deviation), and the dispersion of sample means around the population mean (the standard error). When enough sample means are taken from a population, the mean of those sample means begins to converge on the actual population mean. The standard deviation of the sample means decreases with increasing samples and eventually becomes equal to the population’s true standard deviation divided by the square root of the sample size: √n. So to estimate the SEM of a population from which you have taken a sample, you take the standard deviation of the sample (your estimate of the population’s standard deviation) and divide by √n. See the figures on the next slide.

Recall that the sample mean, the sample standard deviation, and the sample standard error of the mean are all estimates of the same parameters for the actual population. The graphs below show what happens to these estimates as the sample size approaches the actual population size. From: Krzywinski, M. & N. Altman. (2013). Points of significance: Importance of being uncertain. Nature Methods 10:809-810.

When the sample size is large (n ≥ 20), the 95% confidence intervals (CI) are roughly 2 X SEM. When the sample size gets smaller than 20, the 95% CIs become larger than 2 X SEM (Fig b). This is because the actual method for calculating the 95% CI uses a statistic called t. Shows that 95% CIs are expected to span/capture the true population mean about 19 out of every 20 times (n = 10 for this example). Shows the relationship between 95% CIs and SEM for increasing sample sizes. From: Krzywinski, M. & N. Altman. (2013). Points of significance: Error bars. Nature Methods 10:921-922.

Error bars are not intended to allow us to decide if two means are significantly different from each other, they simply show the uncertainty for a sample mean. However, when comparing the relative uncertainty of two sample means, error bars can lead us to hypothesize that two means may be significantly different from each other. 1. Error bar width and interpretation of spacing depends on the error bar type. n = 10 in both a and b. 2. Size and position for SEM and 95% CIs for different p-values. n = 10 in all cases. From: Krzywinski, M. & N. Altman. (2013). Points of significance: Error bars. Nature Methods 10:921-922.

A note on interpreting 95% CI and SEM Error Bars Incorrect – For 95% CIs: “I am 95% confident that the true mean lies somewhere within the error bars.” For SEM: “I am 68% confident that the true mean lies somewhere within the error bars.” “If the 95% error bars do not overlap, the means are significantly different.” Correct – For 95% CIs: “The true population mean should be captured by the error bars 95% of the time.” For SEM: “The true population mean should be captured by the error bars 68% of the time.” “My error bars either captured the true population mean or they didn’t, I can’t be sure” “If the error bars do not overlap, the means may be significantly different or they may not, but an additional statistical test (the t-Test in this case) is required for more confidence.” From Strode and Brokaw (2015). HHMI Teacher’s Guide: Mathematics and Statistics in Biology.