Standard Deviation and Standard Error Tutorial

Slides:



Advertisements
Similar presentations
Are our results reliable enough to support a conclusion?
Advertisements

Introduction to Summary Statistics
Theoretical Probability Distributions We have talked about the idea of frequency distributions as a way to see what is happening with our data. We have.
The standard error of the sample mean and confidence intervals
The standard error of the sample mean and confidence intervals
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Statistics Intro Univariate Analysis Central Tendency Dispersion.
AP Biology Intro to Statistic
The Sampling Distribution of the Sample Mean AGAIN – with a new angle.
Review Measures of Central Tendency –Mean, median, mode Measures of Variation –Variance, standard deviation.
Central Tendency and Variability
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Today: Central Tendency & Dispersion
Objective To understand measures of central tendency and use them to analyze data.
What is statistics? STATISTICS BOOT CAMP Study of the collection, organization, analysis, and interpretation of data Help us see what the unaided eye misses.
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
Nature of Science Science Nature of Science Scientific methods Formulation of a hypothesis Formulation of a hypothesis Survey literature/Archives.
Sample-Based Epidemiology Concepts Infant Mortality in the USA (1991) Infant Mortality in the USA (1991) UnmarriedMarriedTotal Deaths16,71218,78435,496.
Rule of sample proportions IF:1.There is a population proportion of interest 2.We have a random sample from the population 3.The sample is large enough.
MATH IN THE FORM OF STATISTICS IS VERY COMMON IN AP BIOLOGY YOU WILL NEED TO BE ABLE TO CALCULATE USING THE FORMULA OR INTERPRET THE MEANING OF THE RESULTS.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Reasoning in Psychology Using Statistics Psychology
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Data Analysis.
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Introduction to Inference Sampling Distributions.
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
Psych 230 Psychological Measurement and Statistics Pedro Wolf September 16, 2009.
Descriptive Statistics Used in Biology. It is rarely practical for scientists to measure every event or individual in a population. Instead, they typically.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
AP PSYCHOLOGY: UNIT I Introductory Psychology: Statistical Analysis The use of mathematics to organize, summarize and interpret numerical data.
Sampling Distributions
INTRODUCTION TO STATISTICS
AP Biology Intro to Statistics
AP Biology Intro to Statistics
INF397C Introduction to Research in Information Studies Spring, Day 12
TYPES OF GRAPHS There are many different graphs that people can use when collecting Data. Line graphs, Scatter plots, Histograms, Box plots, bar graphs.
Psychology Unit Research Methods - Statistics
Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern October 17 and 19, 2017.
Data Analysis-Descriptive Statistics
Introduction to Summary Statistics
Distribution of the Sample Means
AP Biology Intro to Statistics
Introduction to Summary Statistics
Descriptive Statistics: Presenting and Describing Data
Introduction to Summary Statistics
STATS DAY First a few review questions.
AP Biology Intro to Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
Are our results reliable enough to support a conclusion?
Introduction to Summary Statistics
Standard Deviation & Standard Error
Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern April 9, 2018.
Introduction to Summary Statistics
Statistical analysis.
GENERALIZATION OF RESULTS OF A SAMPLE OVER POPULATION
Introduction to Summary Statistics
Data Literacy Graphing and Statisitics
Presentation transcript:

Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

The Basics Let’s start with a review of the basics of statistics. Mean: What most people consider “average.” The sum of all scores divided by the number of scores. The mean is good for the average of normally distributed data. Median: The middle number when data is ordered. If you have an even number, it’s the mean of the two middle points. The median is good for the average of data that is not normally distributed. Mode: The most frequently-seen value in the data. 0 if no data points repeat.

Distribution Chart of Heights of 100 Control Plants Data Distribution Feast your eyes on this data and try to get a rough sense of how a histogram (frequency chart) would look. Where would the peak be? Distribution Chart of Heights of 100 Control Plants Height of plants (cm) # of Plants 0.0-0.9 3 1.0-1.9 10 2.0-2.9 21 3.0-3.9 30 4.0-4.9 20 5.0-5.9 14 6.0-6.9 2

Data Distribution This is a normal distribution, also known as a bell curve. The majority of individuals are “medium.”

Abnormal Distribution? Human height is a fairly normal distribution. Average U.S. woman (age 20+) is 5’ 4”. Average U.S. man (age 20+) is 5’ 9.5”. About 50% of people are at or above average and 50% are at or below average. What, then, is not a normal distribution? Imagine if most women are 5’ 4”, but no one is taller. That’s not a normal distribution, and it won’t be a bell curve.

Abnormal Distribution The same goes for test scores. If we get an average of 80% on a test, we don’t necessarily have a normal distribution. That’s why the median is better than the mean for test scores. Imagine if the average were a 100% – definitely not a normal distribution.

Back to Standard Deviation/Error Suppose two students take a test. One gets a 100%, one gets a 0%. What’s the mean? 50%. One gets a 50%, one gets a 50%. So it’s the same mean, but we got there very differently. This could mean a lot about the test. Variance measures the average “difference” from the mean in a set of data.

Variance Variance is given by the symbol s2. A high variance is indicative of a lot of deviation from the mean. A low variance is indicative of relatively stable values.

Calculating Variance Σ is “sum of” – you need to perform the numerator operation for each number in the data set. xi is an individual number in your data set. x̄ (read: “x bar”) is the mean for your data. n is your sample size.

Squares of deviation from mean Sample Samples Let’s try calculating the variance: Plant Height (cm) Deviations from mean Squares of deviation from mean Divided by n-1 (xi) (xi- x) (xi- x)2 A 10 2 4 B 7 -1 1 C 6 -2 D 8 E 9 Mean = 8 Σ (xi- x)2 = 10 10 / (5-1) = 2.5 _ _ _

Whoo, variance! Now what? The standard deviation is simply the square root of the variance. So its symbol is s. In our example, s2 (variance) is 2.5, so s (standard deviation) is 1.58. Now, you may be asking why we bother taking this statistic, if variance seems to do the same thing. The reason is that we can make some inferences and statements about the data in the same way we used chi-squared tables to make inferences about the role of chance.

Standard Deviation (SD) Inferences If you assume a normal distribution of data, 68.27% of data is within 1 SD of the mean. No real difference. 95.45% of the data is within 2 SD. Anything outside is probably an outlier. 99.73% of the data is within 3 SD. Anything outside is almost definitely an outlier.

Standard Deviation (SD) Inferences Suppose the average height of a population is 6 feet (SD = 0.5 feet). If the population is normally distributed: 68.27% of the population is between 5.5’ and 6.5’. 95.45% of the population is between 5’ and 7’. 99.73% of the population is between 4.5’ and 7.5’.

Standard Deviation The standard deviation (and mean/variance) allow us to learn something about an entire population from just a sample. Assuming a normal distribution. For example, if we took a sample of pro basketball players’ heights, we could generalize the raw data of our sample to the entire NBA. Key: The more samples we take, and therefore the more “means” we determine, the closer we’ll get to the actual mean of the entire league.

Standard Error The standard error of the means (SEM) (or just plain standard error) is a way to determine how likely our data is off from reality due to chance. Oddly a little like x2. Example: Consider the NBA player height survey. We could sample 10 players and get the average height, and get the standard deviation from that. However, if we continued to sample 10 players over and over and over again, the mean of our calculated means would start to become more like the true mean. Standard error of the means helps us figure out how close our calculated mean is to the true mean, even without knowing it.

Standard Error Put it another way: If we survey 10 players, that’s a low number. Is it likely that those 10 players perfectly represent the league? Probably not. If we survey 300 players, that’s a high number. Is it likely that those 300 players perfectly represent the league? Probably.

Standard Error The formula for standard error should now make sense: s = standard deviation n = sample size The standard error is best when it is closest to 0.

Standard Error vs. Standard Deviation Key: Standard deviation is the deviation of the raw data from the sample’s mean. Think the deviation of an NBA player’s height from the average of a surveyed population. Key: Standard error is the deviation of the sample from the actual population’s mean. Think the deviation of our surveyed population’s mean height from the true mean height of an NBA player from the entire league.

One last way to understand this… Remember the potato cores? You can calculate the average potato core mass, but that doesn’t tell us how consistent the mass was. That’s why we have standard deviation. Once you get a mean for your samples, it also doesn’t tell us if your set of potato cores was representative of all the cores I was slicing. That’s why we have standard error.

Standard Error vs. Standard Deviation Interpreting data: Generally you want standard deviation low. This means your underlying data set is more consistent. Why is that important? You definitely want standard error low. How can we minimize standard error? Have a low standard deviation (out of our control). Have a large sample size (in our control).

Confidence Intervals & Error Bars In addition to the inferences about data from before (68% within one SD, et cetera), we also can make inferences using SEM. These are more important for biology. Traditionally, 95% is the confidence we need in our data (just like in chi-squared analyses). For SEM, 95% confidence is a confidence interval represented on a graph as error bars. Let’s take a closer look.

Confidence Intervals & Error Bars Suppose you want to see if Central Bucks HS students are significantly taller than Council Rock HS students. You can’t do a x2 analysis because there’s no “expected.” So, you take the mean of some of the students from each district. You can’t measure all of them – that’d take forever. You get the SD and SEM as shown: Let’s graph the means. Team Mean Standard Deviation Standard Error Council Rock 72 in. 6 in. 1.90 in. Central Bucks 80 in. 4 in. 1.26 in.

Mean Height of High School Students 86 84 82 80 78 76 74 72 70 68 66 Height (in) Council Rock Central Bucks District

Confidence Intervals & Error Bars Team Mean Standard Deviation Standard Error Council Rock 72 in. 6 in. 1.90 in. Central Bucks 80 in. 4 in. 1.26 in. Okay, now let’s figure out a 95% confidence interval. The 95% confidence interval is traditionally ± 2 SEM about the mean. In this case: C. Rock = 72 in ± 3.80 in (since 1.90 in * 2 = 3.80 in) C. Bucks = 80 in ± 2.52 in (since 1.26 in * 2 = 2.52 in) Now let’s draw the intervals on the graph.

Mean Height of High School Students 86 84 82 80 78 76 74 72 70 68 66 The shapes are the 95% confidence intervals. Since they don’t overlap between the districts, there is probably a significant difference between the heights of the two. Height (in) Council Rock Central Bucks District

Confidence Interval “Frame of Mind” When you construct a graph with confidence intervals and find they do overlap, it suggests insignificant (null) results. It’s possible that the real average height of ALL Council Rock students is actually equal to the same for the Central Bucks. This is also known as sampling error. In other words, there is some average height, within both confidence intervals, that could make the two teams equal. If there is no overlap, it suggests significance.

Practice Standard Deviation and Standard Error Procedural Practice

Practice How else are we going to practice standard deviation and standard error? With your data! Find in your lab notebooks the measurements you took on potato core size. Calculate the standard deviation and standard error for your data set with your lab group. See why I had you take their masses individually?

Practice Calculate standard deviation: Calculate the standard error: What is the SD of your set of three cores before the study and the SD of your three cores afterward? Calculate the standard error: For each set of data, how likely is our average potato mass was close to the actual average potato mass of all the slices I cut for our lab? No error bars needed. Last Key Note: Your units for SD and SE match the units of the mean (here it’s grams).