Statistical Significance of Data

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
IB Math Studies – Topic 6 Statistics.
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
Chapter 2 Simple Comparative Experiments
Inferences About Process Quality
VARIABILITY. PREVIEW PREVIEW Figure 4.1 the statistical mode for defining abnormal behavior. The distribution of behavior scores for the entire population.
IB Math Studies – Topic 6 Statistics.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Vocabulary for Box and Whisker Plots. Box and Whisker Plot: A diagram that summarizes data using the median, the upper and lowers quartiles, and the extreme.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Statistical Analysis Statistical Analysis
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
More About Significance Tests
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
POPULATION DYNAMICS Required background knowledge:
Is this quarter fair? How could you determine this? You assume that flipping the coin a large number of times would result in heads half the time (i.e.,
Anthony J Greene1 Dispersion Outline What is Dispersion? I Ordinal Variables 1.Range 2.Interquartile Range 3.Semi-Interquartile Range II Ratio/Interval.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Chapter 4 Variability. Variability In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution. In.
Beak of the Finch Natural Selection Statistical Analysis.
Chi-squared Tests. We want to test the “goodness of fit” of a particular theoretical distribution to an observed distribution. The procedure is: 1. Set.
Table of Contents 1. Standard Deviation
Teacher Notes This product is a PPT showing the step by step directions for creating a box and whisker plot. Also included are guided notes allowing students.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Describing and Displaying Quantitative data. Summarizing continuous data Displaying continuous data Within-subject variability Presentation.
Measures of Dispersion How far the data is spread out.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Test for Significant Differences T- Tests. T- Test T-test – is a statistical test that compares two data sets, and determines if there is a significant.
Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
The Statistical Analysis of Data. Outline I. Types of Data A. Qualitative B. Quantitative C. Independent vs Dependent variables II. Descriptive Statistics.
Independent t-Test CJ 526 Statistical Analysis in Criminal Justice.
Quantitative data. mean median mode range  average add all of the numbers and divide by the number of numbers you have  the middle number when the numbers.
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Statistical Analysis. Null hypothesis: observed differences are due to chance (no causal relationship) Ex. If light intensity increases, then the rate.
Data Analysis.
PCB 3043L - General Ecology Data Analysis.
STATISTICS FOR SCIENCE RESEARCH (The Basics). Why Stats? Scientists analyze data collected in an experiment to look for patterns or relationships among.
© Copyright McGraw-Hill 2004
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Box-and-Whisker Plots. What is a box and whisker plot? A box and whisker plot is a visual representation of how data is spread out and how much variation.
Chapter 6: Interpreting the Measures of Variability.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
2-6 Box-and-Whisker Plots Indicator  D1 Read, create, and interpret box-and whisker plots Page
What is a box-and-whisker plot? 5-number summary Quartile 1 st, 2 nd, and 3 rd quartiles Interquartile Range Outliers.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
1 Design and Analysis of Experiments (2) Basic Statistics Kyung-Ho Park.
DRAWING INFERENCES FROM DATA THE CHI SQUARE TEST.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Chapter 1 Lesson 4 Quartiles, Percentiles, and Box Plots.
Chi-Square (χ 2 ) Analysis Statistical Analysis of Genetic Data.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Chap 1: Exploring Data 1.3: Measures of Center 1.4: Quartiles, Percentiles, and Box Plots 1.7: Variance and Standard Deviation.
Get out your notes we previously took on Box and Whisker Plots.
Range, Mean, Median, Mode Essential Question: How do we take a random sample, and what statistics can we find with the data? Standard: MM1D3.a.
Unit 4 Statistics Review
Box and Whisker Plots Algebra 2.
Box and Whisker Plots.
Measures of Central Tendency
Standard Deviation & Standard Error
5 Number Summaries.
Quadrat sampling & the Chi-squared test
Quadrat sampling & the Chi-squared test
Presentation transcript:

Statistical Significance of Data

Box and Whisker Plot …can be useful when dealing with many data values. Rather than showing all of the data, it selects five statistics. Five-number summary is another name for the visual representations of the box-and-whisker plot. The five statistics consist of the: • Median • Quartiles (lower and upper) • Minimum • Maximum

Make a Box and Whisker Plot from these numbers: 54 68 18 93 87 27 100 91 52 85 34 61 56 78 82 1. Put the numbers in numerical order. 2. Find the Minimum (smallest value of the entire set) 3. Find the Maximum (largest value of the entire set) 4. Find the Median number (the number in the middle of the ordered set of numbers). 5. Lower Quartile (Q1) - numbers to the left of the median, find its median. 6. Upper Quartile (Q3) - numbers to the right of the median, find its median. 7. Draw a box to represent the IQR (interquartile range) and solve: (Q3 – Q1) = IQR * If you are finding the median of an even set of numbers - find the two middle numbers,    add them together and divide by two to get the median. 10 20 30 40 50 60 70 80 90 100

Make a Box and Whisker Plot: 54 68 18 93 87 27 100 91 52 85 34 61 56 78 82 1. Put the numbers in numerical order. 18 - 27 - 34 - 52 - 54 - 56 - 61 - 68 - 78 - 82 - 85 - 87 - 91 - 93 - 100 2. Find the Minimum (smallest value of the entire set) 18 3. Find the Maximum (largest value of the entire set) 100 4. Find the Median number (the number in the middle of the ordered set of numbers). 68 5. Lower Quartile (Q1) - # to the left of the median, find its median. 52+54/2 = 53 6. Upper Quartile (Q3) - # to the right of the median, find its median. 85+87/2 = 86 7. Draw a box to represent the IQR (interquartile range) and solve: (Q3 - QL) = IQR 86 - 53 = 33 10 20 30 40 50 60 70 80 90 100

* Make a Box and Whisker Plot: 46 39 10 48 46 45 51 42 49 46 39 10 48 46 45 51 42 49 1. Put the numbers in numerical order. 3 - 39 - 42 - 45 - 46 - 46 - 48 - 49 – 51 2. Find the Minimum (smallest value of the entire set) 10 3. Find the Maximum (largest value of the entire set) 51 4. Find the Median number (the number in the middle of the ordered set of numbers). 46 5. Lower Quartile (Q1) - # to the left of the median, find its median. 39 + 42/2 = 40.5 6. Upper Quartile (Q3) - # to the right of the median, find its median. 48 + 49 = 48.5 7. Draw a box to represent the IQR (interquartile range) and solve: (Q3 – Q1) = IQR 68.5 – 40.5 = 8 8. Is 10 an Outlier that should be ignored? Multiply 1.5 (IQR) = 1.5 x 8 = 12; then Q1 – 12 = 40.5 – 12 = 28.5 (10 is well below 28.5 = an outlier) Is 51 an outlier? Multiply 1.5 (IQR) = 1.5 x 8 = 12; then Q3 – 12 = 48.5 + 12 = 60.5 (51 is within that range = NOT an outlier) 39 For the upper and lower quartile, include the minimum and maximum, but do not include the actual median So change the minimum to the next smallest number (39) and draw the whisker * 10 20 30 40 50 60

Dot Plot …can be useful when trying to find patterns, trends, or clusters of data. In some cases it may show a discrepancy in data that could be due to unavoidable error or avoidable/user error.

Using this data, create a: 1. Box and Whisker Plot 2. Dot Plot

Box and Whisker Dot Plot 1) 1 3 4 4 4 5 5 6 7 7 7 7 8 10 10 11 11 12 13 22 2) Median = 7 + 7 / 2 = 7 grams of sugar/serving 3) Lowest = 1 gram of sugar/serving 4) Highest = 22 grams of sugar/serving 5) Lower Quartile = 4+5/2 = 4.5 grams of sugar/serving 6) Upper Quartile = 10+11/2 = 10.5 grams of sugar/serving Dot Plot 5 10 15 20 25

Chi-Squared Introduction: The Chi Square test (X2) is often used in science to test if data you observe from an experiment is the same as the data you would expect from the experiment. Calculating X2 values allow you to determine if test results can be attributed to randomness or not. If the data differs greatly and is not due to randomness, other factors must be influencing your results. Objectives: • Determine the degrees of freedom (df) for this investigation (category or class number -1) = n-1. • Calculate the X 2 value for a given set of data. X2 = ∑ (observed value – expected value) expected value • Use the Chi Square Table to determine if the calculated value is equal to or less than the critical value. • Determine if the Chi Square value exceeds the critical value & if the null hypothesis is accepted or rejected. Biologists generally use a Probability value of 0.05 (p = 0.05) in a Chi Square Table. That p-value means the probability of a random error would be fewer than 1 time in 20, thus the value (p = 0.05). This is like saying you are 95% certain that the results are a due to random chance. Degree of Freedom is the number of choices(n) minus 1 df = n – 1 The P-Value & the Degree of Freedom are used to determine the Critical Value If the chi square value ≤ the critical value the Null Hypothesis is accepted as statistically reasonable. If the chi square value is > the critical value, then it is seen as a “statistically significant” difference – meaning that the validity of the hypothesis would be under question, suggesting that the results are “unlikely to have occurred by chance,” thus rejecting the Null Hypothesis.

Difference Between Null Hypothesis and Experimental Hypothesis: The null hypothesis is the hypothesis that the dependent variable in an experiment is not affected by the independent variable. The experimental hypothesis is that the dependent variable is affected by the independent variable.  For example, let's say that you are testing whether playing violent video games affect aggressiveness. The playing or not playing of video games is the independent variable, and the aggressiveness is the dependent variable.  1) Your experimental hypothesis could be that playing violent video games affects aggressiveness in the test subjects.  The null hypothesis is that the violent video games do not in any way affect aggressiveness.  2) Alternatively, you could make an experimental hypothesis saying that playing violent video games make people more aggressive, in which case the null hypothesis would be that it does not make people more aggressive. In this case you'd get a directional null hypothesis, e.g. that there is either no effect or the effect is the opposite of what you expect.   Question: Is there a statistically significant difference in the data?

Scenario 1: While reviewing zoo records, a zookeeper notices that the baboon exhibits each average 42 incidences of aggressive behavior a month. He hypothesizes that changing the intensity of the light in the primate exhibit will reduce the amount of aggression between the baboons. In exhibit A, with a lower light intensity, he observed 32 incidences of aggression over a one month period. In exhibit B, with normal lights, he observes 45 incidences of aggression. Would you accept or reject his experimental hypothesis? Exhibit A (32 – 42)2 = 100 = 2.38 42 42 = 2.59 (45 – 42)2 = 9 = 0.21 42 42 Exhibit B P-value = 0.05 Degree of Freedom (df) = number of choices – 1 (n is aggressive or not aggressive ) 2-1 = 1 df Critical Value = 3.84 Accept or Reject his experimental hypothesis?

Scenario 2: A behavioral psychologist notices that gate #4 in the polar bear exhibit is preferred over gate 1, 2 and 3. If there were no impetus for the polar bears to go through that particular gate, then one would expect each gate to be used equally, 25% of the time. However, based on her observations, the polar bears entered gate 1 (9%), gate 2 (20%), gate 3 (25%), and gate 4 (46%). She believes there is something making them select that door in such great numbers. Would you accept or reject the null hypothesis (that nothing is impacting their decision)? (9 – 25)2 = 256 = 10.24 25 25 Gate 1 (20 – 25)2 = 25 = 1.00 25 25 Gate 2 = 28.88 (25 – 25)2 = 0 = 0.00 25 25 Gate 3 (46 – 25)2 = 441 = 17.64 25 25 Gate 4 P-value = 0.05 Degree of Freedom (df) = number of choices – 1 (n is 4 gates) 4-1 = 3 df Critical Value = 7.82 Accept or Reject the null hypothesis?

Statistics Worksheet N: Total number of individuals in a population Mean ( ) : (same as the average) add up the values / number of trials Sum of the Squares (SS) Variance (s2) Standard Deviation (s) Standard Error of the Mean SS = N: Total number of individuals in a population n: Total number of individuals in a sample of the population Xi: a single measurement ∑: Summation x: sample mean

Data of Non-Survivors (xi - 1)2 2s √n Sample # Non-Survivor Beak Depth (mm) X1 measurement Squared Difference (xi - 1)2 1 7.52 2 9.31 3 8.20 4 8.39 5 10.50 Mean = (total/sample #) Sum of Squares Variance Standard deviation Standard error of the mean 95% CI Data of Non-Survivors Data on 100 medium ground finches from Peter and Rosemary Grant’s 40 years of Research in the Galápagos 43.92/5 = 8.78 SS = SS = xxxx S2 = xxxx S = xxxx SE = xxxx 2s √n CI = xxxx n = 5 samples

Data of Non-Survivors (xi - 2)2 2s √n Sample # Non-Survivor Beak Depth (mm) X1 measurement Squared Difference (xi - 2)2 1 9.10 2 8.80 3 9.15 4 11.01 5 10.86 Mean = (total/sample #) Sum of Squares Variance Standard deviation Standard error of the mean 95% CI Data of Non-Survivors Data on 100 medium ground finches from Peter and Rosemary Grant’s 40 years of Research in the Galápagos 48.92/5 = 9.78 SS = SS = xxxx S2 = xxxx S = xxxx SE = xxxx 2s √n CI = xxxx n = 5 samples

t–Test Statistics The t-Test determines the probability (p) that any observed differences between the means of the two samples (i.e. non-survivors and survivors) occurred simply by chance, and not natural selection. | | = absolute value, always a positive number n = 5 birds