Normal Distribution Chapter 5 Normal distribution

Slides:



Advertisements
Similar presentations
Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
Advertisements

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/23/12 Sections , Single Proportion, p Distribution (6.1)
Sampling: Final and Initial Sample Size Determination
Inference Sampling distributions Hypothesis testing.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 Randomization distribution p-value.
Statistics: Unlocking the Power of Data Lock 5 Inference Using Formulas STAT 101 Dr. Kari Lock Morgan Chapter 6 t-distribution Formulas for standard errors.
Section 3.4 Bootstrap Confidence Intervals using Percentiles.
Topics: Inferential Statistics
Inferences About Means of Single Samples Chapter 10 Homework: 1-6.
Connecting Simulation- Based Inference with Traditional Methods Kari Lock Morgan, Penn State Robin Lock, St. Lawrence University Patti Frazer Lock, St.
Inference for Categorical Variables 2/29/12 Single Proportion, p Distribution Intervals and tests Difference in proportions, p 1 – p 2 One proportion or.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Section 5.2 Confidence Intervals and P-values using Normal Distributions.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal.
Normal Distribution Chapter 5 Normal distribution
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
Many times in statistical analysis, we do not know the TRUE mean of a population of interest. This is why we use sampling to be able to generalize the.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
Chapter 10 – Sampling Distributions Math 22 Introductory Statistics.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
The z test statistic & two-sided tests Section
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14.
PSY 307 – Statistics for the Behavioral Sciences Chapter 9 – Sampling Distribution of the Mean.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
AP Statistics Unit 5 Addie Lunn, Taylor Lyon, Caroline Resetar.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 250 Dr. Kari Lock Morgan SECTION 4.1 Hypothesis test Null and alternative.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution (5.1) Central limit theorem.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Please hand in homework on Law of Large Numbers Dan Gilbert “Stumbling on Happiness”
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Chapter Nine Hypothesis Testing.
Inference for Proportions
Chapter 7 Review.
Sampling Distributions
3. The X and Y samples are independent of one another.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Measuring Evidence with p-values
Sampling Distributions
Hypothesis Tests for 1-Sample Proportion
Sampling Distribution of Sample Means Open Minitab
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Test Review: Ch. 7-9
When we free ourselves of desire,
Two-sided p-values (1.4) and Theory-based approaches (1.5)
Connecting Intuitive Simulation-Based Inference to Traditional Methods
Chapter 7 Sampling Distributions.
CHAPTER 22: Inference about a Population Proportion
Using Simulation Methods to Introduce Inference
INTEGRATED LEARNING CENTER
Confidence Intervals: The Basics
Using Simulation Methods to Introduce Inference
Chapter 7 Sampling Distributions.
Sampling Distributions
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2018 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Chapter 7 Sampling Distributions.
Comparing Two Proportions
Comparing Two Proportions
Chapter 7 Sampling Distributions.
Presentation transcript:

Normal Distribution Chapter 5 Normal distribution STAT 250 Dr. Kari Lock Morgan Normal Distribution Chapter 5 Normal distribution Normal distribution for p-values (5.1) Normal distribution for confidence intervals (5.2)

How do Malaria parasites impact mosquito behavior? Question of the Day How do Malaria parasites impact mosquito behavior?

Malaria Parasites and Mosquitoes Mosquitos were randomized to either eat from a malaria infected mouse or a healthy mouse After infection, the parasites go through two stages: Oocyst (not yet infectious), Days 1-8 Sporozoite (infectious), Days 9 – 28 Response variable: whether the mosquito approached a human (in a cage with them) Does this behavior differ by infected vs control? Does it differ by infection stage? Dr. Andrew Read, Professor of Biology and Entomology and Penn State, is a co-author Cator LJ, George J, Blanford S, Murdock CC, Baker TC, Read AF, Thomas MB. (2013). ‘Manipulation’ without the parasite: altered feeding behaviour of mosquitoes is not dependent on infection with malaria parasites. Proc R Soc B 280: 20130711.

Malaria Parasites and Mosquitoes Malaria parasites would benefit if: Mosquitos sought fewer blood meals after getting infected, but before becoming infectious (oocyst stage), because blood meals are risky Mosquitoes sought more blood meals after becoming infectious (sporozoite stage), to pass on the infection Does infecting mosquitoes with Malaria actually impact their behavior in this way?

Oocyt Stage H0: pI = pC, Ha: pI < pC H0: pI = pC, Ha: pI > pC We’ll first look at the Oocyst stage, after the infected group has been infected, but before they are infectious. pC: proportion of controls to approach human pI: proportion of infecteds to approach human What are the relevant hypotheses? H0: pI = pC, Ha: pI < pC H0: pI = pC, Ha: pI > pC H0: pI < pC, Ha: pI = pC H0: pI > pC, Ha: pI = pC

Data: Oocyst Stage

Randomization Test p-value observed statistic

Randomization and Bootstrap Distributions Mate choice and offspring fitness Tea and immunity Difference in proportions Difference in means What do you notice? Mercury in fish Sleep and anxiety Single mean Correlation

N(mean, standard deviation) Normal Distribution The symmetric bell-shaped curve we have seen for almost all of our distribution of statistics is called a normal distribution The normal distribution is fully described by it’s mean and standard deviation: N(mean, standard deviation)

Central Limit Theorem For a sufficiently large sample size, the distribution of sample statistics for a mean or a proportion is normal The larger the sample sizes, the more normal the distribution of the statistic will be www.lock5stat.com/statkey

Randomization Distributions N(mean, standard deviation) If a randomization distribution is normally distributed, we can write it as N(null value, se) N(statistic, se) N(parameter, se)

Malaria and Mosquitoes Which normal distribution should we use to approximate this? N(0, -0.131) N(0, 0.056) N(-0.131, 0.056) N(0.056, 0)

Normal Distribution We can compare the original statistic to this Normal distribution to find the p-value!

p-value from N(null, SE) Exact same idea as randomization test, just using a smooth curve! observed statistic

Standardized Data Often, we standardize the statistic to have mean 0 and standard deviation 1 How? z-scores! What is the equivalent for the null distribution? statistic null value SE

Standardized Statistic The standardized test statistic (also known as a z-statistic or SDS) is Calculating the number of standard errors a statistic is from the null lets us assess extremity on a common scale

Standardized Statistic Malaria and Mosquitoes: From original data: statistic = -0.131 From null hypothesis: null value = 0 From randomization distribution: SE = 0.056 Compare to N(0,1) to find p-value…

Standard Normal The standard normal distribution is the normal distribution with mean 0 and standard deviation 1 Standardized statistics are compared to the standard normal distribution

p-value from N(0,1) If a statistic is normally distributed under H0, the p-value can be calculated as the proportion of a N(0,1) beyond

standardized statistic p-value from N(0,1) p-value Exact same idea as before, just standardized! standardized statistic

Replace with smooth curve The p-value is always the proportion in the tail(s) beyond the relevant statistic! Randomization test: We have evidence that mosquitoes exposed to malaria parasites are less likely to approach a human before they become infectious than mosquitoes not exposed to malaria parasites. Replace with smooth curve N(null, SE) N(0,1) Standardize

Sporozoite Stage For the data from the Sporozoite stage, after infectious, what are the relevant hypotheses? pC: proportion of controls to approach human pI: proportion of infecteds to approach human What are the relevant hypotheses? H0: pI = pC, Ha: pI < pC H0: pI = pC, Ha: pI > pC H0: pI < pC, Ha: pI = pC H0: pI > pC, Ha: pI = pC

Data Oocyst Stage Sporozoite Stage

Sporozoite Stage The difference in proportions is 0.15 and the standard error is 0.05. Is this significant? Yes No Standard normal

Malaria and Mosquitoes It appears that mosquitoes infected by malaria parasites do, in fact, behave in ways advantageous to the parasites! Oocyst stage results: They are less likely to engage in risky blood meals before becoming infectious (so more likely to stay alive) Sporozoite stage results: They are more likely to engage in risky blood meals after becoming infectious (so more likely to pass on disease) Toxoplasmosis parasites impact rat and human behavior, malaria parasites impact mosquito behavior… what else might parasites be impacting that we have yet to discover?

Proportion of Infected All mosquitoes in the infected group were exposed to the malaria parasites, but not all mosquitoes were actually infected Of the 201 mosquitoes in the infected group that we actually have data on, only 90 were actually infected (90/201 = 0.448) What proportion of mosquitoes eating from a malaria infected mouse become infected? We want a confidence interval!

Bootstrap Interval 95% Confidence Interval

Bootstrap Distributions If a bootstrap distribution is normally distributed, we can write it as N(parameter, sd) N(statistic, sd) N(parameter, se) N(statistic, se) sd = standard deviation of data values se = standard error = standard deviation of statistic

Normal Distribution We can find the middle P% of this Normal distribution to get the confidence interval!

CI from N(statistic, SE) Same idea as the bootstrap, just using a smooth curve! 95% CI

Replace with smooth curve Bootstrap Interval: 95% CI Replace with smooth curve N(statistic, SE) 95% CI

(Un)-standardization Standardized scale: To un-standardize:

(Un)-standardization In testing, we go to a standardized statistic In intervals, we find (-z*, z*) for a standardized distribution, and return to the original scale Un-standardization (reverse of z-scores): What’s the equivalent for the distribution of the statistic? (bootstrap distribution) statistic ± z* SE

P% Confidence Interval 1. Find values (–z* and z*) that capture the middle P% of N(0,1) 2. Return to original scale with statistic ± z*× SE P% -z* z*

Confidence Interval using N(0,1) If a statistic is normally distributed, we find a confidence interval for the parameter using statistic ± z*× SE where the proportion between –z* and +z* in the standard normal distribution is the desired level of confidence.

Confidence Intervals Find z* for a 99% confidence interval. www.lock5stat.com/statkey

Proportion of Infected Proportion of infected mosquitoes: Sample statistic (from data): 90/201 = 0.448 z* (from standard normal): 2.575 SE (from bootstrap distribution): 0.037 Give a 99% confidence interval for the proportion of mosquitoes who get infected.

z* Why use the standard normal? z* is always the same, regardless of the data! Common confidence levels: 95%: z* = 1.96 (but 2 is close enough) 90%: z* = 1.645 99%: z* = 2.576

Replace with smooth curve N(0, 1) Bootstrap Interval: middle 95% middle 95% Replace with smooth curve Un-standardize N(statistic, SE) statistic ± z* × SE 0.448 ± 1.96 × 0.037 (0.375, 0.521) We are 95% confident that only between 0.375 and 0.521 of the mosquitoes exposed to infection actually get infected. middle 95%

Malaria and Mosquitoes Should we limit our analysis to only those mosquitoes that actually got infected? Why or why not? In favor of yes: In favor of no:

Confidence Interval Formula If distribution of statistic is normal… From N(0,1) From original data From bootstrap distribution

From randomization distribution Formula for p-values If distribution of statistic is normal… From original data From H0 From randomization distribution Compare z to N(0,1) for p-value

Standard Error Wouldn’t it be nice if we could compute the standard error without doing thousands of simulations? We can!!! Or at least we’ll be able to next class…

To Do HW 5.1, 5.2 (due Monday, 3/27)