Download presentation
Presentation is loading. Please wait.
Published byAlban Perkins Modified over 8 years ago
2
P Values - part 2 Samples & Populations Robin Beaumont 2011 With much help from Professor Chris Wilds material University of Auckland
3
Aspects of the P value
4
Resume P value = P(observed summary value + those more extreme |population value = x) A P value is a conditional probability considering a range of outcomes Sample value Hypothesised population value
5
Populations and samples Ever constant at least for your study! = Parameter estimate = statistic
6
One sample
7
Size matters – single samples
8
Size matters – multiple samples
9
We only have a rippled mirror
10
Standard deviation - individual level = measure of variability 'Standard Normal distribution' Total Area = 1 0 1 = SD value 68% 95% 2 Area: Between + and - three standard deviations from the mean = 99.7% of area Therefore only 0.3% of area(scores) are more than 3 standard deviations ('units') away. - But does not take into account sample size = t distribution Defined by sample size aspect ~ df Area! Wait and see
11
Sampling level -‘accuracy’ of estimate From: http://onlinestatbook.com/stat_sim/sampling_dist/index.htmlhttp://onlinestatbook.com/stat_sim/sampling_dist/index.html = 5/√5 = 2.236 SEM = 5/√25 = 1 We can predict the accuracy of your estimate (mean) by just using the SEM formula. From a single sample Talking about means here
12
Example - Bradford Hill, (Bradford Hill, 1950 p.92) mean systolic blood pressure for 566 males around Glasgow = 128.8 mm. Standard deviation =13.05 Determine the ‘precision’ of this mean. “We may conclude that our observed mean may differ from the true mean by as much as ± 2.194 (.5485 x 4) but not more than that in around 95% of samples. page 93. [edited] All possible values of POPULATION mean
13
Sampling summary The SEM formula allows us to: predict the accuracy of your estimate ( i.e. the mean value of our sample) From a single sample Assumes Random sample
14
Variation what have we ignored! Onto Probability now
15
Probabilities are rel. frequencies All outcomes at any one time = 1
16
Multiple outcomes at any one time Probability Density Function Scores Probability 0 1 2 3 4 5 6 7 8 9 10 11 333743475357636773778387 The total area = 1 total 48 scores Density p(score<45) = area A A p(score > 50) = area B B P(score 50) = Just add up the individual outcomes
17
= Conditional Probability Male P(male) female No Disease X Disease X No Disease X Disease X AND Male What happens in the past affects the present Multiple each branch of the tree to get end value Disease X P(disease x |male) P(disease AND male) = P(male) x P(disease x | male) P(disease AND male) /P(male) = P(disease x | male)
18
Screening Example 0.1% of the population carry a particular faulty gene. A test exists for detecting whether an individual is a carrier of the gene. In people who actually carry the gene, the test provides a positive result with probability 0.9. In people who don’t carry the gene, the test provides a positive result with probability 0.01. Let G = person carries gene P = test is positive for gene N = test is negative for gene Errors If someone gets a positive result when tested, find the probability that they actually are a carrier of the gene. We want to find P(P) = P(G and P) + P(G' and P) = 0.0009 + 0.00999 = 0.01089 P( P | G) P(P | G) ≠ P (G | p) ORDER MATTERS
19
Survival analysis Each years survival depends on previous ones or does it?
20
Probability summary All outcomes at any one time add up to 1 Probability histogram = area under curve =1 -> specific areas = set of outcomes -> specific areas = ‘equal to or more extreme’ Conditional probability – present dependent on past – ORDER MATTERS
21
Putting it all together
22
Statistics Summary measure – SEM, Average etc T statistic – different types, simplest: So when t = 0 means 0/anything = estimated and hypothesised population mean are equal So when t = 1 observed different same as SEM So when t = 10 observed different much greater than SEM
23
T statistic example Serum amylase values from a random sample of 15 apparently healthy subjects. The mean = 96 SD= 35 units/100 ml. How likely would such a sample be obtained from a population of serum amylase determinations with a mean of 120. (taken from Daniel 1991 p.202 adapted) This looks like a rare occurrence? But for what GIVEN the population value = the null hypothesis
24
t density:s x = 9.037 n =15 0 120 96 -2.656 t 2.656 Shaded area =0.0188 Original units: 0 Serum amylase values from a random sample of 15 apparently healthy subjects. mean =96 SD= 35 units/100 ml. How likely would such a sample be obtained from a population of serum amylase determinations with a mean of 120. (taken from Daniel 1991 p.202 adapted) What does the shaded area mean! Given that the sample was obtained from a population with a mean of 120 a sample with a T (n=15) statistic of - 2.656 or 2.656 or one more extreme will occur 1.8% of the time = just under two samples per hundred on average... Given that the sample was obtained from a population with a mean of 120 a sample of 15 producing a mean of 96 (120-x where x=24) or 144 (120+x where x=24) or one more extreme will occur 1.8% of the time, that is just under two samples per hundred on average. But it this not a P value P value = 2 · P(t (n−1) < t| H o is true) = 2 · [area to the left of t under a t distribution with df = n − 1]
25
P value and probability for t statistic p value = 2 x P(t (n-1 ) values more extreme than t (n-1 ) | H o is true ) = 2 · [area to the left of t under a t distribution with n − 1 shape] A p value is a special type of probability with: Multiple outcomes + conditional upon the specified parameter value
26
Putting it all together Do we need it!
27
Rules t density:s x = 9.037 n =15 0 120 96 -2.656 t 2.656 Shaded area =0.0188 Original units: 0 Set a level of acceptability = critical value (CV)! Say one in twenty 1/20 = Or 1/100 Or 1/1000 or.... If our result has a P value of less than our level of acceptability. Reject the parameter value. Say 1 in 20 (i.e.CV=0.5) Given that the sample was obtained from a population with a mean (parameter value) of 120 a sample with a T (n=15) statistic of -2.656 or 2.656 or one more extreme with occur 1.8% of the time, This is less than one in twenty therefore we dismiss the possibility that our sample came from a population mean of 120.. What do we replace it with?
28
Fisher – only know and only consider the model we have i.e. The parameter we have used in our model – when we reject it we accept that any value but that one can replace it. Neyman and Pearson + Gossling Must have an alternative specified value for the parameter
29
If there is an alternative - what is it – another distribution! Power – sample size Affect size – indication of clinical importance: Serum amylase values from a random sample of 15 apparently healthy subjects. mean =96 SD= 35 units/100 ml. How likely would such a sample be obtained from a population of serum amylase determinations with a mean of 120. (taken from Daniel 1991 p.202 adapted)
30
α = the reject region = 120 = 96 Correct decisions incorrect decisions
31
Insufficient power – never get a significant result even when effect size large Too much power get significant result with trivial effect size
32
Life after P values Confidence intervals Effect size Description / analysis Bayesian statistics - qualitative approach by the back door! Planning to do statistics for your dissertation? see: My medical statistics courses: Course 1: www.robin-beaumont.co.uk/virtualclassroom/stats/course1.html YouTube videos to accompany course 1: http://www.youtube.com/playlist?list=PL9F0EBD42C0AB37D0 Course 2: www.robin-beaumont.co.uk/virtualclassroom/stats/course2.html YouTube videos to accompany course 2: http://www.youtube.com/playlist?list=PL05FC4785D24C6E68
33
Your attitude to your data
34
Where do they fit in!
35
Students bloomers The p value did not indicate much statistic significance Given that the population comes from one population The p value is 0.003 thus rejecting the null hypothesis and there is a statistical significance Correlation = 0.25 (p<0.001) indicating that assuming that the data come from a bivariate normal distribution with a correlation of zero you would obtain a correlation of <0.000. There is 95% chance that the relationship among the variables is not due to chance
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.