Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 4: Statistical inference.

Similar presentations


Presentation on theme: "Statistical Genomics Zhiwu Zhang Washington State University Lecture 4: Statistical inference."— Presentation transcript:

1 Statistical Genomics Zhiwu Zhang Washington State University Lecture 4: Statistical inference

2  Homework1, due Feb 3, Wednesday, 3:10PM Administration

3  X2 test on contingency table  Empirical null distribution  X2 test on variance  t test  Hypothesis test  two types of error  Power Outline

4 TransgeneticNon transgeneticSUM Herbicide35540 No herbicide352560 SUM7030100 Observed and expected frequency TransgeneticNon transgeneticSUM Herbicide281240 No herbicide421860 SUM7030100

5  Poisson distribution: Mean=Var=Expected  (Observed-Expected)/Sqrt(Expected) ~ N(0,1)  SUM(Observed-Expected) 2 / Expected ~ X 2 (df)  df=number of independent cells Approximate Distributions

6 TransgeneticNon transgeneticSUM Herbicide35540 No herbicide352560 SUM7030100 Observed and expected frequency TransgeneticNon transgeneticSUM Herbicide281240 No herbicide421860 SUM7030100 49/28+49/12+49/42+49/18=9.72

7 Distribution of x2(1) Observed 9.72 P<1% 99% percentile 6.97 par(mfrow=c(2,2),mar = c(3,4,1,1)) x=rchisq(k,1) d=density(x) plot(x) plot(d) hist(x) plot(ecdf(x)) quantile(x,.99)

8  A sample has mean of 103.6 and variance of 27.82  The sample has 10 observations  Q1: What is the probability that the sample was from a normal distribution with variance of 25?  Q2: What is the probability that the sample was from a normal distribution with mean of 100? Tests on samples

9  Empirical solution:  Sample ten observations from a normal distribution with variance of 25.  Calculate observed variance.  Repeat the sampling and get null distribution of the sample variances  Find percentile of observed variance on the null distribution Q1: distribution with variance of 25

10 x=replicate(10000, {s=rnorm(10,0,5) var=var(s) }) Observed 27.82 P>25% 75% percentile 31.6 > length(x[x>27.82])/10000 [1] 0.3516 par(mfrow=c(2,2),mar = c(3,4,1,1)) d=density(x) plot(x) plot(d) hist(x) plot(ecdf(x)) quantile(x,.75)

11  Theoretical solution: Q1: distribution with variance of 25 v=(10-1)*27.82/25=10.026 > 1-pchisq(10.026,9) [1] 0.3483845 vs. 0.3516 from empirical

12 Q2: distribution with mean of 100  Empirical solution  Sample ten observations from N(100, 25)  Calculate mean  Repeat the process 10,000 times  Null distribution of of the 10,000 means  Determine the percentile of testing mean (103.6) on the null distribution

13 Q2: distribution with mean of 100 x=replicate(10000, {s=rnorm(10,100,5) m=mean(s) }) Observed 103.6 1%<P<5% 95% percentile 102.6 > length(x[x>103.6])/10000 [1] 0.0132 par(mfrow=c(2,2),mar = c(3,4,1,1)) d=density(x) plot(x) plot(d) hist(x) plot(ecdf(x)) quantile(x,.95) quantile(x,.99) 99% percentile 102.6

14 t test

15 T=(103.6-100)/(5/sqrt(10)) P=1-pt(T,9) c(T,P) 2.27683992 0.02440704 Under 5% of threshold, reject the hypothesis that the sample was from a distribution with mean of 100

16 F test

17  Null hypothesis (H0): Initial assumption  Alternative hypothesis (Ha): Opposite to the assumption  Find the probability of H0  If the probability is too low (e.g. 5%), reject Ho and accept Ha  Otherwise, accept Ho Hypothesis test

18  Type I error: Reject true H0, False positive, the probability is the threshold used, e.g. α=5%  Type II error: Accept false H0, false negative, β  Power: Probability to reject false H0, (1-β) Two types of errors and power

19 TestH0 is TrueHo is False Positive (reject H0) False positive Type I: α Power=1-β Negative (Accept H0) Specificity=1-α False negative Type II: β Sum100% Summary

20 Highlight  X2 test on contingency table  Empirical null distribution  X2 test on variance  t test  Hypothesis test  two types of error  Power


Download ppt "Statistical Genomics Zhiwu Zhang Washington State University Lecture 4: Statistical inference."

Similar presentations


Ads by Google