Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School.

Similar presentations


Presentation on theme: "Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School."— Presentation transcript:

1 Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School of Medicine Statistical Genetics Forum, April, 2006

2 References   R.E. Bechhofer, J. Kiefer., M. Sobel. 1968. Sequential identification and ranking procedures. The University of Chicago Press, Chicago.   M.A. Province. 2000. A single, sequential, genome-wide test to identify simultaneously all promising areas in a linkage scan. Genetic Epidemiology,19:301-332.   Q.Y. Zhang, M.A. Province . 2005. Simplified sequential multiple decision procedures for genome scans . 2005 Proceedings of American Statistical Association. Biometrics section:463~468

3 SMDP Sequential Multiple Decision Procedures Sequential test Multiple hypothesis test

4 Idea 1: Sequential n0n0n0n0 Start from a small sample size Increase sample size, sequential test at each stage (SPRT) Stop when stopping rule is satisfied n 0 +1 n 0 +2 n 0 +i … Experiment in next stage Extra data for validation …

5 Idea 2: Multiple Decision SNP1SNP2SNP3 SNP4 SNP5 SNP6 … SNPn Simultaneous test Multiple hypothesis test Independent test Binary hypothesis test test 1 test 2 test 3 test 4 test 5 test 6 test n SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 … SNPn test-wise error and experiment-wise error p value correction Signal group Noise group

6 Binary Hypothesis Test SNP1SNP2SNP3 SNP4 SNP5 SNP6 … SNPn test 1 H0: Eff.(SNP1)=0 vs. H1: Eff.(SNP1)≠0 test 2 H0: Eff.(SNP2)=0 vs. H1: Eff.(SNP2)≠0 test 3 …… test 4 …… test 5 …… test 6 …… test n H0: Eff.(SNPn)=0 vs. H1: Eff.(SNPn)≠0

7 Multiple Hypothesis Test SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 … SNPn H1: SNP1,2,3 are truly different from the others H2: SNP1,2,4 are truly different from the others H3 …… H4 …… H5: SNP4,5,6 are truly different from the others H6 ……… Hu: SNPn,n-1,n-2 are truly different from the others H: any t SNPs are truly different from the others (n-t) u= number of all possible combination of t out of n

8 SMDP Sequential test Multiple hypothesis test Sequential Multiple Decision Procedure

9 Koopman-Darmois(K-D) Populations (Bechhofer et al., 1968) The freq/density function of a K-D population can be written in the form: f(x)=exp{P(x)Q(θ)+R(x)+S(θ)} A.The normal density function with unknown mean and known variance; B.The normal density function with unknown variance and known mean; C.The exponential density function with unknown scale parameter and known location parameter; D.The Bernoulli distribution with unknown probability of “success” on a single trial; E.The Poisson distribution with unknown mean; …… The distance of two K-D populations is defined as :

10 SMDP (Bechhofer et al., 1968) Selecting the t best of M K-D populations Sequential Sampling 1 2 … h h+1 … Pop. 1 Pop. 2 : Pop. t-1 Pop. t Pop. t+1 Pop. t+2 : Pop. M D Y 1,h Y 2,h : Y i,h : Y M,h U possible combinations of t out of M For each combination u Stopping rule Prob. of correct selection (PCS) > P*, whenever D>D*

11 SMDP: P*, t, D* P* arbitrary, 0.95 t fixed or varied D* indifference zone Pop. 1 Pop. 2 : Pop. t-1 Pop. t Pop. t+1 Pop. t+2 : Pop. M D SMDP stopping rule Prob. of correct selection (PCS) > P* whenever D>D* Correct selection Populations with Q(θ)> Q(θ t )+D* are selected D* Q(θ t )+D Q(θ t )+D* Q(θ t )

12 SMDP: Computational Problem 1 2 3 : h h+1 : N Sequential stage Y 1,h Y 2,h : Y t,h Y t+1,h Y t+2,h : Y M,h U sums of U possible combinations of t out of M Each sum contains t members of Y i,h Computer t ime ?

13 Simplified Stopping Rule (Bechhofer et al., 1968) U-S+1= Top Combination Number (TCN) TCN=2 (i.e. S=U-1,U-S=1)=> the simplest stopping rule When TCN=U (i.e. S=1, U-S=U-1)=> the original stopping rule How to choose TCN? Balance between computational accuracy and computational time

14

15 SMDP Combined With Regression Model (M.A. Province, 2000, page 320-321) Z 1, X 1 Z 2, X 2 Z 3, X 3 : Z h, X h Z h+1, X h+1 : Z N, X N Data pairs for a marker Sequential sum of squares of regression residuals Y i,h denotes Y for marker i at stage h

16 Combine SMDP With Regression Model (M.A. Province, 2000, page 319) Case B : the normal density function with unknown variance and known mean;

17 Simplified Stopping Rule M.A. Province, 2000 page 321-322

18 A Real Data Example ( M.A. Province, 2000, page 310)

19 A Real Data Example ( M.A. Province, 2000, page 308)

20 Simulation Results (1) M.A. Province, 2000, page 312

21 Simulation Results (2) M.A. Province, 2000, page 313

22

23 Simplified SMDP (Bechhofer et al., 1968) U-S+1= Top Combination Number (TCN) How to choose TCN? Balance between computational accuracy and computational time

24 Data Sample size Genotype Phenotype 85 Cell lines 5841 SNPs (category: 0,1,2) ViabFu7(continuous)

25 Relation of W and t (h=50, D*=10) Effective Top Combination Number ETCN Zhang & Province,2005,page 465

26 ETCN Curve Zhang & Province,2005,page 466

27 t =? Zhang & Province,2005,page 466

28 Zhang & Province,2005,page 467 P*=0.95D*=10TCN=10000 72 SNPs P<0.01

29 SMDP Summary Advantages: Test, identify all signals simultaneously, no multiple comparisons Use “Minimal” N to find significant signals, efficient Tight control statistical errors (Type I, II), powerful Save rest of N for validation, reliable Further studies: Computer time Extension to more methods/models Extension to non-K-D distributions

30 Thanks !


Download ppt "Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School."

Similar presentations


Ads by Google