Download presentation
Presentation is loading. Please wait.
Published byJemima Cole Modified over 9 years ago
1
Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School of Medicine Statistical Genetics Forum, April, 2006
2
References R.E. Bechhofer, J. Kiefer., M. Sobel. 1968. Sequential identification and ranking procedures. The University of Chicago Press, Chicago. M.A. Province. 2000. A single, sequential, genome-wide test to identify simultaneously all promising areas in a linkage scan. Genetic Epidemiology,19:301-332. Q.Y. Zhang, M.A. Province . 2005. Simplified sequential multiple decision procedures for genome scans . 2005 Proceedings of American Statistical Association. Biometrics section:463~468
3
SMDP Sequential Multiple Decision Procedures Sequential test Multiple hypothesis test
4
Idea 1: Sequential n0n0n0n0 Start from a small sample size Increase sample size, sequential test at each stage (SPRT) Stop when stopping rule is satisfied n 0 +1 n 0 +2 n 0 +i … Experiment in next stage Extra data for validation …
5
Idea 2: Multiple Decision SNP1SNP2SNP3 SNP4 SNP5 SNP6 … SNPn Simultaneous test Multiple hypothesis test Independent test Binary hypothesis test test 1 test 2 test 3 test 4 test 5 test 6 test n SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 … SNPn test-wise error and experiment-wise error p value correction Signal group Noise group
6
Binary Hypothesis Test SNP1SNP2SNP3 SNP4 SNP5 SNP6 … SNPn test 1 H0: Eff.(SNP1)=0 vs. H1: Eff.(SNP1)≠0 test 2 H0: Eff.(SNP2)=0 vs. H1: Eff.(SNP2)≠0 test 3 …… test 4 …… test 5 …… test 6 …… test n H0: Eff.(SNPn)=0 vs. H1: Eff.(SNPn)≠0
7
Multiple Hypothesis Test SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 … SNPn H1: SNP1,2,3 are truly different from the others H2: SNP1,2,4 are truly different from the others H3 …… H4 …… H5: SNP4,5,6 are truly different from the others H6 ……… Hu: SNPn,n-1,n-2 are truly different from the others H: any t SNPs are truly different from the others (n-t) u= number of all possible combination of t out of n
8
SMDP Sequential test Multiple hypothesis test Sequential Multiple Decision Procedure
9
Koopman-Darmois(K-D) Populations (Bechhofer et al., 1968) The freq/density function of a K-D population can be written in the form: f(x)=exp{P(x)Q(θ)+R(x)+S(θ)} A.The normal density function with unknown mean and known variance; B.The normal density function with unknown variance and known mean; C.The exponential density function with unknown scale parameter and known location parameter; D.The Bernoulli distribution with unknown probability of “success” on a single trial; E.The Poisson distribution with unknown mean; …… The distance of two K-D populations is defined as :
10
SMDP (Bechhofer et al., 1968) Selecting the t best of M K-D populations Sequential Sampling 1 2 … h h+1 … Pop. 1 Pop. 2 : Pop. t-1 Pop. t Pop. t+1 Pop. t+2 : Pop. M D Y 1,h Y 2,h : Y i,h : Y M,h U possible combinations of t out of M For each combination u Stopping rule Prob. of correct selection (PCS) > P*, whenever D>D*
11
SMDP: P*, t, D* P* arbitrary, 0.95 t fixed or varied D* indifference zone Pop. 1 Pop. 2 : Pop. t-1 Pop. t Pop. t+1 Pop. t+2 : Pop. M D SMDP stopping rule Prob. of correct selection (PCS) > P* whenever D>D* Correct selection Populations with Q(θ)> Q(θ t )+D* are selected D* Q(θ t )+D Q(θ t )+D* Q(θ t )
12
SMDP: Computational Problem 1 2 3 : h h+1 : N Sequential stage Y 1,h Y 2,h : Y t,h Y t+1,h Y t+2,h : Y M,h U sums of U possible combinations of t out of M Each sum contains t members of Y i,h Computer t ime ?
13
Simplified Stopping Rule (Bechhofer et al., 1968) U-S+1= Top Combination Number (TCN) TCN=2 (i.e. S=U-1,U-S=1)=> the simplest stopping rule When TCN=U (i.e. S=1, U-S=U-1)=> the original stopping rule How to choose TCN? Balance between computational accuracy and computational time
15
SMDP Combined With Regression Model (M.A. Province, 2000, page 320-321) Z 1, X 1 Z 2, X 2 Z 3, X 3 : Z h, X h Z h+1, X h+1 : Z N, X N Data pairs for a marker Sequential sum of squares of regression residuals Y i,h denotes Y for marker i at stage h
16
Combine SMDP With Regression Model (M.A. Province, 2000, page 319) Case B : the normal density function with unknown variance and known mean;
17
Simplified Stopping Rule M.A. Province, 2000 page 321-322
18
A Real Data Example ( M.A. Province, 2000, page 310)
19
A Real Data Example ( M.A. Province, 2000, page 308)
20
Simulation Results (1) M.A. Province, 2000, page 312
21
Simulation Results (2) M.A. Province, 2000, page 313
23
Simplified SMDP (Bechhofer et al., 1968) U-S+1= Top Combination Number (TCN) How to choose TCN? Balance between computational accuracy and computational time
24
Data Sample size Genotype Phenotype 85 Cell lines 5841 SNPs (category: 0,1,2) ViabFu7(continuous)
25
Relation of W and t (h=50, D*=10) Effective Top Combination Number ETCN Zhang & Province,2005,page 465
26
ETCN Curve Zhang & Province,2005,page 466
27
t =? Zhang & Province,2005,page 466
28
Zhang & Province,2005,page 467 P*=0.95D*=10TCN=10000 72 SNPs P<0.01
29
SMDP Summary Advantages: Test, identify all signals simultaneously, no multiple comparisons Use “Minimal” N to find significant signals, efficient Tight control statistical errors (Type I, II), powerful Save rest of N for validation, reliable Further studies: Computer time Extension to more methods/models Extension to non-K-D distributions
30
Thanks !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.