Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School.

Slides:



Advertisements
Similar presentations
COMPUTER INTENSIVE AND RE-RANDOMIZATION TESTS IN CLINICAL TRIALS Thomas Hammerstrom, Ph.D. USFDA, Division of Biometrics The opinions expressed are those.
Advertisements

Gene-by-Environment and Meta-Analysis Eleazar Eskin University of California, Los Angeles.
Hypothesis testing Another judgment method of sampling data.
Hypothesis Testing Steps in Hypothesis Testing:
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Chapter 4Design & Analysis of Experiments 7E 2009 Montgomery 1 Experiments with Blocking Factors Text Reference, Chapter 4 Blocking and nuisance factors.
DOX 6E Montgomery1 Design of Engineering Experiments Part 3 – The Blocking Principle Text Reference, Chapter 4 Blocking and nuisance factors The randomized.
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Ch11 Curve Fitting Dr. Deshi Ye
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
Sampling distributions of alleles under models of neutral evolution.
The General Linear Model. The Simple Linear Model Linear Regression.
ANOVA notes NR 245 Austin Troy
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
Chapter Seventeen HYPOTHESIS TESTING
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Chapter 10 Simple Regression.
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
Chapter 11 Multiple Regression.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
= == Critical Value = 1.64 X = 177  = 170 S = 16 N = 25 Z =
Inferences About Process Quality
Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.
Chapter 14 Inferential Data Analysis
5-1 Introduction 5-2 Inference on the Means of Two Populations, Variances Known Assumptions.
Random Sampling, Point Estimation and Maximum Likelihood.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.
Lab 3b: Distribution of the mean
Confidence intervals and hypothesis testing Petter Mostad
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Sequential & Multiple Hypothesis Testing Procedures for Genome-wide Association Scans Qunyuan Zhang Division of Statistical Genomics Washington University.
BCS547 Neural Decoding.
Computing for Research I Spring 2013
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
XIAO WU DATA ANALYSIS & BASIC STATISTICS.
McGraw-Hill/Irwin Business Research Methods, 10eCopyright © 2008 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 17 Hypothesis Testing.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Chapter 5 Joint Probability Distributions and Random Samples  Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
Efficient calculation of empirical p- values for genome wide linkage through weighted mixtures Sarah E Medland, Eric J Schmitt, Bradley T Webb, Po-Hsiu.
Lesson Testing the Significance of the Least Squares Regression Model.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 20: MLMM.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Sampling and Sampling Distributions
MEASURES OF CENTRAL TENDENCY Central tendency means average performance, while dispersion of a data is how it spreads from a central tendency. He measures.
Chapter 7. Classification and Prediction
Statistical Modelling
Part Four ANALYSIS AND PRESENTATION OF DATA
IEE 380 Review.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Understanding Results
Hypothesis Tests: One Sample
Part Three. Data Analysis
Regression-based linkage analysis
Chapter 9 Hypothesis Testing.
CHAPTER 29: Multiple Regression*
Linear Regression.
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
Discrete Event Simulation - 4
Confidence Intervals.
Hypothesis Testing S.M.JOSHI COLLEGE ,HADAPSAR
Presentation transcript:

Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School of Medicine Statistical Genetics Forum, April, 2006

References   R.E. Bechhofer, J. Kiefer., M. Sobel Sequential identification and ranking procedures. The University of Chicago Press, Chicago.   M.A. Province A single, sequential, genome-wide test to identify simultaneously all promising areas in a linkage scan. Genetic Epidemiology,19:   Q.Y. Zhang, M.A. Province . Simplified sequential multiple decision procedures for genome scans . 2005 Proceedings of American Statistical Association. Biometrics section:463~468

SMDP Sequential Multiple Decision Procedures Sequential test Multiple hypothesis test

Idea 1: Sequential n0n0n0n0 Start from a small sample size Increase sample size, sequential test at each stage (SPRT) Stop when stopping rule is satisfied n 0 +1 n 0 +2 n 0 +i … Experiment in next stage Extra data for validation …

Idea 2: Multiple Decision SNP1SNP2SNP3 SNP4 SNP5 SNP6 … SNPn Simultaneous test Multiple hypothesis test Independent test Binary hypothesis test test 1 test 2 test 3 test 4 test 5 test 6 test n SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 … SNPn test-wise error and experiment-wise error p value correction Signal group Noise group

Binary Hypothesis Test SNP1SNP2SNP3 SNP4 SNP5 SNP6 … SNPn test 1 H0: Eff.(SNP1)=0 vs. H1: Eff.(SNP1)≠0 test 2 H0: Eff.(SNP2)=0 vs. H1: Eff.(SNP2)≠0 test 3 …… test 4 …… test 5 …… test 6 …… test n H0: Eff.(SNPn)=0 vs. H1: Eff.(SNPn)≠0

Multiple Hypothesis Test SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 … SNPn H1: SNP1,2,3 are truly different from the others H2: SNP1,2,4 are truly different from the others H3 …… H4 …… H5: SNP4,5,6 are truly different from the others H6 ……… Hu: SNPn,n-1,n-2 are truly different from the others H: any t SNPs are truly different from the others (n-t) u= number of all possible combination of t out of n

SMDP Sequential test Multiple hypothesis test Sequential Multiple Decision Procedure

Koopman-Darmois(K-D) Populations (Bechhofer et al., 1968) The freq/density function of a K-D population can be written in the form: f(x)=exp{P(x)Q(θ)+R(x)+S(θ)} A.The normal density function with unknown mean and known variance; B.The normal density function with unknown variance and known mean; C.The exponential density function with unknown scale parameter and known location parameter; D.The Bernoulli distribution with unknown probability of “success” on a single trial; E.The Poisson distribution with unknown mean; …… The distance of two K-D populations is defined as :

SMDP (Bechhofer et al., 1968) Selecting the t best of M K-D populations Sequential Sampling 1 2 … h h+1 … Pop. 1 Pop. 2 : Pop. t-1 Pop. t Pop. t+1 Pop. t+2 : Pop. M D Y 1,h Y 2,h : Y i,h : Y M,h U possible combinations of t out of M For each combination u Stopping rule Prob. of correct selection (PCS) > P*, whenever D>D*

SMDP: P*, t, D* P* arbitrary, 0.95 t fixed or varied D* indifference zone Pop. 1 Pop. 2 : Pop. t-1 Pop. t Pop. t+1 Pop. t+2 : Pop. M D SMDP stopping rule Prob. of correct selection (PCS) > P* whenever D>D* Correct selection Populations with Q(θ)> Q(θ t )+D* are selected D* Q(θ t )+D Q(θ t )+D* Q(θ t )

SMDP: Computational Problem : h h+1 : N Sequential stage Y 1,h Y 2,h : Y t,h Y t+1,h Y t+2,h : Y M,h U sums of U possible combinations of t out of M Each sum contains t members of Y i,h Computer t ime ?

Simplified Stopping Rule (Bechhofer et al., 1968) U-S+1= Top Combination Number (TCN) TCN=2 (i.e. S=U-1,U-S=1)=> the simplest stopping rule When TCN=U (i.e. S=1, U-S=U-1)=> the original stopping rule How to choose TCN? Balance between computational accuracy and computational time

SMDP Combined With Regression Model (M.A. Province, 2000, page ) Z 1, X 1 Z 2, X 2 Z 3, X 3 : Z h, X h Z h+1, X h+1 : Z N, X N Data pairs for a marker Sequential sum of squares of regression residuals Y i,h denotes Y for marker i at stage h

Combine SMDP With Regression Model (M.A. Province, 2000, page 319) Case B : the normal density function with unknown variance and known mean;

Simplified Stopping Rule M.A. Province, 2000 page

A Real Data Example ( M.A. Province, 2000, page 310)

A Real Data Example ( M.A. Province, 2000, page 308)

Simulation Results (1) M.A. Province, 2000, page 312

Simulation Results (2) M.A. Province, 2000, page 313

Simplified SMDP (Bechhofer et al., 1968) U-S+1= Top Combination Number (TCN) How to choose TCN? Balance between computational accuracy and computational time

Data Sample size Genotype Phenotype 85 Cell lines 5841 SNPs (category: 0,1,2) ViabFu7(continuous)

Relation of W and t (h=50, D*=10) Effective Top Combination Number ETCN Zhang & Province,2005,page 465

ETCN Curve Zhang & Province,2005,page 466

t =? Zhang & Province,2005,page 466

Zhang & Province,2005,page 467 P*=0.95D*=10TCN= SNPs P<0.01

SMDP Summary Advantages: Test, identify all signals simultaneously, no multiple comparisons Use “Minimal” N to find significant signals, efficient Tight control statistical errors (Type I, II), powerful Save rest of N for validation, reliable Further studies: Computer time Extension to more methods/models Extension to non-K-D distributions

Thanks !