Presentation is loading. Please wait.

Presentation is loading. Please wait.

IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*,

Similar presentations


Presentation on theme: "IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*,"— Presentation transcript:

1 IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*, Olivier Thas*, Marnik Vuylsteke # * Ghent University # VIB (Flanders Institute for Biotechnology)

2 IAP workshop, Ghent, Sept. 18 th, 2008 2 Overview  Genetic background  Objectives  Data  Methodology  Results  Conclusions

3 IAP workshop, Ghent, Sept. 18 th, 2008 3 Genetic background  Regulation of gene expression is affected either in: - Cis : affecting the expression of only one of the two alleles in a heterozygous individual; - Trans : affecting the expression of both alleles in a heterozygous individual;

4 IAP workshop, Ghent, Sept. 18 th, 2008 4 Genetic background  Why search for Cis-regulatory variants? “low hanging fruit”: window is a small genomic region Fast screening for markers in LD with expression trait.  How to search for Cis-regulatory variants? Using GASED (Genome-wide Allelic Specific Expression Difference) approach (Kiekens et al, 2006) - Based on a diallel design which is very popular in plant breeding system to estimate GCA (generation combination ability) and SCA (specific combination ability)

5 IAP workshop, Ghent, Sept. 18 th, 2008 5 Genetic Background  What is GASED approach?  The expression of a gene in a F 1 hybrid coming from the kth offspring of the cross can be written as: (c—cis-element, t-trans-element) kth offspring of cross i  j From parent i From parent j From both (cross-terms) In case there is no trans-effect In case there is cis-effect In case homozygous Genotypic variation A cis-regulatory divergence completely explains the difference between two parental lines

6 IAP workshop, Ghent, Sept. 18 th, 2008 6 Objectives of this study  Using mixed model analysis to discover Cis- regulated Arabidopsis genes  Based on GASED approach, to partition between F 1 hybrid genotypic variation for mRNA abundance into additive and non- additive variance components to differentiate between cis- and trans-regulatory changes and to assign allele specific expression differences to cis-regulatory variation.  To find its associated haplotypes (a set of SNPs) for these selected cis-regulated genes.  Systematic surveys of cis-regulatory variation to identify “superior alleles”.

7 IAP workshop, Ghent, Sept. 18 th, 2008 7 Flow chart Data contains all expressed genes (25527 genes) Step I: Step II: Step III: Step IV: Choose genes with significant genotypic variation: Choose genes from Step 1 with no trans-regulatory variation: Choose genes from step 2 displaying significant allelic imbalance to cis- regulatory variation: Choose genes from Step 3 showing significant association with founded haplotype blocks:

8 IAP workshop, Ghent, Sept. 18 th, 2008 8 Data Data acquisition: 1)Scan the arrays 2)Quantitate each spot 3)Subtract noise from background 4)Normalize 5)Export table Data for us to analyze

9 IAP workshop, Ghent, Sept. 18 th, 2008 9 Methodology - Step I Full model: Mixed-Model Equations Gene X: expression values FIXED effects RANDOM effect Residual Reduced model: y klnm = μ + dye k + replicate l + array m + error klnm y klnm = μ + dye k + replicate l + genotype n + array m + error klnm error ~ N(0,Σ e ), Σ e =I 220  2 e ; array ~ N(0, Σ a ), Σ a =I 110  2 a genotype ~ N(0,Σ genotype ), Σ genotype =G = K  2 g ; K = 55 x 55 marker-based relatedness matrix: Calculated as 1 – d R ; dR = Rogers’ distance ( Rogers,1972; Reif et al. 2005)

10 IAP workshop, Ghent, Sept. 18 th, 2008 10 p ij and q ij are allele frequencies of the jth allele at the ith locus n i is the number of alleles at the ith locus (i.e. n i = 2) m refers to the number of loci (i.e. m = 210,205) Rogers (1972); Reif et al. (2005) Melchinger et al. (1991) Methodology - Step I K = 55 x 55 marker-based relatedness matrix: Mixed-Model Equations

11 IAP workshop, Ghent, Sept. 18 th, 2008 11 Likelihood ratio test (REML) LRT ~ 0.5  2 (0) + 0.5  2 (1)) p-value Multiple testing correction Gene X: 25527 Genes Adjusted q-value (FDR) Estimate the proportion of features that are truly null : Methodology - Step I FDR: false discovery rate How many of the called positives are false? 5% FDR means 5% of calls are false positive John Storey et al. (2002) : q-value to represent FDR We use adjusted q-value to represent FDR

12 IAP workshop, Ghent, Sept. 18 th, 2008 12 Multiple testing correction Storey et al estimate π 0 = m 0 /m under assumption that true null p- values is uniformly distributed (0,1) We estimate π 0 –adj = m 0 /m under assumption that true null p-values is 50% uniformly distributed (0,0.5), 50% is just 0.5. Methodology - Step I

13 IAP workshop, Ghent, Sept. 18 th, 2008 13 Full model: Mixed-Model Equations Reduced model: y klijm = μ + dye k + replicate l + gca i + gca j + array m + error klijm L is the Cholesky decomposition Methodology - Step II y klijm = μ + dye k + replicate l + gca i + gca j + sca ij + array m + error klijm Gene X: expression values FIXED effects RANDOM effect Residual

14 IAP workshop, Ghent, Sept. 18 th, 2008 14 Likelihood ratio test (REML) LRT ~ 0.5  2 (0) + 0.5  2 (1) p-value Multiple testing correction Gene X: 20976 Genes qa-value (FNR) Methodology - Step II  FNR: false non-discovery rate (Genovese et al, 2002) How many of the called negatives are false? 5% FNR means 5% of calls are false negative  Since we are interested in selecting genes with negative sca ij effect, we control FNR instead of FDR We use qa-value to represent FNR

15 IAP workshop, Ghent, Sept. 18 th, 2008 15 Multiple testing correction Methodology - Step II False non-discovery rate (FNR) : π 0 is the estimate of the proportion of features that are truly null

16 IAP workshop, Ghent, Sept. 18 th, 2008 16 model: Mixed-Model Equations Gene X: g1 =g2? g1 =g3? g1 =g4? … g1= g10? g2 =g3? g2= g4? g2=g5? … g2 =g10? ……, …… g9 = g10? Test 45 pairs ? Two sample dependent t-test Non-standard P-value Distribution of true null p-values is not uniformly distributed from 0 to 1 Methodology - Step III y klijm = μ + dye k + replicate l + gca i + gca j + array m + error kijlm

17 IAP workshop, Ghent, Sept. 18 th, 2008 17 Multiple testing correction two sample t-test testing BLUPs Gene X: 1380 Genes q-value (FDR) Simulate H 0 distribution from real data: simulation-based p-value Methodology - Step III

18 IAP workshop, Ghent, Sept. 18 th, 2008 18 Gene Full model: Mixed-Model Equations SNP 1 SNP 2 SNP 3 ………SNP i (tag SNPs) Reduced model: y klim = μ + dye k + replicate+ genotype i + array m + error kilm Gene X: (cis-regulated) Methodology - Step IV y klim = μ + dye k + replicate l + + genotype i + array m + error kijlm FIXED effects RANDOM effect Residual chromosome genotype ~ N(0,Σ genotype ), Σ genotype =G = K  2 g ; K = 55 x 55 marker-based relatedness matrix. array ~ N(0,Σ a ), Σ a =I 110  2 a ; error ~ N(0,Σ e ), Σ e =I 220  2 e

19 IAP workshop, Ghent, Sept. 18 th, 2008 19 Gene X: (cis-regulated) 836 Genes q-value (FDR) p-value Multiple testing correction Methodology - Step IV LRT ~  2 (2n) n is the number of SNPs Likelihood ratio test (ML)

20 IAP workshop, Ghent, Sept. 18 th, 2008 20 Results Data contains all expressed genes (25527 genes) Step I: 20979 genes Adjusted_q value<0.0005 1328 genes 972 genes 859 genes Adjusted_qa value<0.01 q value<0.01 Step II: Step III: Step IV:

21 IAP workshop, Ghent, Sept. 18 th, 2008 21 Results  Among all 25527 genes, 20979 genes have significant genotypic variation (qvalue < 0.0005). (–Step I)  Among these 20979 genes, 1328 genes have no-trans regulated effect (qavalue < 0.01). (–Step II)  Among these 1328 genes, 972 genes have showed significant different allelic expressions (qvlaue < 0.01); these 972 genes are discovered as cis- regulated. (–Step III)  We confirm our discovery from these 972 cis-regulated genes in step IV:  an allelic expression difference caused by cis-regulatory variant implies a nearby polymorphism (SNP) that controls expression in LD;  We indeed found 96.5% selected cis-regulated genes have associated polymorphisms (haplotype blocks ) nearby.

22 IAP workshop, Ghent, Sept. 18 th, 2008 22 Conclusions  This mixed-model approach used here for association mapping analysis with Kinship matrix included are more appropriate than other recent methods in identifying cis-regulated genes ( p-values more reliable).  Each step’s statistical method is controlled in a more accurate way to specify statistical significance (referring to FDR, FNR).  Using simulation-based pvalues when testing difference between random effects increases power of detecting association.  A comprehensive analysis of gene expression variation in plant populations has been described.  Using this mixed-model analysis strategy, a detailed characterization of both the genetic and the positional effects in the genome is provided.  This detailed statistical analysis provides a robust and useful framework for the future analysis of gene expression variation in large sample sizes.  Advanced statistical methods look promising in identifying interesting discoveries in genetics.

23 IAP workshop, Ghent, Sept. 18 th, 2008 23 Many thanks for your attention !


Download ppt "IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*,"

Similar presentations


Ads by Google