Download presentation
Presentation is loading. Please wait.
Published byAlban Stewart Modified over 9 years ago
1
IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*, Olivier Thas*, Marnik Vuylsteke # * Ghent University # VIB (Flanders Institute for Biotechnology)
2
IAP workshop, Ghent, Sept. 18 th, 2008 2 Overview Genetic background Objectives Data Methodology Results Conclusions
3
IAP workshop, Ghent, Sept. 18 th, 2008 3 Genetic background Regulation of gene expression is affected either in: - Cis : affecting the expression of only one of the two alleles in a heterozygous individual; - Trans : affecting the expression of both alleles in a heterozygous individual;
4
IAP workshop, Ghent, Sept. 18 th, 2008 4 Genetic background Why search for Cis-regulatory variants? “low hanging fruit”: window is a small genomic region Fast screening for markers in LD with expression trait. How to search for Cis-regulatory variants? Using GASED (Genome-wide Allelic Specific Expression Difference) approach (Kiekens et al, 2006) - Based on a diallel design which is very popular in plant breeding system to estimate GCA (generation combination ability) and SCA (specific combination ability)
5
IAP workshop, Ghent, Sept. 18 th, 2008 5 Genetic Background What is GASED approach? The expression of a gene in a F 1 hybrid coming from the kth offspring of the cross can be written as: (c—cis-element, t-trans-element) kth offspring of cross i j From parent i From parent j From both (cross-terms) In case there is no trans-effect In case there is cis-effect In case homozygous Genotypic variation A cis-regulatory divergence completely explains the difference between two parental lines
6
IAP workshop, Ghent, Sept. 18 th, 2008 6 Objectives of this study Using mixed model analysis to discover Cis- regulated Arabidopsis genes Based on GASED approach, to partition between F 1 hybrid genotypic variation for mRNA abundance into additive and non- additive variance components to differentiate between cis- and trans-regulatory changes and to assign allele specific expression differences to cis-regulatory variation. To find its associated haplotypes (a set of SNPs) for these selected cis-regulated genes. Systematic surveys of cis-regulatory variation to identify “superior alleles”.
7
IAP workshop, Ghent, Sept. 18 th, 2008 7 Flow chart Data contains all expressed genes (25527 genes) Step I: Step II: Step III: Step IV: Choose genes with significant genotypic variation: Choose genes from Step 1 with no trans-regulatory variation: Choose genes from step 2 displaying significant allelic imbalance to cis- regulatory variation: Choose genes from Step 3 showing significant association with founded haplotype blocks:
8
IAP workshop, Ghent, Sept. 18 th, 2008 8 Data Data acquisition: 1)Scan the arrays 2)Quantitate each spot 3)Subtract noise from background 4)Normalize 5)Export table Data for us to analyze
9
IAP workshop, Ghent, Sept. 18 th, 2008 9 Methodology - Step I Full model: Mixed-Model Equations Gene X: expression values FIXED effects RANDOM effect Residual Reduced model: y klnm = μ + dye k + replicate l + array m + error klnm y klnm = μ + dye k + replicate l + genotype n + array m + error klnm error ~ N(0,Σ e ), Σ e =I 220 2 e ; array ~ N(0, Σ a ), Σ a =I 110 2 a genotype ~ N(0,Σ genotype ), Σ genotype =G = K 2 g ; K = 55 x 55 marker-based relatedness matrix: Calculated as 1 – d R ; dR = Rogers’ distance ( Rogers,1972; Reif et al. 2005)
10
IAP workshop, Ghent, Sept. 18 th, 2008 10 p ij and q ij are allele frequencies of the jth allele at the ith locus n i is the number of alleles at the ith locus (i.e. n i = 2) m refers to the number of loci (i.e. m = 210,205) Rogers (1972); Reif et al. (2005) Melchinger et al. (1991) Methodology - Step I K = 55 x 55 marker-based relatedness matrix: Mixed-Model Equations
11
IAP workshop, Ghent, Sept. 18 th, 2008 11 Likelihood ratio test (REML) LRT ~ 0.5 2 (0) + 0.5 2 (1)) p-value Multiple testing correction Gene X: 25527 Genes Adjusted q-value (FDR) Estimate the proportion of features that are truly null : Methodology - Step I FDR: false discovery rate How many of the called positives are false? 5% FDR means 5% of calls are false positive John Storey et al. (2002) : q-value to represent FDR We use adjusted q-value to represent FDR
12
IAP workshop, Ghent, Sept. 18 th, 2008 12 Multiple testing correction Storey et al estimate π 0 = m 0 /m under assumption that true null p- values is uniformly distributed (0,1) We estimate π 0 –adj = m 0 /m under assumption that true null p-values is 50% uniformly distributed (0,0.5), 50% is just 0.5. Methodology - Step I
13
IAP workshop, Ghent, Sept. 18 th, 2008 13 Full model: Mixed-Model Equations Reduced model: y klijm = μ + dye k + replicate l + gca i + gca j + array m + error klijm L is the Cholesky decomposition Methodology - Step II y klijm = μ + dye k + replicate l + gca i + gca j + sca ij + array m + error klijm Gene X: expression values FIXED effects RANDOM effect Residual
14
IAP workshop, Ghent, Sept. 18 th, 2008 14 Likelihood ratio test (REML) LRT ~ 0.5 2 (0) + 0.5 2 (1) p-value Multiple testing correction Gene X: 20976 Genes qa-value (FNR) Methodology - Step II FNR: false non-discovery rate (Genovese et al, 2002) How many of the called negatives are false? 5% FNR means 5% of calls are false negative Since we are interested in selecting genes with negative sca ij effect, we control FNR instead of FDR We use qa-value to represent FNR
15
IAP workshop, Ghent, Sept. 18 th, 2008 15 Multiple testing correction Methodology - Step II False non-discovery rate (FNR) : π 0 is the estimate of the proportion of features that are truly null
16
IAP workshop, Ghent, Sept. 18 th, 2008 16 model: Mixed-Model Equations Gene X: g1 =g2? g1 =g3? g1 =g4? … g1= g10? g2 =g3? g2= g4? g2=g5? … g2 =g10? ……, …… g9 = g10? Test 45 pairs ? Two sample dependent t-test Non-standard P-value Distribution of true null p-values is not uniformly distributed from 0 to 1 Methodology - Step III y klijm = μ + dye k + replicate l + gca i + gca j + array m + error kijlm
17
IAP workshop, Ghent, Sept. 18 th, 2008 17 Multiple testing correction two sample t-test testing BLUPs Gene X: 1380 Genes q-value (FDR) Simulate H 0 distribution from real data: simulation-based p-value Methodology - Step III
18
IAP workshop, Ghent, Sept. 18 th, 2008 18 Gene Full model: Mixed-Model Equations SNP 1 SNP 2 SNP 3 ………SNP i (tag SNPs) Reduced model: y klim = μ + dye k + replicate+ genotype i + array m + error kilm Gene X: (cis-regulated) Methodology - Step IV y klim = μ + dye k + replicate l + + genotype i + array m + error kijlm FIXED effects RANDOM effect Residual chromosome genotype ~ N(0,Σ genotype ), Σ genotype =G = K 2 g ; K = 55 x 55 marker-based relatedness matrix. array ~ N(0,Σ a ), Σ a =I 110 2 a ; error ~ N(0,Σ e ), Σ e =I 220 2 e
19
IAP workshop, Ghent, Sept. 18 th, 2008 19 Gene X: (cis-regulated) 836 Genes q-value (FDR) p-value Multiple testing correction Methodology - Step IV LRT ~ 2 (2n) n is the number of SNPs Likelihood ratio test (ML)
20
IAP workshop, Ghent, Sept. 18 th, 2008 20 Results Data contains all expressed genes (25527 genes) Step I: 20979 genes Adjusted_q value<0.0005 1328 genes 972 genes 859 genes Adjusted_qa value<0.01 q value<0.01 Step II: Step III: Step IV:
21
IAP workshop, Ghent, Sept. 18 th, 2008 21 Results Among all 25527 genes, 20979 genes have significant genotypic variation (qvalue < 0.0005). (–Step I) Among these 20979 genes, 1328 genes have no-trans regulated effect (qavalue < 0.01). (–Step II) Among these 1328 genes, 972 genes have showed significant different allelic expressions (qvlaue < 0.01); these 972 genes are discovered as cis- regulated. (–Step III) We confirm our discovery from these 972 cis-regulated genes in step IV: an allelic expression difference caused by cis-regulatory variant implies a nearby polymorphism (SNP) that controls expression in LD; We indeed found 96.5% selected cis-regulated genes have associated polymorphisms (haplotype blocks ) nearby.
22
IAP workshop, Ghent, Sept. 18 th, 2008 22 Conclusions This mixed-model approach used here for association mapping analysis with Kinship matrix included are more appropriate than other recent methods in identifying cis-regulated genes ( p-values more reliable). Each step’s statistical method is controlled in a more accurate way to specify statistical significance (referring to FDR, FNR). Using simulation-based pvalues when testing difference between random effects increases power of detecting association. A comprehensive analysis of gene expression variation in plant populations has been described. Using this mixed-model analysis strategy, a detailed characterization of both the genetic and the positional effects in the genome is provided. This detailed statistical analysis provides a robust and useful framework for the future analysis of gene expression variation in large sample sizes. Advanced statistical methods look promising in identifying interesting discoveries in genetics.
23
IAP workshop, Ghent, Sept. 18 th, 2008 23 Many thanks for your attention !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.