IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*,

Slides:



Advertisements
Similar presentations
A quantitative trait locus not associated with cognitive ability in children: a failure to replicate Hill, L. et al.
Advertisements

What is an association study? Define linkage disequilibrium
Gene-by-Environment and Meta-Analysis Eleazar Eskin University of California, Los Angeles.
Planning breeding programs for impact
Association Tests for Rare Variants Using Sequence Data
Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Basics of Linkage Analysis
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
QTL Mapping R. M. Sundaram.
Patterns of inheritance
MALD Mapping by Admixture Linkage Disequilibrium.
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
Signatures of Selection
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
Differentially expressed genes
Statistical Analysis of Microarray Data
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Global dissection of cis and trans regulatory variations in Arabidopsis thaliana Xu Zhang Borevitz Lab.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Quantitative Genetics
Review Session Monday, November 8 Shantz 242 E (the usual place) 5:00-7:00 PM I’ll answer questions on my material, then Chad will answer questions on.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Creating a Kinship Matrix using Microsatellite Analyzer (MSA) Zhifen Zhang The Ohio State University.
Comments on Rare Variants Analyses Ryo Yamada Kyoto University 2012/08/27 Japan.
Broad-Sense Heritability Index
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Population Stratification
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M Computational Statistical Genetics.
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
Supplemental Figure 1A. A small fraction of genes were mapped to >=20 SNPs. Supplemental Figure 1B. The density of distance from the position of an associated.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Sequential & Multiple Hypothesis Testing Procedures for Genome-wide Association Scans Qunyuan Zhang Division of Statistical Genomics Washington University.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
The International Consortium. The International HapMap Project.
C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
The Broad Institute of MIT and Harvard Differential Analysis.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Lecture 22: Quantitative Traits II
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
Supplemental Figure 1. False trans association due to probe cross-hybridization and genetic polymorphism at single base extension site. (A) The Infinium.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Genetics and Genetic Prediction in Plant Breeding.
THE INHERITANCE OF PLANT HEIGHT IN HEXAPLOID WHEAT (Triticum aestivum L.) Nataša LJUBIČIĆ 1*, Sofija PETROVIĆ 1, Miodrag DIMITRIJEVIĆ 1, Nikola HRISTOV.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Genome Wide Association Studies using SNP
Genetics of qualitative and quantitative phenotypes
Genome-wide Association Studies
What are BLUP? and why they are useful?
Statistical Analysis and Design of Experiments for Large Data Sets
Lecture 9: QTL Mapping II: Outbred Populations
Presentation transcript:

IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*, Olivier Thas*, Marnik Vuylsteke # * Ghent University # VIB (Flanders Institute for Biotechnology)

IAP workshop, Ghent, Sept. 18 th, Overview  Genetic background  Objectives  Data  Methodology  Results  Conclusions

IAP workshop, Ghent, Sept. 18 th, Genetic background  Regulation of gene expression is affected either in: - Cis : affecting the expression of only one of the two alleles in a heterozygous individual; - Trans : affecting the expression of both alleles in a heterozygous individual;

IAP workshop, Ghent, Sept. 18 th, Genetic background  Why search for Cis-regulatory variants? “low hanging fruit”: window is a small genomic region Fast screening for markers in LD with expression trait.  How to search for Cis-regulatory variants? Using GASED (Genome-wide Allelic Specific Expression Difference) approach (Kiekens et al, 2006) - Based on a diallel design which is very popular in plant breeding system to estimate GCA (generation combination ability) and SCA (specific combination ability)

IAP workshop, Ghent, Sept. 18 th, Genetic Background  What is GASED approach?  The expression of a gene in a F 1 hybrid coming from the kth offspring of the cross can be written as: (c—cis-element, t-trans-element) kth offspring of cross i  j From parent i From parent j From both (cross-terms) In case there is no trans-effect In case there is cis-effect In case homozygous Genotypic variation A cis-regulatory divergence completely explains the difference between two parental lines

IAP workshop, Ghent, Sept. 18 th, Objectives of this study  Using mixed model analysis to discover Cis- regulated Arabidopsis genes  Based on GASED approach, to partition between F 1 hybrid genotypic variation for mRNA abundance into additive and non- additive variance components to differentiate between cis- and trans-regulatory changes and to assign allele specific expression differences to cis-regulatory variation.  To find its associated haplotypes (a set of SNPs) for these selected cis-regulated genes.  Systematic surveys of cis-regulatory variation to identify “superior alleles”.

IAP workshop, Ghent, Sept. 18 th, Flow chart Data contains all expressed genes (25527 genes) Step I: Step II: Step III: Step IV: Choose genes with significant genotypic variation: Choose genes from Step 1 with no trans-regulatory variation: Choose genes from step 2 displaying significant allelic imbalance to cis- regulatory variation: Choose genes from Step 3 showing significant association with founded haplotype blocks:

IAP workshop, Ghent, Sept. 18 th, Data Data acquisition: 1)Scan the arrays 2)Quantitate each spot 3)Subtract noise from background 4)Normalize 5)Export table Data for us to analyze

IAP workshop, Ghent, Sept. 18 th, Methodology - Step I Full model: Mixed-Model Equations Gene X: expression values FIXED effects RANDOM effect Residual Reduced model: y klnm = μ + dye k + replicate l + array m + error klnm y klnm = μ + dye k + replicate l + genotype n + array m + error klnm error ~ N(0,Σ e ), Σ e =I 220  2 e ; array ~ N(0, Σ a ), Σ a =I 110  2 a genotype ~ N(0,Σ genotype ), Σ genotype =G = K  2 g ; K = 55 x 55 marker-based relatedness matrix: Calculated as 1 – d R ; dR = Rogers’ distance ( Rogers,1972; Reif et al. 2005)

IAP workshop, Ghent, Sept. 18 th, p ij and q ij are allele frequencies of the jth allele at the ith locus n i is the number of alleles at the ith locus (i.e. n i = 2) m refers to the number of loci (i.e. m = 210,205) Rogers (1972); Reif et al. (2005) Melchinger et al. (1991) Methodology - Step I K = 55 x 55 marker-based relatedness matrix: Mixed-Model Equations

IAP workshop, Ghent, Sept. 18 th, Likelihood ratio test (REML) LRT ~ 0.5  2 (0)  2 (1)) p-value Multiple testing correction Gene X: Genes Adjusted q-value (FDR) Estimate the proportion of features that are truly null : Methodology - Step I FDR: false discovery rate How many of the called positives are false? 5% FDR means 5% of calls are false positive John Storey et al. (2002) : q-value to represent FDR We use adjusted q-value to represent FDR

IAP workshop, Ghent, Sept. 18 th, Multiple testing correction Storey et al estimate π 0 = m 0 /m under assumption that true null p- values is uniformly distributed (0,1) We estimate π 0 –adj = m 0 /m under assumption that true null p-values is 50% uniformly distributed (0,0.5), 50% is just 0.5. Methodology - Step I

IAP workshop, Ghent, Sept. 18 th, Full model: Mixed-Model Equations Reduced model: y klijm = μ + dye k + replicate l + gca i + gca j + array m + error klijm L is the Cholesky decomposition Methodology - Step II y klijm = μ + dye k + replicate l + gca i + gca j + sca ij + array m + error klijm Gene X: expression values FIXED effects RANDOM effect Residual

IAP workshop, Ghent, Sept. 18 th, Likelihood ratio test (REML) LRT ~ 0.5  2 (0)  2 (1) p-value Multiple testing correction Gene X: Genes qa-value (FNR) Methodology - Step II  FNR: false non-discovery rate (Genovese et al, 2002) How many of the called negatives are false? 5% FNR means 5% of calls are false negative  Since we are interested in selecting genes with negative sca ij effect, we control FNR instead of FDR We use qa-value to represent FNR

IAP workshop, Ghent, Sept. 18 th, Multiple testing correction Methodology - Step II False non-discovery rate (FNR) : π 0 is the estimate of the proportion of features that are truly null

IAP workshop, Ghent, Sept. 18 th, model: Mixed-Model Equations Gene X: g1 =g2? g1 =g3? g1 =g4? … g1= g10? g2 =g3? g2= g4? g2=g5? … g2 =g10? ……, …… g9 = g10? Test 45 pairs ? Two sample dependent t-test Non-standard P-value Distribution of true null p-values is not uniformly distributed from 0 to 1 Methodology - Step III y klijm = μ + dye k + replicate l + gca i + gca j + array m + error kijlm

IAP workshop, Ghent, Sept. 18 th, Multiple testing correction two sample t-test testing BLUPs Gene X: 1380 Genes q-value (FDR) Simulate H 0 distribution from real data: simulation-based p-value Methodology - Step III

IAP workshop, Ghent, Sept. 18 th, Gene Full model: Mixed-Model Equations SNP 1 SNP 2 SNP 3 ………SNP i (tag SNPs) Reduced model: y klim = μ + dye k + replicate+ genotype i + array m + error kilm Gene X: (cis-regulated) Methodology - Step IV y klim = μ + dye k + replicate l + + genotype i + array m + error kijlm FIXED effects RANDOM effect Residual chromosome genotype ~ N(0,Σ genotype ), Σ genotype =G = K  2 g ; K = 55 x 55 marker-based relatedness matrix. array ~ N(0,Σ a ), Σ a =I 110  2 a ; error ~ N(0,Σ e ), Σ e =I 220  2 e

IAP workshop, Ghent, Sept. 18 th, Gene X: (cis-regulated) 836 Genes q-value (FDR) p-value Multiple testing correction Methodology - Step IV LRT ~  2 (2n) n is the number of SNPs Likelihood ratio test (ML)

IAP workshop, Ghent, Sept. 18 th, Results Data contains all expressed genes (25527 genes) Step I: genes Adjusted_q value< genes 972 genes 859 genes Adjusted_qa value<0.01 q value<0.01 Step II: Step III: Step IV:

IAP workshop, Ghent, Sept. 18 th, Results  Among all genes, genes have significant genotypic variation (qvalue < ). (–Step I)  Among these genes, 1328 genes have no-trans regulated effect (qavalue < 0.01). (–Step II)  Among these 1328 genes, 972 genes have showed significant different allelic expressions (qvlaue < 0.01); these 972 genes are discovered as cis- regulated. (–Step III)  We confirm our discovery from these 972 cis-regulated genes in step IV:  an allelic expression difference caused by cis-regulatory variant implies a nearby polymorphism (SNP) that controls expression in LD;  We indeed found 96.5% selected cis-regulated genes have associated polymorphisms (haplotype blocks ) nearby.

IAP workshop, Ghent, Sept. 18 th, Conclusions  This mixed-model approach used here for association mapping analysis with Kinship matrix included are more appropriate than other recent methods in identifying cis-regulated genes ( p-values more reliable).  Each step’s statistical method is controlled in a more accurate way to specify statistical significance (referring to FDR, FNR).  Using simulation-based pvalues when testing difference between random effects increases power of detecting association.  A comprehensive analysis of gene expression variation in plant populations has been described.  Using this mixed-model analysis strategy, a detailed characterization of both the genetic and the positional effects in the genome is provided.  This detailed statistical analysis provides a robust and useful framework for the future analysis of gene expression variation in large sample sizes.  Advanced statistical methods look promising in identifying interesting discoveries in genetics.

IAP workshop, Ghent, Sept. 18 th, Many thanks for your attention !