Genome-wide genetic association of complex traits in outbred mice William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman,

Slides:



Advertisements
Similar presentations
15 The Genetic Basis of Complex Inheritance
Advertisements

Chapter 14~ Mendel & The Gene Idea
Linkage and Genetic Mapping
The genetic dissection of complex traits
Planning breeding programs for impact
Why this paper Causal genetic variants at loci contributing to complex phenotypes unknown Rat/mice model organisms in physiology and diseases Relevant.
Introduction Materials and methods SUBJECTS : Balb/cJ and C57BL/6J inbred mouse strains, and inbred fruit fly strains number 11 and 70 from the recombinant.
Gene by environment effects. Elevated Plus Maze (anxiety)
Experimental crosses. Inbred Strain Cross Backcross.
Qualitative and Quantitative traits
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Combined sequence based and genetic mapping analysis of complex traits in outbred rats Baud, A. et al. Rat Genome Sequencing and Mapping Consortium Presented.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
QTL Mapping R. M. Sundaram.
1 15 The Genetic Basis of Complex Inheritance. 2 Multifactorial Traits Multifactorial traits are determined by multiple genetic and environmental factors.
A multi-phenotype protocol for fine scale mapping of QTL in outbred heterogeneous stock mice LC Solberg, C Arboledas, P Burns, S Davidson, G Nunez, A Taylor,
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
Genetic Traits Quantitative (height, weight) Dichotomous (affected/unaffected) Factorial (blood group) Mendelian - controlled by single gene (cystic fibrosis)
Quantitative Genetics
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
2050 VLSB. Dad phase unknown A1 A2 0.5 (total # meioses) Odds = 1/2[(1-r) n r k ]+ 1/2[(1-r) n r k ]odds ratio What single r value best explains the data?
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Quantitative Genetics
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
From QTL to QTG: Are we getting closer? Sagiv Shifman and Ariel Darvasi The Hebrew University of Jerusalem.
Fine mapping QTLs using Recombinant-Inbred HS and In-Vitro HS William Valdar Jonathan Flint, Richard Mott Wellcome Trust Centre for Human Genetics.
Chapter 5 Characterizing Genetic Diversity: Quantitative Variation Quantitative (metric or polygenic) characters of Most concern to conservation biology.
Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.
Zoology 2005 Part 2 Richard Mott. Inbred Mouse Strain Haplotype Structure When the genomes of a pair of inbred strains are compared, –we find a mosaic.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Quantitative Genetics
QTL Mapping in Heterogeneous Stocks Talbot et al, Nature Genetics (1999) 21: Mott et at, PNAS (2000) 97:
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Association between genotype and phenotype
An quick overview of human genetic linkage analysis
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Lecture # 6Date _________ 4 Chapter 14~ Mendel & The Gene Idea.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
An quick overview of human genetic linkage analysis Terry Speed Genetics & Bioinformatics, WEHI Statistics, UCB NWO/IOP Genomics Winterschool Mathematics.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Lecture 22: Quantitative Traits II
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Why you should know about experimental crosses. To save you from embarrassment.
What is a QTL? Quantitative trait locus (loci) Region of chromosome that contributes to variation in a quantitative trait Generally used to study “complex.
GridQTL High Performance QTL analysis via the Grid/Cloud.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Complex Trait Genetics in Animal Models
upstream vs. ORF binding and gene expression?
Mapping variation in growth in response to glucose concentration
Genome Wide Association Studies using SNP
Gene mapping in mice Karl W Broman Department of Biostatistics
15 The Genetic Basis of Complex Inheritance
Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison.
Power to detect QTL Association
Genetic architecture of behaviour
Inferring Genetic Architecture of Complex Biological Processes Brian S
What are BLUP? and why they are useful?
Exercise: Effect of the IL6R gene on IL-6R concentration
Presentation transcript:

Genome-wide genetic association of complex traits in outbred mice William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman, William O. Cookson, Martin Taylor, J. Nicholas P. Rawlins, Richard Mott, Jonathan Flint.

Genetic Traits Quantitative (height, weight) Dichotomous (affected/unaffected) Factorial (blood group) Mendelian - controlled by single gene (cystic fibrosis) Complex – controlled by multiple genes*environment (diabetes, asthma)

Quantitative Trait Loci QTL: Quantitative Trait Locus chromosome genes

Quantitative Trait Loci QTL: Quantitative Trait Locus QTG: Quantitative Trait Gene chromosome

Quantitative Trait Loci QTL: Quantitative Trait Locus QTG: Quantitative Trait Gene QTN: Quantitative Trait Nucleotide chromosome

Association Studies: Map in Humans or Animal Models ? Disease studied directly Population and environment stratification Very many SNPs (1,000,000?) required Hard to detect trait loci – very large sample sizes required to detect loci of small effect (5,000-10,000) Potentially very high mapping resolution – single gene Very Expensive Animal Model required Population and environment controlled Fewer SNPs required (~ ,000) Easy to detect QTL with ~500 animals Poorer mapping resolution – 1Mb (10 genes) Relatively inexpensive

Mosaic Crosses Inbred foundersG3GNGNF20 mixingchopping up inbreeding F2, diallele Heterogeneous Stock, Advanced Intercross, Random Outbreds Recombinant Inbred Lines

Sizes of Behavioural QTL in rodents (% of total phenotypic variance)

Effect size of cloned genes

Mapping Resolution F2 crosses –Powerful at detecting QTL –Poor at Localisation – 20cM –Too few recombinants Increase number of recombinants: –more animals –more generations in cross

Heterogeneous Stocks cross 8 inbred strains for >10 generations

Heterogeneous Stocks cross 8 inbred strains for >10 generations

Heterogeneous Stocks cross 8 inbred strains for >10 generations 0.25 cM

Multiple Phenotype QTL Experiment

Multiple Phenotypes measured on a Heterogeneous Stock 2000 HS mice (Northport, Bob Hitzeman) 84 families 40 th generation 150 traits measured on each animal –Standardised phenotyping protocol –Covariates Recorded Experimenter Time/Date Litter –Microchipping

Phenotypes Anxiety (Conditioned and Unconditioned Tests) Asthma (Plethysmography) Diabetes (Glucose Tolerance Test) Haematology Immunology Biochemistry Wound Healing (Ear Punch) Gene Expression ….others….

High throughput phenotyping facility

Neophobia

Fear Potentiated Startle

Ovalbumin sensitization

Plethysmograph

Intraperitoneal Glucose Tolerance Test

Ears

Genotyping SNPs genotyped by Illumina –2000 HS mice –300 HS parents –8 inbred HS founders –500 other inbreds SNPs successful 99.8% accuracy (parent-offspring)

Distribution of Marker Spacing (chromosome X) (9 Markers)

LD Decay with distance 99.2% marker pairs on different autosomes have R 2 < 0.05.

Genetic Drift in HS 40 generations of breeding Allele Frequency in founders will drift 8% of genome fixed Allele Frequency in Founders Allele Frequency in HS

Analysis Automated analysis pipeline –R HAPPY package –Single Marker Association Each phenotype analysed independently –Transformed to Normality, outliers removed –Tailored set of covariates –Linear models for most phenotypes –Survival models for latency phenotypes

chromosome markers Want to predict ancestral strain from genotype We know the alleles in the founder strains Single marker association lacks power, can’t distinguish all strains Multipoint analysis – combine data from neighbouring markers alleles Twisted Pair Analysis of Heterogeneous Stock

chromosome markers alleles Twisted Pair Analysis of Heterogeneous Stock Hidden Markov model HAPPY Hidden states = ancestral strains Observed states = genotypes Unknown phase of genotypes Analyse both chromosomes simultaneously Twisted pair of HMMs Mott et al 2000 PNAS

Testing for a QTL p iL (s,t) = Prob( animal i is descended from strains s,t at locus L) p iL (s,t) calculated by HMM using –genotype data –founder strains’ alleles Phenotype is modelled E(y i ) =  s,t p iL (s,t)T(s,t) +  i Var(y i ) =  2 Test for no QTL at locus L –H 0 : T(s,t) are all same –ANOVA partial F test

Genome Scan Additive and dominance models Record all peaks that exceed 5% genome- wide significance, –Threshold based on 200 permutations –9000 preliminary candidate QTL found

Results

Many peaks mean red cell volume

How to select peaks: a simulated example

Simulate 7 x 5% QTLs (ie, 35% genetic effect) + 20% shared environment effect + 45% noise = 100% variance

Simulated example: 1D scan

Peaks from 1D scan phenotype ~ covariates + ?

1D scan: condition on 1 peak phenotype ~ covariates + peak 1 + ?

1D scan: condition on 2 peaks phenotype ~ covariates + peak 1 + peak 2 + ?

1D scan: condition on 3 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + ?

1D scan: condition on 4 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + ?

1D scan: condition on 5 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + ?

1D scan: condition on 6 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + ?

1D scan: condition on 7 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + ?

1D scan: condition on 8 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + ?

1D scan: condition on 9 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + ?

1D scan: condition on 10 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + ?

1D scan: condition on 11 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + peak 11 + ?

Peaks chosen by forward selection

Bootstrap sampling subjects

Bootstrap sampling subjects sample with replacement bootstrap sample from 10 subjects

Forward selection on a bootstrap sample

Bootstrap evidence mounts up…

In 700 bootstraps… Bootstrap Posterior Probability (BPP)

Model averaging by bootstrap aggregation Choosing only one model: –very data-dependent, arbitrary –can’t get all the true QTLs in one model Bootstrap aggregation averages over models – true QTLs get included more often than false ones References: –Broman & Speed (2002) –Hackett et al (2001)

Results

We identified 843 QTLs for 97 phenotypes with BPP greater than 0.25 of which on the basis of simulations we expect 590 to be genuine

Performance of multiple QTL modelling BPP Threshold a Number of QTL b Proportion of detected QTLs that are true c Proportion of true QTLs detected d Expected number of false QTLs per genome scan e

Where to find the results

Distribution of effect sizes

Megabases 95% confidence intervals Resolution

Number of Genes per Locus

Results summary 843 peaks found with BPP > peaks per phenotype on average Based on simulation, we expect ~590 to be genuine. Mean 95% CI width 2.78 Mb Mean number of genes under each 95% CI is 28.9

Results ~7 jointly significant QTL per phenotype 95% Confidence Interval ~ 2 Mb ~50% of QTL have a significant non- additive component Only 3 phenotypes were explained by single major QTL –Most phenotypes are complex

Distribution of QTL Effects Mean Effect size 2.7%

%Variance Explained [% Additive Genetic Variance calculated using 3-generation pedigree data, not genotypes]

Coat colour genes

A known QTL: HDL Wang et al, 2003 HS mapping

New QTLs: two examples Ear Punch Hole Area Regrowth –wound healing Cue Conditioning Freeze.During.Tone –measure of fear

Cue Conditioning Freeze.During.Tone: huge effect, small number of genes cntn1: Contactin precursor (Neural cell surface protein) chr15

What do we want? Biological: –Joint QTL containing the functional genes and that lead to their identification –But genetic mapping finds the variants not the genes Statistical: –Multi-locus QTL selection algorithms that predict the phenotype of new animals accurately –Model-Averaging: no best choice? –Ghost QTL Are statistical QTL algorithms consistent? –Do they find the biological QTL given a large enough sample size? –Simulations of multiple QTL models indicate mapping accuracy declines as complexity increases [Valdar et al 2006 Genetics in press]

Work of many hands Carmen Arboleda-Hitas Amarjit Bhomra Stephanie Burnett Peter Burns Richard Copley Stuart Davidson Simon Fiddy Jonathan Flint Polinka Hernandez Sue Miller Richard Mott Chela Nunez Gemma Peachey Sagiv Shifman Leah Solberg Amy Taylor Martin Taylor William Valdar Binnaz Yalcin Dave Bannerman Shoumo Bhattacharya Bill Cookson Rob Deacon Dominique Gauguier Doug Higgs Tertius Hough Paul Klenerman Nick Rawlins Jennifer Taylor Chris Holmes Project funded by The Wellcome Trust, UK

Data are publicly available

Gene x Environment Gene x Sex Repeat analysis looking for QTLs that interact with –Gender –Litter number –Season, Month, etc –Experimenter Compare models E(y) =  + locus + env E(y) =  + locus * env

Gene x Environment 431 jointly significant GxE QTLs –27 gene x experimenter, –81 gene x litter number, –67 gene x age, –105 gene x study day –151 gene x season. 13% of variation is GxE 25 GxE QTLs overlapped with original joint QTL –defined as lying within 4Mb of the peak position 42 GxSex QTLs

Gene Expression Data (with Binnaz Yalcin, Jennifer Taylor) Illumina 40k chip Livers, Lungs (Brains) –190 HS –HS founders

Phenotype-gene expression correlation Liver gene expression in 180 HS mice Slc4a7

Testing for Functional Variants Is a SNP functional for a trait? Is a functional assay measured in founders related to a trait? –Gene expression –DNA-Protein binding

Testing for non-Functional Variants Is a SNP’s pattern of variation inconsistent with the QTL’s pattern of action ? Is a functional assay’s distribution inconsistent with the QTL’s pattern of action ?

Merge Analysis Yalcin et al 2005 Genetics Require sequence of HS founders –Determine all variants and their strain distribution patterns (SDP) Don’t genotype every variant in the HS –Instead predict genotypes in HS at all variants based on a sparse skeleton of genotypes

Merge Analysis A variant v will partition the HS founder strains into 2 or more groups, depending on its strain distribution pattern (SDP) If p is functional for the trait then the strain effects at the QTL must be identical for strains with the same allele. –so if merging founders according to v’s SDP destroys significance then we reject v

Merge Analysis Model Comparison p iL (s,t) = Prob( animal i is descended from strains s,t at locus L) Replace strains s,t by merged pseudo-strains g,h –Add together probabilities for strains with the same allele –Phenotypic effect of merged strains g,h is  (g,h) v iL (g,h) = Prob( animal i is descended from merged strains g,h at locus L) Compare fits of nested models E(y i ) =  s,t p iL (s,t)T(s,t)+  i unmerged E(y i ) =  g,h v iL (g,h)  (g,h)+  i merged E(y i ) =  i null Require no significant difference between merged and unmerged models, –and for both to be significant compared to null model

Merge Analysis Open Field Activity, Chr 1

Merge Analysis rgs18

Functional Merge Analysis Measure functional assay on HS founders –F L (t) is value at locus L on founder s –e.g. gene expression Expected value in HS is E(f i ) =  s,t p iL (s,t)[F(s) + F(t)] assuming additivity If assay is related to phenotype y then E(y i ) =  E(f i ) +  i Compare nested models (thanks to Chris Holmes) E(y i ) =  s,t p iL (s,t)T(s,t) +  i unmerged E(y i ) =  s,t p iL (s,t)[F(s) + F(t)] +  i merged E(y i ) =  i null Require no significant difference between merged and unmerged models, –and for both to be significant compared to null model

model difference logp Using Gene Expression in HS founders

Future Work

Extensions to basic model Generalised linear models Multivariate data Mixture Models, EM (Chris Holmes) Family Effects, Variance Components, REML (Peter Visscher, Allan McRae) Gene Annotation Data (Kate Elliot) Multiple QTL models Epistasis Pleiotropy