Complex trait analysis, develop- ment, and genomics

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

The genetic dissection of complex traits
Complex trait analysis, develop- ment, and genomics The Complex Trait Consortium and the Collaborative Cross Rob Williams, Gary Churchill, and members.
The Inheritance of Complex Traits
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Quantitative Genetics
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Quantitative Genetics
Methods of Genome Mapping linkage maps, physical maps, QTL analysis The focus of the course should be on analytical (bioinformatic) tools for genome mapping,
Natural Variation in Arabidopsis ecotypes. Using natural variation to understand diversity Correlation of phenotype with environment (selective pressure?)
Multifactorial Traits
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Regulation of gene expression in the mammalian eye and its relevance to eye disease Todd Scheetz et al. Presented by John MC Ma.
CS177 Lecture 10 SNPs and Human Genetic Variation
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Quantitative Genetics
QTL Mapping in Heterogeneous Stocks Talbot et al, Nature Genetics (1999) 21: Mott et at, PNAS (2000) 97:
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Population Dynamics Humans, Sickle-cell Disease, and Malaria How does a population of humans become resistant to malaria?
Mouse BIRN CORE 4: Applications Background test slide.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Genetic correlations and associative networks for CNS transcript abundance and neurobehavioral phenotypes in a recombinant inbred mapping panel Elissa.
What is a QTL? Quantitative trait locus (loci) Region of chromosome that contributes to variation in a quantitative trait Generally used to study “complex.
GENOME ORGANIZATION AS REVEALED BY GENOME MAPPING WHY MAP GENOMES? HOW TO MAP GENOMES?
Simple-Sequence Length Polymorphisms
Identifying candidate genes for the regulation of the response to Trypanosoma congolense infection Introduction African cattle breeds differ significantly.
Single Nucleotide Polymorphisms (SNPs
EQTLs.
Genomic Analysis: GWAS
Allan Balmain, Hiroki Nagase  Trends in Genetics 
Mendel and Inheritance
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
University of Tennessee-Memphis
Genetical Genomics in the Mouse
A multi-strain, high-resolution mouse haplotype map reveals three distinctive genetic signatures Laboratory of Population Genetics.
upstream vs. ORF binding and gene expression?
Sunday, Tuesday & Thursday 2-3
Linked genes.
Genetic Variation Genetic Variation in Populations
Quantitative traits Lecture 13 By Ms. Shumaila Azam
Complex trait analysis beyond QTL mapping webqtl.org
Human Cells Human genomics
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Gene mapping in mice Karl W Broman Department of Biostatistics
Mouse Genetics for Complex Trait Research
Peter John M.Phil, PhD Atta-ur-Rahman School of Applied Biosciences (ASAB) National University of Sciences & Technology (NUST)
Recombination (Crossing Over)
Relationship between Genotype and Phenotype
Genes may be linked or unlinked and are inherited accordingly.
Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Biology, 9th ed,Sylvia Mader
The Chromosomal Basis of Inheritance
Population Dynamics Humans, Sickle-cell Disease, and Malaria
Mapping Quantitative Trait Loci
The ‘V’ in the Tajima D equation is:
Inferring Genetic Architecture of Complex Biological Processes Brian S
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Medical genomics BI420 Department of Biology, Boston College
Biology, 9th ed,Sylvia Mader
Chapter 7 Beyond alleles: Quantitative Genetics
Medical genomics BI420 Department of Biology, Boston College
Evan G. Williams, Johan Auwerx  Cell 
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Restriction Fragment Length Polymorphism (RFLP)
Modes of selection.
Identifying Novel Genes for Atherosclerosis through Mouse-Human Comparative Genetics  Xiaosong Wang, Naoki Ishimori, Ron Korstanje, Jarod Rollins, Beverly.
Cancer as a Complex Genetic Trait
Presentation transcript:

Complex trait analysis, develop- ment, and genomics The Complex Trait Consortium and the Collaborative Cross Rob Williams, Gary Churchill, and members of the Complex Trait Consortium Background test slide.

Elias Zerhouni: The NIH Roadmap. Science 302:63 (2003) Material included in handouts and on the CD also see www.complextrait.org “Solving the puzzle of complex diseases, from obesity to cancer, will require a holistic understanding of the interplay between factors such as genetics, diet, infectious agents, environment, behavior, and social structures.” Zerhouni quote from Science article Elias Zerhouni: The NIH Roadmap. Science 302:63 (2003)

What is the CTC A group of ~150 mouse geneticists most of whom have interests in pervasive diseases and differences in disease susceptibility. General Aim: Improve resources for complex trait analysis using mice. Main catalysts and models ENU mutagenesis programs Sequencing and SNP consortia Catalyze genotyping of strains Simulation studies of crosses Planning a collaborative cross Improved use of resources Project 1. Catalyzed and assisted in SNP and SSLP analysis of 48 to 68 strains. Project 2. Initiated simulation studies of novel types of crosses by Broman, Churchill, Gaile, Flint and Mott and Williams Project 3. The development of a collaborative cross 4. Improved resource, data, methods, and idea sharing.

The short chronology of the CTC Established Nov 2001, Edinburgh (n = 20) 1st CTC Conference, May 2002, Memphis (n = 80; hosted by R Williams) CTC Collaborative Cross design workshop, Aug 2002, JHU (K Broman and R Reeves, host) CTC Satellite meeting at IMGC Nov 2003 (n = 40) 2nd CTC Conference, July 2003, Oxford (n = 80; hosted by R Mott and J Flint) CTC strain selection workshop, Sept 2003 (M Daly, host) 3rd CTC Conference, July 2004, TJL CTC chronology

Lusis et al. 2002: Genetic Basis of Common Human Disease Are mouse models appropriate? Yes and No. “If you want to understand where the war on cancer has gone wrong, the mouse is a pretty good place to start.” –Clifton Leaf Fortune, March 2004 Lusis et al. 2002: Genetic Basis of Common Human Disease

Mixing mouse genomes (reluctantly) Current practice: Keep it simple: high power with low n

Genetic dissection Vp = Vg + Ve Vp = Vg + Ve + 2(Cov GE) + GXE + Vtech Aim 1: Convert genetic variation into a small set of responsible gene loci called QTLs. Aim 2: Develop mechanistic insights into virtually any genetically modulated process or disease. The equation should be Vp= Vg + Ve + 2(COV of g by e) + Vi(g+e) + Vt Note: Only a subset of genes will be variable in a particular sample populations. The fixed invariant genes are initially genetically invisible. We are tracking down the genes associated with “natural” variation. These genes are critical in the clinic--they are the susceptibility genes. These normal variants may not have big effects. The size of the gene effect is NOT a measure of its importance. If fact, one can argue that the most important genes will not tolerate much sequence variation.

20 generations brother-sister matings Standard RI strains Standard recombinant inbred strains (RI) female male C57BL/6J (B) DBA/2J (D) BXD fully inbred chromosome pair isogenic F1 hetero- geneous F2 20 generations brother-sister matings BXD RI Strain set Recombined chromosomes are needed for mapping Inbred Isogenic siblings + … + BXD2 BXD80 BXD1

Proposal for a Collaborative Cross www.complextrait.org

Integrative and cumulative analysis/synthesis physiology anatomy pathology development pharmacokinetics endocrine profile immune 1K Reference environment response pathogens Population metabolism epigenetic modifications proteomics Meta- cancer transcriptome analysis susceptibility

Design criteria for a Collaborative Cross Broad utility: a resource that combines diverse haplotypes and that harbors a broad spectrum of alleles Freedom from genotyping. Lowering the entry barrier into this field Unrestricted access to strains, tissues, data, and statistical analysis suites (on-line mapping) Improved power and precision for trait mapping. Epistasis! Powerful new approaches to analysis of complex systems. Pleiotropy Analysis of gene-by-environment interactions A systems biology resource A new type of complex animal model to study common human diseases

A set of 420 RI lines

Mapping with sequence data in hand

B6 and D2 haplotype contrast map of Chr 1 Three pairs of homologous chromosomes from three strains of mice (strains 1, 2, 3) are shown with color-coding. There are green and red regions, which geneticists would call a haplotype. Genes in stretches of one color are also inherited in blocks. There will be as many as 1000 genes in each of these blocks, and we referred to the common derivation of a chunk of a chromosome as linkage disequilibrium. This is just an intimidating way of saying that genes that are close together on a chromosome tend to stay together and do not assort randomly as expected from the the law of independent assortment. Each part of the genome of the mouse has a unique strain distribution pattern (SDP). The SDP for the QTL in this figure is GRR on the left and hhR on the right. The difference between left and right. The Right side is the F1 hybrid of the left side.

Celera SNP DB

Coincidence analysis !

Integrative and cumulative analysis/synthesis physiology anatomy pathology development pharmacokinetics endocrine profile immune 1K Reference environment response pathogens Population metabolism epigenetic modifications proteomics Meta- cancer transcriptome analysis susceptibility

www.webqtl.org QTL/QT gene Wilt Chamberlain: 7 feet 1 inch Willie Shoemaker: 4 feet 11 inches 1.44-fold Phenotypes: from highly complex such as body size to highly specific, such as transcript expression difference 6 24th QTL/QT gene www.webqtl.org This slide juxtaposes some extraordinarily complex differences between two humans with some very simple differences in allele types at a single gene locus. Neither Wilt Chamberlain or Willie Shumaker would appreciate being called mutants. They are at the extremes of the normal range of variation and they illustrate the extraordinary scope of morphological variation. Biochemical and molecular variation is also significant . Mice have just as much variation at virtually all levels, but do not have such stylish attire. Clinician live with an appreciate the tremendous variation among their patients and their families a(more so than bench scienstists). They are confronted daily by the complex mixture of environmental and genetic factors (and the agene- by-environmental interactions) that produce differences in disease susceptibility, progression, prognosis. Complex trait analysis is a suite of methods that now for the first time makes it possible to extract single genes and loci that contribute to the variation. Those contributions can be complex, contingent, and are usually individually quite small. The number of genes that contribute to the size difference between Wilt and Willie: 10 to 100 genes--10% to 1% each. I’ll illustrate how we use the simple genotypes on the right to help discover these genes.l

Grin2b Cis QTL Trans QTL

Ret mRNA correlations in a small data set

Ret and Sh3d5

Ret GO analysis

The App neighborhood Handdrawn sketch of the App neighborhood Many of the data types in the previous slide are hot-linked and it is easy to generate a small web of correlations between any transcript of interest and many other transcripts. In this case, we have used green lines between transcripts that have positive correlations, and red lines between transcripts that have negative correlations. Correlations have been multiplied by 100. The correlation of 0.96 between App and Hsp84-1 reads 96. These are Pearson product moment correlations and they are sensitive to outliers. If you prefer, you can recompute Spearman rank order correlations. Where did Ndr4 (lower left) come from? It is not in the list in the previous slide. Actually it is. Nomenclature changes rapidly. If you click on R74996 in the previous slide (the active webqtl version) you will see that it now has a new symbol and name. What are all of the conventions in this correlation network sketch. The official gene symbol = App 2. OUr estimate of the location of these gene in the Mouse Genome Sequencing Consoritum version 3 build (MGSCv3). Chromosome followed by the megabase position relative to the centromere. (Mice only have one chromosome arm so this is an unambiguous coordinate. ) 3. The pair of numbers: top is the highest expression among the strain set. The lower number is the lowest expression of that transcript among the strain set. 4. Vertical number on the right side of each box: this is the probe set ID given by Affymetrix. We have truncated these probe set IDs so you will not see the usual “at”. A single gene may be represented by more than 10 probe sets. Thus this ID number is essential to identify the actual data source. 5. Lower right corner: a two digit number followed by plus and minus signs. These numbers are the correlation value (absolute value) of the 100th best correlated transcript. The plus and minus signs indicate the mean polarity of the correlations. 6. The set of numbers that read 2@140* etc. These are the approximate locations of additive effect QTLs detected by WebQTL that we will describe in other slides. Read this as: App has a suggestive QTL on Chr 2 at about 140 Mb and the D allele inherited from DBA/2J confirms a higher expression level at this marker. If there is no star symbol, then it is not even formally “suggestive” but does make an interesting looking blip on the QTL radar screen.

Associational Networks QTL networks add layer of shared causality

Integrative and cumulative analysis/synthesis physiology anatomy pathology development pharmacokinetics endocrine profile immune 1K Reference environment response pathogens Population metabolism epigenetic modifications proteomics Meta- cancer transcriptome analysis susceptibility

Cost Components: 24–28 M over 7–8 yrs Per diem for 8,000 to 10,000 cages (~1500 K/year) Genotyping intermediate generations (~500 K/year) Prospective tissue harvesting and cryopreservation (~500 K/year) Molecular phenotyping of select tissue as proof-of-principle (500 K/year) Bioinformatics, statistical modeling, administration, colony management (~500 K/year) Cryopreservation of final lines at F25+ (~200 K) Sequencing of parental strains (unfunded)

NIH Portfolio

Collaborators Ken Manly (UTHSC) David Threadgill (UNC Chapel Hill) Bob Hitzemann (OHSU) Gary Churchill (TJL) Fernando Pardo Manuel de Villena (UNC) Karl Broman (JHU) Dan Gaile (SUNY Buffalo) Kent Hunter (NCI) Jay Snoddy (ORNL) Jim Cheverud (Wash U) Tim Wiltshire (GNF) Lu Lu Elissa Chesler David Airey Siming Shou Jing Gu Yanhua Qu Supported by: NIAAA-INIA Program, NIMH, NIDA, and the National Science Foundation (P20-MH 62009), NEI, a Human Brain Project and the William and Dorothy Dunavant Endowment.

Chromosome models B D F1 F2 F2 ➊ ➊ ➊ ➊ ➊ ➊ ➊ ➊ ➊ ➊ Each marker samples piece of chromosome (~50 Mb). The geno-type/haplotype can be estimated for $50 per case. ➊ ➋ ➌ ➍ BDHBH at D1Mit004 Genotyping is simple now and reliant on PCR reactions and microsatellites. Trinucleotide repeat expansion disease due to difficulties that polymerase has with slippery-boring repeats. Much more polymorphic than the SNPs that everyone is talking about these days. NEXT SLIDE: RI strains: What were to happen if we cloned every F2 mouse. Then we would have terrific advantages. We could use the same mice to map different traits. We could study the development of a particular genotype of mouse. We could see how particular genotypes respond to particular treatments. We could explore the interaction between genes and environment with a fixed genetic resource. That is what RI strains are good for. F1-CACACACA-R1 F1-CACA-R1 ➍ BDHBHBDHBBBHHHDBHHHHBHHH Genotypes as a vector

Mouse Strain Lineages China Japan Castle’s Mice Swiss Wild C57-related Other Strains Wild Derived

Fernando’s Interval on Chr 11 Cct6b Ap2b1 b CAST/Ei CASA/Rk CALB/Rk JF1/Ms MOLC/Rk CZECHI/Ei PWK/Ph SKIVE/Ei PERA/Ei PERC/Rk ZALENDE/Ei TIRANO/Ei LEWES/Ei RBA/Dn DDK/Pas BALB/cJ C57BL/6J DBA/2J A/J 129X1/SvJ Fernando Pardo Manuel de Villena (March 2004) c 50 40 # of SNPs/10kb 30 20 10 100 200 300 400 500 600

Visualizing QTLs Cerebellar Size QTLs Significant collective effects Modest individual effects Clues to CNS developmen t and even to abnormalitie s such as autism Interest here because of the curious association of autism with somewhat larger than normal cerebellum. This is a case where the QTLs that we are discovering are being passed to colleagues studying the genetics of autism in humans to see if we can track down some good candidate genes. It is a longshot, but… The Chr 8 QTL may be a gene characterized by Jim Morgan at St. Jude Children’s Research Hospital called Cerebellin 1 that has remarkably high expression in the cerebellum. Airey et al, J Neurosci 2001

Information flow Structure Activity Function Physiomics Proteomics Cell-cell Interactions, Tissue- Dynamics Systems Biology and Biological Process: Physiology, Pharmacology, Pathology, Clinical Outcome Physiomics Proteomics Expression Structure Interaction Localization Pathways Genomics Cis-Regulation mRNA Expression cell type specificity Transcripts GeneticsQTL Genes

Variation at different levels of organization Whole body-brain Functional regions Behaviors Cell populations Single cell properties Protein expression Transcript level We really want to get down to Levels 3 and 4, but we should work down from higher levels. Variation in cell number may actually be due to a global variation: whole body size differences or whole brain differences. We need the basic data before we can explore the more refined phenotypes. In this slide on the right side I illustrate on line of research supported by the NIMH in which we are mapping and characterizing the genes (or gene loci) that modulate the size of different cell populations in the brain. Here is red we are interested in the population of dentate granule cells in the hippocampus. These are the only cortical cell population of neurons that are generated at a low level even in adult humans. Virtually all other neuron populations have no renewal capacity. The question we are asking is what genes make this cell population or and brain region so unusual. Can we discover the gene that modulate the proliferation of these neurons using a model such as the mouse?