Presentation is loading. Please wait.

Presentation is loading. Please wait.

TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.

Similar presentations


Presentation on theme: "TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT."— Presentation transcript:

1 TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT CCCTGTTTCCAGGTTTGTTGTCCCAAAATAGTGACCATTTCATATGTATA Comparative Genomics Function

2 Overview I. Comparing genome sequences Concepts and terminology Methods  Whole-genome alignments  Quantifying evolutionary conservation (PhastCons, PhyloP)  Identifying conserved elements Available datasets at UCSC II. Comparative analyses of function Evolutionary dynamics of gene regulation Case studies Insights into regulatory variation within and across species

3 Distribution of evolutionary constraint in the human genome Lindblad-Toh et al. Nature 478:476 (2011) 4.2% of genome is putatively constrained ~1 million putative regulatory elements

4 Infer the course of past evolution using statistical models of sequence evolution Identify sequence elements evolving more slowly or more rapidly than neutral Evaluate the precise degree of constraint on specific positions Predict the functional effects of nucleotide or amino acid mutations in constrained sequences Goals of comparative genomics

5 Cooper and Shendure, Nat Rev Genet 12:628 (2011) Changes to: Methylation patterns Transcription factor binding Histone modification states Gene expression levels Functional variation in the genome

6 Vertebrate genomes available for comparative studies Primates Mammals Tetrapods Vertebrates

7 Commonly used (and misused) terms Mutation vs. Substitution Mutations occur in individuals, segregate in populations Substitutions are mutations that have become fixed Mutations = within species; substitutions = between species Conservation vs. Constraint Conservation = an observation of sequence similarity Constraint = a hypothesis about the effect of purifying selection Homology, Orthology and Paralogy Homologous sequences = derived from a common ancestor Orthologous sequences = homologous sequences separated by a speciation event (e.g., human HOXA and mouse Hoxa) Paralogous sequences = homologous sequences separated by gene duplication (e.g., human HOXA and human HOXB)

8 Basic premises in comparative sequence analysis Most mutations that affect function are eliminated by purifying selection Constrained elements have lower substitution rates than expected from the neutral rate Contingent on the effect of the mutation and degree of constraint on the function Manifests as sequence conservation, even among distant species Beneficial mutations may be driven to fixation by positive selection May be detected as “faster-than-neutral” substitution rate Expected to be rare Most sequence differences among genomes are neutral Involve substitutions with minimal or no functional impact Fixed by random genetic drift Fixation rate is equal to mutation rate Genomes become more dissimilar with greater phylogenetic distance

9 Phylogenies Phylogenetic trees show two things: Evolutionary relationships among species or sequences: branching order Evolutionary distance (e.g., degree of similarity or divergence): branch length Internal node Terminal node Branch

10 Phylogenies Phylogenetic trees show two things: Evolutionary relationships among species or sequences: branching order Evolutionary distance (e.g., degree of similarity or divergence): branch length Species treeGene tree

11 Orthologs and paralogs in gene trees Capra et al. 2013 HMGCS1 HMGCS2

12 Orthologs and paralogs in gene trees Capra et al. 2013 Orthologs Paralogs Duplication

13 Orthologs and paralogs in gene trees Capra et al. 2013 1:1 Orthologs Human HMGCS1 Human HMGCS2 1:2

14 Ortholog assignments at Ensembl

15

16

17 Steps in sequence comparisons Sequence alignment Global vs. local Whole-genome vs. genome segments (e.g., genes) Identify sites that are homologous (not necessarily identical) Measure similarity and divergence of sequences Sequence similarity – level of conservation Rates of change among sequences - divergence Infer degree of evolutionary constraint Are the sequences more conserved than expected from neutral evolution?

18 Rates of sequence change are estimated using models of the substitution process       Transition probabilities:

19 Phylogeny        Substitution rates are calculated for each lineage in a sequence phylogeny

20 Conserved noncoding sequences identified by local reductions in substitution rate aligned position   local  neut

21 Tools for quantifying evolutionary conservation across genomes Alignment: Multiz Generates multiple species alignment relative to a base genome Constructed from pairwise alignment of individual genomes to reference 46-way and 100-way alignment to hg19, 30-way to mm9; 60-way to mm10

22 100-way Multiz alignment in hg19 Green = level of sequence similarity at each site

23 Conservation of synteny: “net” alignments Conservation of genome segments Order and orientation of genes and regulatory sequences

24 Conservation of synteny: “net” alignments Synteny is frequently conserved on megabase scales

25 Tools for quantifying evolutionary conservation across genomes PhastCons Estimates the probability that a nucleotide belongs to a conserved element Sensitive to ‘runs’ of conserved sites – effective for identifying conserved blocks For hg19, elements are calculated at three phylogenetic scopes (Vertebrate, Placental Mammal, Primate) PhyloP Measures conservation independently at individual positions Provides per-base conservation scores: (-log p value under hypothesis of neutrality) Positive scores suggest constraint; negative scores suggest accelerated evolution Alignment: Multiz Generates multiple species alignment relative to a base genome Constructed from pairwise alignment of individual genomes to reference 46-way and 100-way alignment to hg19, 30-way to mm9; 60-way to mm10

26 Identifying conserved elements: PhastCons PhastCons scores PhastCons elements lod score: log probability under conserved model – log probability under neutral model Score: normalized lod score on 0-1000 scale Use scores to rank elements by estimated constraint lod: 882 Score: 694

27 PhastCons elements estimated at 3 phylogenetic scopes Primate Placental Vertebrate

28 Level of conservation decays with increasing evolutionary distance

29 PhyloP: measuring basewise conservation PhyloP scores Scores are calculated independently for each base Scores are –log P values under hypothesis of neutral evolution Positive scores = constraint Negative scores = acceleration

30 Per-site phyloP conservation scores 4.491.77-0.96 Use PhastCons to identify conserved elements Use phyloP to evaluate individual sites within elements

31 Accessing conservation data

32 Multiple genome alignments and conservation metrics are calculated independently for each reference genome Orthologous region in mouse: 30-way multiz alignment

33 Conservation identifies critical binding sites in regulatory elements Regulatory info (ENCODE) Conservation Important binding sites and variants that affect function will be here

34 Furey and Sethupathy, Science 2013 Genetic drivers of gene regulatory variation

35 H3K4me2 H3K27ac H3K4me2 H3K27ac Comparative analysis of ChIP-seq datasets Human Mouse Compare TF binding, histone modifications, DNase hypersensitivity in equivalent tissues Requires a statistical framework to reliably quantify changes in ChIP-seq signals

36 Input data are noisy: ChIP-seq, RNA-seq data are signal based, subject to considerable experimental variation Using comparable biological states within and across species (e.g., human liver vs. mouse liver) = variation across tissues? How do epigenetic states and gene expression diverge among individuals and across species (Neutral? Constrained?) Can we identify variants or substitutions that drive regulatory changes? Issues in comparative functional genomics

37 Science 342: 747 (2013) 10 human lymphoblastoid cell lines 1 population group ( Nigerian) All analyzed by HapMap and 1000 Genomes Targets: RNA Polymerase II H3K4me1, H3K4me3, H3K27ac, H3K27me3 DNase hypersensitivity

38 Measuring allelic imbalance in histone modification profiles G allele T allele Need to map reads reliably to individual alleles ChIP-seq reads Allelic imbalance

39 Cis-quantitative trait loci ~1200 identified


Download ppt "TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT."

Similar presentations


Ads by Google