Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage.

Similar presentations


Presentation on theme: "Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage."— Presentation transcript:

1 Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage B.Assembly 1.Draft assembly 2.Gap closure C.Annotation 1.Gene, intron, RNA prediction 2. De novo vs. homology-based prediction 3.Assessing confidence D.Comparison 1.Comparing gene content, lineage specific gene loss, gain, emergence 2.Comparing genome structure (chromosomes, breakpoints, etc) 3.Comparing evolutionary rates of change (rates of amino-acid, nucleotide substitution) 1

2 Anatomy of a Genome Project: non-Model challenges A.Sequencing 1. De novo vs. ‘resequencing’ … resequencing not possible without a close, syntenic relative 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage … need high coverage and long reads (or mate-pair reads to assemble) B.Assembly 1.Draft assembly 2.Gap closure … time consuming no matter what C.Annotation 1.Gene, intron, RNA prediction 2. De novo vs. homology-based prediction 3.Assessing confidence De novo predictions challenging if gene models are different in your species … can rely less on homology for identifications and assessing confidence D.Comparison 1.Comparing gene content, lineage specific gene loss, gain, emergence 2.Comparing genome structure (chromosomes, breakpoints, etc) 3.Comparing evolutionary rates of change (rates of amino-acid, nucleotide substitution) 2

3 The power of comparison For many non-model organisms, most of the predicted genes will be uncharacterized & may not have homology to known genes. But Comparison within and between species can still reveal interesting features 1.Comparing gene content, lineage specific gene loss, gain, emergence 1.Comparing genome structure (chromosomes, breakpoints, etc) 1.Comparing evolutionary rates of change (rates of amino-acid, nucleotide substitution) 1.Comparing population data (SNPs, expression response, phenotypic variation … mapping studies) 3

4 Science April 25, 2014 Tsetse fly: blood feeding insect that gives birth to live larvae & ‘lactates’ - 366 Mb genome = double the size of Drosophila melanogaster - Identified orthologs across 5 insects … comparison of ortholog presence/absence suggests unique evolutionary trajectories - blood feeding evolved independently 12 times in Diptera … identified shared proteins unique to several blood-suckers -Some gene families have been expanded, others contracted in numbers … functional annotations (“GO” = gene ontology predictions) suggestion selection 4

5 -sequenced 4 bat genomes & compared orthologs across 22 mammals -used phylogenetic analysis and protein trees to identify cases of lineage-spec. evolution 5

6 To detect convergent evolution, look for proteins with unusual sequence relationships Found ~2,300 genes with signatures of convergent evolution. * enriched for genes linked to hearing, ear development, and … vison 6

7 The power of comparison For many non-model organisms, most of the predicted genes will be uncharacterized & may not have homology to known genes. But Comparison within and between species can still reveal interesting features 1.Comparing gene content, lineage specific gene loss, gain, emergence 1.Comparing genome structure (chromosomes, breakpoints, etc) 1.Comparing evolutionary rates of change (rates of amino-acid, nucleotide substitution) 1.Comparing population data (SNPs, expression response, phenotypic variation … mapping studies) 7

8 8

9 Evolutionary Genetics Recap 9

10 * Duplication facilitates change - Duplications can be tandem, segmental, or whole genome - Most duplications lost quickly through neutral (or selective) processes - Facilitates subfunctionalization and neofunctionalization - Baker et al. 2013 paper: paralog interference could drive evolution - Benefits of duplication operate at all levels - Gene duplication novel functions - Gene duplication for novel regulation - Gene duplication for novel network rewiring - Regulatory element duplication for novel gene regulation - Regulatory protein duplication for novel module regulation - Regulatory system duplication for novel network rewiring Evolutionary Genetics: Recurring Themes 10

11 Evolutionary Genetics: Recurring Themes * Biological systems are more plastic than we might think - Much of the genome is under constraint from evolution  purifying selection removes variation - Many features of cellular systems appear to evolve, even if the cellular function or output is conserved  stabilizing selection can explain poor conservation of important features, if the cell finds a ‘quick fix’ to maintain the phenotype Examples: pervasive evidence of positive selection in fly and rodent coding genes … transcription factor binding-site turnover … phospho-site turnover … genetic/protein rewiring??  strongest constraints may promote whole-sale rewiring as stabilizing evolution (e.g. rewiring of ribosomal protein regulon) De novo genes also appear to emerge frequently from the genomic ether 11

12 Evolutionary Genetics: Recurring Themes * Evolutionary pressures vary over time and space Neutral variation can suddenly become advantageous … therefore accumulation of neutral variation can be a future conduit Deleterious polymorphisms can be stabilized in the presence of other polymorphisms splitting up alleles by recombination can unmask deleterious alleles 12

13 * Use a model for null/neutral expectation for your tests - Likelihood ratio: comparing how likely one model is versus another QTL analysis motif model vs background model selection model vs neutral model etc, etc, etc - Random sampling or simulations to assess what you expect by chance - More complicated simulations (eg. coalescence) This is especially true for whole-genome scans … many things look striking until you do the statistics Evolutionary Genetics: Recurring Themes 13

14 * Value of a phylogenetic perspective - use the tree if you have one * may not be the same tree across the entire genome - inferring the state of the common ancestor can aid in analysis Can be very useful for inferring evolutionary trajectory, timing, order of events Evolutionary Genetics: Recurring Themes 14

15 * Control for co-variates Example: controlling for expression levels re. rate of protein evolution Often hard to know what to even look/control for * Best evidence if >1 test is significant * Know your dataset Know how the data were collected, what types of noise are associated e.g. genome sequences by short-read deep sequencing protein-protein interaction data Evolutionary Genetics: Recurring Themes 15

16 Evolutionary Genetics: Remaining Questions & Challenges What is the relative contribution of adaptive vs. neutral evolution? Epistasis & Environmental interactions - how much does epistasis contribute in nature? - challenges associated with gene-gene/gene-environment signals Detecting signatures of selection, esp. recent/transient - human evolution - how will tests, statistics, caveats change with 10,000 genomes? What is the relative contribution of regulatory vs. coding evolution? What features contribute to the evolution of new forms and functions? 16


Download ppt "Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage."

Similar presentations


Ads by Google