Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage.

Slides:



Advertisements
Similar presentations
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Advertisements

Whole Genome Duplications (Polyploidy) Made famous by S. Ohno, who suggested WGD can be a route to evolutionary innovation (focusing on neofunctionalization)
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Signatures of Selection
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Office hours Wednesday 3-4pm 304A Stanley Hall Review session 5pm Thursday, Dec. 11 GPB100.
EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Active Lecture Questions for BIOLOGY, Eighth Edition Neil Campbell & Jane Reece Questions prepared by Jung Choi, Georgia Institute of Technology Copyright.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Comparison of Drosophila Genomes Li-Lun, Ho. D. melanogaster vs. D. yakuba D. yakuba genome is assembled in Apr, D. yakuba genome has 14 times higher.
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Evolution Chapters Evolution is both Factual and the basis of broader theory What does this mean? What are some factual examples of evolution?
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Vestigial Features What are vestigial features? What are some vestigial features in humans? Is it possible for human babies to be born with gills?
Mehdi Layeghifard Evolutionary Mechanisms Underlying the Functional Divergence of Vertebrates’ Circadian Rhythm Genes.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 21 Eukaryotic Genome Sequences
Click to edit Master title style Click to edit Master subtitle style CLICKER QUESTIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry,
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Patterns of divergent selection from combined DNA barcode and phenotypic data Tim Barraclough, Imperial College London.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Identification of Copy Number Variants using Genome Graphs
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Selectionist view: allele substitution and polymorphism
Table 8.3 & Alberts Fig.1.38 EVOLUTION OF GENOMES C-value paradox: - in certain cases, lack of correlation between morphological complexity and genome.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
1 Before considering selection, it’s important to characterize how gene expression varies within and between species. What evolutionary forces act on gene.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Chapter 22 Descent with Modification: A Darwinian View.
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
1 How do regulatory networks evolve? Module = group of genes co-regulated by the same regulatory system * Evolution of individual gene targets Gain or.
Last time … * Constraint on transcription factor binding sites Sites with the most ‘information content’ generally evolve slowest * Stabilizing selection.
Change in Pufs and their RNA InteractionsAnalogous change in transcription factors and their gene regulation Puf binding specificity tends to be conserved.
(Quantitative, Evolution, & Development)
Evolutionary genomics can now be applied beyond ‘model’ organisms
Evolution of eukaryotic genomes
Gil McVean Department of Statistics
Evolution of gene function
Genetics and Evolutionary Biology
Statistical Applications in Biology and Genetics
Very important to know the difference between the trees!
Today… Review a few items from last class
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Gene duplications: evolutionary role
Evolutionary genetics
Hannah K. Long, Sara L. Prescott, Joanna Wysocka  Cell 
Study phylogeny in the context of species evolution
Presentation transcript:

Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage B.Assembly 1.Draft assembly 2.Gap closure C.Annotation 1.Gene, intron, RNA prediction 2. De novo vs. homology-based prediction 3.Assessing confidence D.Comparison 1.Comparing gene content, lineage specific gene loss, gain, emergence 2.Comparing genome structure (chromosomes, breakpoints, etc) 3.Comparing evolutionary rates of change (rates of amino-acid, nucleotide substitution) 1

Anatomy of a Genome Project: non-Model challenges A.Sequencing 1. De novo vs. ‘resequencing’ … resequencing not possible without a close, syntenic relative 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage … need high coverage and long reads (or mate-pair reads to assemble) B.Assembly 1.Draft assembly 2.Gap closure … time consuming no matter what C.Annotation 1.Gene, intron, RNA prediction 2. De novo vs. homology-based prediction 3.Assessing confidence De novo predictions challenging if gene models are different in your species … can rely less on homology for identifications and assessing confidence D.Comparison 1.Comparing gene content, lineage specific gene loss, gain, emergence 2.Comparing genome structure (chromosomes, breakpoints, etc) 3.Comparing evolutionary rates of change (rates of amino-acid, nucleotide substitution) 2

The power of comparison For many non-model organisms, most of the predicted genes will be uncharacterized & may not have homology to known genes. But Comparison within and between species can still reveal interesting features 1.Comparing gene content, lineage specific gene loss, gain, emergence 1.Comparing genome structure (chromosomes, breakpoints, etc) 1.Comparing evolutionary rates of change (rates of amino-acid, nucleotide substitution) 1.Comparing population data (SNPs, expression response, phenotypic variation … mapping studies) 3

Science April 25, 2014 Tsetse fly: blood feeding insect that gives birth to live larvae & ‘lactates’ Mb genome = double the size of Drosophila melanogaster - Identified orthologs across 5 insects … comparison of ortholog presence/absence suggests unique evolutionary trajectories - blood feeding evolved independently 12 times in Diptera … identified shared proteins unique to several blood-suckers -Some gene families have been expanded, others contracted in numbers … functional annotations (“GO” = gene ontology predictions) suggestion selection 4

-sequenced 4 bat genomes & compared orthologs across 22 mammals -used phylogenetic analysis and protein trees to identify cases of lineage-spec. evolution 5

To detect convergent evolution, look for proteins with unusual sequence relationships Found ~2,300 genes with signatures of convergent evolution. * enriched for genes linked to hearing, ear development, and … vison 6

The power of comparison For many non-model organisms, most of the predicted genes will be uncharacterized & may not have homology to known genes. But Comparison within and between species can still reveal interesting features 1.Comparing gene content, lineage specific gene loss, gain, emergence 1.Comparing genome structure (chromosomes, breakpoints, etc) 1.Comparing evolutionary rates of change (rates of amino-acid, nucleotide substitution) 1.Comparing population data (SNPs, expression response, phenotypic variation … mapping studies) 7

8

Evolutionary Genetics Recap 9

* Duplication facilitates change - Duplications can be tandem, segmental, or whole genome - Most duplications lost quickly through neutral (or selective) processes - Facilitates subfunctionalization and neofunctionalization - Baker et al paper: paralog interference could drive evolution - Benefits of duplication operate at all levels - Gene duplication novel functions - Gene duplication for novel regulation - Gene duplication for novel network rewiring - Regulatory element duplication for novel gene regulation - Regulatory protein duplication for novel module regulation - Regulatory system duplication for novel network rewiring Evolutionary Genetics: Recurring Themes 10

Evolutionary Genetics: Recurring Themes * Biological systems are more plastic than we might think - Much of the genome is under constraint from evolution  purifying selection removes variation - Many features of cellular systems appear to evolve, even if the cellular function or output is conserved  stabilizing selection can explain poor conservation of important features, if the cell finds a ‘quick fix’ to maintain the phenotype Examples: pervasive evidence of positive selection in fly and rodent coding genes … transcription factor binding-site turnover … phospho-site turnover … genetic/protein rewiring??  strongest constraints may promote whole-sale rewiring as stabilizing evolution (e.g. rewiring of ribosomal protein regulon) De novo genes also appear to emerge frequently from the genomic ether 11

Evolutionary Genetics: Recurring Themes * Evolutionary pressures vary over time and space Neutral variation can suddenly become advantageous … therefore accumulation of neutral variation can be a future conduit Deleterious polymorphisms can be stabilized in the presence of other polymorphisms splitting up alleles by recombination can unmask deleterious alleles 12

* Use a model for null/neutral expectation for your tests - Likelihood ratio: comparing how likely one model is versus another QTL analysis motif model vs background model selection model vs neutral model etc, etc, etc - Random sampling or simulations to assess what you expect by chance - More complicated simulations (eg. coalescence) This is especially true for whole-genome scans … many things look striking until you do the statistics Evolutionary Genetics: Recurring Themes 13

* Value of a phylogenetic perspective - use the tree if you have one * may not be the same tree across the entire genome - inferring the state of the common ancestor can aid in analysis Can be very useful for inferring evolutionary trajectory, timing, order of events Evolutionary Genetics: Recurring Themes 14

* Control for co-variates Example: controlling for expression levels re. rate of protein evolution Often hard to know what to even look/control for * Best evidence if >1 test is significant * Know your dataset Know how the data were collected, what types of noise are associated e.g. genome sequences by short-read deep sequencing protein-protein interaction data Evolutionary Genetics: Recurring Themes 15

Evolutionary Genetics: Remaining Questions & Challenges What is the relative contribution of adaptive vs. neutral evolution? Epistasis & Environmental interactions - how much does epistasis contribute in nature? - challenges associated with gene-gene/gene-environment signals Detecting signatures of selection, esp. recent/transient - human evolution - how will tests, statistics, caveats change with 10,000 genomes? What is the relative contribution of regulatory vs. coding evolution? What features contribute to the evolution of new forms and functions? 16