Interpreting the human genome Manolis Kellis CSAIL MIT Computer Science and Artificial Intelligence Lab Broad Institute of MIT and Harvard for Genomics.

Slides:



Advertisements
Similar presentations
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Advertisements

Manolis Kellis: Research synopsis Brief overview 1 slide each vignette Why biology in a computer science group? Big biological questions: 1.Interpreting.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Ab initio gene prediction Genome 559, Winter 2011.
Ka-Lok Ng Dept. of Bioinformatics Asia University
Gene Finding BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
1 Alternative Splicing. 2 Eukaryotic genes Splicing Mature mRNA.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Progressive MSA Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic.
Comparative Motif Finding
Gene Finding Charles Yan.
CSE182-L10 Gene Finding.
CSE182-L12 Gene Finding.
CSE182-L8 Gene Finding. Project EST clustering and assembly Given a collection of EST (3’/5’) sequences, your goal is to cluster all ESTs from the same.
Lecture 12 Splicing and gene prediction in eukaryotes
CSE182-L10 MS Spec Applications + Gene Finding + Projects.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Biological Motivation Gene Finding in Eukaryotic Genomes
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Applications of HMMs Yves Moreau Overview Profile HMMs Estimation Database search Alignment Gene finding Elements of gene prediction Prokaryotes.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Gene Finding BIO337 Systems Biology / Bioinformatics – Spring 2014 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BIO337/Spring.
Comparative genomics for pathogen/vector annotation Manolis Kellis CSAIL MIT Computer Science and Artificial Intelligence Lab Broad Institute of MIT and.
Computational Identification of Drosophila microRNA Genes Journal Club 09/05/03 Jared Bischof.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Gene finding and gene structure prediction M. Fatih BÜYÜKAKÇALI Computational Bioinformatics 2012.
Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation.
Construction of Substitution Matrices
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Protein and RNA Families
SPIDA Substitution Periodicity Index and Domain Analysis Combining comparative sequence analysis with EST alignment to identify coding regions Damian Keefe.
Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M Genome Res 17: example of complexities observed.
TITLE OF PRESENTATION Board of Scientific Counselors January 2007 Your Name.
Mark D. Adams Dept. of Genetics 9/10/04
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
From Genomes to Genes Rui Alves.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Comparative genomics of 24 mammals Manolis Kellis MIT MIT Computer Science & Artificial Intelligence Laboratory Broad Institute of MIT and Harvard.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Chapter 3 The Interrupted Gene.
Advisory Board Meeting, Caltech 2004 Genome Sequence Updates. Paul Davis The Sanger Institute.
(H)MMs in gene prediction and similarity searches.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
A knowledge-based approach to integrated genome annotation Michael Brent Washington University.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Considerations for multi-omics data integration Michael Tress CNIO,
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
Web Databases for Drosophila
Regulation of Gene Expression
bacteria and eukaryotes
The Transcriptional Landscape of the Mammalian Genome
Comparative genomics in flies and mammals
Experimental Verification Department of Genetic Medicine
Very important to know the difference between the trees!
Eukaryotic Gene Finding
Ab initio gene prediction
Recitation 7 2/4/09 PSSMs+Gene finding
Chapter 4 The Interrupted Gene.
Phylogenetic footprinting and shadowing
Study phylogeny in the context of species evolution
Presentation transcript:

Interpreting the human genome Manolis Kellis CSAIL MIT Computer Science and Artificial Intelligence Lab Broad Institute of MIT and Harvard for Genomics in Medicine

32 mammals 17 yeasts 12 flies The age of comparative genomics opossumarmadillorabbitcowhyraxelephant humanmouseratchimpdog batdolphinlemurbushbabypikahedgehogtenrec pangolinTree shrewllama etc...

Resolving power in mammals, flies, fungi Neutral:2.57 subs/site (opp: sps: 4.87 ) Coding:1.16 subs/site Detect:6-mer at FP mammals 17 yeasts 12 flies 8 Candida 9 Yeasts Post-duplication Diploid Haploid Pre-dup P P P P P P Neutral:4.13 subs/site Coding:1.65 subs/site Detect: 6-mer at Neutral:15.5 subs/site (Yeast: 6.5 Candida: 6.5 ) Coding:7.91 subs/site Detect: 3-mer at

Comparative Genomics 101: Conservation  Function Conserved elements are typically functional (and vice versa) –For example: exons are deeply conserved to mouse, chicken, fish Some conserved elements are still uncharacterized –How do we make sense of them? –How do we distinguish each type of functional element Answer: evolutionary signatures (Comp. Genomics 201) –Tell me how you evolve, I’ll tell you who you are –Patterns of change  selective pressures  specific function

Gene identification Study known genes Derive conservation rules Discover new genes Evolutionary signatures –“Tell me how you evolve, i’ll tell you who you are” –Each type of functional elements evolves in its own specific ways

Distinguishing genes from non-coding regions Dmel TGTTCATAAATAAA-----TTTACAACAGTTAGCTG-GTTAGCCAGGCGGAGTGTCTGCGCCCATTACCGTGCGGACGAGCATGT---GGCTCCAGCATCTTC Dsec TGTCCATAAATAAA-----TTTACAACAGTTAGCTG-GTTAGCCAGGCGGAGTGTCTGCGCCCATTACCGTGCGGACGAGCATGT---GGCTCCAGCATCTTC Dsim TGTCCATAAATAAA-----TTTACAACAGTTAGCTG-GTTAGCCAGGCGGAGTGTCTGCGCCCATTACCGTGCGGACGAGCATGT---GGCTCCAGCATCTTC Dyak TGTCCATAAATAAA-----TTTACAACAGTTAGCTG-GTTAGCCAGGCGGAGTGCCTTCTACCATTACCGTGCGGACGAGCATGT---GGCTCCAGCATCTTC Dere TGTCCATAAATAAA-----TTTACAACAGTTAGCTG-CTTAGCCATGCGGAGTGCCTCCTGCCATTGCCGTGCGGGCGAGCATGT---GGCTCCAGCATCTTT Dana TGTCCATAAATAAA-----TCTACAACATTTAGCTG-GTTAGCCAGGCGGAGTGTCTGCGACCGTTCATG------CGGCCGTGA---GGCTCCATCATCTTA Dpse TGTCCATAAATGAA-----TTTACAACATTTAGCTG-CTTAGCCAGGCGGAATGGCGCCGTCCGTTCCCGTGCATACGCCCGTGG---GGCTCCATCATTTTC Dper TGTCCATAAATGAA-----TTTACAACATTTAGCTG-CTTAGCCAGGCGGAATGCCGCCGTCCGTTCCCGTGCATACGCCCGTGG---GGCTCCATTATTTTC Dwil TGTTCATAAATGAA-----TTTACAACACTTAACTGAGTTAGCCAAGCCGAGTGCCGCCGGCCATTAGTATGCAAACGACCATGG---GGTTCCATTATCTTC Dmoj TGATTATAAACGTAATGCTTTTATAACAATTAGCTG-GTTAGCCAAGCCGAGTGGCGCC------TGCCGTGCGTACGCCCCTGTCCCGGCTCCATCAGCTTT Dvir TGTTTATAAAATTAATTCTTTTAAAACAATTAGCTG-GTTAGCCAGGCGGAATGGCGCC------GTCCGTGCGTGCGGCTCTGGCCCGGCTCCATCAGCTTC Dgri TGTCTATAAAAATAATTCTTTTATGACACTTAACTG-ATTAGCCAGGCAGAGTGTCGCC------TGCCATGGGCACGACCCTGGCCGGGTTCCATCAGCTTT ***** * * ** *** *** *** ******* ** ** ** * * ** * ** ** ** ** **** * ** Protein-coding genes have specific evolutionary constraints –Gaps are multiples of three (preserve amino acid translation) –Mutations are largely 3-periodic (silent codon substitutions) –Specific triplets exchanged more frequently (conservative substs. ) –Conservation boundaries are sharp (pinpoint individual splicing signals) Encode as ‘evolutionary signatures’ –Computational test for each of them –Combine and score systematically Splice

Signature 1: Reading frame conservation 30% 1.3% 0.14% 58% 14% 10.2% GenesIntergenic Mutations Gaps Frameshifts Separation 2-fold 10-fold 75-fold  100% 60% 55% 90% 40% 60% 100% 20% 30% 40%  100%  60% RFC

Signature 2: Distinct patterns of codon substitution Codon observed in species 2 Codon observed in species 1 Genes Codon substitution patterns specific to genes –Genetic code dictates substitution patterns –Amino acid properties dictate substitution patterns Codon observed in species 2 Codon observed in species 1 Intergenic

Codon Substitution Matrix (CSM) human mouse aliphatic aromatic negativepolarpositive polar

Signatures 3, 4, 5, 6, 7, etc… Mutation patterns of splicing signals –Real splice acceptor/donor evolve in specific ways Evolution of other motifs associated with splicing –Exonic/Intronic Splicing Enhancers/Silencers (ESE,ESI) –Density of motif clouds surrounding real exons Sharp conservation boundaries –Relative conservation exon vs. surrounding regions Length of longest ‘open’ reading frame –Frequency of stop codons in each frame / each species ISEs ESEs real exon acceptor site donor site

Putting it all together: probabilistic framework Hidden Markov Models (HMMs) –Generative model, learn emission, transition probabilities –Easy to train, hard to integrate long-range signals Conditional Random Fields (CRFs) –Discriminative dual of HMMs, learn weights on features –Easy to integrate diverse signals, gradient ascent for training

From HMMs … to CRFs yiyi y i-1 y i+1 X hidden sequence feature functions F(i-1)F(i)F(i+1) observed

From HMMs … to CRFs Transition and Emission probabilities Generative modelDiscriminative model For example, features can simply be e i and a ij Or pretty much anything:

Running on real genomes Obtain optimal weights (from training set) –Experimentally-defined, genetics, curation, cDNA Apply CRF systematically to new genome –Revisit existing genomes –Annotate new genomes

Power of evolutionary signatures –New genes and exons, dubious genes and exons –Adjust gene boundaries: ATG, frame, splice site, seq errors Signatures more powerful than primary signals –Recognize unusual gene structures  read-through, uORFs, editing Towards a revised genome annotation  Curation: FlyBase integrates prediction with cDNA, protein, literature  Experimentation: BDGP large-scale functional validation novel exons D. simulans D. erecta D. persimilis D. melanog. 579 fully rejected 1,454 exons (~800 genes) 2,499 not aligned +668 exons in 443 genes Revisiting fly genome annotation 10,845 fully confirmed (…)

Systematic application leads to Exon-level changes –Ex 1: New genes –Ex 2: New exons –Ex 3: Dubious genes More subtle changes –Ex 4: Start/end adjustments –Ex 5: Wrong reading frame –Ex 6: Splice site adjustments –Ex 7: Sequencing errors fixed Unusual gene structures –W1: Stop-codon read-through –W2: uORFs & dicistronic –W3: Internal frame-shifts Codon observed in species 2 Codon observed in species 1 Genes vs. Intergenic Reading Frame Conservation Codon Substitution Matrix

conserved substitution insertion frameshift gap Example 1: Known genes stand out Sharp conservation boundaries. Known exons stand out. High sensitivity and specificity.

Example 2: Novel multi-exon gene 1,454 novel exons outside known genes –Many cluster in new multi-exon genes –Others are isolated high-confidence exons

Example 2b: Novel exons inside known genes (sorry, this example is from human, mouse, dog, rat) 668 cases in fly –New candidate alternatively spliced gene forms –New protein domains

Novel genes and exons 1,454 novel exons outside existing genes –60% cluster in 300 multi-exon genes –40% isolated exons 668 novel exons inside existing genes –Alternative splicing: Many with cDNA support –Nested genes: Few known examples Human curation –Collaboration with FlyBase –Hundreds of changes in release 5.1, more in 5.2 Systematic experimentation –Sue Celniker and Berkeley Genome Project –Thousands of new genes in the pipeline

Example 3: Dubious single-exon gene Only evidence was an open reading frame –Comparative information much stronger

579 Dubious Genes Classification approach: Yes / No answer –Closely related species: both genes and intergenic aligned –Show very different patterns of mutation Comparative analysis provides negative evidence –Alignment is unambiguous, orthologous, spans entire gene –Sequence shows mutations and indels in every species Weak or missing experimental evidence –100 of these independently rejected by FlyBase –These are missing from systematic clone collections –Only 34 (6%) have assigned names (vs. 36% of all fly genes)

Systematic application leads to Exon-level changes –Ex 1: New genes –Ex 2: New exons –Ex 3: Dubious genes More subtle changes –Ex 4: Start/end adjustments –Ex 5: Wrong reading frame –Ex 6: Splice site adjustments –Ex 7: Sequencing errors fixed Unusual gene structures –W1: Stop-codon read-through –W2: uORFs & dicistronic –W3: Internal frame-shifts Codon observed in species 2 Codon observed in species 1 Genes vs. Intergenic Reading Frame Conservation Codon Substitution Matrix

CG6664/FBtr annotated start codonconserved start codon Example 4: Start codon adjustment Codon substitution patterns suggest new start in 200 genes –Score each substitution using Codon Substitution Matrix (CSM) poor CSM score, atypical substitution high CSM score, protein-like substitution ATG

Annotated ORF (345nt)Real ORF (315nt) Example 5: Gene annotated on wrong reading frame cDNA evidence supports overlapping reading frames, both open –Annotation traditionally selects longer one –Conservation enables distinguishing the two mRNA supports both ORFs Conservation only supports shorter ORF Shorter ORF is the correct one CG7738-RA is incorrect

Example 6: Incorrect splice causes wrong frame Second exon annotated in the wrong frame –Due to splice site boundary error –Correction is supported by cDNA evidence Fix exon boundary First exon: correct frame2 nd exon: incorrect frame

Example 7: Detect seq. errors / strain mutations Insertion/deletion causes frameshift –Conservation signature shifts from ‘frame1’ to ‘frame2’ –All other species disagree with D. melanogaster indel –Sequencing error or species-specific mutation chr3R:6,953,865-6,953,927 (Ugt86Dd) dm CAGTACATATTTGTGGAGAGTTACTTGAAAG-CTTGGCAGCTAAGGGTCATCAGGTGACCGTTA droSec CAGTACATATTTTTGGAGAGCTACTTGAAAGCCTTGGCAGCTAAGGGTCACCAGGTGACCGTTA droSim CAGTACATATTTATGGAGAGCTACTTGAAAGCCTTGGCAGCTAAGGGTCACCAGGTGACCGTTA droYak CAGTACATTTTTGTGGAGACCTACTTGAAAGCCCTGGCAGCCAAGGGTCACCAGGTGACCGTTA droEre CAGTACATTTTTGTGGAGACCTACTTGAAAGCCCTGGCAGCTAGGGGTCACCAGGTGACTGTTA droAna CAGTACATCTTTGTGGAGACCTATCTGAAGGCTTTGGCCGACAAAGGTCACCAGGTGACTGTTA droWil CAATACATATTCATTGAGGCGTATCTAAAGGCATTGGCTGCCAAAGGACATCAGTTAACTGTGA droMoj CAGTACATATTCGCCGAGGCGTATTTGAAGGCGCTAGCAGCCCGGGGCCATGAGGTCACCGTGA droVir CAGTATATATTTGCCGAGTCGTATTTGAAGGCCTTGGCAGCGCGGGGTCATGAGGTGACAGTGA ** ** ** ** *** ** * ** * * ** * ** ** ** * ** ** * Conservation in correct frameConservation in 2 nd frame Frame-shift (sequencing error / recent mutation)

Example 8: Dubious gene is a miRNA transcript Evolutionary signatures reveal specific function

Systematic application leads to Exon-level changes –Ex 1: New genes –Ex 2: New exons –Ex 3: Dubious genes More subtle changes –Ex 4: Start/end adjustments –Ex 5: Wrong reading frame –Ex 6: Splice site adjustments –Ex 7: Sequencing errors fixed Unusual gene structures –W1: Stop-codon read-through –W2: uORFs & dicistronic –W3: Internal frame-shifts Codon observed in species 2 Codon observed in species 1 Genes vs. Intergenic Reading Frame Conservation Codon Substitution Matrix

Unusual genes 1: Stop codon read-through Method #1 (single exons) –112 events, 95 extending known genes  Manual curation: 82 –Enriched in neuronal function Method #2 (after splicing) –256 events, looser cutoff, large overlap, needs manual curation –Enriched in transcription factors Protein-coding conservation Continued protein-coding conservation No more conservation Stop codon read through 2 nd stop codon

Unusual genes 2: Polycistronic messages / uORFs Method –High-scoring ORFs with cDNA evidence –Disjoint from the annotated ORF Results –217 cases Protein-coding conservation in the 5’UTR

Unusual genes 3: Frame-shift in the middle of exons Method –Exons changing high-scoring frame –Far from splice junctions Results –68 cases in 44 genes dm GACTATTTCAACAATCAGCAGCGCGAGCGACACTACCAGCTCCGGCGGCAGAGCCAGCGGCAGACC---TCCGAGATTTGTACCGCCGCCACCGCCTCCGCGTCGCTTGCTCCTCACGCAGACCG droSim GACTATTTCAACAACCAGCAACGCGAGCGACACTACCAGCTCCGGCGGCAGAGCCAGCGGCAGACC---TCCGAGATTTGTACCGCCGCCACCGCCTCCGCGTCGCTTGCTCCTCACGCAGACCG droSec GACTATTTCAACAACCAACAACGCGAGCGACACTACCAGCTCCGGCGGCAGAGCCAGCGGCAGACC---TCCGAGATTTGTACCGCCGCCACCGCCTCCGCGTCGCTTGCTCCTCACGCAGACCG droYak GACTACTTCAACAATCAGCAACGCGAGCGACACTACCAGCTCCGGCGGCAGAGCCAGCGGCAGACC---GGCGAGATTTGTACCGCCTCCACCGCCTCCGCGTCGCTTGCTGCTCACGCAGACCG droEre GACTATTTCAACAATCAGCAACGCGAGCGACACTACCAGCTCCGGCGGCAGAGCCAGCGGCAGACC---GCCGAGATTTGTACCGCCGCCACCGCCTCCGCGTCGCTTGCTTCTCACGCAGACCG droAna GACTACTACAACAATCAGCAGCGGGAGCGGCACTACCAGCTCCGGCGGCAGAGCCAGCGGCAGGCCAGCGGCGAAGTTCGTCCCTCCTCCGCCGCCTCCGCGACGTTTGCTTCTCACGCAGACAG droPse GACTACTACAACAACCAGCAGCGGGAGCGACACTACGAGCTCCGGAGGCAGAGCCAGCGGCAGGCC---AGCGAGGTTTATACCACCGCCGCCGCCTCCGCGTCGCTTGCTGCTCACGCAGACCA droPer GACTACTACAACAACCAGCAGCGGGAGCGACACTACGAGCTCCGGAGGCAGAGCCAGCGGAAGGCC---AGCGAGGTTTATACCACCGCCGCCGCCTCCGCGTCGCTTGCTGCTCACGCAGACCA droWil GACTACTACAACAATCAGCAGAGGGAGCGACACTACGAGCAACGTCGCCAAAGCCAGCGGCAGGCC---AGCCAAATTTATACCACCGCCACCGCCTCCACGTCGACTGCTGCTAACGCAGACAA droMoj GACTACTACAACAACCAGCAGCGGGAGCGGCACTACCAGCTGCGCCACCAGAGCCAACGTCAAGCC---ACCGAGATTTATACCACCACCGCCGCCGCCTCGTCGTCTGCTGCTCACGCAGACAA droVir GACTACTACAACAACCAACAGCGGGAGCGGCACTACCAGCAGCGCCGCCAGAGCCAACGTCAAGCC---ACCGAGATTCATTCCACCGCCGCCGCCGCCTCGTCGTCTGCTGCTCACGCAGACAA droGri GACTACTACAACAATCAGCAGCGGGAGCGGCACTATCAACAGCGTCGCCAGAGTCATCGTCAAGCC---ACCGAGATTTATACCACCACCACCGCCACCTCGTCGTCTATTGCTCACGCAGACAA ***** * ****** ** ** * ***** ***** * * ** ** ** ** ** * ** * * ** * ** ** ** ***** ** ** ** * * ** ******** chrX:2,226,518-2,226,639 (CG14047) Frame 1 is high-scoringFrame 2 is high-scoring

Fully rejected genes: weak/no evidence New exons: existing & novel experimental evidence Need: large-scale functional annotation for novel genes Dog Mouse Rat Human 1,065 fully rejected 454 novel (2591 exons) 1,919 not aligned 7,717 refined Initial results for the whole human genome 9,862 fully confirmed

12 species 2 species Discriminative framework shows continued increase in power Reading frame conservation (RFC) score Codon substitution matrix (CSM) score  2 species3 species5 species12 species 2 species 12 species 90% 10% 30% 70%80% 95% 5% 20%

Overview Part 1. Genome interpretation  Evolutionary signatures of genes  Revisiting the human and fly genomes  Unusual gene structures Part 2. Gene regulation  Regulatory motif discovery  microRNA regulation  Enhancer identification Part 3. Genome evolution  Phylogenomics  The two forces of gene evolution  Accurate gene trees in complete genomes

Who’s actually doing the work Matt Rasmussen Phylogenomics Erez Lieberman Motif evolution Aviva Presser Network evolution Mike Lin Gene identification Alex Stark Fly motifs and miRNAs Pouya Kheradpour Human enhancers Josh Grochow Network motif discovery Ameya Deoras Spectral genomics