CS273A Lecture 17: Cross Species Comparisons

Slides:



Advertisements
Similar presentations
Evolution of genomes.
Advertisements

Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 12:
Describe the structure of a nucleosome, the basic unit of DNA packaging in eukaryotic cells.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E The Stability of the Genome Duplication, Deletion, Transposition.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
[Bejerano Fall10/11] 1 Any Project reflections?
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
[Bejerano Fall09/10] 1 Milestones due today. Anything to report?
[Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
[Bejerano Fall09/10] 1 This Friday 10am Beckman B-200 Introduction to the UCSC Browser.
[Bejerano Spr06/07] 1 TTh 11:00-12:15 in Clark S361 Profs: Serafim Batzoglou, Gill Bejerano TAs: George Asimenos, Cory McLean.
[Bejerano Fall10/11] 1.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Sequencing a genome and Basic Sequence Alignment
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
The Hunt for Chromosomal Determinants of Maleness— A gene mapping story……. The Hunt for Chromosomal Determinants of Maleness— A gene mapping story…….
[Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser.
CS273A Lecture 11: Comparative Genomics II
Lesson Overview 13.1 RNA.
Gene Regulation and Structure Grade 10 Biology Spring 2011.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Biology 10.2 Gene Regulation and Structure Gene Regulation and Structure.
Generating Diversity: how genes and genomes evolve Erin “They call me Dr. Worm” Friedman 29 September 2005.
NEW NEWS of HUMAN FROM MOUSE and CHIMP Nature 420 (6915), 5 Dec 2002 Genome Research 13(3), March 2003.
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 17:
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
The Biology and Genetic Base of Cancer. 2 (Mutation)
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
Sequencing a genome and Basic Sequence Alignment
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Copyright © 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings PowerPoint ® Lecture Presentations for Biology Eighth Edition Neil Campbell.
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Opener Chapter 24 – Genome Evolution. Comparative Genomes Powerful tool for exploring evolutionary divergence among organisms Footprints on the evolutionary.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
LECTURE PRESENTATIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry, Michael L. Cain, Steven A. Wasserman, Peter V. Minorsky, Robert.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
CS273A Lecture 15: Inferring Evolution: Chains & Nets II
Evolutionary genomics can now be applied beyond ‘model’ organisms
Wild-type hemoglobin DNA Mutant hemoglobin DNA LE Wild-type hemoglobin DNA Mutant hemoglobin DNA 3¢ 5¢ 3¢ 5¢ mRNA mRNA 5¢ 3¢ 5¢ 3¢ Normal hemoglobin.
Basics of Comparative Genomics
Genomes and Their Evolution
Very important to know the difference between the trees!
Protein Sequence Alignments
Genomes and Their Evolution
Genomes and Their Evolution
CS273A Lecture 12: Inferring Evolution: Chains & Nets
CS273A Lecture 14: Inferring Evolution: Chains & Nets
Volume 2, Issue 4, Pages (October 2012)
Relationship between Genotype and Phenotype
The Human Genome Source Code
Gene duplications: evolutionary role
Chapter 4 The Interrupted Gene.
Gene Density and Noncoding DNA
Gene expression and regulation & Mutations
Chapter 6 Clusters and Repeats.
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Reminder The AP Exam registration is open in Naviance. The Exam is on Monday, May 13. I’ll let you know when the next test/homework will be.
The Human Genome Source Code
Mutation and DNA repair
Presentation transcript:

CS273A Lecture 17: Cross Species Comparisons http://cs273a.stanford.edu [Bejerano Fall16/17]

Announcements Your project should be coming along nicely! http://cs273a.stanford.edu [Bejerano Fall16/17]

TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG Genome Evolution http://cs273a.stanford.edu [Bejerano Fall16/17] 3

Terminology Orthologs : Genes related via speciation (e.g. C,M,H3) Paralogs: Genes related through duplication (e.g. H1,H2,H3) Homologs: Genes that share a common origin (e.g. C,M,H1,H2,H3) Gene tree single ancestral gene Species tree Speciation Duplication Loss http://cs273a.stanford.edu [Bejerano Fall16/17]

Chains join together related local alignments likely ortholog likely paralogs shared domain? Protease Regulatory Subunit 3 http://cs273a.stanford.edu [Bejerano Fall16/17]

Before and After Chaining http://cs273a.stanford.edu [Bejerano Fall16/17]

Netting Alignments Commonly multiple mouse alignments can be found for a particular human region, eg including for most coding regions. Net finds best match mouse match for each human region. Highest scoring chains are used first. Lower scoring chains fill in gaps within chains inducing a natural hierarchy. http://cs273a.stanford.edu [Bejerano Fall16/17]

Net highlights rearrangements A large gap in the top level of the net is filled by an inversion containing two genes. Numerous smaller gaps are filled in by local duplications and processed pseudo-genes. http://cs273a.stanford.edu [Bejerano Fall16/17]

Nets attempt to computationally capture orthologs (they also hide everything else) http://cs273a.stanford.edu [Bejerano Fall16/17]

Nets/chains can reveal retrogenes (and when they jumped in!) http://cs273a.stanford.edu [Bejerano Fall16/17]

Nets a net is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, for several levels. a net is single-coverage for target but not for query. because it's single-coverage in the target, it's no longer symmetrical. the netter has two outputs, one of which we usually ignore: the target-centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocal-best nets are symmetrical again. nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level. GB: for human inspection always prefer looking at the chains! [Angie Hinrichs, UCSC wiki] http://cs273a.stanford.edu [Bejerano Fall16/17]

Before and After Netting http://cs273a.stanford.edu [Bejerano Fall16/17]

Convert / LiftOver "LiftOver chains" are actually chains extracted from nets, or chains filtered by the netting process. LiftOver – batch utility http://cs273a.stanford.edu [Bejerano Fall16/17]

Drawbacks Chains Nets Inversions not handled optimally > > > > chr1 > > > > > > > chr1 > > > < < < < chr5 < < < < < < < < chr1 < < < < Nets > > > > chr1 > > > > > > > chr1 > > > < < < < chr5 < < < < http://cs273a.stanford.edu [Bejerano Fall16/17]

What nets can’t show, but chains will http://cs273a.stanford.edu [Bejerano Fall16/17]

Same Region… same in all the other fish http://cs273a.stanford.edu [Bejerano Fall16/17]

Drawbacks High copy number genes can break orthology

Gene Families

Self Chain reveals (some) paralogs (self net is meaningless) http://cs273a.stanford.edu [Bejerano Fall16/17]

The Biggest Challenge in Genomics… … is computational: How does this encode this Program Output http://cs273a.stanford.edu [Bejerano Fall16/17]

Xkcd Take – It’s Actually Not That Bad http://cs273a.stanford.edu [Bejerano Fall16/17]

Why compare to Chimp? http://cs273a.stanford.edu [Bejerano Fall16/17]

Humans and Chimpanzees Possess Many Vastly Different Phenotypes A: Chimp B: Human A B Going along with the genomic differences, there are a large number of phenotypic differences between human and chimpanzee also. A somewhat comprehensive list is shown here. A few examples many people are familiar with are: A B 23 [Varki, A. and Altheide, T., Genome Res., 2005] 23

Disease Susceptibility Differences http://cs273a.stanford.edu [Bejerano Fall16/17]

What human-chimp changes do we find? Small Large Medium http://cs273a.stanford.edu [Bejerano Fall16/17]

Large differences Fusion (HSA 2) 18 pericentromeric inversions http://cs273a.stanford.edu [Bejerano Fall16/17]

Medium Sized Differences Gene families expand and contract Mobile element insertion and mediated deletion http://cs273a.stanford.edu [Bejerano Fall16/17]

Small Differences 1% difference at the base level http://cs273a.stanford.edu [Bejerano Fall16/17]

Genetic basis of human phenotypes? Genotype Phenotype Number of rearrangements Most mutations are near/neutral. How do we know? 4D sites, ARs. http://cs273a.stanford.edu [Bejerano Fall16/17] 29

The Genotype - Phenotype divide Can we find evolutionary patterns that are distinct enough to be phenotypically revealing? Problem #1: Too many nucleotide changes between any pair of related species (or individuals). The vast majority of these are near/neutral. Species A Species B http://cs273a.stanford.edu [Bejerano Fall16/17]

Is it in our protein coding genes? 70-80% of all human-chimp orthologous proteins differ. On average they differ by 1-2 amino acids. Which amino acid changes matter? One can also compare non-synonymous amino acid substitutions with synonymous changes, and look for proteins unusually enriched from the former. Those may be evolving under positive selection. http://cs273a.stanford.edu [Bejerano Fall16/17]

Positive and negative gene selection in the human genome http://cs273a.stanford.edu [Bejerano Fall16/17]

Candidate genes for human specific evolution ... http://cs273a.stanford.edu [Bejerano Fall16/17]

What if we did an unbiased search? Human-specific substitutions in conserved sequences rapid change Human Chimp Chimp Human conserved HAR1: Novel ncRNA 18 unique human substitutions HAR1 expressed in Cajal-Retzius neurons at border of marginal zone [Pollard, K. et al., Nature, 2006] [Beniaminov, A. et al., RNA, 2008] 34 34

Different Unbiased Search: Loss vs Gain Human Accelerated Regions rapid change Human 4-18 unique human substitutions Pollard, K. et al., Nature, 2006 Prabhakar, S. et al., Science, 2008 Chimp conserved Human Conserved Sequence Deletions (hCONDELs) deleted! Human HAR1 expressed in Cajal-Retzius neurons at border of marginal zone Complete human loss of sequence Likely to confer human-specific phenotypes Chimp [McLean, Reno, Pollen et al., Nature, 2011] conserved http://cs273a.stanford.edu [Bejerano Fall16/17] 35

Identifying hCONDELs deleted! conserved Human Chimp http://cs273a.stanford.edu [Bejerano Fall16/17]

hCONDEL genomic distribution Median size: 2.8kb Not enriched in highly variable genomic regions Most do not disrupt proteins: only 1 validated exonic deletion http://cs273a.stanford.edu [Bejerano Fall16/17]

Deletions of functional non-coding DNA Gene Gene Gene ( ) ( ) ( ) Gene Gene Gene ( ) Gene Gene ( ) ( ) ( ) ( ) Gene Gene ( ) ( ) Gene with function e.g. “neuronal gene” Gene without function hCONDEL Conserved element ( ) Gene Gene http://great.stanford.edu http://cs273a.stanford.edu [Bejerano Fall16/17] [McLean et al., Nat. Biotechnol., 2010]

Functional enrichments of hCONDELs Ontology Term p-value Gene Ontology Steroid hormone receptor activity 3.73 x 10-4 InterPro Fibronectin, type III 1.01 x 10-4 Zinc finger, nuclear hormone receptor type 1.80 x 10-4 CD80-like, immunoglobulin C2 set 1.37 x 10-3 Entrez Gene Neuronal genes 1.11 x 10-4 Monoallelically-Expressed Genes Monoallelic expression 8.62 x 10-3 GO has also ligand-dependent nuclear receptor activity at p-value 7.83 x 10-4. The single false positive threshold is 8.00 x 10-4. For ½ a false positive the threshold is 3.19 x 10-4. InterPro is more lenient, for a single false positive the threshold is 2.09 x 10-3, and for ½ a false positive the threshold is 1.07 x 10-3. The empirical p-value for neuronal genes is 0.003. The empirical p-value for monoallelic genes is 0.011. These enrichments are unique to hCONDELs http://cs273a.stanford.edu [Bejerano Fall16/17] http://great.stanford.edu 39

hCONDEL near Androgen Receptor The deletion appears fixed in humans and appears deleted in Neandertal. http://cs273a.stanford.edu [Bejerano Fall16/17] 40

Androgen Receptor chimpanzee enhancer assay Human Chimp Genomic fragment Hsp68 promoter LacZ reporter gene From Genomic Chimp DNA… To investigate the phenotypic consequence of loss of particular enhancers, we PCR amplified the enhancer region from chimpanzee genomic DNA and inserted into a LacZ expression vector driven by an heat shock promotor. These vectors were then injected into mouse embryos and in which the localized expression territories driven by these enhancers could be monitored. http://cs273a.stanford.edu [Bejerano Fall16/17] [Phil Reno, David Kingsley] 41

The human deletion near AR acts as an enhancer within known AR expression domains Genital tubercle Sensory whiskers Chimp enhancer E16.5 E16.5 E16.5 Penile spines From Genomic Chimp DNA… To investigate the phenotypic consequence of loss of particular enhancers, we PCR amplified the enhancer region from chimpanzee genomic DNA and inserted into a LacZ expression vector driven by an heat shock promotor. These vectors were then injected into mouse embryos and in which the localized expression territories driven by these enhancers could be monitored. Mouse enhancer E16.5 E16.5 8 weeks http://cs273a.stanford.edu [Bejerano Fall16/17] [Phil Reno, David Kingsley] 42

Androgen Receptor Human Chimp Transcription factor Testosterone Nucleus AR+T dimer Androgen Receptor Transcription factor Activated by binding androgens (testosterone or its metabolite dihydrotestosterone) Induces a conformational change Androgen Receptor activity assayed by addition/removal of testosterone Androgen Receptor Cell Human Chimp http://cs273a.stanford.edu [Bejerano Fall16/17]

Androgen responsiveness in domains of expression Sensory whiskers Penile spines Sensory whisker length (mm) Sensory Penile whiskers spines Don’t say enhancer rot… Clear deletion that functions as an enhancer in these structures, but that does not exclude the role of other genes in the loss of these structures… We hypothesize that this genomic deletion can lead to the loss of these morphological structures Chimp Human picture To summarize: -Deletions of conserved non-coding sequences have occurred during human evolution - Some of these deletions appear to have regulatory effects genes that affect mammalian development - These changes can be associated with morphological transitions that have occurred during human evolution, especially in the case of the androgen receptor enhancer where expression is found to occur in structures lost in humans. Mice with Ar coding region mutations lack penile spines Galago [Ibrahim & Wright 1983] [Dixson, 1976] [Murakami, 1987] http://cs273a.stanford.edu [Bejerano Fall16/17] 44

Could sequence loss lead to tissue gain? hCONDELs enriched for suppressors of cell proliferation or cell migration expressed in cortex (P=1.3 x 10-3) Non-human mammals Humans Do not suppress proliferation Suppress proliferation ( ) http://cs273a.stanford.edu [Bejerano Fall16/17]

The Genotype - Phenotype divide Can we find evolutionary patterns that are distinct enough to be phenotypically revealing? Problem #1: Too many nucleotide changes between any pair of related species (or individuals). The vast majority of these are near/neutral. Species A Species B http://cs273a.stanford.edu [Bejerano Fall16/17]

Genotype -> Phenotype screens Define a “dramatic” (non-neutral) genomic scenario: deleted! Human Chimp conserved Problem #2: What is the phenotype? hCONDEL [McLean, Pollen, Reno et al, 2011] http://cs273a.stanford.edu [Bejerano Fall16/17]

Testing is Exciting… and Humbling These are “wild rides”: Often not what we expected, Often not what we can understand. Are we looking at the right place? Did we test at the right time? We are creating the humanized mice KOs [McLean, Pollen, Reno et al, 2011] http://cs273a.stanford.edu [Bejerano Fall16/17]

What about a tree of related species? What if we could find evolutionary patterns that were distinct enough to be phenotypically revealing? Species A Species B . Genomes: Inherited and Modified. Traits: Come and Go. ancestor Species H http://cs273a.stanford.edu [Bejerano Fall16/17]

What happens when an ancestral trait “goes”? ancestral trait information ancestor Trait information is no longer under selection Phenotype Genome Erodes away over evolutionary time http://cs273a.stanford.edu [Bejerano Fall16/17]

A lot of DNA and many traits vary between any two species. ancestral trait information A lot of DNA and many traits vary between any two species. ancestor Trait information is no longer under selection Phenotype Genome Erodes away over evolutionary time http://cs273a.stanford.edu [Bejerano Fall16/17]

A lot of DNA and many traits vary between any two species. ancestral trait information A lot of DNA and many traits vary between any two species. What about independent trait loss? vitamin C synthesis, tail, body hair, dentition features, etc. etc. ancestor Trait information is no longer under selection Phenotype Genome Erodes away over evolutionary time http://cs273a.stanford.edu [Bejerano Fall16/17]

Phenotype Genome ancestral trait information ancestor Trait information is no longer under selection Phenotype Genome Erodes away over evolutionary time http://cs273a.stanford.edu [Bejerano Fall16/17]

matches trait presence/absence pattern The PG screen      matches trait presence/absence pattern http://cs273a.stanford.edu [Bejerano Fall16/17] [Hiller et al., 2012a]

The PG screen Capture the independent genomic switch from purifying selection  neutral evolution in all and only the trait loss species. Robust to: Different trait disabling times. Different trait disabling mutations. http://cs273a.stanford.edu [Bejerano Fall16/17]

Branding ;-) But does it work? Forward Genetics: phenotype genotype Forward Genetics: Search for mutations that segregate with a trait of interest Forward Genomics: Search for regions that are lost only in species lacking the trait But does it work? http://cs273a.stanford.edu [Bejerano Fall16/17]

Vitamin C Synthesis human rats & mice synthesize vitamin C cannot synthesize vitamin C http://cs273a.stanford.edu [Bejerano Fall16/17]

The Vitamin C synthesis “phenotree” vitamin C synthesis was lost 3-4 times independently in mammalian evolution Fwd Genomics asks: Do one or more genomic loci look like THAT? http://cs273a.stanford.edu [Bejerano Fall16/17]

Insertion in species 1 or We quantify divergence by comparing sequences to the reconstructed ancestral sequence Mutation in species 1 or 2? Insertion in species 1 or deletion in species 2 ? reconstruct ancestral sequence species 1 ACCCTATCGATTGCA TCCGTATCG-TT-CA ACTCT-TCGATT-AA species 2 outgroup ancestor ACCCTATCGATT-CA species 1 ACCCTATCGATTGCA TCCGTATCG-TT-CA 14 identical bases species 2 11 identical bases percent of identical bases: species 1 93% species 2 79%  more diverged

Sequencing errors mimic divergence ancestor ACCCTATCGATT-CAATGG species 1 ACCCTATCGATTGCAAGGG 89% identical bases species 2 TCCGTAACG--T-CTATCG 61% identical bases sequence quality scores high sequencing error rate  treat species 2 as missing data

Assembly gaps mimic divergence Sanger reads assembly gap ????????? species 1 species 2 species 3 species 4 species 5 conserved region  treat species 1 as missing data

... matrix: 33 species x 544,549 regions Reconstruct the evolutionary history of all conserved regions, coding and non-coding 544,549 conserved regions 93% 70% reconstruct ancestral locus 85% ... matrix: 33 species x 544,549 regions Reconstruct ancestral sequence Measure extant species divergence Avoid Low quality sequence Assembly gaps Seek perfect phenotree match http://cs273a.stanford.edu [Bejerano Fall16/17]

We quantify the match to the vitamin C pattern by counting the number of species that violate the pattern Percent identity Percent identity 100 100      1 violation 2 violations http://cs273a.stanford.edu [Bejerano Fall16/17]

Regions matching the vitamin C trait are clustered perfect match 544,549 conserved regions 1 2 3 4 no. of violating species 5 6 7 8 9 10 no match  these conserved regions are all exons of a single gene http://cs273a.stanford.edu [Bejerano Fall16/17]

This gene is more diverged in all non-vitamin C synthesizing species http://cs273a.stanford.edu [Bejerano Fall16/17]

What is the function of this gene ? 33 genomes X 544,549 regions Vitamin C pattern Gulo - gulonolactone (L-) oxidase encodes the enzyme responsible for vitamin C biosynthesis Note: No likely shared disabling mutation. We learned about both evolution and function. http://cs273a.stanford.edu [Bejerano Fall16/17]

The Power of Forward Genomics 33 genomes X 544,549 regions Vitamin C pattern Gulo - gulonolactone (L-) oxidase Forward genomics works. Can it work for continuous traits? With only two independent losses? And many unknown values? http://cs273a.stanford.edu [Bejerano Fall16/17]

Bile Bile is a fluid produced by the liver that aids the digestion of lipids in the small intestine. http://cs273a.stanford.edu [Bejerano Fall16/17]

Bile Phospholipids Different mammals have remarkably different levels of biliary phospholipids: http://cs273a.stanford.edu [Bejerano Fall16/17]

ABCB4 is a phospholipid transporter http://cs273a.stanford.edu [Bejerano Fall16/17]

Find “Cure” Models for Human Disease Human ABCB4 mutations lower patient biliary phospholipid levels to guinea pig levels but are detrimental. Our discovery: Guinea pig and horse have inactivated the Abcb4 gene in their natural state. How can they do it? create KO gene Natural KO try to fix/treat find nature’s cure! http://cs273a.stanford.edu [Bejerano Fall16/17]

Reverse Genomics Reverse Genetics: phenotype genotype Reverse Genetics: Pick interesting loci, mutate and try to figure out phenotype/s Reverse Genomics: Compute independent loss for ALL genomic loci, match to traits We have now collected Million genomic loci by Fifty mammals Thousands of scored mammalian traits And we are playing MATCH and TEST. http://cs273a.stanford.edu [Bejerano Fall16/17]

Reverse Genomics of Enhancers http://cs273a.stanford.edu [Bejerano Fall16/17]

Back of an Envelope Wish http://cs273a.stanford.edu [Bejerano Fall16/17]

Poster Child Example http://cs273a.stanford.edu [Bejerano Fall16/17]

http://cs273a.stanford.edu [Bejerano Fall16/17]