The Human Genome Source Code

Slides:



Advertisements
Similar presentations
Transcriptional-level control (10) Researchers use the following techniques to find DNA sequences involved in regulation: – Deletion mapping – DNA footprinting.
Advertisements

[Bejerano Fall10/11] 1 Thank you for the midterm feedback! Projects will be assigned shortly.
[Bejerano Spr06/07] 1 TTh 11:00-12:15 in Clark S361 Profs: Serafim Batzoglou, Gill Bejerano TAs: George Asimenos, Cory McLean.
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
[Bejerano Fall09/10] 1 Thank you for the midterm feedback!
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger.
[Bejerano Fall10/11] 1 Primer, Friday 10am, Beckman B-302 Ex. 1 is coming.
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
Mutation and Miscellany
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 8:
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
Regulation of Gene Expression
Regulation of Gene Expression Eukaryotes
Regulation of Gene Expression Chapter 18. Warm Up Explain the difference between a missense and a nonsense mutation. What is a silent mutation? QUIZ TOMORROW:
Eukaryotic Genome & Gene Regulation The entire genome of the eukaryotic organism is present in every cell of the organism. Although all genes are present,
Copyright © 2009 Pearson Education, Inc. Regulation of Gene Expression in Eukaryotes Chapter 17 Lecture Concepts of Genetics Tenth Edition.
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 10:
Control of Eukaryotic Genome
[BejeranoFall15/16] 1 MW 1:30-2:50pm in Clark S361* (behind Peet’s) Profs: Serafim Batzoglou & Gill Bejerano CAs: Karthik Jagadeesh.
Chapter8 Gene Regulation & Mutations. Gene Regulation All cells have all the instructions in their DNA to make all proteins, but they only use the sections.
CS173 Lecture 9: Transcriptional regulation III
[BejeranoFall15/16] 1 MW 1:30-2:50pm in Clark S361* (behind Peet’s) Profs: Serafim Batzoglou & Gill Bejerano CAs: Karthik Jagadeesh.
Welcome  In your journal write a paragraph explain what is a gene and what is gene expression?  Notes on Gene Expression Regulation  Quiz over.
Molecules and mechanisms of epigenetics. Adult stem cells know their fate! For example: myoblasts can form muscle cells only. Hematopoetic cells only.
Aim: How is DNA organized in a eukaryotic cell?. Why is the control of gene expression more complex in eukaryotes than prokaryotes ? Eukaryotes have:
Integrative Genomics. Double-helix DNA strands are separated in the gene coding region Which enzyme detects the beginning of a gene ? RNA Polymerase (multi-subunit.
Gene Regulation, Part 2 Lecture 15 (cont.) Fall 2008.
Gene Expression: Prokaryotes and Eukaryotes AP Biology Ch 18.
High-throughput data used in bioinformatics
Eukaryotic Gene Regulation
Thursday, March 2, 2017 GOALS: Finish Ghost in your Genes
Regulation of Gene Expression
CS273A Lecture 6: Gene Regulation II MW 12:50-2:05pm in Beckman B100
Controlling the genes Lecture 15 pp
7.2 Transcription & gene expression
Genetics and Evolutionary Biology
CS273A Lecture 12: repetitive elements II
Eukaryotic Genome & Gene Regulation
Gene Expression.
Regulation of Gene Expression
Regulation of Gene Expression
Regulation of Gene Expression
Chapter 15 Controls over Genes.
Control of Gene Expression
Last Week’s Reading Assignments
Chromatin Regulation September 20, 2017.
Introduction to Genetic Analysis
Regulation of Gene Expression by Eukaryotes
CS273A Lecture 7: Neutral evolution: repetitive elements
Molecular Mechanisms of Gene Regulation
Concept 18.2: Eukaryotic gene expression can be regulated at any stage
Gene Regulation.
Controlling Chromatin Structure
CS273A Lecture 9: Gene Regulation II
Regulation of Gene Expression
CS273A Lecture 10: Transcription Regulation III, Neutral evolution: repetitive elements [Bejerano Fall16/17]
Today: Regulating Gene Expression.
Regulation of Gene Expression
Epigenetics Study of the modifications to genes which do not involve changing the underlying DNA
Review Warm-Up What is the Central Dogma?
7.2 Transcription & Gene Expression
Review Warm-Up What is the Central Dogma?
Review Warm-Up What is the Central Dogma?
Unit III Information Essential to Life Processes
Adam C. Wilkinson, Hiromitsu Nakauchi, Berthold Göttgens  Cell Systems 
The Human Genome Source Code
The Human Genome Source Code
Eukaryotic Gene Regulation
Chromatin modifications
The Human Genome Source Code
Presentation transcript:

The Human Genome Source Code CS273A The Human Genome Source Code Gill Lecture 10: Gene Regulation III, Repeats I TTh  1:30-2:50pm, mostly Always M106* Prof: Gill Bejerano CAs: Boyoung (Bo) Yoo & Yatish Turakhia * Track class on Piazza http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Announcements None from us… http://cs273a.stanford.edu [Bejerano Winter 2018/19]

TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG Genome Content http://cs273a.stanford.edu [Bejerano Winter 2018/19] 3

Transcription & its regulation happen in open chromatin http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Nucleosomes, Histones, Transcription Chromatin / Proteins Genome packaging provides a critical layer of gene regulation. DNA / Proteins http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Gene Activation / Repression via Chromatin Remodeling A dedicated machinery opens and closes chromatin. Interactions with this machinery turn genes and/or gene regulatory regions like enhancers and repressors on or off (by making the genomic DNA in/accessible) http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Epigenomics The histone code http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Histone Tails, Histone Marks DNA is wrapped around nucleosomes. Nucleosomes are made of histones. Histones have free tails. Residues in the tails are modified in specific patterns in conjunction with specific gene regulation activity. http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Histone Mark Correlation Examples Active gene promoters are marked by H3K4me3 Silenced gene promoters are marked by H3K27me3 p300, a protein component of many active enhancers acetylates H3k27Ac. http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Measuring these different states Note that the DNA itself doesn’t change. We sequence different portions of it that are currently in different states (bound by a TF, wrapped around a nucleosome etc.) http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Epigenomics: study all these marks genomewide Translate observations into current genome state. http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Histone Code Hypothesis Histone modifications serve to recruit other proteins by specific recognition of the modified histone via protein domains specialized for such purposes, rather than through simply stabilizing or destabilizing the interaction between histone and the underlying DNA. histone modification: writer eraser … reader http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Epigenomics is not Epigenetics Epigenetics is the study of heritable changes in gene expression or cellular phenotype, caused by mechanisms other than changes in the underlying DNA sequence There are objections to the use of the term epigenetic to describe chemical modification of histone, since it remains unknown whether or not these modifications are heritable. http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Gene Regulation Chromatin / Proteins Extracellular signals DNA / Proteins http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Cis-Regulatory Components Low level (“atoms”): Promoter motifs (TATA box, etc) Transcription factor binding sites (TFBS) Mid Level: Promoter Enhancers Repressors/silencers Insulators/boundary elements Locus control regions High Level: Epigenomic domains / signatures Gene expression domains Gene regulatory networks http://cs273a.stanford.edu [Bejerano Winter 2018/19]

If you only measure gene expression It’s like only seeing the values change in RAM as a program is running. http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Inferring Gene Expression Causality Measuring gene expression over time provides sets of genes that change their expression in synchrony. But who regulates whom? Some of the necessary regulators may not change their expression level when measured, and yet be essential. “Reading” enhancers can provide gene regulatory logic: If present(TF A, TF B, TF C) then turn on nearby gene X http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Obtain a network of all active genes & DNA “Ridicilogram” Now what? (to be revisited) http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Gene Regulation is in Data Deluge mode “Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom.” http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Transcription Factors have Large “fan outs” We could have had one TF regulate two TFS, each of which regulates two other TFs, etc. and each of those contributing to the regulation of a modest number of target genes (that do the real work). Instead TFs reproducibly bind to thousands of genomic locations almost anywhere we’ve looked. Gene regulation forms a dense network. pathway genes TFs http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Some important genes have large “fan ins” http://cs273a.stanford.edu [Bejerano Winter 2018/19]

We are technically DONE with genome function Biology – not that complicated!! Functional part list In our genome: Gene Protein coding Non coding / RNA genes Gene regulatory elements “Atomic” event: transcription factor binding site Build up: promoters, enhancers, silencers, gene reg. domain “Around” our genome Chromatin – open / closed Epigenomic (and some epigenetic) marks http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Let’s quiz Gill on the board… A Quiz! Let’s quiz Gill on the board… http://cs273a.stanford.edu [Bejerano Winter 2018/19]

So Why This?? To be continued http://cs273a.stanford.edu [Bejerano Winter 2018/19]

The Functional Genome Type # in genome genes 20,000 ncRNA cis elements 1,000,000 http://cs273a.stanford.edu [Bejerano Winter 2018/19]

The Functional Genome Corollary: most of the genome is devoid of function (which we understand) Type # in genome % of genome genes 20,000 2-3% ncRNA 2% cis elements 1,000,000 10-15% http://cs273a.stanford.edu [Bejerano Winter 2018/19]

TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG Genome Evolution http://cs273a.stanford.edu [Bejerano Winter 2018/19] 27

“Nothing in Biology Makes Sense Except in the Light of Evolution” Theodosius Dobzhansky http://cs273a.stanford.edu [Bejerano Winter 2018/19]

One Cell, One Genome, One Replication Every cell holds a copy of all its DNA = its genome. The human body is made of ~1013 cells. All originate from a single cell through repeated cell divisions. egg DNA strings = Chromosomes egg cell cell division genome = all DNA chicken egg chicken ≈ 1013 copies (DNA) of egg (DNA) http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Every Genome is Different DNA Replication is imperfect – between individuals of the same species, even between the cells of an individual. junk functional ...ACGTACGACTGACTAGCATCGACTACGA... chicken TT CAT egg ...ACGTACGACTGACTAGCATCGACTACGA... “anything goes” many changes are not tolerated chicken This has bad implications – disease, and good implications – adaptation. http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Human Mutation Rate Recent sequencing analysis suggests ~40-60 new mutations in a child that were not present in either parent. Mutations range from the smallest possible (single base pair change) to the largest – whole genome duplication (to be discussed). Selection does not tolerate all of these mutation, but it sure does tolerate some. chicken egg chicken http://cs273a.stanford.edu [Bejerano Winter 2018/19]

TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG Genome Content http://cs273a.stanford.edu [Bejerano Winter 2018/19] 32

Why this cartoon? http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Genome Composition The functional genome takes about 20% of the genome. The remaining 80% is far from homogeneous… http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Sequences that repeat many times in the genome Take up cumulatively a whooping half of the genome Come in two major, very different, flavors I II http://cs273a.stanford.edu [Bejerano Winter 2018/19]

I. Interspersed Repeats / TEs [Adapted from Lunter] http://cs273a.stanford.edu [Bejerano Winter 2018/19]

I. Interspersed Repeats / TEs [Adapted from Lunter] http://cs273a.stanford.edu [Bejerano Winter 2018/19]

I. Interspersed Repeats / TEs [Adapted from Lunter] http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Classes of Interspersed Repeats http://cs273a.stanford.edu [Bejerano Winter 2018/19]

LINE & SINE Elements http://cs273a.stanford.edu [Bejerano Winter 2018/19]

LINE & SINE Elements http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Genomic Transmission For repeat copies to accumulate through human generations they must make it into the germline cells (eggs & sperms). Equally true for any genomic mutation. egg DNA strings = Chromosomes egg cell cell division genome = all DNA chicken egg chicken ≈ 1013 copies (DNA) of egg (DNA) http://cs273a.stanford.edu [Bejerano Winter 2018/19]

DNA Transposons http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Retrovirus-like Elements http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Every Genome is Different DNA Replication is imperfect – between individuals of the same species, even between the cells of an individual. junk functional ...ACGTACGACTGACTAGCATCGACTACGA... chicken TT CAT egg ...ACGTACGACTGACTAGCATCGACTACGA... “anything goes” many changes are not tolerated chicken This has bad implications – disease, and good implications – adaptation. http://cs273a.stanford.edu [Bejerano Winter 2018/19]

TE composition and assortment vary among eukaryotic genomes 100% 80% 60% DNA transposons 40% LTR Retro. Non-LTR Retro. 20% Rice Fugu Mouse Human Slime mold Neurospora Arabidopsis Nematode Mosquito Drosophila Budding yeast Fission yeast http://cs273a.stanford.edu [Bejerano Fall09/10] Feschotte & Pritham 2006

Repeats: mostly neutral Most repeat events/instances are neutral. Ie, a repeat instance is dropped in a new place, and joins the rest of the neutral DNA, gradually decaying over time. Many repeat copies are “dead as a duck” on arrival at their new location (eg 5’ truncation). Some instances may be active (spawn new instances) for a while, but when an active copy is hit by a mutation – the host is not affected, the instance is inactivated and decays away. http://cs273a.stanford.edu [Bejerano Winter 2018/19]

Repeat Ages http://cs273a.stanford.edu [Bejerano Winter 2018/19]

INTERSPECIES VARIATION IN GENOME SIZE WITHIN VARIOUS GROUPS OF ORGANISMS Figure from Ryan Gregory (2005)

The amount of TE correlate positively with genome size Mb Genomic DNA 3000 2500 TE DNA 2000 Protein-coding DNA 1500 1000 500 Rice Plasmodium Slime mold Brassica Maize Mosquito Neurospora Arabidopsis Sea squirt Nematode Drosophila Zebrafish Fugu Budding yeast Fission yeast Mouse Human http://cs273a.stanford.edu [Bejerano Fall09/10] Feschotte & Pritham 2006

The proportion of protein-coding genes decreases with genome size, while the proportion of TEs increases with genome size TEs Protein-coding genes Gregory, Nat Rev Genet 2005