Shai Carmi, Erez Levanon Bar-Ilan University

Slides:



Advertisements
Similar presentations
Genomics – The Language of DNA Honors Genetics 2006.
Advertisements

Introduction to genomes & genome browsers
12.1 DNA Griffith – Questioned how bacteria made people sick/ pneumonia – Smooth strains caused, harmless strains rough – Heat killed; however, heat killed.
Retroviruses And retroposons
Retroviruses and Retroposons Chapter Introduction Figure 22.1.
DNA, RNA, and the Flow of Genetic Information. Nucleic Acid Structure What structural features do DNA and RNA share? What structural features do DNA and.
1 Alternative Splicing. 2 Eukaryotic genes Splicing Mature mRNA.
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
Junk DNA and DNA editing Shai Carmi Bar-Ilan, BU מוצ " ש י " ג אייר 17/05/2008.
Basic Biology for CS262 OMKAR DESHPANDE (TA) Overview Structures of biomolecules How does DNA function? What is a gene? How are genes regulated?
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
DNA, RNA, and Protein Section Objectives: By the end of this section of notes your should be able to: Relate the concept of the gene to the sequence of.
Chapter 17 From Gene to Protein. Gene Expression The process by which DNA directs the synthesis of proteins 2 stages: transcription and translation Detailed.
CHAPTER 3 GENE EXPRESSION IN EUKARYOTES (cont.) MISS NUR SHALENA SOFIAN.
Transcription: Synthesizing RNA from DNA
FROM GENE TO PROTEIN: TRANSCRIPTION & RNA PROCESSING Chapter 17.
How Proteins are Made. I. Decoding the Information in DNA A. Gene – sequence of DNA nucleotides within section of a chromosome that contain instructions.
Gene Expression Chapter 13.
Eukaryotic Gene Expression The “More Complex” Genome.
Human Genetics The Human Genome 1.
© 2012 Pearson Education, Inc. Lecture by Edward J. Zalisko PowerPoint Lectures for Campbell Biology: Concepts & Connections, Seventh Edition Reece, Taylor,
AP Biology From Gene to Protein How Genes Work AP Biology What do genes code for? proteinscellsbodies How does DNA code for cells & bodies?  how are.
Expression of the Genome The transcriptome. Decoding the Genetic Information  Information encoded in nucleotide sequences contained in discrete units.
Chapter 21 Eukaryotic Genome Sequences
From Gene To Protein Chapter 17. From Gene to Protein The “Central Dogma of Molecular Biology” is DNA  RNA  protein Meaning that our DNA codes our RNA.
AP Biology From Gene to Protein How Genes Work.
12-3 RNA and Protein Synthesis
Today… Genome 351, 12 April 2013, Lecture 4 mRNA splicing Promoter recognition Transcriptional regulation Mitosis: how the genetic material is partitioned.
Protein Synthesis. Transcription DNA  mRNA Occurs in the nucleus Translation mRNA  tRNA  AA Occurs at the ribosome.
Gene Regulations and Mutations
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
Transcription. Recall: What is the Central Dogma of molecular genetics?
CHAPTER 13 RNA and Protein Synthesis. Differences between DNA and RNA  Sugar = Deoxyribose  Double stranded  Bases  Cytosine  Guanine  Adenine 
Ch Gene  Protein A gene is a sequence of nucleotides that code for a polypeptide (protein) Hundreds-thousands of genes are on a typical chromosome.
The Central Dogma of Molecular Biology replication transcription translation.
Cells use information in genes to build several thousands of different proteins, each with a unique function. But not all proteins are required by the.
DNA What is the Function of DNA?. Nucleic Acids : Vocab Translation page 183Translation Transcription Protein Synthesis RNA DNA Complementary Introns.
Question of the DAY Jan 14 During DNA Replication, a template strand is also known as a During DNA Replication, a template strand is also known as a A.
Translation- taking the message of DNA and converting it into an amino acid sequence.
Retroviruses and Trans(retro)posons
TRANSCRIPTION (DNA → mRNA). Fig. 17-7a-2 Promoter Transcription unit DNA Start point RNA polymerase Initiation RNA transcript 5 5 Unwound.
Transcription and RNA processing Fall, Transcription Outline Notes RNA Polymerase Structures Subunits Template versus coding strands Polymerase.
RNA & Protein Synthesis
Unit 7 Review DNA, Protein Synthesis, Mutations. Hershey and Chase DNA is the hereditary material.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Protein Synthesis. One Gene – One Enzyme Protein Synthesis.
Gene Structure and Regulation. Gene Expression The expression of genetic information is one of the fundamental activities of all cells. Instruction stored.
Schematic representation of mechanisms of activation of a host gene by insertion of a provirus and the general structure of leukemia and leukosis and acute.
From Gene to Protein: Transcription & RNA Processing
Biochemistry Free For All
Merja Oja, Jaakko Peltonen, Sami Kaski University of Helsinki and
Fig Prokaryotes and Eukaryotes
Transposable Elements
Transcription: DNA  mRNA
SGN23 The Organization of the Human Genome
Protein Synthesis Genetics.
Evolution of eukaryote genomes
From Gene to Protein: Transcription & RNA Processing
Genome organization and Bioinformatics
How Proteins are Made.
Chapter 17 Hon. Adv. Biology Notes 12/01/06
Gene Density and Noncoding DNA
Transcription and Translation
mRNA Degradation and Translation Control
The Structure of the Genome
credit: modification of work by NIH
Unit Genomic sequencing
2/22/12 Objective: Recognize the central dogma of genetics Describe the process of transcription Describe the structure of messenger RNA Warm-Up:
Presentation transcript:

Shai Carmi, Erez Levanon Bar-Ilan University Large scale DNA editing of retrotransposons accelerates mammalian genome evolution Shai Carmi, Erez Levanon Bar-Ilan University 2010

What’s in the genome? Protein coding sequences are only 2% of the human genome. Lots of other stuff: introns, promoters, enhancers, telomeres, rRNA, tRNA, miRNA, snRNA,… Complexity is determined by non-coding DNA (all animals have few tens of thousands of genes).

Mobile elements Mobile elements comprise half of the human genome. Pieces of 100-10k base pairs moving around the genome in a cut&paste or copy&paste mechanism. Retrotransposons (RTs): ancient retroviruses. Retroviral replication: Viral RNA reverse transcribed DNA integrated into the genome RNA transcribed Proteins translated A new virus assembled!

Retrotransposons Transcription: genomic DNA→RNA. Translation: viral RNA → proteins (optional). Reverse transcription: viral RNA → DNA. Insertion into new genomic locations.

The effect of retrotransposons Mutations, genetic disorders. BUT, A reservoir of sequences for genetic innovation. Rewiring of gene regulation networks. Accumulation of mutations and other mechanisms inhibit most RTs.

DNA Editing of retroviruses

DNA Editing of the genome Genome (DNA) 5’ RT G G G 3’ 5’ A A A 3’ RT 3’ RT C C C 5’ 3’ RT T T T 5’ Transcription RNA 5’ RT G G G 3’ Integration into a different locus, with G→A mutations. Reverse transcription RNA 5’ G RT G G 3’ DNA 3’ C RT C C 5’ Digestion of RNA strand DNA 3’ RT C C C 5’ How often has this happened? Editing DNA 3’ U RT U U 5’ Synthesis of second DNA strand DNA 5’ A RT A A 3’ DNA 3’ U RT U U 5’

An algorithm Extract all retrotransposons (of a given family). Align pairwise using BLAST. Search for high quality alignments with G→A clusters.

An algorithm Define the transition probability: p=[#(C-to-T)+#(T-to-C)] / (2*alignment_length). k- cluster length, n- sequence length.

An algorithm Define the transition probability: p=[#(C-to-T)+#(T-to-C)] / (2*alignment_length). k- cluster length, n- sequence length. How many clusters do we expect by chance? Use p=[#(G→A)+#(A→G)] / (2*alignment_length), and search for clusters of C→T! Editing is strand-specific, and we align only positive strands. True DNA editing will show no C→T clusters.

The results Retrotranspos on family Total no. of elements in family No. of edited elements- high confidence No. of edited nucleotides- high confidence No. of edited elements- low confidence No. of edited nucleotides- low confidence Mouse IAP 26504 195 3539 446 7144 Mouse MusD 12147 22 563 125 1418 Mouse LINE1 884320 1602 28876 6542 92248 Human HERV 18593 21 528 284 2938 Human LINE1 927393 30 492 1319 13460 Human SVA 3425 690 8940 2248 41391 Chimpanzee HERV 19772 38 614 98 1029

The results Mouse IAP

An example 176 G→A mismatches and only 26 other mismatches. Mouse chr8:28575443-28581824 (6,382 nts) vs. chr9:114987516-114993954. 176 G→A mismatches and only 26 other mismatches.

More examples Mouse IAP Mouse MusD Query 4059 AAAACTGGCATAGGTGCCTATGTGGCTAATGGTAAAGTGGTATCCAAACAATATAATGAA 4118 Sbjct 960 ............A..................A.........................A.. 1019 Query 4119 AATTCACCTCAAGTGGTAGAATGTTTAGTGGTCTTAGAAGTTTTAAAAACCTTTTTAAAA 4178 Sbjct 1020 ..................A........A........A....................... 1079 Query 4179 CCCCTTAATATTGTGTCAGATTCCTGTTATGTGGTTAATGCAGTAAATCTTTTAGAAGTG 4238 Sbjct 1080 .........................A............................A..... 1139 Query 4239 GCTGGAGTGATTAAGCCTTCCAGTAGAGTTGCCAATATTTTTCAGCAGATACAATTAGTT 4298 Sbjct 1140 ...A........................................................ 1199 Query 4299 TTGTTATCTAGAAGATCTCCTGTTTATATTACTCATGTTAGAGCCCATTCAGGCCTACCT 4358 Sbjct 1200 .....................A...................................... 1259 Query 4359 GGCCCCATGGCTCTGGGAAATGATTTGGCAGATAAGGCCACTAAAGTGGTGGCTGCTGCC 4418 Sbjct 1260 ..............AAA..........A................................ 1319 Query 4419 CTATCATCCCCGGTAGAGGCTGCAAGAAATTTTCATAACAATTTTCATGTGACGGCTGAA 4478 Sbjct 1320 .....................A...................................A.. 1379 Query 4479 ACATTACGCAGTCGTTTCTCCTTGACAAGAAAAGAAGCCCGTGACATTGTTACTCAATGT 4538 Sbjct 1380 .......A.........................A.......................... 1439 Mouse MusD Query 1381 GCCGCACGCCGTGCTTGGGGAAGGTTGCCTGTCAAAGGAGAGATTGGTGGAAGTTTAGCT 1440 Sbjct 1381 ...A................................A...........AA..A....... 1440 Query 1441 AGCATTCGGCAGAGTTCTGATGAACCATATCAGGATTTTGTGGACAGGCTATTGATTTCA 1500 Sbjct 1441 .A...................A...................................... 1500 Query 1501 GCTAGTAGAATCCTTGGAAATCCGGACACGGGAAGTCCTTTCGTTATGCAATTGGCTTAT 1560 Sbjct 1501 .......A.......AA......AA................................... 1560 Query 1561 GAGAATGCTAATGCAATTTGCCGAGCTGCGATTCAACCGCATAAGGGAACGACAGATTTG 1620 Sbjct 1561 ..............................................A............. 1620 Query 1621 GCGGGATATGTCCGCCTTTGCACAGACATCGGGCCTTCCTGCGAGACCTTGCAGGGAACC 1680 Sbjct 1621 .......................................................A.... 1680 Query 1681 CACGCGCAGGCAATGTTCTCAAGGAAACGAGGGAAAAATGTATGCTTTAAGTGTGGAAGT 1740 Sbjct 1681 .........A......................A........................... 1740

More examples Human HERV Human SVA Query 235 TCCTTTAAACAAGGAACAGGTTAGACAAGCCTTTATCAATTCTGGTGCATGGA-AGATTG 293 Sbjct 1256 ............AA....AA...A.....................AAT..-A.C.A.... 1314  Query 294 ATCTTGCTGATTTTGT-GAGAATTATTGACAGTCATTACCCAAAAACAAAAATCTTCCAG 352 Sbjct 1315 G....A..A.....A.AA.A...........A............................ 1374  Query 353 TTTTAAAAATTGACTACTTGGATTTTACCTAAAAATGCCAGACATAAACCTTTAGAAAAT 412 Sbjct 1375 ....T..............AA.............T.A...A.............A..... 1434  Query 413 GCTCTGACGGTATTTACTGATGGTTCCAGCAATGAAAAAGCAACTTACACCAGGCCAAAA 472 Sbjct 1435 A....A.....G......A..A......A....A.....A.............A...... 1494  Query 473 GAACGAGTCCTTGAAACTCAATGTCACTCGGCTCAAAGAGCAGAGTT-GTTGTTGTCAAT 531 Sbjct 1495 A...A....A..A...............TAA......A.A..A.A..A.C.AC....-.. 1553  Query 532 T-CAGTGTTACAAAATTTTAATCAGCCTATTAACATTGTATCAGATTCTGCATATGTAGT 590 Sbjct 1554 .A..A.A....................................A.....A.....A..A. 1613 Human SVA Query 300 TGCCGGGATTGCAGACGGAGTCTGGTTCGCTCGGTGCTCGGTGGTGCCCAGGCTGGAGTG 359 Sbjct 412 ............................A...A......AA................... 471  Query 360 CAGTGGCGTGGTCTCGGCTCGCTGCAGCCTCCATCTCCCGGCCGCCTGCCTTGGCCGCCC 419 Sbjct 472 ..........A....A.......A..A............A................T... 531  Query 420 AGAGTGCCGAGATTGCAGCCTCTGCCCGGCCTCCACCCCGTCTGGGAGGTGGGGAGCGTC 479 Sbjct 532 .A......A......................A...............A..AA........ 591  Query 480 TCTGCCTGGCCGCCCATCGTCTGGGACGTGGGGAGCCCCTCTGCCTGGCTGCCCAGTCTG 539 Sbjct 592 ..........T...................A............................. 651  Query 540 GAGGGTGGGGAGCATCTCTGCCCGGCCGCCATCCCGTCTGGGAGGTGGGGAGCGCCTCTT 599 Sbjct 652 ..AA...A.....G.....................A...A...A...A............ 711  Query 600 CCCGGCAGCCATCCCATCTGGGAGGTGGGGAGCGTCTCTGCCCGGCCGCCCATCGTCTGA 659 Sbjct 712 .......................A...A................................ 771

Editing Motifs Motifs were evaluated statistically based on the nucleotide composition of the RTs. Total 446 elements. IAP A C G T 2 nts upstream 4 7 1 nt upstream 10 1 nt downstream 12 2 nts downstream 43 13 GxA→AxA motif Mouse LINE- GG→AG Human SVA- AG→AA IAP MusD

Are edited RTs expressed? 8% (35) of edited IAPs are in exons, but only 3.5% in all IAPs. Could be facilitated by the increase in the weak A-T pairs. 24 exons are alternative. Editing modified the 5’-splice site from the consensus G|GT to A|GT.

Other mammalians But in organisms that have no APOBEC3… Animal Elements P-value Minimal cluster length Number of G→A clusters Number of G→A nucleotides Number of C→T clusters Number of C→T nucleotides Rat ERV 10-8 8 877 12173 30 289 Orangutan HERV 10-7 7 182 2126 61 Rhesus 146 1959 4 29 Marmoset 38 410 53 But in organisms that have no APOBEC3… Retrotransposon family Total no. of elements in family No. of edited elements- high confidence No. of edited nucleotides- high confidence No. of edited elements- low confidence No. of edited nucleotides- low confidence Fly LTR 15925 17 119 Yeast Ty1 267 4 29 - Chicken LTR 36318 1 13 Frog LTR 10493 Zebreafish LTR 133895 Worm LTR 617

Editing is ongoing SVA RTs are hominoid-specific Largest fraction of elements are edited (690, 20%) 262 human-specific edited elements 16 polymorphic elements

Phylogenetics The molecular clock paradigm is wrong! Editing must be masked to construct phylogenetic trees. IAPLTR4_I

Tracing evolution (1) G G G (2) (3) A G G G A G (4) (5) A G A A A A Editing is directed. Order of replication events can be reconstructed. Editing event (1) G G G (2) (3) A G G G A G (4) (5) A G A A A A

Tracing evolution (1) (2) (3) (4) (5) (1) (2) (3) (4) (5) Create an edge connecting a sequence with G to a sequence with A. Eliminate short circles. For each RT, keep only the edge to the common ancestor that is genetically nearest (based on non G→A mismathces). (1) (2) (3) (4) (5) (1) (2) (3) (4) (5)

Tracing evolution IAPLTR4_I

Discussion Editing can explain the successful exaptation of RTs Editing accelerates evolution- demonstrated for HIV Our method detects only a small fraction of edited elements De novo genes from edited RTs probably not here yet

Future directions Searching for editing in non-reference DNA: An editing-based algorithm to reconstruct the history of retrotransposon evolution. A comprehensive survey of editing in the reference genome. A systematic search for functions of edited elements (expression with RNA-seq, positive selection). Searching for editing in non-reference DNA: Different individuals (polymorphism). Different tissues (somatic editing).

Thank you CGACAAGAGTGTACGATGACGTC |||||*||||||*|||||*|||| CGACCGGAGTGTGCGCTGGCGTC