Finishing Phage Genomes How to identify circularly permuted genomes, physical ends, 3’ overhangs, terminal repeats, and nicks.

Slides:



Advertisements
Similar presentations
The Molecular Basis of Inheritance
Advertisements

Recombinant DNA technology
DNA Structure and Replication Review.  Click here if you’d like to review part 1: DNA Discovery and Structure Click here if you’d like to review part.
Copying DNA 12.3 DNA Replication. Which color is the sugar? Which color is phosphate? If yellow is cytosine, what color is guanine? If green is adenine,
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
1 Review of directionality in DNA Now, for DNA replication.
SC.L.16.3 Describe the basic process of DNA replication and how it relates to the transmission and conservation of the genetic information.
Genomic DNA & cDNA Libraries
DNA Replication Chapter 7.2. Processing of Genetic Material.
What are the 4 nitrogenous bases? Which bases bond together? REMEMBER.
DNA Replication Chapter 12.3.
In Eukaryotes and Prokaryotes
Section 12-3: DNA Replication
DNA Sequencing PCR REPLICATION AND ITS APPLICATIONS.
DNA Replication Accel Bio 2014.
Genomic walking (1) To start, you need: -the DNA sequence of a small region of the chromosome -An adaptor: a small piece of DNA, nucleotides long.
Inquiry: How is DNA used to store and transmit cell information?
The Plasmid Lab: Creating Recombinant DNA.  Circular piece of DNA  Replicates independently  Used as a VECTOR.
Unit 9: The Central Dogma Honors Biology.  The process of DNA replication is fundamentally similar for prokaryotes and eukaryotes.  DNA replication.
DNA The Molecule of Life: Replication. Replication: Why? When cells replicate, each new cell needs it’s own copy of DNA. Where? Nucleus in Eukaryotes.
NOTES: CH 16 (part 2) – DNA Replication and Repair.
DNA Replication. Processing of Genetic Material What is DNA Replication The process by which the DNA within a cell makes exact copies of itself Balance.
16.2 DNA Replication. DNA in Prokaryotes and Eukaryotes Prokaryotes: –ring of chromosome –holds nearly all of the cell’s genetic material.
Roles of Enzymes in DNA Replication
DNA Replication How does each cell have the same DNA? How is a prokaryote different than a eukaryote?
DNA Replication. Chromosome E. coli bacterium Bases on the chromosome DNA is very long!... but it is highly folded packed tightly to fit into the cell!
DNA Replication Accel Bio Overview: What is DNA for? The purpose of DNA is to store the information necessary to allow cells & organisms to function.
DNA REPLICATION. What does it mean to replicate? The production of exact copies of complex molecules, such as DNA molecules, that occurs during growth.
DNA Replication The double helix shape helped explain how DNA copies itself. We will study this process, DNA replication, in more detail.
Microbial Genetics Part 1 Genetics can be a challenge to understand. Use the McGraw Hill website to supplement this lecture. Please.
DNA REPLICATION TOPIC 3.4 & 7.2. Assessment Statements Explain DNA replication in terms of unwinding the double helix and separation of the strands.
SEMI-CONSERVATIVE DNA REPLICATION Pages Essential Questions replication What is replication and how is it done? helicaseDNA polymerase What’s.
Applied Bioinformatics Week 5. Topics Cleaning of Nucleotide Sequences Assembly of Nucleotide Reads.
 Stores information needed for traits and cell processes  Copying information needed for new cells  Transferring information from generation to generation.
MITOSIS AND MEIOSIS. Objectives 2. Discuss the relationships among chromosomes, genes, and DNA. 2.1 Describe how the genetic code is carried on the DNA.
Genetic Engineering Genetic engineering is also referred to as recombinant DNA technology – new combinations of genetic material are produced by artificially.
Locating and sequencing genes
SEQUENCING DNA Jos. J. Schall Biology Department University of Vermont.
DNADNA. Structure and replication of DNA - syllabus content Structure of DNA — nucleotides contain deoxyribose sugar, phosphate and base. DNA has a sugar–phosphate.
DNA The Molecule of Life: Replication. Replication: Why? When cells replicate, each new cell needs it’s own copy of DNA. Where? Nucleus in Eukaryotes.
Chapter 9 Section 3 The Replication of DNA.
Plasmids that contain l cos sites.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
DNA Replication 20.1 part 2. DNA replication Earlier on in this unit we learned about mitosis or cell division. In order for mitosis to occur and chromosomes.
MOLECULAR BIOLOGY IN ACTION In this project, students will use what they have learned in the previous courses to complete a larger multi-step molecular.
DNA Replication 20/02/ DNA replication is central to life and to evolution; in which the stored genomic information is handed down to the next.
12.3 DNA Replication THINK ABOUT IT :Before a cell divides, its DNA must first be copied. How might the double-helix structure of DNA make that possible?
Lesson Overview 12.3 DNA Replication. Lesson Overview Lesson Overview DNA Replication THINK ABOUT IT Before a cell divides, its DNA must first be copied.
Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.
DNA Replication How does each cell have the same DNA? How is a prokaryote different than a eukaryote?
DNA Replication.
RESTRICTION ENZYMES.
DNA Replication.
Solving Systems of Equations in Two Variables; Applications
BIOLOGY 12 DNA Replication.
Chapter 7 Recombinant DNA Technology and Genomics
Human Cells 2 Structure and replication of DNA
DNA Replication.
Chapter 12 Section 2: Replication of DNA
DNA Replication the big event during S phase
DNA Structure and Replication
BIOLOGY 12 DNA Replication.
DNA REPLICATION.
About how many cells are our bodies made of?
Chapter 9 Section 3 The Replication of DNA.
DNA Replication Read the title aloud to students..
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Relationship between Genotype and Phenotype
The MLPA assay and application to diagnosis of DGS
Presentation transcript:

Finishing Phage Genomes How to identify circularly permuted genomes, physical ends, 3’ overhangs, terminal repeats, and nicks.

Circularly Permuted Genomes Some phages have circularly permuted genomes. This means a linear concatamer of phage DNA is synthesized, used to fill a phage head, then cut when the head is full. Generally, one head will fit more than 100% of a genome, say, %. This ensures that wherever the DNA is cut, at least one working copy of each gene is present. The remaining part of the concatamer goes on to fill a new head, is cut, etc. Think of it like the complete genome of a phage was the alphabet… ABCDEFGHIJKLMNOPQRSTUVWXYZ Complete genome of phage “AlphaCirc” with 26 “genes” (A through Z):

CDEFGHIJKLMNOPQRSTUVWXYZABCD EFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRS CDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRS ABCDEFGHIJKLMNOPQRSTUVWXYZAB Circularly Permuted Genomes ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRS First, a long concatamer of the genome is synthesized: Next, that concatamer is packaged into a phage head until the head is full: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRS Then the concatamer is cut: CDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRS And packaging begins again with a new head: And cutting: Until… CDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRS EFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRS

CDEFGHIJKLMNOPQRSTUVWXYZABCD ABCDEFGHIJKLMNOPQRSTUVWXYZAB Circularly Permuted Genomes …an entire series of heads have had DNA packaged: Note that:  each new phage does have a complete complement of genes (A  Z, plus 2 duplicates)  there are ends within each individual phage, but the ends are not conserved among particles GHIJKLMNOPQRSTUVWXYZABCDEFGH EFGHIJKLMNOPQRSTUVWXYZABCDEF IJKLMNOPQRSTUVWXYZABCDEFGHIJ MNOPQRSTUVWXYZABCDEFGHIJKLMN KLMNOPQRSTUVWXYZABCDEFGHIJKL So what does this mean for finishing genomes?

Circularly Permuted Genomes A phage with a circularly permuted genome will not have any defined ends. No primers walks will result in the “glorious A” typical of physical ends. No clone/read build up at ends will occur. All reads will assemble into a large contig with sequence match at the “ends.” Assembly View of a Finished, Circularly Permuted Phage (Troll4) Green lines are clone/read coverage depth. Purple lines show paired reads (F/R) from clones across the ends. Orange line shows the location of sequence matches.

Circularly Permuted Genomes We can tell this phage is circularly permuted because there is strong clone and read coverage throughout, and overlap at the ends. As long as we’ve checked for weak areas throughout the contig and verified the overlap as high enough quality, this phage is considered finished. Keep in mind that the “ends” we see here are not real ends, only an artifact of consed, which cannot show DNA in a circle and so chooses a breaking point. Assembly View of a Finished, Circularly Permuted Phage (Troll4) Green lines are clone/read coverage depth. Purple lines show paired reads (F/R) from clones across the ends. Orange line shows the location of sequence matches.

Physical Ends These phages have “physical ends,” meaning the left end and right end of each particle is the same, unlike circularly permuted phages. ABCDEFGHIJKLMNOPQRSTUVWXYZ Some phages package their DNA differently. In these phages, the DNA molecule that is packaged always has the same start and end positions: So what does this mean for finishing genomes?

Physical Ends In sequencing data, physical ends can be identified in two basic ways:  A build-up of clones/reads with identical start positions.  Primer walks into the end that terminate in a “glorious A” (an artificial, strong base added to physical ends by sequencing polymerase). Let’s see what each method looks like in raw data form… ABCDEFGHIJKLMNOPQRSTUVWXYZ

Physical Ends Finding a potential physical end from a build-up of clones. Note that many clones start having high quality (Q>20) sequence from almost the exact same base. This would be extremely unlikely by chance, so this is likely a physical end. A screenshot of the Aligned Reads view from consed, from the phage Giles.

Physical Ends Finding a potential physical end from a build-up of clones. Looking at the assembly view of that same phage, we see several important things:  No orange line indicating overlap at the ends.  No purple clones linking the ends.  A higher than average amount of coverage at each end (green line). Assembly View from phage Giles

Physical Ends Finding a potential physical end from a build-up of clones. The build-up may not always be as profound, but even 4 clones that start at the same position are unlikely by chance, and should arouse suspicions. Another screenshot of the Aligned Reads view from consed, this time the phage Fruitloop.

Physical Ends Verifying a physical end with a primer walk. To verify that you truly have a physical end, and to pinpoint the precise base where the genome ends, a primer walk toward the end is necessary. The sequencing polymerase will add a single false A nucleotide if it reaches the end of a piece of DNA. Another screenshot of the Aligned Reads view from consed, this time the phage Fruitloop. This is a primer walk using primer 12 and genomic DNA as the template.

Physical Ends Verifying a physical end with a primer walk. To verify that you truly have a physical end, and to pinpoint the precise base where the genome ends, a primer walk toward the end is necessary. The sequencing polymerase will add a single false A nucleotide if it reaches the end of a piece of DNA. This is a primer walk using primer 12 and genomic DNA as the template. This is the chromatogram of that primer walk. Notice that the sequence has high quality with clear peaks, reaches a “glorious A” peak at the end, and then dies out. This is very strong evidence that this is a physical end, and since the glorious A is not real, we can call the last few bases of the genome: …TGCGCGGCCC

Physical Ends Verifying a physical end with a primer walk. At the other end of the genome, things work much the same. Just remember that the “glorious A” will now be a “glorious T” since the chromatogram is reverse complemented. Again, remembering the final T is false, we can call the start of the genome: TGCAGATTT…

Physical Ends Done? Not quite. Most Mycobacterium phages that have physical ends also have a short (4-14bp) 3’ sticky-end overhang. We’d like to know the length and sequence of this overhang to consider the phage completely finished. So we know both ends precisely, the genome has acceptable coverage throughout (at least one high quality read on each strand in all locations), so is it finished? It would be nice to simply primer walk into this overhang and get the sequence that way. Why doesn’t that work?

3’ Overhangs Here’s what we know about the end of the Fruitloop genome (assuming some 3’ overhang): A C G C G C C G G G 5’ 3’ T G C G C G G C C C ? ? ? ? ? ? ? ? ? ? 5’ 3’ A primer heading towards the end of the genome will always use the bottom strand as template: T G C G C G G C C C ? ? ? ? ? ? ? ? ? ? 5’ 3’ A C G C G C C G G G 5’ 3’ T G C G C G G C C C A Note that the glorious A is added, but that we still have not been enlightened about the overhang sequence at all. So how do we figure out the overhang sequence? The answer is that we ligate some genomic DNA: A C G C G C C G G G 3’ T G C G C G G C C C ? ? ? ? ? ? ? ? ? ? 5’ A C G T C T A A A A ? ? ? ? ? ? ? ? ? ? T G C A G A T T T T 5’ 3’ The sticky 3’ overhangs from each end align, ligase covalently bonds them, and now we have a continuous template on which we can run the same primer!

3’ Overhangs Before ligating our genomic DNA, primer walks at the ends died at the “glorious A” (or “glorious T”), now they can reveal the overhang sequence. We knew the right end of the genome was: …TGCGCGGCCC And the left end of the genome was: TGCAGATTT… Now with primer walks on ligated DNA we can call the 3’ overhang between the two: CGGAAGGCGC

ABCDEFGHIJKLMNOPQRSTUVWXYZAB Terminally Repetitive Genomes Note that:  each phage particle has duplicates of section AB of the genome  each phage particle has the same ends So some genomes are circularly permuted, and some have physical ends with overhangs. There are also terminally repetitive genomes, where the ends are consistent, but more than one full copy of the genome is packaged. ABCDEFGHIJKLMNOPQRSTUVWXYZAB T5 is an E. coli phage that has a terminally repetitive genome. The total genome length is about 122 kb, but the first and last 10 kb are 100% identical. Awesome is a T5-like phage finished at PBI.

ABCDEFGHIJKLMNOPQRSTUVWXYZAB Terminally Repetitive Genomes The easiest way to identify a terminally repetitive genome is by a BLAST search that matches a known terminally repetitive genome. Another possible way is to look for an unusually defined section of double coverage in the data. ABCDEFGHIJKLMNOPQRSTUVWXYZAB Assembly View of Phage Awesome Genome The red circle identifies a contiguous area of unusually high coverage. Notice that the true physical ends (on either side of “AB” in the phage particles) are somewhere within the contig, since the assembly software combines the AB section from both ends. ABCDEFGHIJKLMNOPQRSTUVWXYZAB

Terminally Repetitive Genomes You may also see a build-up of clones/reads at the edges of the double coverage area, within the contig. Assembly View of Phage Awesome Genome Area of detail. Suspicious build-up of reads, only this time it’s not at the end of a contig.

Terminally Repetitive Genomes To confirm that this is really a terminal repeat, and to find the precise base where the repeat begins and ends, primer walks are again necessary. Complete Awesome GenomeTerminal Repeat We want to design primers as though walking into physical ends. These would normally give us glorious As and define the precise ends, but… …each primer now has a secondary binding site. This means when running these primers, we will get sequence from two areas of the genome. The reads from each binding site will be identical within the terminal repeat. When the end of the terminal repeat is reached, half the signal will end in a glorious A (like the yellow primer on the right) and the other half will continue into unique sequence (like the yellow primer on the left). Thus, to find the ends of the terminal repeat (and genome), we look for primer walks with a glorious A, but that continue along after it at ~½ the signal strength.

Terminally Repetitive Genomes Here is the chromatogram from Awesome that comes from running the equivalent of the yellow primer below. Complete Awesome GenomeTerminal Repeat We can see the glorious A at base of the contig, and the purple lines show the drop of about ½ in average signal strength.

Terminally Repetitive Genomes And the equivalent of the red primer below, from Awesome. Complete Awesome GenomeTerminal Repeat Now we can call both ends of the terminal repeat (and genome).

Terminally Repetitive Genomes One important note, whose relevance will become clear. If we treat genomic DNA from this type of phage with ligase, the chromatogram is unchanged.

DNA Nicks One other feature of some genomes (such as Awesome and T5) is the presence of nicks in the DNA. Nicks are present in one strand only, in the same place of the genome each time. Some nicks are “minor” (meaning a small percentage of DNA molecules possess the nick) and these are unlikely to show up in sequencing data. Others are “major” (most of the DNA molecules possess the nick) and these are likely to show up in the DNA. A C G C G C C G G G 3’ T G C G C G G C C C G T C T A G C A 5’ A C G T C T A A A A C A G A T C G T T G C A G A T T T T 5’ 3’ Nick So how do nicks show up in sequencing data?

DNA Nicks In an assembly, major nicks will appear as a build up of clones on one strand, and a “smear” of clustered clones on the other strand. This is because of the way DNA is sheared and repaired for library construction. The red circle shows the build-up of clones. The purple line shows the “smear” on the opposite strand.

DNA Nicks Again, primer walks are needed to verify the nick. Primer walks on one strand will be unaffected (those that use the non-nick strand as template), and walks on the other strand will die suddenly with a glorious A. Non-nick strand as template. Nick strand as template.

DNA Nicks If a nick is present in only 50% of DNA molecules, it will look almost identical to an end of a terminally repetitive genome. The easiest way to distinguish them is to treat the DNA with ligase, which will repair a nick, but not (remember from earlier!) an end. The same primer on ligated and unligated DNA. It’s repaired, so must be a nick, not an end!