A PCR-generated chimeric sequence usually comprises two phylogenetically distinct parent sequences and occurs when a prematurely terminated amplicon reanneals.

Slides:



Advertisements
Similar presentations
Indexing DNA Sequences Using q-Grams
Advertisements

1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Metabarcoding 16S RNA targeted sequencing
A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
COFFEE: an objective function for multiple sequence alignments
Molecular Evolution Revised 29/12/06
Detection of chimeric sequences from PCR artefacts
Linear Algebraic Equations
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Bioinformatics and Phylogenetic Analysis
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Similar Sequence Similar Function Charles Yan Spring 2006.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Metagenomic Analysis Using MEGAN4
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Molecular Microbial Ecology
BLAST What it does and what it means Steven Slater Adapted from pt.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Improving the prediction of RNA secondary structure by detecting and assessing conserved stems Xiaoyong Fang, et al.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Elucidating factors behind pair wise distances discrepancies between short and near full-length sequences. We hypothesized that since the 16S rRNA molecule.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Sequence Alignment.
INTEGRALS We saw in Section 5.1 that a limit of the form arises when we compute an area. We also saw that it arises when we try to find the distance traveled.
5 INTEGRALS.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Fitch-Margoliash Algorithm 1.From the distance matrix find the closest pair, e.g., A & B 2.Treat the rest of the sequences as a single composite sequence.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness Patric D. Schloss and Jo Handelsman Department.
Presented by Samuel Chapman. Pyrosequencing-Intro The core idea behind pyrosequencing is that it utilizes the process of complementary DNA extension on.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
Pipelines for Computational Analysis (Bioinformatics)
Sequence comparison: Local alignment
Workshop on the analysis of microbial sequence data using ARB
Multiple Alignment and Phylogenetic Trees
AGEseq: Analysis of Genome Editing by Sequencing
Multidimensional Scaling
Chapter 19 Molecular Phylogenetics
Phylogenetic tree of 38 Pseudomonas type strains, based on the V3-V5 region sequence of the 16S rRNA gene (V3 primer, positions 442 to 492; and V5 primer,
Universal microbial diagnostics using random DNA probes
Presentation transcript:

A PCR-generated chimeric sequence usually comprises two phylogenetically distinct parent sequences and occurs when a prematurely terminated amplicon reanneals to a foreign DNA strand and is copied to completion in the following PCR cycles. The point at which the chimeric sequence changes from one parent to the next is called the conversion, recombination or break point. Chimeras are problematic in culture-independent surveys of microbial communities because they suggest the presence of non-existent organisms (von Wintzingerode et al., 1997). Several methods have been developed for detecting chimeric sequences (Cole et al., 2003; Komatsoulis and Waterman, 1997; Liesack et al., 1991; Robinson-Cox et al., 1995) that generally rely on direct comparison of individual sequences to one or two putative parent sequences at a time. Here we present an alternative approach based on how well sequences fit into their complete phylogenetic context. Hugenholtz,P. and Huber,T. (2003) Chimeric 16S rDNA sequences of diverse origin are accumulating in the public databases. Int. J. Syst. Evol. Microbiol., 53, 289–293. Huber, T., Faulkner, G. and Hugenholtz, P. (2004) Bellerophon; a program to detect chimeric sequences in multlipe sequence alignments, Bioinformatics, –2319. Bellerophon: a program to detect chimeric sequences in multiple sequence alignments Thomas Huber † and Philip Hugenholtz # #DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA † ComBinE group, Advanced Computational Modelling Centre, The University of Queensland, Brisbane 4072, Australia Abstract Usage: More than 275 users worldwide References: Parent sequences are assigned to each putative chimera by selecting the two sequences with the highest opposing paired distance contributions (dm[i][j]) to the dme at the optimal break point. The parent sequences of a chimera are most likely to be found in the same PCR-clone library and therefore as many sequences as possible from this one library should be included in the analysis. However, even if the exact parent sequences of a given chimera are not present in the dataset, Bellerophon will identify and report the closest phylogenetic neighbours of the parents. In addition, the output from Bellerophon includes the location of the optimal break point relative to an Escherichia coli reference alignment (Brosius et al., 1978) and the percentage identities of the parent sequences to the chimera either side of the break point. These features aid in verification of chimeras. Mutually incompatible chimeras are screened from the Bellerophon output. That is, once a sequence (A) has been identified as chimeric, subsequent putative chimeras with lower preference scores, that identify sequence A as one of the parents, areremoved from the output list. Bellerophon detects chimeras based on a partial treeing approach (Wang and Wang, 1997; Hugenholtz and Huber, 2003), i.e. phylogenetic trees are inferred from independent regions (fragments) of a multiple sequence alignment and the branching patterns are compared for incongruencies that may be indicative of chimeric sequences. No trees are actually built during the procedure and the only calculations required are distance (sequence similarity) calculations. A full matrix of distances (dm) between all pairs of sequences are calculated for fragments left and right of an assumed break point. The total absolute deviation of the distance matrices (distance matrix error, dme) of n sequences is then where dm[i][j] denotes the distance between two sequences i and j. The largest contribution to the dme is expected to arise from chimeras, since fragments from these sequences have distinctly different locations relative to all other sequences in the dataset, and therefore distinctly different distance matrices. To rank the sequences by their contribution to the dme value, we calculate the ratio of the dme value from all sequences over the dme value (dme[i]) of a reference dataset lacking the sequence i under consideration. This ratio is called the preference score of the sequence. The ratio for chimeric sequences will have a preference score >1, whereas non- chimeric sequence scores are expected to be ~1. To detect all putative chimeras in a dataset, preference scores have to be calculated for all sequences. Naively, the calculation would require a computationally expensive distance matrix comparison for each sequence in the dataset. This can, however, be implemented more efficiently by taking advantage of previously performed calculations. Because the calculation of the dme involves column sums in the form of and the distances between identical sequences dm[i][i] are by definition zero, equation (1) can be rewritten as: which only involves calculation of a single matrix and some intermediate storage of the column sums. To determine the optimal break point for putative chimeras, all sequences are scanned along their length by dividing the alignment into fragment pairs at 10 character intervals. Distances are calculated from equally sized windows (200, 300 or 400 characters) of the fragments left and right of the break point to obtain similar signal-to-noise ratios for each fragment. The highest preference score calculated for each sequence in all fragment pairs indicates the optimal break point. Sequences are ranked according to their highest recorded preference score and reported as potentially chimeric if that score is >1. Absolute preference scores are dataset-dependent and should only be used for relative ranking of putative chimeras within a given dataset. For manual confirmation of identified chimeras and phylogenetic placement of the chimeric halves, it is necessary to specify the most likely parent sequences in the dataset, giving rise to the chimera. Up to date (10 November 2004) Bellerophon has been used by more than 275 researchers world wide (figure 1) to detect chimeric sequences in more than 2500 PCR clone libraries. Figure 2 shows the total number of monthly requests processed by Bellerophon. The screening of approximately 250 clone libraries for chimeric sequences each month is a direct reflection of Bellerophon’s popularity. This has to be seen in particular in context of the importance of 16S marker genes in molecular microbial biology to identify new species in microbial communities and the experimental time involved in generating a single PCR clone library from an environmental sample. Fig. 1: user locations. Fig. 2: Server usage. Summary: Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments. Availability: Bellerophon is available as an interactive web server at Contact: Method Introduction