Download presentation
Presentation is loading. Please wait.
Published byDaniela Kelley Modified over 9 years ago
1
Novel computational methods for large scale genome comparison PhD Director: Dr. Xavier Messeguer Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Novel computational methods for large scale genome comparisons of related species DOCTORAL THESIS, 2008 Todd James Treangen Novel computational methods for large scale genome comparison Todd James Treangen The current wealth of available genomic data embodies a wide variety of species, spanning all domains of life, thus providing an unprecedented opportunity to compare and contrast evolutionary histories of closely and distantly related organisms. The identification of homologous DNA is a fundamental building block of comparative genomic and molecular evolution studies. To date, sequence alignment has proven to be a versatile tool for comparing closely and distantly related organisms, nucleotide at a time. For pairwise comparisons, optimal global and local alignment for a given scoring scheme requires O (m x n) time and space with respect to the lengths m,n of the genomes. This computational bottleneck becomes exponential for multiple genomes, O (n k ) for k genomes, making large scale multiple genome alignment a daunting task using traditional methods. The focus of this dissertation is on developing novel algorithms and software for efficient global and local comparison of multiple genomes and the application of these methods for a biologically relevant case study. The thesis research is organized following the main focus into three successive phases, specifically: (1) multiple genome alignment of closely related species, (2) local multiple alignment of interspersed repeats, and finally, (3) a comparative genomics case study of DNA uptake sequences (DUS) in Nesseria. In Phase 1, we first develop an efficient algorithm and data structure for maximal substring match search in multiple genome sequences. Our approach for searching for maximal unique substrings yields significant improvements, in both time and space, over existing methods when dealing with multiple genome sequences. Specifically, given S 1 … S m genome sequences (where S 1 is the smallest genome), we are able to find the maximal unique substrings among all of the sequence in linear time O(|S 1 |+ … + S m |) and linear space O(|S 1 |). We then implement these contributions into an interactive multiple genome comparison and alignment tool, M-GCAT, which can efficiently construct multiple genome comparison frameworks in closely related species. In Phase 2, we present a novel computational method for local multiple alignment of interspersed repeats. Our method for local alignment of interspersed repeats features a novel method for gapped extensions of chained seed matches, joining global multiple alignment with a homology test based on a hidden Markov model. We implement our method for local alignment of interspersed repeats into the open source procrastAlign software. In Phase 3, using the results from the previous two phases we perform a case study of neisserial genomes by tracking the propagation of repeat sequence elements in attempt to understand why the important pathogens of the neisserial group have sexual exchange of DNA by natural transformation. In conclusion, our global contributions in this dissertation have focused on comparing and contrasting evolutionary histories of related organisms via their genomes. We have proposed novel data structures, algorithms, and software for active areas of research in comparative genomics. We have experimentally demonstrated the efficiency and accuracy of the computational methods presented in this thesis for multiple alignment, and have implemented all data structures and algorithms in freely available software.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.