Presentation is loading. Please wait.

Presentation is loading. Please wait.

Approximate logarithmic gap penalty Affine gap functions are linear in gap length, γ(x) = αx + β. Logarithmic gaps handle both problems by penalizing small.

Similar presentations


Presentation on theme: "Approximate logarithmic gap penalty Affine gap functions are linear in gap length, γ(x) = αx + β. Logarithmic gaps handle both problems by penalizing small."— Presentation transcript:

1 Approximate logarithmic gap penalty Affine gap functions are linear in gap length, γ(x) = αx + β. Logarithmic gaps handle both problems by penalizing small indels linearly but reducing the additional penalty as gaps grow larger. While true logarithmic gap penalties increase alignment runtime by a factor of the log of the length of the larger sequence, we may approximate them using multiple affine penalties. Probabilistic Global Alignment of DNA Under Multiple Conservation Models Probabilistic Global Alignment of DNA Under Multiple Conservation Models Chuong B. Do, Michael Brudno, Serafim Batzoglou Department of Computer Science, Stanford University, California, USA {chuongdo, brudno, serafim} @cs.stanford.edu Comparative analysis of genomic sequences from different organisms has long been considered a powerful method for inferring biological function. Most standard global nucleotide aligners use some variation of or approximation to the basic Needleman-Wunsch algorithm for modeling evolutionary changes. When organisms have sufficiently diverged, however, simple DNA-level conservation sometimes provides too weak a signal for reliable comparisons, leading to inaccurate alignments. Better results for distantly related species can be achieved by combining several different alignment models to account for the local characteristics of regions of sequences during alignment. We present LAGAN2, a multi-state probabilistic extension of the LAGAN alignment method for pairwise sequence comparison. 1 Why another aligner? Given a pair of sequences X and Y: Step OneGenerate local alignments between X and Y. Step TwoConstruct rough global map by O(n log n) chaining. Step ThreeCompute global alignment restricted to areas surrounding chained alignments. 2 LAGAN alignment in a nutshell 123 4 Multiple Conservation Models M GXGX GYGY KEY Mmatch G X gap in sequence X G Y gap in sequence Y CONSERVED NONCODING CC-M CC-G Y CC-G X CC-M CC-G Y CC-G X CN-M CN-G Y CN-G X CN-G Y ’ CN-G X ’ N-M/G CONSERVED CODING FORWARD NONCONSERVED CONSERVED CODING REVERSE Traditional dynamic programming-based alignment methods (such as LAGAN) maximize the score for aligning sequences under a scoring matrix for nucleotide matches and an affine gap penalty, as depicted in the finite-state machine above. While such techniques fail to take advantage of various biological signals that provide stronger signals for alignment, the computational time cost of sophisticated but more biologically- motivated alignment models has generally been prohibitive for processing sequences with lengths over 1 megabase. In LAGAN2, we introduce an extended model for more accurate nucleotide-level pairwise alignment of such long genomic DNA. Amino acid level conservation Met Glu Val Leu Phe Tyr Ser Asp ATG GAG GTG CTG TTC TAT TCA GAT ATG GAA GTC CTC --- TAC AGC GAC Met Glu Val Leu Tyr Ser Asp While the correct alignment is clear in the translated sequences, low nucleotide similarity (see serine) poses a challenge for traditional aligners. Also, note how the ending TC of the leucine in the sequence are conserved with an in-frame following gap, whereas regular aligners would choose to move the TC to right, generating an out-of-frame gap. αβWhen α is _____ and β is ____, then… HighLowThe few large gaps introduced correctly align long conserved stretches while small indels are missed. LowHighSmall insertions/deletions are modeled correctly, but overall alignment quality drops as larger broad areas of conservation are broken into small nonorthologous regions of misleadingly high nucleotide identity. KEY N-M/Gnonconserved match/gap CN-Mconserved noncoding match CN-G X conserved noncoding short gap in X CN-G Y conserved noncoding short gap in Y CN-G X ’conserved noncoding long gap in X CN-G Y ’conserved noncoding long gap in Y CC-Mconserved coding forward strand match CC-G X conserved coding forward strand gap in X CC-G Y conserved coding forward strand gap in Y CC-Mconserved coding reverse strand match CC-G X conserved coding reverse strand gap in X CC-G Y conserved coding reverse strand gap in Y γ(x) x 3 Classical alignment models The Nonconserved State The nonconserved state allows us to model regions in which the low or nonexistent conservation fails to produce a meaningful alignment; with such a state, we can prevent overprediction as well as suggest boundaries for local conservation. The systems were tested on sequence from the cystic fibrosis transmembrane conductance regulator region for 12 species. Pairwise alignments were performed between the human sequence and each of the other 11 species, including baboon, cat, chicken, chimp, cow, dog, fugu, pig, rat, zebrafish. Representative species are used to illustrate the distribution of predicted states per nucleotide. Despite the low state frequencies for conserved coding states, exon alignment accuracy remained high in all species as measured by percentage of exon length covered in the alignments. 5 Results Table 1. Exon coverage and nucleotide-level prediction accuracy CoveragePrediction Accuracy Organism< 70% Correct 70%+ Correct 90%+ Correct 100% Correct Exon Sensitivity Exon Specificity Fugu5%99%87%81%86.7%63.5% Zebrafish0%100%98%94%97.6%57.4% Chicken0%100% 99%88.1%79.6% Table 2. Representative State Distributions OrganismNonconservedConserved Noncoding Conserved Coding Fugu99.081%0.053%0.867% Chicken99.080%0.553%0.368% Mouse59.834%39.321%0.845% Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, NISC Comparative Sequencing Program, Green ED, Sidow A, Batzoglou S. LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA. Genome Research, 2003 Apr; 13(4):721-31. Kent WJ, Zahler AM. Conservation, Regulation, Synteny, and Introns in a Large-Scale C. briggsae-C. elegans Genomic Alignment. Genome Research, 2000 Aug; 10(8):1115-1125. Miller W, Myers EW. Sequence comparison with concave weighting functions. Bulletin of Mathematical Biology, 1998. 50(2):97-120. Thanks to Eugene Davydov and Marina Sirota for useful conversations. 6 References


Download ppt "Approximate logarithmic gap penalty Affine gap functions are linear in gap length, γ(x) = αx + β. Logarithmic gaps handle both problems by penalizing small."

Similar presentations


Ads by Google