Bioinformatics (4) Sequence Analysis. figure NA1: Common & simple DNA2: the last 5000 generations Sequence Similarity and Homology.

Slides:



Advertisements
Similar presentations
Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
1 Applications of Dynamic Programming zTo sequence analysis Shotgun sequence assembly Multiple alignments Dispersed & tandem repeats Bird song alignments.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
1 DNA1: Last week's take-home lessons Types of mutants Mutation, drift, selection Binomial for each Association studies  2 statistic Linked & causative.
Heuristic alignment algorithms and cost matrices
Expected accuracy sequence alignment
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
1 DNA1: (Last week) Types of mutants Mutation, drift, selection Binomial for each Association studies  2 statistic Linked & causative alleles Alleles,
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Tue Sep 18 Intro 1: Computing, statistics, Perl, Mathematica Tue Sep 25 Intro 2: Biology, comparative genomics, models & evidence, applications Tue Oct.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment Oct 9, 2002 Joon Lee Genomics & Computational Biology.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
BNFO 602 Multiple sequence alignment Usman Roshan.
Sequence analysis of nucleic acids and proteins: part 1 Based on Chapter 3 of Post-genome Bioinformatics by Minoru Kanehisa, Oxford University Press, 2000.
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Alignment II Dynamic Programming
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Sequence alignment, E-value & Extreme value distribution
From Pairwise Alignment to Database Similarity Search.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Chapter 3 Computational Molecular Biology Michael Smith
Expected accuracy sequence alignment Usman Roshan.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Doug Raiford Phage class: introduction to sequence databases.
Protein Sequence Alignment Multiple Sequence Alignment
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Aligning Genomes Genome Analysis, 12 Nov 2007 Several slides shamelessly stolen from Chr. Storm.
INTRODUCTION TO BIOINFORMATICS
The ideal approach is simultaneous alignment and tree estimation.
Sequence comparison: Dynamic programming
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Sequence comparison: Local alignment
Bioinformatics: The pair-wise alignment problem
Sequence Alignment 11/24/2018.
Pairwise sequence Alignment.
Intro to Alignment Algorithms: Global and Local
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Bioinformatics (4) Sequence Analysis

figure NA1: Common & simple DNA2: the last 5000 generations Sequence Similarity and Homology

Alignments & Scores Global (e.g. haplotype) ACCACACA ::xx::x: ACACCATA Score= 5(+1) + 3(-1) = 2 Suffix (shotgun assembly) ACCACACA ::: ACACCATA Score= 3(+1) =3 Local (motif) ACCACACA :::: ACACCATA Score= 4(+1) = 4

"Hardness" of (multi-) sequence alignment Align 2 sequences of length N allowing gaps. ACCAC-ACA ACCACACA ::x::x:x: :xxxxxx: AC-ACCATA, A-----CACCATA, etc. 2N gap positions, gap lengths of 0 to N each: A naïve algorithm might scale by O(N 2N ). For N= 3x10 9 this is rather large. Now, what about k>2 sequences? or rearrangements other than gaps?

Increasingly complex (accurate) searches Exact (stringsearch) CGCG Regular expression (PrositeSearch) CGN{0-9}CG = CGAACG Substitution matrix (BlastN) CGCG ~= CACG Profile matrix (PSI-blast) CGc(g/a) ~ = CACG Gaps (Gap-Blast) CGCG ~= CGAACG Dynamic Programming (NW, SM) CGCG ~= CAGACG

Pearson WR Protein Sci 1995 Jun;4(6): Comparison of methods for searching protein sequence databases. Methods Enzymol 1996;266: Effective protein sequence comparison. Algorithm: FASTA, Blastp, Blitz Substitution matrix:PAM120, PAM250, BLOSUM50, BLOSUM62 Database: PIR, SWISS-PROT, GenPept Comparisons of homology scores

Scoring matrix based on large set of distantly related blocks: Blosum62

Scoring Functions and Alignments Scoring function: –  (match) = +1; or substitution matrix –  (mismatch) = -1; " –  (indel) = -2; –  (other) = 0. Alignment score: sum of columns. Optimal alignment: maximum score.

Calculating Alignment Scores

What is dynamic programming? A dynamic programming algorithm solves every subsubproblems just once and then saves its answer in a table, avoiding the work of recomputing the answer every time the subsubproblem is encountered. -- Cormen et al. "Introduction to Algorithms", The MIT Press.

Pairwise sequence alignment by the dynamic programming algorithm. The algorithm involves finding the optimal path in the path matrix. (a), which is equivalent to searching the optimal solution in the search tree (b). (a) Path Matrix(b) Search Tree AIMS A M O S Alignment AIM-S A-MOS Pruning by an optimization function XX

Methods for computing the optimal score in the dynamic programming algorithm (a ) the gap penalty is a constant. (b) the gap penalty is a linear function of the gap length. (a) (b) D i, j-l d D i-1, j D i-1, j-1 D i-1, j D i, j-l d w s(i), t(j) D i,j D i, j (2) b w s(i), t(j) D i,j (1) D i,j (3) b

Recursion of Optimal Global Alignments

Concepts of global and local optimality in the pairwise sequence alignment. The distinction is made as to how the initial values are assigned to the path matrix. (a) Global vs. Global (b) Local vs. Global X (c) Local vs. Local

Recursion of Optimal Local Alignments

The dynamic programming algorithm can be applied to limited areas, rather than to the entire matrix, after rapidly searching the diagonals that contain candidate markers. n 1 m m n +m -1 j 1 1 i l l

Computing Row-by-Row

Traceback Optimal Global Alignment

Local and Global Alignments

Time and Space Complexity of Computing Alignments

The order of computing matrix elements in the path matrix, which is suitable for (a) sequential processing and (b) parallel processing. (I, j -1) (i, j) (i +1, j-1) (i +1, j ) (i -1, j -1) (i -1, j ) (a) (i, j -2) (i, j -1) (i, j) (i+1, j -2) (i +1, j -1)(i -1, j -1) (i -1, j ) (b)

Time and Space Problems Comparing two one-megabase genomes. Space: –An entry: 4 bytes; –Table: 4 * 10^6 * 10^6 = 4 G bytes memory. Time: –1000 MHz CPU: 1M entries/second; –10^12 entries: 1M seconds = 10 days.

A Multiple Alignment of Immunoglobulins

A multiple alignment Dynamic programming on a hyperlattice From G. Fullen, 1996.

Computing a Node on Hyperlattice V S A