Pairwise Sequence Alignment

Slides:



Advertisements
Similar presentations
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Advertisements

Global Sequence Alignment by Dynamic Programming.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Pairwise Sequence Alignment
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
Sequence alignment SEQ1: VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKK VADALTNAVAHVDDPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA SLDKFLASVSTVLTSKYR.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
Developing Pairwise Sequence Alignment Algorithms
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
An Introduction to Bioinformatics
Protein Sequence Alignment and Database Searching.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Sequence alignment SEQ1: VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKK VADALTNAVAHVDDPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA SLDKFLASVSTVLTSKYR.
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Arun Goja MITCON BIOPHARMA
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Day 7 Carlow Bioinformatics Aligning sequences. What is an alignment? CENTRAL concept in bioinformatics Easy if straight-forward, similar seqs –THISTHESAME.
Genome Revolution: COMPSCI 004G 8.1 BLAST l What is BLAST? What is it good for?  Basic.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Introduction to sequence alignment Mike Hallett (David Walsh)
INTRODUCTION TO BIOINFORMATICS
The ideal approach is simultaneous alignment and tree estimation.
Welcome to Introduction to Bioinformatics
Sequence comparison: Local alignment
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
Sequence comparison: Traceback and local alignment
Sequence Alignment 11/24/2018.
Pairwise sequence Alignment.
#7 Still more DP, Scoring Matrices
Sequence comparison: Local alignment
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Find the Best Alignment For These Two Sequences
Pairwise Alignment Global & local alignment
Dynamic Programming Finds the Best Score and the Corresponding Alignment O Alignment: Start in lower right corner and work backwards:
BIOINFORMATICS Sequence Comparison
Sequence alignment BI420 – Introduction to Bioinformatics
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Presentation transcript:

Pairwise Sequence Alignment LESSON 3(2)

HOMEWORK2 Try a pairwise alignment of human alpha and beta globin at the NCBI protein BLAST site, using the available matrices (PAM30, PAM70, PAM250, BLOSUM45, BLOSUM62, BLOSUM80). Which gives the highest bit score?

Protein alignment vs. DNA alignment Protein Alignment can be more Informative than DNA Alignment. BUT, ……

Percentage identity (% ID) CCATCAAGTCC CCATGTACAGAGTCC 5/15 = 33 % CCAT---CA-AGTCC CCATGTACAGAGTCC 11/15 = 73 %

CCATCAAGTCC CCATGTACAGAGTCC CCAT---CA-AGTCC CCATGTACAGAGTCC

CCAT---CA-AGTCC CCATGTACAGAGTCC Dotplot C A T G ☻

Scoring Matrices CCATCAAGTCC CCATGTACAGA Identity matrix (e.g. match=1 and mismatch=−1) Substitution matrix

A transition (a purine becomes another purine) happens frequently. (G) (C) (T) A transition (a purine becomes another purine) happens frequently. A transversion (a purine becomes pyrimidine) occurs far less frequently.

Codons are degenerate. Codons are degenerate: changes in the third position often do not alter the amino acid that is specified

DNA Alignments are appropriate To confirm To study polymorphism To study non-coding regions of DNA

DNA Alignments for Finding regulatory elements in DNA sequences non-coding DNA ? full of regulatory elements give rise to the differences between organisms Each gene is associated with thousands of nucleotides of non-coding DNA.

Best alignment Generate all possible gapped alignment. Find the score for each. Select the highest-scoring alignment. Time consuming 100 a.a : 1075 alignments Dynamic programming algorithm

Global Sequence Alignment: Needleman and Wunsch Algorithm

GGTT GAT- GG-TT -GAT- GGTT GAT G-AT Match : +1 Mismatch : -1 Gap : -2 +1-1+1-2 = -1 GG-TT -GAT- -2+1-2+1-2 = -4 G-AT +1-2-1+1 = -1 GGTT GAT Match : +1 Mismatch : -1 Gap : -2 Introducing gaps greatly increases the number of different comparisons between two sequences and in the general case it is impossible to do them all.

Alignment by Dynamic Programming Global Alignment Needleman & Wunsch (1970) used in major alignment software packages (e.g. the ALIGN tool in the FASTA package) Local Alignment Smith & Waterman Algorithm (1981)

“mismatch” “gap” “gap” 16

Four possible outcomes in aligning two sequences 1 2 [1] identity (stay along a diagonal) [2] mismatch (stay along a diagonal) [3] gap in sequence 1 (move vertically!) [4] gap in sequence 2 (move horizontally!) 17

- G T A Global Alignment by Dynamic Programming GGTT GAT Match : +1 A GGTT GAT Match : +1 Mismatch : -1 Gap : -2

Fill in the matrix using “dynamic programming” 19

Dynamical programming - the 3 way to leave a cell → (Rightward) insert gap in vertical sequence ↓ (Downward) insert gap in horizontal sequence (Diagonal) Match Mismatch - G T -2 -4 -6 -8 A -G G G A

- G T A -2 -4 -6 -8 +1 G Global Alignment by Dynamic Programming Match : +1 Mismatch : -1 Gap : -2 - G T -2 -4 -6 -8 +1 A G

- G T A -2 -4 -6 -8 +1 -1 ↓ : -4-2 = -6 → : +1-2 = -1 Global Alignment by Dynamic Programming Match : +1 Mismatch : -1 Gap : -2 - G T -2 -4 -6 -8 +1 -1 A ↓ : -4-2 = -6 → : +1-2 = -1 : -2+1 = -1

Global Alignment by Dynamic Programming Match : +1 Mismatch : -1 Gap : -2 - G T -2 -4 -6 -8 +1 -1 -3 -5 A final alignment score

Global Alignment by Dynamic Programming Match : +1 Mismatch : -1 Gap : -2 Traceback pointer - G T -2 -4 -6 -8 +1 -1 -3 -5 A GGTT G-AT

http://www.ebi.ac.uk/Tools/emboss/

26

27

Local Alignment : Smith and Waterman Algorithm

Fail to identify functionally important residues

Global vs. Local Global alignments Local alignments Comparing sequences over their entire length Comparing sequences with partial homology Making high-quality alignments

Global alignment (top) includes matches ignored by local alignment (bottom) 15% identity 30% identity NP_824492, NP_337032

Domain Parts of sequence/Particular functional site sequence-structure-function relation Domain

Local Alignments Only aligns the most similar portions of sequences To look for small parts of the sequences that are similar to each other. searching for functionally related sequences Programs for database searching FASTA BLAST

Alignments by Dynamic Programming Match : +1 Mismatch : -1 Gap : -2 S1 = GCCCTAGCG S2 = GCGCAATG Needleman-Wunsch methods (Global Alignment) GCCCTAGCG GCGC-AATG Smith-Waterman methods (Local Alignment) GCGCAATG I I I I I I I I

Smith- Waterman methods Dynamic programming algorithm for performing local sequence alignment Traces only continue as long as the scores are positive. Whenever a score becomes negative it is set to 0. diagonal horizontal vertical 0. start again h Smith–Waterman is a dynamic programming algorithm too. No values in the scoring matrix can be negative! H ≥ 0

Needleman-Wunsch methods (Global Alignment) GCCCTAGCG GCGC-AATG Match : +1, Mismatch : -1, Gap : -2 I I I I I

Smith-Waterman methods (Local Alignment) GCCCTAGCG GCGCAATG Match : +1, Mismatch : -1, Gap : -2 I I I

The highest scoring cell does not need to be at the bottom right-hand corner, it could be anywhere in the matrix. The backtracing procedure begins at the highest-scoring point in the matrix, and follows the arrows back until a 0 is reached. GCCCTAGCG GCGCAATG I I I