Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.

Slides:



Advertisements
Similar presentations
Sequence Alignments.
Advertisements

Global Sequence Alignment by Dynamic Programming.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Sequence Alignments and Database Searches Introduction to Bioinformatics.
 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i,
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Introduction to Bioinformatics
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Sequence Alignments Introduction to Bioinformatics.
Alignment II Dynamic Programming
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Protein Sequence Comparison Patrice Koehl
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignments and Dynamic Programming BIO/CS 471 – Algorithms for Bioinformatics.
Sequence Alignment.
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Sequence Alignment Algorithms Morten Nielsen Department of systems biology, DTU.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Arun Goja MITCON BIOPHARMA
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Your friend has a hobby of generating random bit strings, and finding patterns in them. One day she come to you, excited and says: I found the strangest.
Welcome to Introduction to Bioinformatics
Sequence comparison: Local alignment
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
Global, local, repeated and overlaping
Sequence Alignment 11/24/2018.
Pairwise sequence Alignment.
#7 Still more DP, Scoring Matrices
Pairwise Sequence Alignment
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Find the Best Alignment For These Two Sequences
Pairwise Alignment Global & local alignment
Dynamic Programming Finds the Best Score and the Corresponding Alignment O Alignment: Start in lower right corner and work backwards:
Sequence alignment BI420 – Introduction to Bioinformatics
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Presentation transcript:

Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically meaningful. Global alignment - Needleman-Wunsch (1970) maximizes the number of matches between the sequences along the entire length of the sequences. Local alignment - Smith-Waterman (1981) gives the highest scoring local match between two sequences.

Pairwise Global Alignment Global alignment - Needleman-Wunsch (1970) maximizes the number of matches between the sequences along the entire length of the sequences. Reason for making a global alignment: checking minor difference between two sequences Analyzing polymorphisms (ex. SNPs) between closely related sequences …

Pairwise Global Alignment Computationally: Given: a pair of sequences (strings of characters) Output: an alignment that maximizes the similarity

How can we find an optimal alignment? 1 27 ACGTCTGATACGCCGTATAGTCTATCT CTGAT---TCG-CATCGTC--T-ATCT How many possible alignments? C(27,7) gap positions = ~888,000 possibilities Dynamic programming: The Needleman & Wunsch algorithm

Time Complexity   Consider two sequences: AAGT AGTC How many possible alignments the 2 sequences have? N! = sqrt(2pin)(n/e)^n + … Sterlings formula   2n n = (2n)!/(n!)2 = (22n /n ) = (2n)

Scoring a sequence alignment Match/mismatch score: +1/+0 Open/extension penalty: –2/–1 ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || |||||||| ----CTGATTCGC---ATCGTCTATCT Matches: 18 × (+1) Mismatches: 2 × 0 Open: 2 × (–2) Extension: 5 × (–1) Score = +9

Pairwise Global Alignment Computationally: Given: a pair of sequences (strings of characters) Output: an alignment that maximizes the similarity

Needleman & Wunsch Place each sequence along one axis Place score 0 at the up-left corner Fill in 1st row & column with gap penalty multiples Fill in the matrix with max value of 3 possible moves: Vertical move: Score + gap penalty Horizontal move: Score + gap penalty Diagonal move: Score + match/mismatch score The optimal alignment score is in the lower-right corner To reconstruct the optimal alignment, trace back where the max at each step came from, stop when hit the origin.

Example AAAC AAAC A-GC -AGC C A G -6 -4 -2 -8 -6 -4 -2 1 -1 -3 -5 -1 Let gap = -2 match = 1 mismatch = -1. C A empty G -6 -4 -2 -8 -6 -4 -2 1 -1 -3 -5 -1 -2 -4 -3 -2 -1 -1 AAAC A-GC AAAC -AGC

Time Complexity Analysis Initialize matrix values: O(n), O(m) Filling in rest of matrix: O(nm) Traceback: O(n+m) If strings are same length, total time O(n2)

Local Alignment Problem first formulated: Problem: Algorithm: Smith and Waterman (1981) Problem: Find an optimal alignment between a substring of s and a substring of t Algorithm: is a variant of the basic algorithm for global alignment

Motivation Searching for unknown domains or motifs within proteins from different families Proteins encoded from Homeobox genes (only conserved in 1 region called Homeo domain – 60 amino acids long) Identifying active sites of enzymes Comparing long stretches of anonymous DNA Querying databases where query word much smaller than sequences in database Analyzing repeated elements within a single sequence

Local Alignment match = 1 mismatch = -1. 1 1 2 1 3 1 1 1 2 2 2 1 3 1 1 Let gap = -2 match = 1 mismatch = -1. GATCACCT GATACCC GATCACCT GAT _ ACCC C A T G empty 1 1 2 1 3 1 1 1 2 2 2 1 3 1 1 1 2 4 2 1 2 3 3

Smith & Waterman Place each sequence along one axis Place score 0 at the up-left corner Fill in 1st row & column with 0s Fill in the matrix with max value of 4 possible values: Vertical move: Score + gap penalty Horizontal move: Score + gap penalty Diagonal move: Score + match/mismatch score The optimal alignment score is the max in the matrix To reconstruct the optimal alignment, trace back where the MAX at each step came from, stop when a zero is hit

exercise Find the best local alignment: CGATG AAATGGA Let: gap = -2 match = 1 mismatch = -1. Find the best local alignment: CGATG AAATGGA

Semi-global Alignment Example: CAGCA-CTTGGATTCTCGG –––CAGCGTGG–––––––– CAGCACTTGGATTCTCGG CAGC––––G––T––––GG We like the first alignment much better. In semiglobal comparison, we score the alignments ignoring some of the end spaces.

Global Alignment Example: AAACCC A  CCC -2 -4 -6 -8 -10 -12 1 -1 -3 empty A C -2 -4 -6 -8 -10 -12 1 -1 -3 -5 -7 -9 Prefer to see: AAACCC   ACCC Do not want to penalize the end spaces

SemiGlobal Alignment t =   ACCC Example: s = AAACCC -2 1 -1 -4 2 -6 empty A C -2 1 -1 -4 2 -6 -3 3 -8 -5 4

SemiGlobal Alignment t =   ACCC  Example: s = AAACCCG -2 1 -1 -4 2 empty A C -2 1 -1 -4 2 -6 -3 3 -8 -5 4 G -1 -2 -1 2

Place where spaces are not penalized for SemiGlobal Alignment Summary of end space charging procedures: Place where spaces are not penalized for Action Beginning of 1st sequence End of 1st sequence Beginning of 2nd sequence End of 2nd sequence Initialize 1st row with zeros Look for max in last row Initialize 1st column with zeros Look for max in last column

Pairwise Sequence Comparison over Internet lalign www.ch.embnet.org/software/LALIGN_form.html Global/Local fasta.bioch.virginia.edu/fasta_www/plalign.htm USC www-hto.usc.edu/software/seqaln/seqaln-query.html alion fold.stanford.edu/alion genome.cs.mtu.edu/align.html align www.ebi.ac.uk/emboss/align xenAliTwo www.soe.ucsc.edu/~kent/xenoAli/xenAliTwo.html Local for DNA blast2seqs www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html Local BLAST web.umassmed.edu/cgi-bin/BLAST/blast2seqs lalnview www.expasy.ch/tools/sim-prot.html Visualization prss www.ch.embnet.org/software/PRSS_form.html Evaluation Fasta.bioch.virginia.edu/fasta/prss.htm graph-align Darwin.nmsu.edu/cgi-bin/graph_align.cgi Bioinformatics for Dummies

Significance of Sequence Alignment Consider randomly generated sequences. What distribution do you think the best local alignment score of two sequences of sample length should follow? Uniform distribution Normal distribution Binomial distribution (n Bernoulli trails) Poisson distribution (n, np=) others Binomial distribution --- in n Bernoulli trials (p-H q-T), probability to see k successes. # of successes satisfies the binomial distribution The composition of the two sequences is the same as two test sequences

Extreme Value Distribution Yev = exp(- x - e-x )

Extreme Value Distribution vs. Normal Distribution P-value --- the area under the shaded curve

“Twilight Zone” Some proteins with less than 15% similarity have exactly the same 3-D structure while some proteins with 20% similarity have different structures. Homology/non-homology is never granted in the twilight zone.