1 Pairwise Sequence Alignment. 2 Biological motivation Main algorithms for pairwise sequences alignment ATTGCGTCGATCGCAC-GCACGCT ATTGCAGTG-TCGAGCGTCAGGCT.

Slides:



Advertisements
Similar presentations
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Advertisements

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Pairwise Sequence Alignment
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
Local Alignment Tutorial 2. Conditions –Division to sub-problems possible –(Optimal) Sub-problem solution usable (many times?) –“Bottom-up” approach Dynamic.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Introduction To Bioinformatics Tutorial 2. Local Alignment Tutorial 2.
Sequence similarity.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Introduction to Bioinformatics / Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Shai Ben-Elazar Idit kosti Course web site :
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Introduction to Bioinformatics Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Rachelly Normand Edward Vitkin Course web site :
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Introduction to Bioinformatics Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Rachelly Normand Olga Karinski Course web site :
. Sequence Alignment Author:- Aya Osama Supervision:- Dr.Noha khalifa.
Introduction to Bioinformatics
The ideal approach is simultaneous alignment and tree estimation.
Pairwise alignment incorporating dipeptide covariation
Introduction to Bioinformatics /234525
Sequence Alignment 11/24/2018.
Intro to Alignment Algorithms: Global and Local
Pairwise Sequence Alignment
Find the Best Alignment For These Two Sequences
Pairwise Alignment Global & local alignment
BIOINFORMATICS Sequence Comparison
Pairwise Sequence Alignment
Sequence Analysis Alan Christoffels
Presentation transcript:

1 Pairwise Sequence Alignment

2 Biological motivation Main algorithms for pairwise sequences alignment ATTGCGTCGATCGCAC-GCACGCT ATTGCAGTG-TCGAGCGTCAGGCT CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT Global alignment

3 Biological motivation Main algorithms for pairwise sequences alignment ATTGCGTCGATCGCAC-GCACGCT ATTGCAGTG-TCGAGCGTCAGGCT CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT local alignment

4 Discover function Sequences that are similar probably have the same function

5 Study evolution If two sequences from different organisms are similar, they may have been a common ancestor

6 Find crucial features –Regions in the sequences that are strongly conserved between different sequences can indicate their functional importance Conservation of the IGFALS (Insulin-like growth factor) Between human and mouse.

7 Identify cause of disease –Comparison of sequences between individuals can detect changes that are related to diseases

8 Sickle Cell Anemia Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin Image source:

9 Healthy Individual >gi| |ref|NM_ | Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GG A GAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi| |ref|NP_ | beta globin [Homo sapiens] MVHLTP E EKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

10 Diseased Individual >gi| |ref|NM_ | Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GG T GAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi| |ref|NP_ | beta globin [Homo sapiens] MVHLTP V EKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

11 Sequence Modifications Three types of mutation –Substitution (point mutation) –Insertion –Deletion TCAGTTCGAGT TCCGT TCGT TCAGT Indel (replication slippage)

12 How do we quantitate similarity?

13 Scoring Similarity Assume independent mutation model –Each site considered separately Score at each site –Positive if the same –Negative if different Sum to make final score –Can be positive or negative –Significance depends on sequence length GTAGTC CTAGCG

14 Substitutions Only Pretend there are no indels –Sequences compared base-by-base –Count the number of matches and mismatches –Matches score +2, Mismatches score -1 TTCGTCGTAGTCGGCTCGACCTG GTACGTCTAGCGAGCGTGATCCT 9 matches mismatches-14 Total score +4 A weak match

15 Including Indels Create an ‘alignment’ –Count matches within alignment –Required if sequences are different length TT-CGTCGTAGTCG-GC-TCGACC-TG GTACGTC-TAG-CGAGCGT-GATCCT- 17 matches+34 2 mismatches- 2 8 indels- 8 Total score +24 A strong match

16 Choosing an Alignment Many different alignments are possible –Should consider all possible –Take the best score found –There may be more than one best alignment TT-CGTCGTAGTCG-GC-TCGACC-TG GTACGTC-TAG-CGAGCGT-GATCCT TTCGT-CGTAGTC-GGCTCG-ACCTG GTAC-GTCTA-GCGAGCGT-GATCC-T 0

17 Why is it hard ? Alignment (without gaps) requires an algorithm that performs a number of comparisons roughly proportional to the square of the average sequence length. If we include gaps the number of comparisons becomes astronomical

18 Algorithms for pairwise alignments Dot Plots – Gibbs and McIntyre 1970 Dynamic Programming : Local alignment : Smith- Waterman Global alignment :Needelman-Wunsch

19 Dot Plots Early method Sequences at top and left Dots indicate matched bases Diagonal series show matched regions GTAGTCGG T  A  G  C  G  A  G  C  TAGTCG TAG-CG

20 Dynamic Programming A method for reducing a complex problem to a set of identical sub-problems The best solution to one sub-problem is independent from the best solution to the other sub-problem

21 Dynamic Programming A method for reducing a complex problem to a set of identical sub-problems The best solution to one sub-problem is independent from the best solution to the other sub-problem

22 what does it mean? If a path from X→Z passes through Y, the best path from X→Y is independent of the best path from Y→Z

23 Example Sequences: A = ACGCTG, B = CATGT A C G C T G C A T G T ?

24 Example Score of best alignment between AC and CATG Sequences: A = ACGCTG, B = CATGT -2 …between AC and CATGT 2 …between ACG and CATG Calculate score between ACG and CATGT ? Match:+2, Other:-1

25 Needleman-Wunsch Example Insertion in the first sequence Align the next letter in sequence 1 and 2 Insertion in the Second sequence

26 Sequences: A = ACGCTG, B = CATGT Needleman-Wunsch Example from before plus -1 for mismatch of G against T  -2 2 from before plus -1 for mismatch of – against T  1 -2 from before plus -1 for mismatch of G against –  -3 1 Cell gets highest score of -2, 1, -3  1

27 Sequences: A = ACGCTG, B = CATGT Needleman-Wunsch Example

28 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 A 2 T 3 G 4 T 5

29 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 A 2 T 3 G 4 T 5 A-A-

30 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G C 1 A 2 T 3 G 4 T 5 ACGCTG

31 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G C 1 A 2 -2 T 3 -3 G 4 -4 T CATGT

32 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G C 1 A 2 -2 T 3 -3 G 4 -4 T 5 -5 ACAC

33 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G C 1 1 A 2 -2 T 3 -3 G 4 -4 T 5 -5 AC -C

34 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G C 1 10 A 2 -2 T 3 -3 G 4 -4 T 5 -5 ACG -C-

35 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G C 1 10 A 2 -2 T 3 -3 G 4 -4 T 5 -5 ACGC -C-- ACGC ---C

36 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G C A T 3 -3 G 4 -4 T 5 -5 ACG -CA

37 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G C A T G T

38 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G C A T G T

39 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 10 A 2 10 T 3 01 G T 5 32

40 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 10 A 2 10 T 3 01 G T 5 32 ACGCTG- -C-ATGT

41 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 10 A 2 10 T 3 01 G T 5 32 ACGCTG- -CA-TGT

42 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 10 A 2 10 T 3 01 G T ACGCTG CATG-T-

43 Needleman-Wunsch Alignment Global alignment between sequences –Compare entire sequence against another Create scoring table –Sequence A across top, B down left Cell at column i and row j contains the score of best alignment between the first i elements of A and the first j elements of B –Global alignment score is bottom right cell Summary

44 Global vs. Local alignment DOROTHY HODGKIN Global alignment: DOROTHY HODGKIN DOROTHYCROWFOOTHODGKIN Local alignment:

45 Global Alignment versus Local Alignment ATTGCAGTG-TCGAGCGTCAGGCT ATTGCGTCGATCGCAC-GCACGCT Global Alignment Local Alignment CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT

46 Local Alignment Best score for aligning part of sequences –Often beats global alignment score Similar algorithm: Smith-Waterman –Table cells never score below zero

47 Local Alignment How do we do it ? 1.We can start a new match instead of extending a previous alignment. –This means- at each cell, we can start to calculate the score from 0 (even if this means ignoring the prefix). –We do this only if it’s better than the alternative (which means- only if the alternative is negative). 2.Instead of looking only at the far corner, we look anywhere in the table for the best score (even if this means ignoring the suffix)