1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES.

Slides:



Advertisements
Similar presentations
Global Sequence Alignment by Dynamic Programming.
Advertisements

Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
EVOLUTIONARY CHANGE IN DNA SEQUENCES - usually too slow to monitor directly… … so use comparative analysis of 2 sequences which share a common ancestor.
DNA sequences alignment measurement
Lecture 8 Alignment of pairs of sequence Local and global alignment
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
Sequencing and Sequence Alignment
Sequence Alignment.
Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University.
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Pairwise alignment Computational Genomics and Proteomics.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Biological entities (sequences, taxa) share common ancestry
Sequencing a genome and Basic Sequence Alignment
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Sequence Alignment.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
Pairwise Alignment, Part I Constructing the Values and Directions Tables from 2 related DNA (or Protein) Sequences.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Sequencing a genome and Basic Sequence Alignment
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
COT 6930 HPC and Bioinformatics Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Step 3: Tools Database Searching
Sequence Alignment. Assignment Read Lesk, Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you.
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Last lecture summary. Sequence alignment What is sequence alignment Three flavors of sequence alignment Point mutations, indels.
Bioinformatics for Research
Sequence comparison: Local alignment
Pairwise sequence Alignment.
Pairwise Sequence Alignment
Pairwise Alignment Global & local alignment
Sequence Analysis Alan Christoffels
Introduction to bioinformatics Lecture 5 Pair-wise sequence alignment
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

2 An alignment is an evolutionarily meaningful comparison of two or more sequences (DNA, RNA, or proteins). In the case of two DNA sequences, an alignment consists of a series of paired bases, one base from each sequence. There are three types of pairs: (1) matches = the same nucleotide appears in both sequences. (2) mismatches = different nucleotides are found in the two sequences. (3) gaps = a base in one sequence and a null base in the other. GCGGCCCATCAGGTAGTTGGTG-G GCGTTCCATC--CTGGTTGGTGTG ***..*****.*.******* *

3 Alignment: Alignment: A hypothesis concerning positional homology among residues in a sequence. Positional homology Positional homology = A pair of nucleotides from two aligned sequences that have descended from one nucleotide in the ancestor of the two sequences. GCGGCCCATCAGGTAGTTGGTG-G GCGTTCCATC--CTGGTTGGTGTG ***..*****.*.******* *

4 Positional homology Positional homology = A pair of nucleotides from two aligned sequences that have descended from one nucleotide in the ancestor of the two sequences. GCGGCCCATCAGGTAGTTGGTG-G GCGTTCCATC--CTGGTTGGTGTG ***..*****.*.******* * These two nucleotides are derived from the ancestor of cats and armadillos.

5 Homology: Homology: The term was coined by Richard Owen in Definition: Similarity resulting from common ancestry.

6 Homology: A qualitative statment Homology designates a relationship of common descent between entities Two genes are either homologs or not –it doesn’t make sense to say “two genes are 43% homologous.” –it doesn’t make sense to say “Linda is 43% pregnant.”

7 By comparing homologous characters, we can reconstruct the evolutionary events that have led to the formation of the extant sequences from the common ancestor. Homology

8 When dealing with sequences, we are interested in POSITIONAL HOMOLOGY. We identify positional homology by ALIGNMENT. Homology

9 ACTGGGCCCAAATC 1 deletion 1 substitution 1 insertion 1 substitution ACAGGGCCACAAATCACTGGCCCAGATC ACTGGCCCAGATC-- ACAGGGCCACAAATC **.**.***.*..-- ACT-GGCC-CAGATC ACAGGGCCACAAATC **.-****-**.*** Correct alignment Incorrect alignment ACTGGGCCCAAATC G A A

10 unknown ACAGGGCCACAAATCACTGGCCCAGATC ACTGGCCCAGATC-- ACAGGGCCACAAATC **.**.***.*..-- ACT-GGCC-CAGATC ACAGGGCCACAAATC **.-****-**.*** Correct alignment? Incorrect alignment? unknown

11 Sequence alignment = Sequence alignment = The identification of the location of deletion or insertions that might have occurred in either of the two lineages since their divergence from a common ancestor. Insertion + Deletion = Indel or Gap

12 Sequence alignment 1. Pairwise alignment 2. Multiple alignment

13 - Two DNA sequences: A and B. - Lengths are m and n, respectively. - The number of matched pairs is x. - The number of mismatched pairs is y. - Total number of bases in gaps is z.

14 deletion insertion An gap indicates that a deletion or an insertion has occurred in one of the two lineages. GCGG-CCATCAGGTAGTTGGTG-- GCGTTCCATC--CTGGTTGGTGTG

15 The alignment is the first step in many evolutionary and functional studies. Errors in alignment tend to amplify in later computational stages.

16 Methods of alignment: 1. Manual 2. Dot matrix 3. Algorithmic (scoring matrices and gap penalties)

17 Manual alignment Manual alignment. When there are few gaps and the two sequences are not too different from each other, a reasonable alignment can be obtained by visual inspection. GCG-TCCATCAGGTAGTTGGTGTG GCGTTCCATCAGGTGGTTGGTGTG *** **********.*********

18 Advantages of manual alignment: (1) use of a powerful and trainable tool (the brain, well…, some brains). (2) ability to integrate additional data, e.g., domain structure, biological function (e.g., 3D structure).

19 Disadvantages of manual alignment: 1. Subjectivity = the inability to formally specify the algorithm. 2. Irreproducibility = the inability of two researchers to reach the same result. 3. Unscalability = the inability to apply the method to long sequences. 4. Incommensurability = the inability to compare the results to those derived from other methods.

20 The dot-matrix method: The two sequences are written out as column and row headings of a two- dimensional matrix. A dot is put in the dot-matrix plot at a position where the nucleotides in the two sequences are identical.

21 The alignment is defined by a path from the upper- left element to the lower- right element.

22 There are 4 possible steps in the path: (1) a diagonal step through a dot = match. (2) a diagonal step through an empty element of the matrix = mismatch. (3) a horizontal step = a gap in the sequence on the top of the matrix. (4) a vertical step = a gap in the sequence on the left of the matrix.

23 allowed directions forbidden directions

24 A dot matrix may become cluttered. With DNA sequences, ~25% of the elements will be occupied by dots by chance alone.

25 The number of spurious matches is determined by: window size, stringency, & alphabet size. window size =1 stringency = 1 alphabet size = 4

26 window size =1 stringency = 1 alphabet size = 4 window size = 3 stringency = 2 alphabet size = 4

27 window size = 1 stringency = 1 alphabet size = 20

28 Dot-matrix methods: Dot-matrix methods: Advantages: May unravel information on the evolution of sequences.

29 Advantages: Highlighting Information The vertical gap indicates that a coding region corresponding to ~75 amino acids has either been deleted from the human gene or inserted into the bacterial gene. Window size = 60 amino acids; Stringency = 24 matches

30 The two diagonally oriented parallel lines most probably indicate that a small internal duplication has occurred in the bacterial gene. Window size = 60 amino acids; Stringency = 24 matches Advantages: Highlighting Information

31 Dot-matrix methods: Dot-matrix methods: Disadvantage: May not identify the best alignment.

32 Scoring Matrices & Gap Penalties

The true alignment between two sequences is the one that reflects accurately the evolutionary relationships between the sequences. Since the true alignment is unknown, in practice we look for the optimal alignment, which is the one in which the numbers of mismatches and gaps are minimized according to certain criteria.

34 Unfortunately, reducing the number of mismatches results in an increase in the number of gaps, and vice versa.

35  = matches  = mismatches  = nucleotides in gaps  = gaps

36 The scoring scheme comprises a gap penalty and a scoring matrix, M(a,b), that specifies the score for each type of match (a = b) or mismatch (a  b). The units in a scoring matrix may be the nucleotides in the DNA or RNA sequences, the codons in protein-coding regions, or the amino acids in protein sequences.

37 If you want to know the secrets behind the black box of sequence alignment, you will have to take a class in BIOINFORMATICS.

38 Multiple Sequence Alignment is infinitely more complicated than pairwise alignment

39 Multiple Sequence Alignment does not have an exact optimal solution. It is solved heuristically.

40 A Multiple Sequence Alignment GCGGCTCA TCAGGTAGTT GGTG-GSpinach GCGGCCCA TCAGGTAGTT GGTG-GRice GCGTTCCA TC--CT-GTT GGTGTGMosquito GCGTCCCA TCAGCTAGTT GTTG-GMonkey GCGGCGCA TTAGCTAGTT GGTG-AHuman ***...***.--.*-*** *.**-.