Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________.

Slides:



Advertisements
Similar presentations
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Advertisements

Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
Measuring the degree of similarity: PAM and blosum Matrix
1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Introduction to bioinformatics
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
1 Introduction to Bioinformatics 2 Introduction to Bioinformatics. LECTURE 3: SEQUENCE ALIGNMENT * Chapter 3: All in the family.
Sequencing a genome and Basic Sequence Alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Bioinformatics Sequences alignment ____________________________________________________________________________________________________________________.
Sequence Alignment Algorithms Morten Nielsen Department of systems biology, DTU.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
An Introduction to Bioinformatics
Pairwise Alignment, Part I Constructing the Values and Directions Tables from 2 related DNA (or Protein) Sequences.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Sequencing a genome and Basic Sequence Alignment
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Bioinformatics GBIO Biological Sequences ____________________________________________________________________________________________________________________.
Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Bioinformatics GBIO Biological Sequences ____________________________________________________________________________________________________________________.
Copyright OpenHelix. No use or reproduction without express written consent1.
GBIO Bioinformatics ____________________________________________________________________________________________________________________ Kirill.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Sequence comparison: Local alignment
Sequence Alignment.
Pairwise sequence Alignment.
Pairwise Sequence Alignment
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Pairwise Alignment Global & local alignment
Sequence Alignment Practical
Basic Local Alignment Search Tool (BLAST)
Presentation transcript:

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p1 Sequence Alignment Tutorial Presented by Kirill Bessonov Oct 2012

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p2 Talk Structure Introduction to sequence alignments Methods / Logistics – Global Alignment: Needleman-Wunsch – Local Alignment: Smith-Waterman Illustrations of two types of alignments – step by step local alignment Computational implementation of alignment – Retrieval of sequences using R – Alignment of sequences using R Homework

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p3 Sequence Alignments Comparing two objects is intuitive. Likewise sequence pairwise alignments provide info on: – evolutionary distance between species (e.g. homology) – new functional motifs / regions – genetic manipulation (e.g. alternative splicing) – new functional roles of unknown sequence – identification of binding sites of primers / TFs – de novo genome assembly alignment of the short “reads” from high-throughput sequencer (e.g. Illumina or Roche platforms)

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p4 Comparing two sequences There are two ways of pairwise comparison – Global using Needleman-Wunsch algorithm (NW) – Local using Smith-Waterman algorithm (SW) Both approaches use similar methodology, but have completely different objectives – Global alignment (NW) tries to align the “whole” sequence more restrictive than local alignment – Local alignment (SW) tries to align portions (e.g. motifs) of given sequences more flexible as considers “parts” of the sequence works well on highly divergent sequences entire sequence perfect match unaligned rest of the sequence aligned portion

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p5 Global alignment (NW) Sequences are aligned end-to-end along their entire length Many possible alignments are produced – The alignment with the highest score is chosen Naïve algorithm is very inefficient (O exp ) – To align sequence of length 15, need to consider Possibilities # = (insertion, deletion, gap) 15 = 3 15 = 1,4*10 7 – Impractical for sequences of length >20 nt Used to analyze homology/similarity of entire: – genes and proteins – assess gene/protein overall homology between species

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p6 Methodology of global alignment (1 of 4)

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p7 Methodology of global alignment (2 of 4) The matrix should have extra column and row – M +1 columns, where M is the length sequence M – N +1 rows, where N is the length of sequence N Initialize the matrix by introducing gap penalty at every initial position along rows and columns Scores at each cell are cumulative WHAT W -2 H -4 Y -6 -2

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p8 Methodology of global alignment (3 of 4) For each cell consider all three possibilities 1)Gap (horiz/vert) 2)Match (diag) 3)Mismatch(diag) Select the maximum score for each cell and fill the matrix WHAT W H Y WH 0 -4 W -2-4 WH W -2+2 WH W

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p9 Methodology of global alignment (4 of 4) Select the most very bottom right cell Consider different path(s) going to very top left cell – Path is constructed by finding the source cell w.r.t. the current cell – How the current cell value was generated? From where? WHAT WHY- WH-Y Overall score = 1 Overall score = 1 Select the best alignment(s) WHAT W H Y WHAT W H Y

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p10 Local alignment (SW) Sequences are aligned to find regions (part of the original sequence) where the best alignment occurs Local similarity context (aligning parts) More detailed / “micro” appoach Ideal for finding short motifs, DNA binding sites Works well on highly divergent sequences – Sequences with highly variable introns but highly conserved and sparse exons

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p11 Methodology of local alignment (1 of 4)

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p12 Methodology of local alignment (2 of 4) Construct the MxN alignment matrix with M+1 columns and N+1 rows Initialize the matrix by introducing gap penalty at 1 st row and 1 st column WHAT W 0 H 0 Y 0

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p13 Methodology of local alignment (3 of 4) For each subsequent cell consider all possibilities (i.e. motions) 1)Vertical 2)Horizontal 3)Diagonal For each cell select the highest score – If score is negative  assign zero WHAT W H Y

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p14 Methodology of local alignment (4 of 4) Select the initial cell with the highest score(s) Consider different path(s) leading to score of zero – Trace-back the cell values – Look how the values were originated (i.e. path) WH Mathematically – where S(I, J) is the score for subsequences I and J WHAT W H Y total score of 4 B A J I

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p15 Local alignment illustration (1 of 2)

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p16 Local alignment illustration (2 of 2) GGCTCAATCA A C C T A A G G GGCTCAATCA A 0 C 0 C 0 T 0 A 0 A 0 G 0 G 0 GGCTCAATCA A C 0 C 0 T 0 A 0 A 0 G 0 G 0 GGCTCAATCA A C 0 C 0 T 0 A 0 A 0 G 0 G 0 GGCTCAATCA A C 0 C 0 T 0 A 0 A 0 G 0 G 0 GGCTCAATCA A C 0 C 0 T 0 A 0 A 0 G 0 G 0 GGCTCAATCA A C C 0 T 0 A 0 A 0 G 0 G 0 GGCTCAATCA A C C T 0 A 0 A 0 G 0 G 0 GGCTCAATCA A C C T A 0 A 0 G 0 G 0 GGCTCAATCA A C C T A A 0 G 0 G 0 GGCTCAATCA A C C T A A G 0 G 0 GGCTCAATCA A C C T A A G G 0 GGCTCAATCA A C C T A A G G

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p17 Local alignment illustration (3 of 3) CTCAAGGCTCAATCA CT-AA ACCT-AAGG Best score: 6 GGCTCAATCA A C C T A A G G in the whole seq. context

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p18 Aligning proteins Globally and Locally

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p19 Protein Alignment Protein local and global alignment follows the same rules as we saw with DNA/RNA Differences – alphabet of proteins is 22 residues long – special scoring/substitution matrices used – conservation and protein proprieties are taken into account E.g. residues that are totally different due to charge such as polar Lysine and apolar Glycine are given a low score

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p20 Substitution matrices Since protein sequences are more complex, matrices are collection of scoring rules These are 2D matrices reflecting comparison between sequence A and B Cover events such as – mismatch and perfect match Need to define gap penalty separately Popular BLOcks SUbstitution Matrix (BLOSUM)

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p21 BLOSUM-x matrices Constructed from aligned sequences with specific x% similarity – matrix built using sequences with no more then 50% similarity is called BLOSUM-50 For highly mutating / dissimilar sequences use – BLOSUM-45 and lower For highly conserved / similar sequences use – BLOSUM -62 and higher

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p22 BLOSUM 62 What diagonal represents? What is the score for substitution E  D (acid a.a.)? More drastic substitution K  I (basic to non-polar)? perfect match between a.a. Score = 2 Score = -3

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p23 Practical problem: Align following sequences both globally and locally using BLOSUM 62 matrix with gap penalty of -8 Sequence A: AAEEKKLAAA Sequence B: AARRIA

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p24 Aligning globally using BLOSUM 62 AAEEKKLAAA AA--RRIA-- Score: -14 Other alignment options? Yes AAEEKKLAAA A A R R I A

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p25 Aligning locally using BLOSUM 62 KKLA RRIA Score: 10 AAEEKKLAAA A A R R I A

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p26 Using R and SeqinR library for: Sequence Retrieval and Analysis

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p27 Protein database UniProt database ( has high quality protein data manually curatedhttp:// It is manually curated Each protein is assigned UniProt ID

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p28 Retrieving data from In search field one can enter either use UniProt ID or common protein name – example: myelin basic protein We will use retrieve data for P02686 Uniprot ID

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p29 Understanding fields Information is divided into categories Click on ‘Sequences’ category and then FASTA

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p30 FASTA format FASTA format is widely used and has the following parameters – Sequence name start with > sign – The fist line corresponds to protein name Actual protein sequence starts from 2 nd line

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p31 Retrieving protein data with R and SeqinR Can “talk” to UniProt database using R – SeqinR library is suitable for “Biological Sequences Retrieval and Analysis” Detailed manual could be found herehere – Install this library in your R environment install.packages("seqinr") library("seqinr") – Choose database to retrieve data from choosebank("swissprot") – Download data object for target protein (P02686) query("MBP_HUMAN", "AC=P02686") – See sequence of the object MBP_HUMAN MBP_HUMAN_seq = getSequence(MBP_HUMAN); MBP_HUMAN_seq

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p32 Dot Plot (comparison of 2 sequences) (1of2) 2D way to find regions of similarity between two sequences – Each sequence plotted on either vertical or horizontal dimension – If two a.a. at given positions are identical the dot is plotted – matching sequence segments appear as diagonal lines

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p33 Dot Plot (comparison of 2 sequences) (2of2) Visualize dot plot dotPlot(MBP_HUMAN_seq[[1]], MBP_MOUSE_seq[[1]],xlab="MBP - Human", ylab = "MBP - Mouse") - Is there similarity between human and mouse form of MBP protein? - Where is the difference in the sequence between the two isoforms? Let’s compare two protein sequences – Human MBP (Uniprot ID: P02686) – Mouse MBP (Uniprot ID: P04370) Download 2 nd mouse sequence query("MBP_MOUSE", "AC=P04370"); MBP_MOUSE_seq = getSequence(MBP_MOUSE);

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p34 Using R and Biostrings library for: Pairwise global and local alignments

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p35 Installing Biostrings library DNA_subst_matrix

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p36 Global alignment using R and Biostrings Create two sting vectors (i.e. sequences) seqA = "GATTA" seqB = "GTTA" Use pairwiseAlignment() and the defined rules globalAlignAB = pairwiseAlignment(seqA, seqB, substitutionMatrix = DNA_subst_matrix, gapOpening = -2, scoreOnly = FALSE, type="global") Visualize best paths (i.e. alignments) globalAlignAB Global PairwiseAlignedFixedSubject (1 of 1) pattern: [1] GATTA subject: [1] G-TTA score: 2

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p37 Local alignment using R and Biostrings Input two sequences seqA = "AGGATTTTAAAA" seqB = "TTTT" The scoring rules will be the same as we used for global alignment globalAlignAB = pairwiseAlignment(seqA, seqB, substitutionMatrix = DNA_subst_matrix, gapOpening = -2, scoreOnly = FALSE, type="local") Visualize alignment globalAlignAB Local PairwiseAlignedFixedSubject (1 of 1) pattern: [5] TTTT subject: [1] TTTT score: 8

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p38 Aligning protein sequences Protein sequences alignments are very similar except the substitution matrix is specified data(BLOSUM62) BLOSUM62 Will align sequences seqA = "PAWHEAE" seqB = "HEAGAWGHEE" Execute the global alignment globalAlignAB <- pairwiseAlignment(seqA, seqB, substitutionMatrix = "BLOSUM62", gapOpening = -2, gapExtension = -8, scoreOnly = FALSE)

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p39 Summary We had touched on practical aspects of – Global and local alignments Thoroughly understood both algorithms Applied them both on DNA and protein seq. Learned on how to retrieve sequence data Learned on how to retrieve sequences both with R and using UniProt Learned how to align sequences using R

Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________ Kirill Bessonov CH4 :p40 Resources Online Tutorial on Sequence Alignment – bioinformatics.readthedocs.org/en/latest/src/chapter4.html bioinformatics.readthedocs.org/en/latest/src/chapter4.html Graphical alignment of proteins – Documentation on libraries – Biostings: – SeqinR: