Sequence alignment BI420 – Introduction to Bioinformatics

Slides:



Advertisements
Similar presentations
Sequence Alignments.
Advertisements

Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
1 Chapter 2 Data Searches and Pairwise Alignments 暨南大學資訊工程學系 黃光璿 2004/03/08.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Sequence alignment Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
Sequence Alignments and Database Searches Introduction to Bioinformatics.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Needleman_Wunsch algorithm of global alignment Chap. 6 Higgs and Attwood.
Sequence alignment Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
Bioinformatics and BLAST
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Needleman Wunsch Sequence Alignment
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
Pair-wise Sequence Alignment (II) Introduction to bioinformatics 2008 Lecture 6 C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.
INTRODUCTION TO BIOINFORMATICS
Introduction to Dynamic Programming
The ideal approach is simultaneous alignment and tree estimation.
Welcome to Introduction to Bioinformatics
Sequence comparison: Local alignment
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
Introduction to bioinformatics 2007
Pairwise sequence Alignment.
Sequence alignment, Part 2
Pairwise Sequence Alignment
Sequence alignment BI420 – Introduction to Bioinformatics
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Find the Best Alignment For These Two Sequences
Pairwise Alignment Global & local alignment
Dynamic Programming Finds the Best Score and the Corresponding Alignment O Alignment: Start in lower right corner and work backwards:
Global vs Local Alignment
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Presentation transcript:

Sequence alignment BI420 – Introduction to Bioinformatics BI420 Fall 2012 Department of Biology, Boston College

Biologically significant alignment 1. Find two evolutionarily related sequences (subunits of human hemoglobin) in GenBank: http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi hba_human hbb_human 2. Save sequences on the Desktop and rename: hba_human.fasta & hbb_human.fasta

Biologically significant alignment 3. Visit a web-based pair-wise alignment program: http://artedi.ebc.uu.se/programs/pairwise.html 4. Upload our two proteins:

Biologically significant alignment 5. Create a pair-wise alignment between the two protein sequences:

Biologically plausible alignment Retrieve another sequence, leghemoglobin: Leghemoglobin Create a pair-wise alignment with human hemoglobin A:

Biologically plausible alignment http://en.wikipedia.org/wiki/Leghemoglobin

Spurious alignment Retrieve the sequence of a human BRCA1 gene variant, clearly not related to hemoglobin: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein&cmd=search&term=NP_009225.1 Make the pair-wise alignment: Examples from: Biological sequence analysis. Durbin, Eddy, Krogh, Mitchison

How Alignment Works

Alignment types How do we align the words: CRANE and FRAME? CRANE || | 3 matches, 2 mismatches How do we align words that are different in length? COELACANTH || ||| P-ELICAN-- COELACANTH || ||| -PELICAN-- 5 matches, 2 mismatches, 3 gaps In this case, if we assign +1 points for matches, and -1 for mismatches or gaps, we get 5 x 1 + 1 x (-1) + 3 x (-1) = 0. This is the alignment score. Examples from: BLAST. Korf, Yandell, Bedell

Finding the “best” alignment COELACANTH | ||| PE-LICAN-- COELACANTH || P-EL-ICAN- COELACANTH PELICAN-- S=-2 S=-6 S=-10 COELACANTH || ||| P-ELICAN-- S=0

JACKALOPE ANTELOPE JACKALOPE JACKA---LOPE -ANTELOPE ----ANTELOPE More mismatches More gaps Choice depends on score function

Global vs. local alignment Aligning words: SHAKE and SPEARE 1. Global alignment: aligning the two sequences along their entire length (even if it means adding many “gaps”): SH-AKE | | | SPEARE SHAKE--- | | SP--EARE -OR- 1. Local alignment: aligning only a nicely matching section between the two sequences (possibly leaving the ends un-aligned): SHAKE | | SPEARE SHAKE SPEARE Example from: Higgs and Attwood

MATLAB example – global alignment MATLAB bioinformatics toolbox sequence analysis demo: Aligning pairs of sequences >> s1 = 'ACGATT’ >> s2 = 'CCGACTA’ >> [score, ga] = nwalign(s1,s2) score = 7.3333 ga = ACGA-TT ||| |: CCGACTA

MATLAB example – local alignment MATLAB bioinformatics toolbox sequence analysis demo: Aligning pairs of sequences >> s1 = 'ACGATT’ >> s2 = 'CCGACTA’ >> [score, sa] = swalign(s1,s2) score = 10 sa = CGATT ||| | CGACT

Score Function + gap score g = -6 Pair-wise amino-acid scores S(ai,bj) (PAM250 scoring scheme) plus gap score g. Example from: Higgs and Attwood

Global alignment – Needleman-Wunsch Exact recursion scheme to calculate scores from already known scores: { H(i-1,j-1) + S(ai,bj)  diagonal H(i,j) = best of: H(i-1,j) – g  vertical H(i,j-1) – g  horizontal Example from: Higgs and Attwood

Global alignment – Needleman-Wunsch Example: Align the two sequences SHAKE and SPEARE Example from: Higgs and Attwood

Global alignment – Needleman-Wunsch Initialization (filling the top row and left column from gap scores): Example from: Higgs and Attwood

Global alignment – Needleman-Wunsch Filling cell (1,1): Example from: Higgs and Attwood

Global alignment – Needleman-Wunsch Filling the rest of the cells (i,j): Example from: Higgs and Attwood

Global alignment – Needleman-Wunsch Tracing back to read out the alignment: Best global alignment: S-HAKE SPEARE Example from: Higgs and Attwood

Global alignment – Needleman-Wunsch The Needleman-Wunsch procedure is exhaustive. Every possible alignment is considered by the algorithm. So it is guaranteed to find the best global alignment. Example from: Higgs and Attwood

Local alignment – Smith-Waterman Smith-Waterman algorithm find the optimal LOCAL alignment. It works similarly to the Needleman-Wunsch GLOBAL alignment algorithm. Recursion scheme changes: 1. if the best score for a cell is negative, we replace it by 0 (start over) 2. gaps at the boundary are ignored  they get 0 score { H(i-1,j-1) + S(ai,bj)  diagonal H(i,j) = best of: H(i-1,j) – g  vertical H(i,j-1) – g  horizontal 0  start over Example from: Higgs and Attwood

Local alignment – Smith-Waterman Initialization Example from: Higgs and Attwood

Local alignment – Smith-Waterman Initialization Example from: Higgs and Attwood

Local alignment – Smith-Waterman Filling the cells Example from: Higgs and Attwood

Local alignment – Smith-Waterman Trace-back: Find path that contains the highest score Best local alignment: SHAKE SPEARE Example from: Higgs and Attwood Example: Align the two sequences: TTCAC and CTCAA using scores +1 for match and -1 for either gap or mismatch.

Local alignment – Smith-Waterman The Smith-Waterman procedure is also exhaustive. Every possible alignment is considered by the algorithm. So it is guaranteed to find the best local alignment. Example from: Higgs and Attwood

Example of a scoring matrix for Amino Acids The scoring matrix describes the scores for amino acid matches/mismatches. Scores are affected by biochemical similarity of amino acids. Note: this is not an alignment matrix!

Similar algorithms can be used for multiple alignment The multiple alignment of 24 hexokinase protein sequences from various species. However, real multiple alignment programs (e.g. clustalw) are usually heuristic, rather than exact

Applications of Alignment

Alignment is used for mapping sequence reads to the genome

Alignment is used in similarity search Alignment: determining how sequences have descended from a common ancestor Similarity search: determining which sequences are related to one another. Requires scoring of each alignment. query database

Alignment Exercises

Visualizing pair-wise alignments Visit a web server running a dot-plotter: http://bioweb.pasteur.fr/seqanal/interfaces/dotmatcher.html Upload hba_human and hbb_human, and create dot-plot:

MATLAB example MATLAB bioinformatics toolbox sequence analysis demo: Aligning pairs of sequences