Lecture 14 Algorithm Analysis

Slides:



Advertisements
Similar presentations
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Advertisements

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
Measuring the degree of similarity: PAM and blosum Matrix
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Heuristic alignment algorithms and cost matrices
Sequence similarity (II). Schedule Mar 23midterm assignedalignment Mar 30midterm dueprot struct/drugs April 6teams assignedprot struct/drugs April 13RNA.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Sequence Alignment III CIS 667 February 10, 2004.
Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
. Sequence Alignment and Database Searching 2 Biological Motivation u Inference of Homology  Two genes are homologous if they share a common evolutionary.
BLAST What it does and what it means Steven Slater Adapted from pt.
Protein Sequence Alignment and Database Searching.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Lecture 15 Algorithm Analysis
Construction of Substitution matrices
DNA, RNA and protein are an alien language
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
INTRODUCTION TO BIOINFORMATICS
VCF format: variants c.f. S. Brown NYU
Sequence comparison: Local alignment
Local alignment and BLAST
#7 Still more DP, Scoring Matrices
CSC2431 February 3rd 2010 Alecia Fowler
Next-generation sequencing - Mapping short reads
Sequence Based Analysis Tutorial
Pairwise Sequence Alignment
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
BIOINFORMATICS Fast Alignment
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Next-generation sequencing - Mapping short reads
CS 6293 Advanced Topics: Translational Bioinformatics
Basic Local Alignment Search Tool (BLAST)
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Lecture 14 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea

Sequence Alignments

Sequence Alignment Problem Given two sequences a, b over some alphabet Σ. Problem: Find some scheme so that a and b fit together. Example: a = GATTACATAAGTTTT b = GCATGCUTGCTCTT Possible alignment: mismatch G - A T T A C A T A A G - T T T T G C A T G - C U T - - G C T C T T gap match 2016/11 Algorithm Analysis

Instances of the Alignment Problem Global Alignment = end-to-end alignment Local Alignment = best subsection alignment Example: Align FTALLLAAV to FTFTALILLAVAV: 2016/11 Algorithm Analysis

Needleman-Wunsch Algorithm For global alignments Input: Two strings a and b over some alphabet Σ Scoring system, that defines bonus and penalty for matches and mismatches penalty for inserting a gap comprising one symbol into a penalty for inserting a gap comprising one symbol into b (this is equal to the deletion of one symbol in a) Technique used by algorithm: Dynamic programming on the foundation of matrix computation 2016/11 Algorithm Analysis

Matrix Initialization b Initialize with d * i (d * j), where d is the gap penalty -1 -2 -3 -1 -2 -3 a 2016/11 Algorithm Analysis

Compute Cell Values T A T T Match (+1) Mismatch (-1) 𝐹 𝑖−1,𝑗−1 3 𝐹 𝑖−1,𝑗 2 𝐹 𝑖−1,𝑗−1 3 𝐹 𝑖−1,𝑗 2 𝐹 𝑖−1,𝑗−1 3 𝐹 𝑖−1,𝑗 2 𝐹 𝑖−1,𝑗−1 3 𝐹 𝑖−1,𝑗 2 𝐹 𝑖,𝑗−1 1 𝐹 𝑖,𝑗 4 𝐹 𝑖,𝑗−1 1 𝐹 𝑖,𝑗 2 𝐹 𝑖,𝑗−1 1 𝐹 𝑖,𝑗 1 𝐹 𝑖,𝑗−1 1 𝐹 𝑖,𝑗 T T Match (+1) Mismatch (-1) Delete (gap in b) (-1) Insert (gap in a) (-1) Take the maximum of these values as value of 𝐹 𝑖,𝑗 and store the direction of the blue arrow 2016/11 Algorithm Analysis

Pseudocode for Matrix Computation NW-Matrix(A, B, S, d, F) 2016/11 Algorithm Analysis

Example match mismatch insertion deletion 2016/11 Algorithm Analysis

Alignment Computation on the Foundation of the Matrix Start at the bottom cell in the rightmost column and follow the arrows, until you reach the leftmost column or the topmost row. Situation can be ambiguous, so we can have more than one best match. 2016/11 Algorithm Analysis

Pseudocode for Alignment Computation 2016/11 Algorithm Analysis

Example (cont.) G - A T T A C A G C A T G - C U G - A T T A C A G C A insertion mismatch deletion mismatch G - A T T A C A G C A T - G C U G - A T T A C A G C A - T G C U 2016/11 Algorithm Analysis

Complexity Analysis Let m=length(a) and n=length(b) Matrix computation: θ(𝑚∗𝑛) Alignment Computation: O(max{𝑚,𝑛}) Together: θ(𝒎∗𝒏) Practically quite expensive with respect to time as well as space. 2016/11 Algorithm Analysis

Similarity Matrix Static values for match and mismatch can be replaced by a similarity matrix: Example: In the field of Bio-IT exist several predefined similarity matrices for amino acids: BLOSUM (BLOcks SUbstitution Matrix) PAM (Point Accepted Mutation ) 2016/11 Algorithm Analysis

Smith-Waterman Algorithm For local alignments Input: Two strings a and b over some alphabet Σ Similarity scoring scheme 𝒔 𝒂 𝒊 , 𝒃 𝒋 over the alphabet Σ 𝑾 𝒊 gap-scoring scheme Ouput: Scoring Matrix 𝑯 Variation of Needleman-Wunsch Algorithm, so that the NW-Alg. works for local alignments. 2016/11 Algorithm Analysis

= differences to NW-Alg. Matrix Computation The matrix H is build as follows: where: m is the length of a, and n is the length of b = differences to NW-Alg. 2016/11 Algorithm Analysis

Computation of alignments Backtracking like in the NW-Alg., but with significant difference: Search the cell with the highest score and start over there H backtracking area cell with highest score 2016/11 Algorithm Analysis

Complexity Analysis Same story like for Needleman-Wunsch Alg.: Let m=length(a) and n=length(b) Matrix computation: θ(𝑚∗𝑛) Backtracking (Alignment Computation): O(max{𝑚,𝑛}) Together: θ(𝒎∗𝒏) Practically quite expensive with respect to time as well as space. 2016/11 Algorithm Analysis

How to overcome the demanding space-time requirements of NW-Alg How to overcome the demanding space-time requirements of NW-Alg. and SW-Alg.? Many solutions … Long story… Heuristic Approaches: BLAST (One of the standard tools for sequence alignment nowadays) BLAT fast but considerably less sensitive than BLAST BWT-based approaches as e.g. Bowtie or BWA Many of the above tools/algorithms rely on some form of seeding before starting the core alignment 2016/11 Algorithm Analysis

Seeding technique Step 1: Somehow “digest” (break into smaller pieces) sequence b (let us call it query sequence). Step 2 (seeding): Align these short segments quickly using some form of precomputed dictionary the comprises data for sequence a (let us call a the reference sequence) Step 3: Take the output of step 2 in order to limit the search space for further alignment activities 2016/11 Algorithm Analysis

Example: Local alignment by seeding query sequence 1. digest 1 2 3 4 5 (seeding) 2. align reference sequence 1 2 4 5 6 3 3. compute area of interest area of interest section of reference query sequence 4. cut and SW align local alignment 2016/11 Algorithm Analysis

Efficient seeding technique Suffix array: Contains the starting positions of suffixes of a string in lexicographical order Example: word banana$ sort as array 2016/11 Algorithm Analysis

Search a suffix area … Where is an in banana$? 2016/11 Algorithm Analysis

How to search suffix arrays efficiently? FM-Index: Makes use of Burrows-Wheeler transform (BWT) Stores precomputed symbol counts in a tabular form (occurrence table) Foundation of the aligner Bowtie and BWA 2016/11 Algorithm Analysis

ITBE working group at Hanyang University Projects in the joint field of Information Technology (Computer Science) and Microbiology (Genetics) Analysis of genes/gene families by using/combining available computational tools Development of special tailored solutions/algorithms for specific kinds of problems Example (big data analysis): Taxonomic heat diagrams that show the expression/occurrence of some gene with respect to some given taxonomy 2016/11 Algorithm Analysis

Heat diagram for Gene FAM72 Example Example 2016/11 Algorithm Analysis