Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 14 Algorithm Analysis

Similar presentations


Presentation on theme: "Lecture 14 Algorithm Analysis"— Presentation transcript:

1 Lecture 14 Algorithm Analysis
Arne Kutzner Hanyang University / Seoul Korea

2 Sequence Alignments

3 Sequence Alignment Problem
Given two sequences a, b over some alphabet Σ. Problem: Find some scheme so that a and b fit together. Example: a = GATTACATAAGTTTT b = GCATGCUTGCTCTT Possible alignment: mismatch G - A T T A C A T A A G - T T T T G C A T G - C U T - - G C T C T T gap match 2016/11 Algorithm Analysis

4 Instances of the Alignment Problem
Global Alignment = end-to-end alignment Local Alignment = best subsection alignment Example: Align FTALLLAAV to FTFTALILLAVAV: 2016/11 Algorithm Analysis

5 Needleman-Wunsch Algorithm
For global alignments Input: Two strings a and b over some alphabet Σ Scoring system, that defines bonus and penalty for matches and mismatches penalty for inserting a gap comprising one symbol into a penalty for inserting a gap comprising one symbol into b (this is equal to the deletion of one symbol in a) Technique used by algorithm: Dynamic programming on the foundation of matrix computation 2016/11 Algorithm Analysis

6 Matrix Initialization
b Initialize with d * i (d * j), where d is the gap penalty -1 -2 -3 -1 -2 -3 a 2016/11 Algorithm Analysis

7 Compute Cell Values T A T T Match (+1) Mismatch (-1)
𝐹 𝑖−1,𝑗−1 3 𝐹 𝑖−1,𝑗 2 𝐹 𝑖−1,𝑗−1 3 𝐹 𝑖−1,𝑗 2 𝐹 𝑖−1,𝑗−1 3 𝐹 𝑖−1,𝑗 2 𝐹 𝑖−1,𝑗−1 3 𝐹 𝑖−1,𝑗 2 𝐹 𝑖,𝑗−1 1 𝐹 𝑖,𝑗 4 𝐹 𝑖,𝑗−1 1 𝐹 𝑖,𝑗 2 𝐹 𝑖,𝑗−1 1 𝐹 𝑖,𝑗 1 𝐹 𝑖,𝑗−1 1 𝐹 𝑖,𝑗 T T Match (+1) Mismatch (-1) Delete (gap in b) (-1) Insert (gap in a) (-1) Take the maximum of these values as value of 𝐹 𝑖,𝑗 and store the direction of the blue arrow 2016/11 Algorithm Analysis

8 Pseudocode for Matrix Computation
NW-Matrix(A, B, S, d, F) 2016/11 Algorithm Analysis

9 Example match mismatch insertion deletion 2016/11 Algorithm Analysis

10 Alignment Computation on the Foundation of the Matrix
Start at the bottom cell in the rightmost column and follow the arrows, until you reach the leftmost column or the topmost row. Situation can be ambiguous, so we can have more than one best match. 2016/11 Algorithm Analysis

11 Pseudocode for Alignment Computation
2016/11 Algorithm Analysis

12 Example (cont.) G - A T T A C A G C A T G - C U G - A T T A C A G C A
insertion mismatch deletion mismatch G - A T T A C A G C A T - G C U G - A T T A C A G C A - T G C U 2016/11 Algorithm Analysis

13 Complexity Analysis Let m=length(a) and n=length(b)
Matrix computation: θ(𝑚∗𝑛) Alignment Computation: O(max{𝑚,𝑛}) Together: θ(𝒎∗𝒏) Practically quite expensive with respect to time as well as space. 2016/11 Algorithm Analysis

14 Similarity Matrix Static values for match and mismatch can be replaced by a similarity matrix: Example: In the field of Bio-IT exist several predefined similarity matrices for amino acids: BLOSUM (BLOcks SUbstitution Matrix) PAM (Point Accepted Mutation ) 2016/11 Algorithm Analysis

15 Smith-Waterman Algorithm
For local alignments Input: Two strings a and b over some alphabet Σ Similarity scoring scheme 𝒔 𝒂 𝒊 , 𝒃 𝒋 over the alphabet Σ 𝑾 𝒊 gap-scoring scheme Ouput: Scoring Matrix 𝑯 Variation of Needleman-Wunsch Algorithm, so that the NW-Alg. works for local alignments. 2016/11 Algorithm Analysis

16 = differences to NW-Alg.
Matrix Computation The matrix H is build as follows: where: m is the length of a, and n is the length of b = differences to NW-Alg. 2016/11 Algorithm Analysis

17 Computation of alignments
Backtracking like in the NW-Alg., but with significant difference: Search the cell with the highest score and start over there H backtracking area cell with highest score 2016/11 Algorithm Analysis

18 Complexity Analysis Same story like for Needleman-Wunsch Alg.:
Let m=length(a) and n=length(b) Matrix computation: θ(𝑚∗𝑛) Backtracking (Alignment Computation): O(max{𝑚,𝑛}) Together: θ(𝒎∗𝒏) Practically quite expensive with respect to time as well as space. 2016/11 Algorithm Analysis

19 How to overcome the demanding space-time requirements of NW-Alg
How to overcome the demanding space-time requirements of NW-Alg. and SW-Alg.? Many solutions … Long story… Heuristic Approaches: BLAST (One of the standard tools for sequence alignment nowadays) BLAT fast but considerably less sensitive than BLAST BWT-based approaches as e.g. Bowtie or BWA Many of the above tools/algorithms rely on some form of seeding before starting the core alignment 2016/11 Algorithm Analysis

20 Seeding technique Step 1: Somehow “digest” (break into smaller pieces) sequence b (let us call it query sequence). Step 2 (seeding): Align these short segments quickly using some form of precomputed dictionary the comprises data for sequence a (let us call a the reference sequence) Step 3: Take the output of step 2 in order to limit the search space for further alignment activities 2016/11 Algorithm Analysis

21 Example: Local alignment by seeding
query sequence 1. digest 1 2 3 4 5 (seeding) 2. align reference sequence 1 2 4 5 6 3 3. compute area of interest area of interest section of reference query sequence 4. cut and SW align local alignment 2016/11 Algorithm Analysis

22 Efficient seeding technique
Suffix array: Contains the starting positions of suffixes of a string in lexicographical order Example: word banana$ sort as array 2016/11 Algorithm Analysis

23 Search a suffix area … Where is an in banana$? 2016/11
Algorithm Analysis

24 How to search suffix arrays efficiently?
FM-Index: Makes use of Burrows-Wheeler transform (BWT) Stores precomputed symbol counts in a tabular form (occurrence table) Foundation of the aligner Bowtie and BWA 2016/11 Algorithm Analysis

25 ITBE working group at Hanyang University
Projects in the joint field of Information Technology (Computer Science) and Microbiology (Genetics) Analysis of genes/gene families by using/combining available computational tools Development of special tailored solutions/algorithms for specific kinds of problems Example (big data analysis): Taxonomic heat diagrams that show the expression/occurrence of some gene with respect to some given taxonomy 2016/11 Algorithm Analysis

26 Heat diagram for Gene FAM72
Example Example 2016/11 Algorithm Analysis


Download ppt "Lecture 14 Algorithm Analysis"

Similar presentations


Ads by Google