CS 262 Discussion Section 1. Purpose of discussion sections To clarify difficulties/ambiguities in the problem set questions and lecture material. To.

Slides:



Advertisements
Similar presentations
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Advertisements

Lecture 8: Dynamic Programming Shang-Hua Teng. Longest Common Subsequence Biologists need to measure how similar strands of DNA are to determine how closely.
Pairwise Sequence Alignment
CS 5263 Bioinformatics Lecture 3: Dynamic Programming and Global Sequence Alignment.
. Sequence Alignment I Lecture #2 This class has been edited from Nir Friedman’s lecture which is available at Changes made by.
Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
Dynamic Programming Dynamic Programming is a general algorithm design technique for solving problems defined by recurrences with overlapping subproblems.
Inexact Matching of Strings General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic.
6/11/2015 © Bud Mishra, 2001 L7-1 Lecture #7: Local Alignment Computational Biology Lecture #7: Local Alignment Bud Mishra Professor of Computer Science.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Computational Genomics Lecture 1, Tuesday April 1, 2003.
Sequence Alignment Algorithms in Computational Biology Spring 2006 Edited by Itai Sharon Most slides have been created and edited by Nir Friedman, Dan.
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Introduction To Bioinformatics Tutorial 2. Local Alignment Tutorial 2.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Sequence Alignment Oct 9, 2002 Joon Lee Genomics & Computational Biology.
CISC667, F05, Lec6, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Pairwise sequence alignment Smith-Waterman (local alignment)
Dynamic Programming A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 8 ©2012 Pearson Education, Inc. Upper Saddle River,
Alignment II Dynamic Programming
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Lecture 7 Topics Dynamic Programming
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
DNA. All living things contain a blueprint for the entire organism inside a special molecule known as _____. DNA stands for _____________________. A DNA.
Sequence Analysis Determining how similar 2 (or more) gene/protein sequences are (too each other) is a “staple” function in bioinformatics. This information.
CS 5263 Bioinformatics Lecture 4: Global Sequence Alignment Algorithms.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
BIOMETRICS Module Code: CA641 Week 11- Pairwise Sequence Alignment.
A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 8 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.
CSE 6406: Bioinformatics Algorithms. Course Outline
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
DNA Replication How does each cell have the same DNA? How is a prokaryote different than a eukaryote?
DNA Replication. Chromosome E. coli bacterium Bases on the chromosome DNA is very long!... but it is highly folded packed tightly to fit into the cell!
Chromosomes & DNA Replication. I. DNA & Chromosomes A. DNA is found in different ways depending on the type of cell you are looking at – 1. In prokaryotic.
Lesson 14a: DNA Structure and Replication Trans – carrysoma - body Scrib – writegen - origin Co – togetherre - again.
Page :Algorithms in the Real World Computational Biology I – Introduction – LCS, Edit Distance.
Expected accuracy sequence alignment Usman Roshan.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
12-2 Chromosomes and DNA Replication
DNA REPLICATION. DNA replication video DNA and Chromosomes In _________cells, DNA is located in the cytoplasm. Most prokaryotes have a __________ DNA.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 21.
CS 5263 Bioinformatics Lecture 3: Dynamic Programming and Sequence Alignment.
9/27/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURE 16 Dynamic.
DNA Deoxyribonucleic Acid de ox y rib o nu cleic acid Student note: 1.
INTRODUCTION TO BIOINFORMATICS
Multiple sequence alignment (msa)
Dynamic Programming Dynamic Programming is a general algorithm design technique for solving problems defined by recurrences with overlapping subproblems.
Dynamic Programming Dynamic Programming is a general algorithm design technique for solving problems defined by recurrences with overlapping subproblems.
Sequence comparison: Dynamic programming
Chromosomes & DNA Replication
12.2 Chromosomes and DNA Replication
Bioinformatics: The pair-wise alignment problem
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
Using Dynamic Programming To Align Sequences
Pairwise sequence Alignment.
Intro to Alignment Algorithms: Global and Local
CS 3343: Analysis of Algorithms
The Blue Print of Life.
KEY CONCEPT DNA replication copies the genetic information of a cell.
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Lecture 8. Paradigm #6 Dynamic Programming
Algorithms and Data Structures Lecture X
Dynamic Programming Steps.
Presentation transcript:

CS 262 Discussion Section 1

Purpose of discussion sections To clarify difficulties/ambiguities in the problem set questions and lecture material. To supplement class material by going somewhat into the biological concepts and motivations underlying this field. To discuss more algorithms from a topic, wherever needed.

Antiparallel vs Parallel strands

The DNA strand has a chemical polarity

The members of each base pair can fit together within the double helix only if the two strands of the helix are antiparallel

Prokaryotes do not have a nucleus, eukaryotes do

Eukaryotic DNA is packaged into chromosomes A chromosome is a single, enormously long, linear DNA molecule associated with proteins that fold and pack the fine thread of DNA into a more compact structure. Human Genome: 3.2 x 10 9 base pairs distributed over 46 chromosomes.

A display of the full set of 46 chromosomes

Sequence similarity

Biological motivation Sequence similarity is useful in hypothesizing the function of a new sequence… … assuming that sequence similarity implies structural and functional similarity. Sequence Database Query New Sequence List of similar matches Response

Case Study: Multiple Sclerosis Multiple sclerosis is an autoimmune dysfunction in which the T-cells of the immune system start attacking the body’s own nerve cells. The T-cells recognize the myelin sheath protein of neurons as foreign. Show movie

A hypothesis: Possibly, the myelin sheath proteins identified by the T-cells were similar to bacterial/viral sheath proteins from an earlier infection. How to test this hypothesis? Use sequence alignment. Why does this happen? Sequence Database Query Myelin sheath proteins List of similar bacterial/viral sequences. Response Identification of cause of immune dysfunction Lab tests

Dynamic Programming It is a way of solving problems (involving recurrence relations) by storing partial results. Consider the Fibonacci Series: F(n) = F(n-1) + F(n-2) F(0) = 0, F(1) = 1 A recursive algorithm will take exponential time to find F(n) A Dynamic Prog. based solution takes only n steps (linear time)

Needleman-Wunsch algorithm F(i,j) = Maximum of F(i-1, j-1) + s(x[i], y[j]) F(i-1, j) – d F(i, j-1) - d F(i-1,j-1)F(i, j-1) F(i-1, j) F(i,j) -d +s (X[i],Y[j]) Assume that match = 1, mismatch = 0, indel = 0

Needleman-Wunsch example GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A Traceback

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

GTCAGTTATAA G G A T C G A

The solution Optimal alignment has a score of 6. G_AATTCAGTTA GGA_T_C_G__A

Linear Space Alignment Serafim talked about the Myers-Miller algorithm in class. There is another variant of the Hirschberg algorithm, given in Durbin (Pg 35).

Suppose we know that characters X[i] and Y[j] are aligned to each other in the optimal alignment of X[1..n] and Y[1..m]. How can we compute the alignment using this information? We can partition the alignment into two parts, align X[1..i-1] with Y[1..j-1] and X[i+1..n] with Y[j+1..m] separately.

Middle column

F(i,j) Middle column

F(i,j) Middle column

F(i,j) Middle column

F(i,j) Middle column

F(i,j) Middle column This is the cell in the middle column from where the traceback leaves the column. Maintain the coordinates of that cell with the value of F(i,j) Call it c(i,j)

For every cell in the right half of the matrix, Maintain the F(i,j) value. Maintain the coordinates of the cell in the middle column from where its traceback path leaves the middle column. Call it c(i, j). Maintain the direction of that jump as given by the pointer (either or ). Call it P(i,j).

If (i’,j’) is the cell preceding to (i,j), from which F(i,j) is derived, then c(i,j) = c(i’,j’) and P(i,j) = P(i’,j’) We need only linear space to compute the F,c and P values as we proceed across the matrix.

F(i’,j’) c(i’,j’) F(i,j) c(i,j) Middle column We know the traceback from (i’,j’) leaves the middle column at this cell Hence, the traceback from this cell will also have the same c(i,j) value We are interested in the value of c(n.m)

We use the c(n,m) and P(n,m) values to split the dynamic programming matrix into two parts. How? Because we know one aligned pair of letters in the optimal alignment now.