Physical Mapping of DNA Shanna Terry March 2, 2004.

Slides:



Advertisements
Similar presentations
Proof of correctness; More reductions
Advertisements

NP-Hard Nattee Niparnan.
2012: J Paul GibsonT&MSP: Mathematical FoundationsMAT7003/L2-GraphsAndTrees.1 MAT 7003 : Mathematical Foundations (for Software Engineering) J Paul Gibson,
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
1/44 A simple Test For the Consecutive Ones Property.
1 NP-completeness Lecture 2: Jan P The class of problems that can be solved in polynomial time. e.g. gcd, shortest path, prime, etc. There are many.
3.3 Spanning Trees Tucker, Applied Combinatorics, Section 3.3, by Patti Bodkin and Tamsen Hunter.
Graphs Chapter 12. Chapter Objectives  To become familiar with graph terminology and the different types of graphs  To study a Graph ADT and different.
Graph Classes A Survey 張貿翔 中正大學資訊工程學系. Meeting Room Reservation One meeting room Reservation: –Gary10:00~11:00 –Mary10:30~12:00 –Jones13:30~14:30 –Amy14:00~15:30.
Complexity 11-1 Complexity Andrei Bulatov NP-Completeness.
Computational problems, algorithms, runtime, hardness
Applied Discrete Mathematics Week 12: Trees
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Analysis of Algorithms CS 477/677
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Fall 2007CS 2251 Graphs Chapter 12. Fall 2007CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs To.
ARCHEOLOGICAL SERIATION AND INTERVAL GRAPHS
Physical Mapping II + Perl CIS 667 March 2, 2004.
The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis Zo ë Abrams Ho-Lin Chen
Graphs, relations and matrices
Applied Discrete Mathematics Week 10: Equivalence Relations
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
Data Structures and Algorithms Graphs Minimum Spanning Tree PLSD210.
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
MCS312: NP-completeness and Approximation Algorithms
MAPS OF DNA AND INTERVAL GRAPHS by Akshita Gurram.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
Graph Theory Topics to be covered:
Advanced Algorithm Design and Analysis (Lecture 13) SW5 fall 2004 Simonas Šaltenis E1-215b
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Week 11 - Wednesday.  What did we talk about last time?  Graphs  Euler paths and tours.
Tree A connected graph that contains no simple circuits is called a tree. Because a tree cannot have a simple circuit, a tree cannot contain multiple.
Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Based on slides by Y. Peng University of Maryland
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
1 Combinatorial Problem. 2 Graph Partition Undirected graph G=(V,E) V=V1  V2, V1  V2=  minimize the number of edges connect V1 and V2.
Data Structures & Algorithms Graphs
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Lecture 6 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
NP-Complete problems.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
NP-Completeness (Nondeterministic Polynomial Completeness) Sushanth Sivaram Vallath & Z. Joseph.
Graphs and MSTs Sections 1.4 and 9.1. Partial-Order Relations Everybody is not related to everybody. Examples? Direct road connections between locations.
CS 461 – Nov. 30 Section 7.5 How to show a problem is NP-complete –Show it’s in NP. –Show that it corresponds to another problem already known to be NP-complete.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
LIMITATIONS OF ALGORITHM POWER
CPS Computational problems, algorithms, runtime, hardness (a ridiculously brief introduction to theoretical computer science) Vincent Conitzer.
Lecture 25 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
Chapter 13 Backtracking Introduction The 3-coloring problem
Final Review Chris and Virginia. Overview One big multi-part question. (Likely to be on data structures) Many small questions. (Similar to those in midterm.
Graphs. Graph Definitions A graph G is denoted by G = (V, E) where  V is the set of vertices or nodes of the graph  E is the set of edges or arcs connecting.
Introduction to NP Instructor: Neelima Gupta 1.
1/44 A simple Test For the Consecutive Ones Property Without PC-trees!
An Algorithm for the Consecutive Ones Property Claudio Eccher.
1 Combinatorial Problem. 2 Graph Partition Undirected graph G=(V,E) V=V1  V2, V1  V2=  minimize the number of edges connect V1 and V2.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Week 11 - Wednesday.  What did we talk about last time?  Graphs  Paths and circuits.
1 Lecture 5 (part 2) Graphs II (a) Circuits; (b) Representation Reading: Epp Chp 11.2, 11.3
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Discrete Structures Li Tak Sing( 李德成 ) Lectures
The NP class. NP-completeness
Source Code for Data Structures and Algorithm Analysis in C (Second Edition) – by Weiss
Lecture 22 Complexity and Reductions
Presentation transcript:

Physical Mapping of DNA Shanna Terry March 2, 2004

Overview Background Types of Mapping Mathematical Models Enhanced Double Digest Problem

Background Given a sequence of DNA, how do we figure out where on some larger chromosome the sequence lies? ?

Background II Look for markers that match in both the chromosome and the shorter sequence. –Markers: Usually short, precisely defined sequences match!

Background III How do we create the original map? Generate fingerprints (markers) with: –Restriction site mapping –Hybridization

Background IV But why? Can’t we just expand the sequence assembly techniques we’ve already learned? NO! (with one exception) Why not? A chromosome isn’t just 150k bps long. … Human chromosomes range in length from 51 million to 245 million base pairs.

Overview Background Types of Mapping Mathematical Models Enhanced Double Digest Problem

Restriction Site Mapping In this situation, the fingerprint is the length between restriction sites of given enzymes (recall from previous lectures). Make three copies of target DNA: strings A, B, C. Apply one enzyme (α) to string A, another (β) to string B, and both (α and β) to string C. Line up the fragments in A and B so they match C: this is the double digest problem

Restriction Site Mapping II A variant is the partial digest approach: Use only one enzyme, but allow it to act for different time periods. Different restriction sites will be recognized. Fragment sites: 6, 20, 27, 32; 14, 21, 26; 7, 12; and

Restriction Site Mapping III 6 20 … etc … Fragment sites: 6, 20, 27, 32; 14, 21, 26; 7, 12; and

Hybridization Mapping Check whether specific small sequences (called probes) bind (hybridize) to fragments (clones) The fingerprint is the subset of probes that successfully hybridize to the clone. If some portion of one clone’s fingerprint matches another, they are likely to be from overlapping regions of the target.

Hybridization Mapping II Probes x, y, z, bound to clone A; x, w and z bound to clone B… overlap in x and z. yx z w Except… we don’t know that much. We only know which probes bind to which clones. Not ordering or even relative lengths!

Background Types of Mapping Mathematical Models Enhanced Double Digest Problem

Restriction Site Models Back to the double digest problem: we’ve split the strings A, B, C into fragments with two enzymes. We have the multisets made up of the fragment lengths: –From previous example: A = {5, 6, 7, 14} B = {3, 4, 8, 15} C = {1, 2, 3, 4, 6, 7, 9} Find permutations of A and B such that there is a one-to-one correspondence between all the subintervals and C. Not too bad, right?

Restriction Site Models II BAD NEWS: The double digest problem is NP-complete. It is a generalization of the set-partition problem, already known to be NP-complete. To give you an idea…the number of solutions is (k-1)! for k = number of restriction sites. BUT… we will see a heuristic later…

Interval Graph Models Model hybridization mapping in terms of interval graphs –Interval graph: A graph G which is mapped from a series of intervals. For each interval there is a vertex in G. For each intersection of intervals there is an edge in G.

Interval Graph Models II Ex: a b c d e b a c e d

Interval Graph Models III To apply this to the hybridization mapping problem… We create graphs with vertices representing clones (fragments), and edges representing overlaps between clones. Two graphs: one for known overlaps and one with known and unknown overlap information (neither are necessarily interval graphs).

Interval Graph Models IV Now find the true interval graph (a subgraph of the known and unknown graph) given the two graphs. Known overlaps + Known/Unknown Actual Interval Overlaps Graph Hmm… Not too easy.

Interval Graph Models V MORE BAD NEWS! This is NP-hard. Maybe another model? –There are two other possible models for hybridization mapping (described in the book). But…. Those are NP-hard too!

Consecutive Ones Property We’re sick of NP hard problems. Give us something a little easier. The Consecutive Ones Property Model (C1P) can be solved in linear time! Assumptions: 1.The probes are unique. 2.There are no errors. (!!) 3.All of the correspondences of clones and probes have been found. (!!!)

Consecutive Ones Property II Build a matrix (n x m), n = # of clones, m = # of probes. Entry i,j is a binary code for whether probe j hybridized to clone i. above: probe 1 hybridized to clone 1, probe 2 hybridized to clone 1, probe 1 hybridized to clone 3, probe 4 hybridized to clone

Consecutive Ones Property III Find a permutation of the columns (probes) such that all the 1s in each row (clone) are consecutive

Consecutive Ones Property IV This algorithm can be run in linear time! Unfortunately, the assumption that there are no errors isn’t useful because biology isn’t a mathematical model. Probes may not bind, DNA may be replicated incorrectly. And generalizations make the problem NP-hard again! We need a good heuristic…

Background Types of Mapping Mathematical Models Enhanced Double Digest Problem

The Enhanced Double Digest (EDD) problem is NP-hard in the general case, but if the lengths of fragments in C (the string acted upon by both types of enzymes) are distinct, it can be solved in linear time! Why do the fragments have to be distinct? … What if all the fragments are the same length?

Problem Formulation We have the multisets A and B. A = {6, 14, 7, 5} B = {8, 3, 15, 4} Take the actual fragments corresponding to each member of the either set (since the sets are only lengths). Apply the other enzyme to the fragment (i.e. apply enzyme β to fragments from A, and vice versa) to create subfragments. AB i is the multiset of subfragments created by applying enzyme β to fragments from A; BA j from applying enzyme α to fragments of B.

Problem Formulation II Example: 8 {2,6} 5 {1,4} A={5, 6, 7, 14} B={3, 4, 8, 15} AB 1 ={1,4}, AB 2 ={6}, AB 3 ={7}, AB 4 ={2,3,9} BA 1 ={3}, BA 2 ={4}, BA 3 ={2,6}, BA 4 ={1,7,9}

Algorithm Given A, B, AB i and BA j for all i, j, construct an undirected graph that connects each element of A and B to its corresponding AB/BA. Note that all elements in C will be covered A: C: B:

Algorithm II Create a spanning tree: Start at random node, follow all paths from the node, don’t repeat edges. 6 A 6 C 8 B 2 C 14 A 9 C 15 B 1 C 5 A 4 C 4 B 3 C 7 C 3 B 7 A

Properties The graph (G) will always be connected, and every node in A and B will only be adjacent to nodes from C. Each node from C connects to only one node each from A and B. If the problem can be solved: G will be a spanning tree, and any subtree that “hangs” on the longest path will be a 2-node length path (dangler).

Properties II Danglers: 6 A 6 C 8 B 2 C 14 A 9 C 15 B 1 C 5 A 4 C 4 B 3 C 7 C 3 B 7 A

Algorithm III If the graph G is not a spanning tree, and not all subtrees hanged off the longest path are danglers, then there is no valid permutation. Perform Dangler-first search on the graph G…

Dangler-First Search Traverse G starting at one end of a path S with the largest number of edges, reading only the nodes from C. Whenever reaching a node with degree greater than 2 (must have a dangler), read the nodes in C from the hanging danglers first, then continue to traverse S. This sequence is π c.

Dangler-First Search II π c = 6, 2, 3, 9, 7, 1, 4. 6 A 6 C 8 B 2 C 14 A 9 C 15 B 1 C 5 A 4 C 4 B 3 C 7 C 3 B 7 A

Algorithm IV The elements in each AB i form a consecutive subsequence in π c. Likewise, the elements in each BA j also form a consecutive subsequence in π c. –This permutation is a valid permutation… meaning: you have the answer!

Solution A= 6, 14, 7, 5 π c = 6, 2, 3, 9, 7, 1, 4 B= 8, 3, 15, 4 6 A 6 C 8 B 2 C 14 A 9 C 15 B 1 C 5 A 4 C 4 B 3 C 7 C 3 B 7 A

Enhanced Double Digest Problem II This can be solved in O(n) time! A generalization (assuming a constant number of duplicates) can also be solved in O(n) time. The general enhanced double digest problem is still NP-Hard. This can be shown by reduction from the Hamiltonian Path problem.