Graph Theory Aiding DNA Fragment Assembly Jonathan Kaptcianos advisor: Professor Jo Ellis-Monaghan Work.

Slides:



Advertisements
Similar presentations
CSE 211 Discrete Mathematics
Advertisements

CS 336 March 19, 2012 Tandy Warnow.
NP-Hard Nattee Niparnan.
Chapter 8 Topics in Graph Theory
Lecture 5 Graph Theory. Graphs Graphs are the most useful model with computer science such as logical design, formal languages, communication network,
Walks, Paths and Circuits Walks, Paths and Circuits Sanjay Jain, Lecturer, School of Computing.
De Bruijn sequences Rotating drum problem:
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Chapter 8: Graph Algorithms July/23/2012 Name: Xuanyu Hu Professor: Elise de Doncker.
Introduction to Graphs
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Chapter 7 Graph Theory 7.1 Modeling with graphs and finding Euler circuits. Learning Objectives: Know how to use graphs as models and how to determine.
1 NP-completeness Lecture 2: Jan P The class of problems that can be solved in polynomial time. e.g. gcd, shortest path, prime, etc. There are many.
Introduction to Graph Theory Lecture 11: Eulerian and Hamiltonian Graphs.
Section 7.4: Closures of Relations Let R be a relation on a set A. We have talked about 6 properties that a relation on a set may or may not possess: reflexive,
Koenigsberg bridge problem It is the Pregel River divided Koenigsberg into four distinct sections. Seven bridges connected the four portions of Koenigsberg.
Section 2.1 Euler Cycles Vocabulary CYCLE – a sequence of consecutively linked edges (x 1,x2),(x2,x3),…,(x n-1,x n ) whose starting vertex is the ending.
What is the first line of the proof? a). Assume G has an Eulerian circuit. b). Assume every vertex has even degree. c). Let v be any vertex in G. d). Let.
DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.
Jo Ellis-Monaghan St. Michaels College, Colchester, VT website: Work.
4/17/2017 Section 8.5 Euler & Hamilton Paths ch8.5.
Complexity ©D.Moshkovitz 1 Paths On the Reasonability of Finding Paths in Graphs.
Network Theorems SUPERPOSITION THEOREM THÉVENIN’S THEOREM
MCA 520: Graph Theory Instructor Neelima Gupta
Chapter 15 Graph Theory © 2008 Pearson Addison-Wesley.
1 Shortest Path Calculations in Graphs Prof. S. M. Lee Department of Computer Science.
GRAPH Learning Outcomes Students should be able to:
DNA Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 8, 2005 ChengXiang Zhai Department of Computer Science University of Illinois,
University of Texas at Arlington Srikanth Vadada Kishan Kumar B P Fall CSE 5311 Solving Travelling Salesman Problem for Metric Graphs using MST.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
1 Excursions in Modern Mathematics Sixth Edition Peter Tannenbaum.
Graph Theory Topics to be covered:
Copyright © 2013, 2009, 2005 Pearson Education, Inc. 1 5 Systems and Matrices Copyright © 2013, 2009, 2005 Pearson Education, Inc.
394C March 5, 2012 Introduction to Genome Assembly.
Can you connect the dots as shown without taking your pen off the page or drawing the same line twice.
Graph Theory And Bioinformatics Jason Wengert. Outline Introduction to Graphs Eulerian Paths & Hamiltonian Cycles Interval Graph & Shape of Genes Sequencing.
CSE 20: Discrete Mathematics for Computer Science Prof. Shachar Lovett.
CSNB143 – Discrete Structure Topic 9 – Graph. Learning Outcomes Student should be able to identify graphs and its components. Students should know how.
Unit – V Graph theory. Representation of Graphs Graph G (V, E,  ) V Set of vertices ESet of edges  Function that assigns vertices {v, w} to each edge.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Solving Systems of Equations by Elimination (Addition)
CIRCUITS, PATHS, AND SCHEDULES Euler and Königsberg.
Graph theory and networks. Basic definitions  A graph consists of points called vertices (or nodes) and lines called edges (or arcs). Each edge joins.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 5 The Mathematics of Getting Around 5.1Euler Circuit Problems 5.2What.
A new Approach to Fragment Assembly in DNA Sequenceing Fei wu April,24,2006.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Chapter 11 - Graph CSNB 143 Discrete Mathematical Structures.
Introduction to Graph Theory
An Algorithm for the Consecutive Ones Property Claudio Eccher.
Chapter 14 Section 3 - Slide 1 Copyright © 2009 Pearson Education, Inc. AND.
Grade 11 AP Mathematics Graph Theory Definition: A graph, G, is a set of vertices v(G) = {v 1, v 2, v 3, …, v n } and edges e(G) = {v i v j where 1 ≤ i,
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
MAT 110 Workshop Created by Michael Brown, Haden McDonald & Myra Bentley for use by the Center for Academic Support.
Solving Systems of Equations in Two Variables; Applications
Context-Free Grammars: an overview
More NP-complete Problems
Eulerian tours Miles Jones MTThF 8:30-9:50am CSE 4140 August 15, 2016.
Introduction to Genome Assembly
CS 598AGB Genome Assembly Tandy Warnow.
Instructor: Shengyu Zhang
Genome Assembly.
Maximum Flows of Minimum Cost
CHAPTER 2 Context-Free Languages
Graph Algorithms in Bioinformatics
How to use hash tables to solve olympiad problems
An Eulerian path approach to DNA fragment assembly
Chapter 15 Graph Theory © 2008 Pearson Addison-Wesley.
Applied Combinatorics, 4th Ed. Alan Tucker
Hamilton Paths and Circuits
Presentation transcript:

Graph Theory Aiding DNA Fragment Assembly Jonathan Kaptcianos advisor: Professor Jo Ellis-Monaghan Work supported by the Vermont Genetics Network through NIH Grant Number P20 RR16462 from the INBR program of the National Center for Research Resources

DNA Sequencing: An Overview a lab technique which looks at fragments (anywhere from 500 to 1200 nucleotides long) of DNA and determines the order of entire genome from these these individual fragments. modern science has enabled us to determine the DNA sequences of animals and other organisms Previous approaches for fragment assembly follow the “overlap-layout consensus” algorithm overlap: matching all possible reads and finding any overlapping layout: finding order of reads along DNA and putting them together consensus: deriving how sequence will appear based on layout

Problems in DNA Sequencing there could be multiple ways to reconstruct the original strand out of the fragment pieces, or “snippets,” and only one of which is correct the human genome has a large number of sequences that repeat an even larger number of times if a repeating sequence is larger than the size of the viewable reads, it would make construction of the genome almost impossible Solutions: Some components in Graph Theory, specifically Eulerian Paths and de Bruijn Graphs, help us come to some possible conclusions about the problem regarding reassembled strands of DNA

Eulerian Circuits and Paths Eulerian Circuit – visits each edge in a graph exactly once, and ends at the same vertex in which it started. a-d-b-f-e-d-f-c-b-a is an Eulerian cycle in this particular graph a b c d f e Eulerian Path – visits each edge in a graph exactly once. a b c d f e ji h g h a-b-c-d-e-f-g-c-h-f-i-j is an Eulerian trail in this particular graph

Example: The strand ATCGACTATAAGGCATCGAA de Bruijn graph has “snippets” of length 4, vertices of length 3, and the directed edge between two vertices represent the 4 piece snippet. GAA TCG CGA GGC GAC ACT CTA TAT ATA AGG GGC ATC TAA AAG GCA CAT S 2007 DNA Strands and de Bruijn Graphs de Bruijn Graph – a directed graph with vertices that represent sequences of symbols from an alphabet, and edges that indicate where the sequence may overlap.

Eulerian Path Approach to DNA Fragment Assembly  abandons the previously mentioned “overlap-layout- consensus”  ultimately, converts an NP-complete Hamilton Path Problem into a simplified Eulerian Path Problem through construction of a de Bruijn graph  the number of ways to reconstruct the graph is equivalent to the number of paths which follow the respective directions and travel through all edges  the resulting problem is that there are a number of different Eulerian Paths through this graph, and we cannot tell which would resemble the original path E-M 2006

Eulerian Superpath Problem Eulerian Superpath Problem – Given an Eulerian Graph and a collection of paths on this graph, find an Eulerian path in this graph that contains all these paths as subpaths.  The original Eulerian Path Problem is a case of the Eulerian Superpath Problem, in which every path is a single edge. Solving:  Take graph G and the system of paths P, and transform these to a new graph G 1 and a new system P 1.  With the goal in mind that there is a one-to-one correspondence (equivalence) between (G,P) and (G 1,P 1 ), we go on to make a series of these transformations. (G,P) → (G 1,P 1 ) → (G 2,P 2 ) →…→ (G k,P k )  All these transformations should lead to a system P k in which every path is represented by one edge.  Since all transformations from beginning to end are equal, every solution of EPP in (G k,P k ) will provide a solution to the ESPP in (G,P).

An x,y -detachment for no multiple edges  Let x = (v in,v mid ) and y = (v mid,v out ) be two consecutive edges in G and P x,y be all paths from P that include x,y as a subpath.  P →x is the paths from P that end on x and P y→ is the collection of paths from P that start with y.  Adding a new edge z = (v in,v out ) to delete the edges x and y.  We can substitute z instead of x,y in all paths from P x,y, x in all paths from P →x, and y in all paths from P y→.  Thus, reducing an ESPP to an EPP. PTW 2001

Detachment for Multiple Edges  Let vertex v mid have multiplicity 2 and only incoming edge be x = (v in,v mid ), and two outgoing edges y 1 = (v mid,v out1 ) and y 2 = (v mid,v out2 ) with multiplicity 1.  Since there exists a multiple edge, the Eulerian path will visit x twice, once followed by y 1 and once by y 2.  If an edge z is used in a detachment of x,y 1, it will shorten P x,y1 to a single edge z and substitute z in all paths from P y1→.  Equivalence will only be present if P →x is empty; if its not, there will be ambiguity about whether the last edge in a specified path P in P →x should go to z or the remaining edge x.  This is resolved by looking at the relations between every path P and P x,y1 or P x,y2. PTW 2001

Paths and Consistency Two paths are consistent if their union is a path and there are no branching vertices. Case 1: P is inconsistent with both P x,y1 and P x,y2 In this situation, there exists no solution to the Eulerian Superpath Problem, as the data for sequencing will be inconsistent. In the example below, the three paths possess a different way to visit edge x PTW 2001

Case 2: P is consistent with only one of P x,y1 and P x,y2  P is resolvable, as it can be related to one of the systems of paths.  When consistent with P x,y1, it would be assigned to the z edge created in the previous x,y1 -detachment  When consistent with P x,y2, it would be assigned to edge x and no further action would be needed  The edge x is resolvable if all paths in P →x are, and therefore it is an equivalent transformation. Here, P is consistent with P x,y1 PTW 2001 x

Case 3: P is consistent with both P x,y1 and P x,y2  When this occurs on at least one path in P →x, the edge x is considered unresolvable and is postponed with the hopes of further transformations (shown below) resolving it y4,x1 -detachment x2,y1 -detachment z,x2 -detachment  Through this series of transformations, the final graph is a simplified and equivalent transformation of the first. PTW 2001

The x-cut  Consider the graph G with 5 edges and the 4 given paths with two edges each.  In this situation, no previous detachment discussed will allow for an equivalent transformation.  An edge x=(v,w) is removable if it is the only edge leaving v and coming into w, and if it is either the initial or final edge in every path P in the system of paths  An x -cut on this graph will turn P into a new system of paths by removing x from all paths in P →x and P x→.  As x is removed from each path, the single-edged paths y1, y2, y3, y4 that remain.  This demonstrates an equal transformation as each Eulerian Superpath in (G,P) corresponds to each in (G 1,P 1 ) PTW 2001

Some Conclusions Through a series of detachments and cuts, it is possible to transform a once tangled and overwhelming graph into a simplified, equivalent and more easily resolvable graph. The Eulerian Superpath Approach on DNA Fragment Assembly doesn’t eliminate the discrepancies about the original construction of the Genome, but just makes it a little neater and easier to work with.  Scientists and researchers are able to consider large groups of edges, vertices, and paths as a significantly smaller number elements, instead of having to focus on every element in the strand of DNA.