Genome Halving – work in progress Fulton Wang 8-17-2011 ACGT Group Meeting.

Slides:



Advertisements
Similar presentations
CS 336 March 19, 2012 Tandy Warnow.
Advertisements

Chapter 8 Topics in Graph Theory
Graph Theory Arnold Mesa. Basic Concepts n A graph G = (V,E) is defined by a set of vertices and edges v3 v1 v2Vertex (v1) Edge (e1) A Graph with 3 vertices.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Bart Jansen, Utrecht University. 2  Max Leaf  Instance: Connected graph G, positive integer k  Question: Is there a spanning tree for G with at least.
R. Johnsonbaugh Discrete Mathematics 5 th edition, 2001 Chapter 8 Network models.
1 Routing and Wavelength Assignment in Wavelength Routing Networks.
Management Science 461 Lecture 2b – Shortest Paths September 16, 2008.
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
Five Problems CSE 421 Richard Anderson Winter 2009, Lecture 3.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Graphs. Overview What is a graph? Some terminology Types of graph Implementing graphs (briefly) Some graph algorithms Graphs 2/18.
Approximation Algorithms: Combinatorial Approaches Lecture 13: March 2.
CSE 326: Data Structures NP Completeness Ben Lerner Summer 2007.
CS541 Advanced Networking 1 Routing and Shortest Path Algorithms Neil Tang 2/18/2009.
FPGA Acceleration of Gene Rearrangement Analysis Jason D. Bakos Dept. of Computer Science and Engineering University of South Carolina Columbia, SC USA.
Chapter 4 Graphs.
The Shortest Path Problem
Chapter 15 Graph Theory © 2008 Pearson Addison-Wesley.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance Phillip Compeau University of California-San Diego Department.
GRAPH Learning Outcomes Students should be able to:
Copyright © Cengage Learning. All rights reserved.
Dijkstra’s Algorithm. 2 Shortest-path Suppose we want to find the shortest path from node X to node Y It turns out that, in order to do this, we need.
Programming for Geographical Information Analysis: Advanced Skills Online mini-lecture: Introduction to Networks Dr Andy Evans.
SPANNING TREES Lecture 21 CS2110 – Spring
Graph Theory Topics to be covered:
Lecture 16 Maximum Matching. Incremental Method Transform from a feasible solution to another feasible solution to increase (or decrease) the value of.
NP-complete Problems SAT 3SAT Independent Set Hamiltonian Cycle
CSNB143 – Discrete Structure Topic 9 – Graph. Learning Outcomes Student should be able to identify graphs and its components. Students should know how.
CSCI 115 Chapter 7 Trees. CSCI 115 §7.1 Trees §7.1 – Trees TREE –Let T be a relation on a set A. T is a tree if there exists a vertex v 0 in A s.t. there.
Sorting by Cuts, Joins and Whole Chromosome Duplications
Incidentor coloring: methods and results A.V. Pyatkin "Graph Theory and Interactions" Durham, 2013.
DP TSP Have some numbering of the vertices We will think of all our tours as beginning and ending at 1 C(S, j) = length of shortest path visiting each.
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Data Structures & Algorithms Graphs
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
CS 361 – Chapter 16 Final thoughts on minimum spanning trees and similar problems Flow networks Commitment: –Decide on presentation order.
CSCI 115 Chapter 8 Topics in Graph Theory. CSCI 115 §8.1 Graphs.
Comp. Genomics Recitation 10 Clustering and analysis of microarrays.
Homework - hints Problem 1. Node weights  Edge weights
CSC 413/513: Intro to Algorithms
Introduction to Graph Theory
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
Tree - in “math speak” An ________ graph is a set of vertices/nodes and a set of edges, each edge connects two vertices. Any undirected graph in which.
C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
Approximation Algorithms by bounding the OPT Instructor Neelima Gupta
Instructor Neelima Gupta Table of Contents Introduction to Approximation Algorithms Factor 2 approximation algorithm for TSP Factor.
Matching in bipartite graphs Given: non-weighted bipartite graph not covered node extending alternating path initial matching Algorithm: so-called “extending.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
Proof of correctness of Dijkstra’s algorithm: Basically, we need to prove two claims. (1)Let S be the set of vertices for which the shortest path from.
1 Minimum Spanning Tree: Solving TSP for Metric Graphs using MST Heuristic Soheil Shafiee Shabnam Aboughadareh.
Network Flow.
Original Synteny Vincent Ferretti, Joseph H. Nadeau, David Sankoff, 1996 Presented by: Suzy Sun.
Eulerian tours Miles Jones MTThF 8:30-9:50am CSE 4140 August 15, 2016.
COMP 6/4030 ALGORITHMS Prim’s Theorem 10/26/2000.
Network Flow.
Bipartite Matching and Other Graph Algorithms
The Taxi Scheduling Problem
Discrete Maths 9. Graphs Objective
Lecture 3: Genome Rearrangements and Duplications
Instructor: Shengyu Zhang
Problem Solving 4.
Double Cut and Join with Insertions and Deletions
Network Flow.
Graphs G = (V, E) V are the vertices; E are the edges.
Network Flow.
Presentation transcript:

Genome Halving – work in progress Fulton Wang ACGT Group Meeting

Motivation: chromosomes have undergone whole duplication in the past, going from “pre duplicated genome” to “perfectly duplicated genome” “pre duplicated genome” A: single circular chr with exactly 1 copy of each gene “perfectly duplicated chr” A+A: genome that has 2 copies of every adjacency in A A+A could be either 1 or 2 chrs, ie A-A or A A depending on how you label the adjacencies in A+A Eg: > or

After the duplication event, rearrangements occur that alter the adjacencies in the perfectly duplicated chr to give us present day genome G Eg: > We will assume that G is a single circular chr Given G, want to infer A

Natural solution: want to find preduplicated chr A such that dist(A+A, G) is minimized Simple distance measure: breakpoint distance: N - # shared adjacencies between 2 chrs Eg: 1234 and 1243 has breakpoint distance of 1 But A+A and G each have 2 copies of every gene

What is breakpoint distance between 1212 and 1122? Not well defined. Need to match up the genes in the 2 chr to each other. Need to label every gene copy and gives 0 shared adjacencies and gives 1 shared adjacency

If G1, G2 are duplicated genomes(exactly 2 copies of every gene), then dist(G1,G2) is the minimum of bp(G1,G2) over all possible labellings of G1, G2 Halving problem: given a genome G with exactly 2 copies of each gene, we want to find the minimum over all pre duplicated genomes A of dist(A+A, G). Assume A and G are both single circular chr

Restriction is that the pre duplicated chr A be a single circular chr If you didn’t care about # pieces in A, then there is simple algorithm to solve halving problem – involves just maximum matching where the vertices are the gene ends, edges between each pair of gene ends weighed by how many times that adjacency appears in G

Restrictions on A make the problem harder Right now, I am considering the simpler case when I restrict all the genes to be positively oriented, ie no backwards facing genes

Graph representation of problem A chr can be represented as a directed graph, where each vertex is a specific copy of a gene becomes 1 1 ->2 1 ->1 2 ->2 2 Genome depicted is Grouped members of same gene family together. View each family as 1 “big” vertex, so that graph has 4 vertices

Picking A corresponds to picking a tour of the “big” vertices. Suppose A = Before labeling the adjacencies in A, you don’t know if A+A is A-A or A A Can you label the adjacencies in A+A such that the number of shared adjacencies between A+A and G is the number of shared adjacencies between A+A and G, if the genes are unlabeled?

Yes. Note that this is only true because we don’t care if A+A is 1 or 2 chr This means that if you fix A, then bp(A+A) is equal to the number of “small” edges parallel to A This means we can work with alternate graph, with n vertices, each representing a gene family, where the weight of an edge is the number of adjacencies between the 2 gene families in G

Now, genome halving is reduced to finding max weight directed ham cycle in the complete graph, with edge weights of 0,1,2 dist(A+A,G) = min_A min_labellings (A+A, G) Can simplify problem – if A-A/A A is the minimum possible distance from G, and some adjacency A->B appears twice in G, then A must contain that adjacency, and can merge the 2 vertices

So then, halving is reduced to finding max weight Ham cycle in complete directed graph, where all the edges out of a vertex have weight 0 except 2 of them, which have weight 1 Alternatively, want to find minimum path cover in a directed 2-regular graph. If you have a minimum path cover, you can link the paths with edges of weight 0 to get a max weight ham cycle.

Guided halving Same situation as before, but now we know an ancestor of the current organism, call it B. Then, we wish to minimize bp(A+A,G) + d(A,B) Note that B only contains 1 copy of each gene, same with A

Then, have similar problem to before – given complete graph where edges are of weight 0,1,2, find maximum weight ham cycle This is the exact same problem that is needed to solve in the breakpoint median problem. Likely means guided halving is NP-complete.