Gao Song 2010/04/27. Outline Concepts Problem definition Non-error Case Edge-error Case Disconnected Components Simulated Data Future Work.

Slides:



Advertisements
Similar presentations
BEST FIRST SEARCH - BeFS
Advertisements

Lecture 15. Graph Algorithms
Greedy best-first search Use the heuristic function to rank the nodes Search strategy –Expand node with lowest h-value Greedily trying to find the least-cost.
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
gSpan: Graph-based substructure pattern mining
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.
Label Placement and graph drawing Imo Lieberwerth.
CSE 2331/5331 Topic 11: Basic Graph Alg. Representations Undirected graph Directed graph Topological sort.
Graph Traversals Visit vertices of a graph G to determine some property: Is G connected? Is there a path from vertex a to vertex b? Does G have a cycle?
Graphs Chapter 12. Chapter Objectives  To become familiar with graph terminology and the different types of graphs  To study a Graph ADT and different.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 CHAPTER 4 - PART 2 GRAPHS 1.
© 2006 Pearson Addison-Wesley. All rights reserved14 A-1 Chapter 14 Graphs.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
CSE 830: Design and Theory of Algorithms
Spring 2010CS 2251 Graphs Chapter 10. Spring 2010CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs.
Graphs & Graph Algorithms 2 Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Minimal Spanning Trees. Spanning Tree Assume you have an undirected graph G = (V,E) Spanning tree of graph G is tree T = (V,E T E, R) –Tree has same set.
Fall 2007CS 2251 Graphs Chapter 12. Fall 2007CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs To.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
CISC220 Fall 2009 James Atlas Nov 13: Graphs, Line Intersections.
Graphs & Graph Algorithms 2 Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Busby, Dodge, Fleming, and Negrusa. Backtracking Algorithm Is used to solve problems for which a sequence of objects is to be selected from a set such.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp , 2004.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Chapter 2 Graph Algorithms.
Image Segmentation Seminar III Xiaofeng Fan. Today ’ s Presentation Problem Definition Problem Definition Approach Approach Segmentation Methods Segmentation.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Vilalta&Eick:Uninformed Search Problem Solving By Searching Introduction Solutions and Performance Uninformed Search Strategies Avoiding Repeated States/Looping.
Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results.
Lecture 11 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
1 Kuliah 4 : Informed Search. 2 Outline Best-First Search Greedy Search A* Search.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
© 2006 Pearson Addison-Wesley. All rights reserved 14 A-1 Chapter 14 Graphs.
Trees Thm 2.1. (Cayley 1889) There are nn-2 different labeled trees
Computability NP complete problems. Space complexity. Homework: [Post proposal]. Find PSPACE- Complete problems. Work on presentations.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
OPERA highthroughput paired-end sequences Reconstructing optimal genomic scaffolds with.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
Graph Theory Def: A graph is a set of vertices and edges G={V,E} Ex. V = {a,b,c,d,e} E = {ab,bd,ad,ed,ce,cd} Note: above is a purely mathematical definition.
Spanning Trees Dijkstra (Unit 10) SOL: DM.2 Classwork worksheet Homework (day 70) Worksheet Quiz next block.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Week 11 - Wednesday.  What did we talk about last time?  Exam 2  And before that:  Graph representations  Depth first search.
Graph Search Applications, Minimum Spanning Tree
Introduction to Algorithms
Last time: Problem-Solving
Topic:- ALGORITHM Incharge Faculty – Lokesh Sir.
CSE 2331/5331 Topic 9: Basic Graph Alg.
Network analysis.
Single-Source Shortest Paths
Minimal Spanning Trees
Enumerating Distances Using Spanners of Bounded Degree
Graphs & Graph Algorithms 2
Shortest Path.
Searching for Solutions
Lectures on Graph Algorithms: searching, testing and sorting
2017, Fall Pusan National University Ki-Joune Li
Applied Combinatorics, 4th Ed. Alan Tucker
Minimum Spanning Tree Algorithms
Introducing Underestimates
HW 1: Warmup Missionaries and Cannibals
HW 1: Warmup Missionaries and Cannibals
Presentation transcript:

Gao Song 2010/04/27

Outline Concepts Problem definition Non-error Case Edge-error Case Disconnected Components Simulated Data Future Work

Concepts Contig: Edge (PET): library size Scaffolding: a sequence of contigs Happy Edge: Real distance <= expected distance Orientation of both contigs are correct

Problem Definition Version 1: Given a set of contigs and a set of edges, find a scaffold which has at most p unhappy edges Version 2: Given a set of contigs and a set of edges, find a scaffold which has at most p unhappy edges and is also the optimal solution

Non-error Case Connected graph Partial Layout: Dangling Edge: only one end in partial layout Active region: the sequence from the first contig having dangling edges to the end of partial layout; less than library size Domain of a partial layout: all nodes in partial layout

Non-error Case Theorem: if two partial layout l1 and l2 have same active region and dangling set, then (1) they have same domain (2) both or neither of them can extend to a solution Proof:

Procedure Find the unassigned node Select the nearest node as next assigned node Update current partial layout Remove all dangling edges incident to new node Add new dangling edges of new node Remove contigs from active region

Main Procedure Find all nodes which has no ancestors and select one to start From an active region, get all unassigned nodes, and update the partial layout Remember all visited partial layout If dangling edge set is empty, output the results

Time and space complexity Two possibilities k vertices in active region – one possible next nodes Less than k vertices in active region – n possible next nodes Comlexity O(n k )*O(1) O(n k-1 )*O(n) Total time complexity: O(n k ) Total space complexity: store all visited partial order

Introduce Edge Error Types of edge error Chimeric PETs: Mapping error Misassembled contigs Solution Filtering – filter chimeric PETs Select x% of PETs Shuffle them to get chimeric PETs Cluster them to find threshold Local threshold

Introduce Edge Error There are p unhappy edges in final scaffolding Partial layout Dangling edges: real dangling edges; wrong edges

Equivalent Class Active region, dangling edges’ set, count of current wrong edges Same domain Assumption: the partial order is a connected graph

Get Unassigned Nodes Sort the unassigned nodes Properties of nodes: Steps to reach this node Distance to the end of active region Unhappy edges introduced due to this node

Sort Unassigned Nodes Breadth-first search Select the smallest possible distance: > threshold Sort nodes: Less than 5 steps, compare with distance; same distance, compare with unhappy edges

Update Partial Layout Check if all incident un-wrong dangling edges are happy If yes, just remove all those edges and add new node If no, check if setting all unhappy edges as omitted will result in disconnected graph If no, just add new node and remove dangling edges If yes, discard current partial layout – to avoid insert disconnected component into sequence Add new dangling edges Remove all dangling edges which is not happy – check connectness

Main Procedure If active region is empty Current connected component is finished Check if dangling edge set is empty If yes, output the result If no, using dangling edges to find a new node and start another scaffolding

Disconnected Components First find all the connected components and sort them according to the number of nodes From the first component, find a solution, which omits p1 edges For ith component, if there is no solution omits p- sum(p1,…, pi-1) edges, remember all the stop point, return to (i-1)th component, and see if it can find a solution which omits less than pi-1 edges. If yes, continue from the stop point of ith component.

If ith component finishes the whole search and found more than one solutions. Then, only remember the solution with minimum pi. Then, in the future, when comes to this component, just use this solution as part of the partial results

Optimal Solution Branch and Bound P’ edges

Simulated Data Result Node Num: 1522 nodes Contig length: ,000 Wrong edgespTime(ms)

Future Work Find the optimal solution Wrong contigs Repeats How to deal with large p Find a good way to sort the unassigned nodes

Thank you