Learning a hidden graph with adaptive algorithms

Slides:



Advertisements
Similar presentations
Routing Complexity of Faulty Networks Omer Angel Itai Benjamini Eran Ofek Udi Wieder The Weizmann Institute of Science.
Advertisements

CS 336 March 19, 2012 Tandy Warnow.
Connectivity - Menger’s Theorem Graphs & Algorithms Lecture 3.
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
1 LP Duality Lecture 13: Feb Min-Max Theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum.
22C:19 Discrete Math Graphs Fall 2010 Sukumar Ghosh.
22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.
13 May 2009Instructor: Tasneem Darwish1 University of Palestine Faculty of Applied Engineering and Urban Planning Software Engineering Department Introduction.
Small Subgraphs in Random Graphs and the Power of Multiple Choices The Online Case Torsten Mütze, ETH Zürich Joint work with Reto Spöhel and Henning Thomas.
Approximating Maximum Subgraphs Without Short Cycles Guy Kortsarz Join work with Michael Langberg and Zeev Nutov.
Tutorial 6 of CSCI2110 Bipartite Matching Tutor: Zhou Hong ( 周宏 )
Next Generation Sequencing, Assembly, and Alignment Methods
Introduction to Approximation Algorithms Lecture 12: Mar 1.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint work with Mira Gonen Dana Ron Tel-Aviv University.
Chapter 11: Limitations of Algorithmic Power
Small Subgraphs in Random Graphs and the Power of Multiple Choices The Online Case Torsten Mütze, ETH Zürich Joint work with Reto Spöhel and Henning Thomas.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
Sequence comparison: Local alignment
Part I: Introductory Materials Introduction to Graph Theory Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer.
MCS312: NP-completeness and Approximation Algorithms
DNA Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 8, 2005 ChengXiang Zhai Department of Computer Science University of Illinois,
Simple Efficient Algorithm for MPQ-tree of an Interval Graph Toshiki SAITOH Masashi KIYOMI Ryuhei UEHARA Japan Advanced Institute of Science and Technology.
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Presenter: Jen Hua Chi Adviser: Yeong Sung Lin Network Games with Many Attackers and Defenders.
Graph Theory And Bioinformatics Jason Wengert. Outline Introduction to Graphs Eulerian Paths & Hamiltonian Cycles Interval Graph & Shape of Genes Sequencing.
Sorting by Cuts, Joins and Whole Chromosome Duplications
Pooling designs for clone library screening in the inhibitor complex model Department of Mathematics and Science National Taiwan Normal University (Lin-Kou)
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Testing the independence number of hypergraphs
1/24 Introduction to Graphs. 2/24 Graph Definition Graph : consists of vertices and edges. Each edge must start and end at a vertex. Graph G = (V, E)
1 The 24th Clemson mini-Conference on Discrete Mathematics and Algorithms Oct. 22 – Oct. 23, 2009 Clemson University Algebraic Invariants and Some Hamiltonian.
Lecture 10: Graph-Path-Circuit
Speeding Up Enumeration Algorithms with Amortized Analysis Takeaki Uno (National Institute of Informatics, JAPAN)
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
DNA computing on a chip Mitsunori Ogihara and Animesh Ray Nature, 2000 발표자 : 임예니.
Eternal Domination Chip Klostermeyer.
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
Learning Hidden Graphs Hung-Lin Fu 傅 恆 霖 Department of Applied Mathematics Hsin-Chu Chiao Tung Univerity.
Combinatorial Group Testing 傅恆霖應用數學系. Mathematics? You are you, you are the only one in the world just like you! Mathematics is mathematics, there is.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Group Testing and Its Applications
DNA Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics)
CSCI2950-C Lecture 12 Networks
Multicolored Subgraphs in Properly Edge-colored Graphs
Introduction to Approximation Algorithms
Static and Dynamic Fault Diagnosis
Recent Interests and Progress Hung-Lin Fu (傅恒霖)
Michael Langberg: Open University of Israel
Sequence comparison: Local alignment
What is the next line of the proof?
From dense to sparse and back again: On testing graph properties (and some properties of Oded)
NP-Completeness Yin Tat Lee
MST in Log-Star Rounds of Congested Clique
Analysis and design of algorithm
ICS 353: Design and Analysis of Algorithms
Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems
A Primal-Dual Solution to Minimal Test Generation Problem
Problem Solving 4.
Multicolored Subgraphs in an Edge-colored Complete Graph
Optimal Conflict-avoiding Codes of Odd Length Weight Three
NP-Completeness Yin Tat Lee
Discrete Mathematics and its Applications Lecture 7 – Cops and Robbers
Rainbow Graph Designs Hung-Lin Fu (傅 恒 霖)
Grid-Block Designs and Packings
Hamiltonicity below Dirac’s condition
Presentation transcript:

Learning a hidden graph with adaptive algorithms Hung-Lin Fu Department of Applied Mathematics National Chiao Tung University Hsin Chu, Taiwan

Motivated by bioinformatics applications Introduction

Random shotgun approach genomic segment cut many times at random (Shotgun) 6

Whole-genome shotgun sequencing Short reads are obtained and covering the genome with redundancy and possible gaps. Circular genome Introduction

Reads are assembled into contigs with unknown relative placement. Introduction

Primers : (short) fragments of DNA characterizing ends of contigs. Introduction

A PCR (Polymerase Chain Reaction) reaction reveals if two primers are proximate (adjacent to the same gap). Multiplex PCR can treat multiple primers simultaneously and outputs if there is a pair of adjacent primers in the input set and even sometimes the number of such pairs. Introduction

Two primers of each contig are “mixed together” Find a Hamiltonian cycle by PCRs! Introduction

Primers are treated independently. Find a perfect matching by PCRs. Introduction

Goal Our goal is to provide an experimental protocol that identifies all pairs of adjacent primers with as few PCRs (queries) (or multiplex PCRs respectively) as possible. Introduction

Mathematical Models Hidden Graphs (Reconstructed) Topology-known graphs, e.g. Hamiltonian cycle, matching, star, clique, bipartite graph, …, etc. Graphs of bounded degree Hypergraphs Graphs of known number of edges REF Introduction

Models Multi-vertex model Quantitative multi-vertex model k-vertex model Quantitative k-multi-vertex model Learning a hidden graph by edge-detecting queries: 8

Described into Math Part II Algorithms Adaptive algorithms: a query can depend on the answers obtained by previous queries. Nonadaptive algorithms: queries are independent and can be processed in parallel. Hidden Graph Introduction

Example 3 4 8 7 1 2 5 6 G :

Q({1,2,3,4,5,6,7,8}) = 1 3 4 8 7 1 2 5 6

Q({1,2,3,4}) = 0 3 4 8 7 1 2 5 6

Q({1,2,3,4,5,7}) = 1 3 4 8 7 1 2 5 6

3 4 8 7 1 2 5 6 v = {5}, S \ {v} = {1, 2, 3, 4} Q({1,2,3,4,5}) = 1 v Q({5,1,2}) = 0 Q({5,3}) = 1 5 2 1 4 3 5 2 1 4 3

Known Results (Matching) The information-theoretic lower bound for matching is (1+o(1))nlgn bound can be reached by an adaptive algorithm. [Bouvel, et al. 05’]. Proof. Nonadaptive algorithms require queries. [Alon, Beigel, Kasif, Rudich, Sudakov 02’]. Proof Introduction

Strategy: first to find one vertex Theorem: [Angluin 06’] A vertex in a hidden graph on n vertices can be reconstructed with at most queries. Proof. Introduction

Results Example of Find-One-Vertex Introduction

Known Results on Other Graphs Hamiltonian[lower][upper] Star Introduction

Hamiltonian cycle ~ adap. O(nlgn) bound can be reached by an adaptive algorithm. [Grebinski, Kucherov 1997]. Proof. To process all vertices one-by-one by storing them in the independent set of chains. case I: no/no case II: yes/no case III: yes/yes at most 2nlgn queries. BACK Introduction

How about more general graphs?

Lower bound Theorem 3. For any , edge-detecting queries are required to identify a graph drawn from the class of all graphs with vertices and edges. Proof. 18

Main Ideas If there are edges between two independent sets A and B, we may find all of the edges by using (a, B)-algorithm, a  A. We start with finding the maximal matching! Algorithm 1. MAXIMAL_MATCHING(V) Algorithm 2. PARTITION_OF_VERTEX_SET(V) Algorithm 3. HIDDEN_GRAPH(V) 20

Reference Reconstructing a Hamiltonian cycle by querying the graph: Application to DNA physical mapping [Grebinski and Kucherov 98’ ] Learning a hidden Matching [ N. Alon et al, 04’] Learning a hidden graph using O(lgn) queries per edge. [Angluin and Chen 04’] Learning a hidden subgraph [Alon and Asodi, 05’] Combinatorial search on graphs motivated by bioinformatics applications: a brief survey [Bouvel, Grebinski and Kucherov, 05’] Learning a hidden hypergraph [Angluin and Chen, 06’] Math Introduction

Example (Algorithm A(V): Finding an edge on V) 6 8 5 7 2 1 4 3 MAXIMAL_MATCHING(V) Algorithm A({1,2,3,4,5,6,7,8}) 1 3 Algorithm A({2,4,5,6,7,8}) 2 4 Algorithm A({5,6,7,8}) 5 7 Q({8,6}) = 0 21

Algorithm 2 PARTITION_OF_VERTEX_SET(V) 6 8 6 8 G : 6 8 5 7 2 1 4 3 1 3 21

Algorithm 3 It is left to find all the edges between independent sets. Now, a general graph is reconstructed.

Don’t Stop!

Complexity The number of queries is less than 2m(log n + 9). Algorithm 1. Line Number of queries 2 3 total

Algorithm 2. Algorithm 3. Line Number of queries 2 3 total Line 1 7 14+17 0 (all of queries be answered in algorithm 2. , 10th line) 15+18 26 total

Concluding remarks Reduce the rounds of Algorithm 1 (i.e., obtain an efficient algorithm to find a maximal matching). Learning a hidden graph in Quantitative k-multi-vertex model. 24

References [1] N. Alon, R. Beigel, S. Kasif, S. Rudich,and B. Sudakov. Learning a hidden matching, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 197–206, 2002. [2] D. Angluin and J. Chen. Learning a hidden graph using O(log n) queries per edge. Manuscript, 2006. [3] D. Angluin and J. Chen. Learning a hidden hypergraph of Machine Learning Research 7, 2215-2236, 2007. [4] R. Beigel, N. Alon, S. Kasif, M. S. Apaydin and L. Fortnow. An optimal procedure for gap closing in whole genome shotgun sequencing, In RECOMB, 22–30, 2001. [5] V. Grebinski and G. Kucherov. Optimal query bounds for reconstructing a Hamiltonian cycle in complete graphs, In fifth Israel symposium on the Theory of Computing Systems, 166-173, 1997. [6] V. Grebinski and G. Kucherov. Reconstructing a Hamiltonian cycle by querying the graph: Application to DNA physical mapping. Discrete Applied Math., 88(1-3): 147–165, 1998. 25

Thank you for your attention! Introduction