Download presentation
Presentation is loading. Please wait.
Published byDieter Gerstle Modified over 5 years ago
1
Learning a hidden graph with adaptive algorithms
Hung-Lin Fu Department of Applied Mathematics National Chiao Tung University Hsin Chu, Taiwan
2
Motivated by bioinformatics applications
Introduction
3
Random shotgun approach
genomic segment cut many times at random (Shotgun) 6
4
Whole-genome shotgun sequencing
Short reads are obtained and covering the genome with redundancy and possible gaps. Circular genome Introduction
5
Reads are assembled into contigs with unknown relative placement.
Introduction
6
Primers : (short) fragments of DNA characterizing ends of contigs.
Introduction
7
A PCR (Polymerase Chain Reaction) reaction reveals if two primers are proximate (adjacent to the same gap). Multiplex PCR can treat multiple primers simultaneously and outputs if there is a pair of adjacent primers in the input set and even sometimes the number of such pairs. Introduction
8
Two primers of each contig are “mixed together”
Find a Hamiltonian cycle by PCRs! Introduction
9
Primers are treated independently.
Find a perfect matching by PCRs. Introduction
10
Goal Our goal is to provide an experimental protocol that identifies all pairs of adjacent primers with as few PCRs (queries) (or multiplex PCRs respectively) as possible. Introduction
11
Mathematical Models Hidden Graphs (Reconstructed)
Topology-known graphs, e.g. Hamiltonian cycle, matching, star, clique, bipartite graph, …, etc. Graphs of bounded degree Hypergraphs Graphs of known number of edges REF Introduction
12
Models Multi-vertex model Quantitative multi-vertex model
k-vertex model Quantitative k-multi-vertex model Learning a hidden graph by edge-detecting queries: 8
13
Described into Math Part II
Algorithms Adaptive algorithms: a query can depend on the answers obtained by previous queries. Nonadaptive algorithms: queries are independent and can be processed in parallel. Hidden Graph Introduction
14
Example 3 4 8 7 1 2 5 6 G :
15
Q({1,2,3,4,5,6,7,8}) = 1 3 4 8 7 1 2 5 6
16
Q({1,2,3,4}) = 0 3 4 8 7 1 2 5 6
17
Q({1,2,3,4,5,7}) = 1 3 4 8 7 1 2 5 6
18
3 4 8 7 1 2 5 6 v = {5}, S \ {v} = {1, 2, 3, 4} Q({1,2,3,4,5}) = 1 v Q({5,1,2}) = 0 Q({5,3}) = 1 5 2 1 4 3 5 2 1 4 3
19
Known Results (Matching)
The information-theoretic lower bound for matching is (1+o(1))nlgn bound can be reached by an adaptive algorithm. [Bouvel, et al. 05’]. Proof. Nonadaptive algorithms require queries. [Alon, Beigel, Kasif, Rudich, Sudakov 02’]. Proof Introduction
20
Strategy: first to find one vertex
Theorem: [Angluin 06’] A vertex in a hidden graph on n vertices can be reconstructed with at most queries. Proof. Introduction
21
Results Example of Find-One-Vertex
Introduction
22
Known Results on Other Graphs
Hamiltonian[lower][upper] Star Introduction
23
Hamiltonian cycle ~ adap.
O(nlgn) bound can be reached by an adaptive algorithm. [Grebinski, Kucherov 1997]. Proof. To process all vertices one-by-one by storing them in the independent set of chains. case I: no/no case II: yes/no case III: yes/yes at most 2nlgn queries BACK Introduction
24
How about more general graphs?
25
Lower bound Theorem 3. For any , edge-detecting queries are required to identify a graph drawn from the class of all graphs with vertices and edges. Proof. 18
26
Main Ideas If there are edges between two independent sets A and B, we may find all of the edges by using (a, B)-algorithm, a A. We start with finding the maximal matching! Algorithm 1. MAXIMAL_MATCHING(V) Algorithm 2. PARTITION_OF_VERTEX_SET(V) Algorithm 3. HIDDEN_GRAPH(V) 20
27
Reference Reconstructing a Hamiltonian cycle by querying the graph: Application to DNA physical mapping [Grebinski and Kucherov 98’ ] Learning a hidden Matching [ N. Alon et al, 04’] Learning a hidden graph using O(lgn) queries per edge. [Angluin and Chen 04’] Learning a hidden subgraph [Alon and Asodi, 05’] Combinatorial search on graphs motivated by bioinformatics applications: a brief survey [Bouvel, Grebinski and Kucherov, 05’] Learning a hidden hypergraph [Angluin and Chen, 06’] Math Introduction
28
Example (Algorithm A(V): Finding an edge on V)
6 8 5 7 2 1 4 3 MAXIMAL_MATCHING(V) Algorithm A({1,2,3,4,5,6,7,8}) 1 3 Algorithm A({2,4,5,6,7,8}) 2 4 Algorithm A({5,6,7,8}) 5 7 Q({8,6}) = 0 21
29
Algorithm 2 PARTITION_OF_VERTEX_SET(V) 6 8 6 8 G : 6 8 5 7 2 1 4 3 1 3
21
30
Algorithm 3 It is left to find all the edges between independent sets. Now, a general graph is reconstructed.
31
Don’t Stop!
32
Complexity The number of queries is less than 2m(log n + 9).
Algorithm 1. Line Number of queries 2 3 total
33
Algorithm 2. Algorithm 3. Line Number of queries 2 3 total Line
1 7 14+17 0 (all of queries be answered in algorithm 2. , 10th line) 15+18 26 total
34
Concluding remarks Reduce the rounds of Algorithm 1 (i.e., obtain an efficient algorithm to find a maximal matching). Learning a hidden graph in Quantitative k-multi-vertex model. 24
35
References [1] N. Alon, R. Beigel, S. Kasif, S. Rudich,and B. Sudakov. Learning a hidden matching, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 197–206, 2002. [2] D. Angluin and J. Chen. Learning a hidden graph using O(log n) queries per edge. Manuscript, 2006. [3] D. Angluin and J. Chen. Learning a hidden hypergraph of Machine Learning Research 7, , 2007. [4] R. Beigel, N. Alon, S. Kasif, M. S. Apaydin and L. Fortnow. An optimal procedure for gap closing in whole genome shotgun sequencing, In RECOMB, 22–30, 2001. [5] V. Grebinski and G. Kucherov. Optimal query bounds for reconstructing a Hamiltonian cycle in complete graphs, In fifth Israel symposium on the Theory of Computing Systems, , 1997. [6] V. Grebinski and G. Kucherov. Reconstructing a Hamiltonian cycle by querying the graph: Application to DNA physical mapping. Discrete Applied Math., 88(1-3): 147–165, 1998. 25
36
Thank you for your attention!
Introduction
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.