Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimizing pooling strategies for the massive next-generation sequencing of viral samples Pavel Skums 1 Joint work with Olga Glebova 2, Alex Zelikovsky.

Similar presentations


Presentation on theme: "Optimizing pooling strategies for the massive next-generation sequencing of viral samples Pavel Skums 1 Joint work with Olga Glebova 2, Alex Zelikovsky."— Presentation transcript:

1 Optimizing pooling strategies for the massive next-generation sequencing of viral samples Pavel Skums 1 Joint work with Olga Glebova 2, Alex Zelikovsky 2, Ion Mandoiu 3 and Yury Khudyakov 1 1 Centers for Disease Control and Prevention, Atlanta, GA 2 Georgia State University, Atlanta, GA 3 University of Connecticut, Storrs, CT

2 1. Massive NGS of viral samples 2. Optimal pooling design problem 3. Algorithm and results Outline

3 NGS in epidemiology Molecular surveillance Prediction of the epidemics progress Transmission networks Epidemiological parameters Vaccination strategies

4 NGS in epidemiology A large-scale molecular surveillance requires sequencing of unprecedentedly large sets of viral samples. NGS of tens of thousands of samples is highly cost- and labor- intensive. Example: sequencing 100K samples using 454 senior system with 50 MIDs doing one sequencing run per day Cost:  5000*(100 000)/50 = 10 000 000$ Time: (100 000)/50 = 2000 days  5.5 years

5 Optimal Pooling Design Problem Goal: a framework for identification of viral sequences from large number of samples using the smallest possible number of NGS runs. Idea: for n samples generate m pools (i.e. mixtures of samples) with m << n in such a way that every sample is uniquely identified by the pools to which it belongs.

6 Optimal Pooling Design Problem E Example. 8 samples: 1,2,3,4,5,6,7,8 Sequencing each sample separately: 8 runs 4 pools: M 1 = {1,2,3,4} M 2 = {5,6,7,8} M 3 = {1,2,5,6} M 4 = {1,3,5,7} Sequencing pools: 4 runs

7 Optimal Pooling Design Problem 4 pools: M 1 = {1,2,3,4} M 2 = {5,6,7,8} M 3 = {1,2,5,6} M 4 = {1,3,5,7} {1} = M 1  M 3  M 4 {2} = (M 1  M 3 ) \ M 4 …………………………………… {8} = (M 2 \ M 3 ) \ M 4

8 Optimal Pooling Design Problem Problem 1 (Optimal Pooling Design Problem). Given: a set of samples S = {S 1,...,S n } Find: a set of pools P = {P 1,…,P m }, P k  S for k=1,…,m such that 1)P 1  …P m = S 2) for every S i,S j  S there exists P k  P such that |P k  {S i,S j }| = 1 3) m is minimal Theorem1. There exists a solution of Problem 1 with m = log(n) + 1 (P k separates S i and S j )

9 Optimal Pooling Design Problem Additional conditions for the problem ConditionReasons Each pool contains at most k samples |P j | ≤ k for j = 1,…,m number of reads which could be obtained by each NGS technology is bounded if large number of samples are mixed in one pool, some of them may be lost due to a PCR bias

10 Optimal Pooling Design Problem Additional conditions for the problem ConditionReasons Each pool contains at most k samples |P j | ≤ k for j = 1,…,m number of reads which could be obtained by each NGS technology is bounded if large number of samples are mixed in one pool, some of them may be lost due to a PCR bias Each sample belongs to at least l pools |{j : S i  P j }| ≥ l for i = 1,…,n to ensure sufficient coverage for sequences of each sample

11 Optimal Pooling Design Problem Additional conditions for the problem ConditionReasons Each pool contains at most k samples |P j | ≤ k for j = 1,…,m number of reads which could be obtained by each NGS technology is bounded if large number of samples are mixed in one pool, some of them may be lost due to a PCR bias Each sample belongs to at least l pools |{j : S i  P j }| ≥ l for i = 1,…,n to ensure sufficient coverage for sequences of each sample Some pairs of samples should not be put into a same pool Some samples may intersect (if they belong to the same transmission cluster)

12 Optimal Pooling Design Problem ConditionReasons Each pool P j should be a clique of the graph G(S) Some samples may intersect (if they belong to the same transmission cluster) A graph G(S) = (V,E), where V = S S i S j  E if and only if there is a confidence that the samples S i and S j do not intersect.

13 Optimal Pooling Design Problem Problem 2 (Minimum Clique Test Set Problem). Given: a graph G=G(S), natural numbers k,l Find: a set of cliques P = {P 1,…,P m } of G such that 1) |P i | ≤ k for every i=1,…,m 2) every vertex v  V(G) belongs to at least l cliques from P 3) for every u,v  V(G) there exists P i  P such that |P i  {u,v}| = 1 4) m is minimal

14 Optimal Pooling Design Problem Minimum Test Set Problem (Garey, Johnson) Given: collection Q={Q 1,…,Q n } of subsets of a finite set S Find: a subcollection P = {P 1,…,P m }  Q such that 1) for every s i,s j  S there exists P r  P such that |P r  {s i,s j }| = 1 2) m is minimal

15 Problem reformulations Minimum Clique Test Set Problem Given: a graph G=G(S), natural numbers k,l Find: a set of cliques P = {P 1,…,P m } of G such that 1) |P i | ≤ k for every i=1,…,m 2) every vertex v  V(G) belongs to at least l cliques from P 3) for every u,v  V(G) there exists P i  P such that |P i  {u,v}| = 1 4) m is minimal Only some pairs of vertices should be separated A graph H with V(H)=V(G), uv  E(H) if and only if u and v should be separated

16 Problem reformulations Minimum Clique Test Set Problem Given: a graphs G and H on the same set of vertices V, natural numbers k,l Find: a set of cliques P = {P 1,…,P m } of G such that 1) |P i | ≤ k for every i=1,…,m 2) every vertex v  V(G) belongs to at least l cliques from P 3) for every uv  E(H) there exists P i  P such that |P i  {u,v}| = 1 4) m is minimal Only some pairs of vertices should be separated A graph H with V(H)=V(G), uv  E(H) if and only if u and v should be separated

17 Problem reformulations Minimum Clique Test Set Problem Given: a graphs G and H on the same set of vertices V, natural numbers k,l Find: a set of cliques P = {P 1,…,P m } of G such that 1) |P i | ≤ k for every i=1,…,m 2) every vertex v  V belongs to at least l cliques from P 3) for every uv  E(H) there exists P i  P such that |P i  {u,v}| = 1 4) m is minimal Replace each vertex v  V(G) with l pairwise non-adjacent copies

18 Problem reformulations Minimum Clique Test Set Problem Given: a graphs G and H on the same set of vertices V, natural number k Find: a set of cliques P = {P 1,…,P m } of G such that 1) |P i | ≤ k for every i=1,…,m 2) P 1  …P m = V(G) 3) for every uv  E(H) there exists P i  P such that |P i  {u,v}| = 1 4) m is minimal Replace each vertex v  V(G) with l pairwise non-adjacent copies

19 Problem reformulations Minimum Clique Test Set Problem Given: a graphs G and H on the same set of vertices V, natural number k Find: a set of cliques P = {P 1,…,P m } of G such that 1) |P i | ≤ k for every i=1,…,m 2) P 1  …P m = V 3) for every uv  E(H) there exists P i  P such that |P i  {u,v}| = 1 4) m is minimal For every u  V add a vertex x u and an edge ux u  E(H) 1 2 3

20 Problem reformulations Minimum Clique Test Set Problem Given: a graphs G and H on the same set of vertices V, natural number k Find: a set of cliques P = {P 1,…,P m } of G such that 1) |P i | ≤ k for every i=1,…,m 2) P 1  …P m = V 3) for every uv  E(H) there exists P i  P such that |P i  {u,v}| = 1 4) m is minimal For every u  V add a vertex x u and an edge ux u  E(H) 1 2 3 x1x1 x2x2 x3x3

21 Problem reformulations Minimum Clique Test Set Problem Given: a graphs G and H on the same set of vertices, natural number k Find: a set of cliques P = {P 1,…,P m } of G such that 1) |P i | ≤ k for every i=1,…,m 3) for every uv  E(H) there exists P i  P such that |P i  {u,v}| = 1 4) m is minimal For every u  V add a vertex x u and an edge ux u  E(H) 1 2 3 x1x1 x2x2 x3x3

22 Heuristic algorithm Input: a graphs G and H on the same set of vertices V, natural number k P =  WHILE  C  P C  V OR E(H)   find maximum cut (A,B) in H (using local search); for every a  A put w(a) = # of neighbors of a from B in H; for every b  B put w(b) = # of neighbors of b from A in H; find maximum clique C 1 with |C 1 |≤k in a subgraph G[A] with weights w; find maximum clique C 2 with |C 2 |≤k in a subgraph G[B] with weights w; C := argmax{w(C 1 ),w(C 2 )} ; P:= P  {C}; E(H) := E(H) \ {uv : u  C, v  V\C} ENDWHILE

23 Algorithm results 1. All samples are unrelated (i.e. G is complete graph) # samples# generated pools 43 84 165 326 647 1288 2569 51210 102411 204812

24 Algorithm results 2. Some samples are related (G is a random graph with the edge probability p)

25 Thank you!


Download ppt "Optimizing pooling strategies for the massive next-generation sequencing of viral samples Pavel Skums 1 Joint work with Olga Glebova 2, Alex Zelikovsky."

Similar presentations


Ads by Google