Computational Molecular Biology Pooling Designs – Inhibitor Models
An Inhibitor Model _ _ _ _ _ + _ x + In sample spaces, exists some inhibitors Inhibitor = anti-positive (Positives + Inhibitor) = Negative _ _ _ _ _ Inhibitor + More challenging, in the pooling designs, there exists another type of clones called Inhibitor. _ x + Negative My T. Thai mythai@cise.ufl.edu
An Example of Inhibitors My T. Thai mythai@cise.ufl.edu
Inhibitor Model Definition: Given a sample with d positive clones, subject to at most r inhibitors Find a pooling design with a minimum number of tests to identify all the positive clones (also design a decoding algorithm with your pooling design) My T. Thai mythai@cise.ufl.edu
Inhibitors with Fault Tolerance Model Definition: Given n clones with at most d positive clones and at most r inhibitors, subject to at most e testing errors Identify all positive items with less number of tests My T. Thai mythai@cise.ufl.edu
Preliminaries My T. Thai mythai@cise.ufl.edu
2-stages Algorithm What is AI? The set AI should contains all the inhibitors and no positives. Hence the set PN contains all positives (and some negatives) but no inhibitors My T. Thai mythai@cise.ufl.edu
2-stages Algorithm At this stage, the problem become the e-error-correcting problem. My T. Thai mythai@cise.ufl.edu
Non-adaptive Solution (1 stage) P contains all positives N contains all negatives O contains all inhibitors and no positives My T. Thai mythai@cise.ufl.edu
Non-adaptive Solution My T. Thai mythai@cise.ufl.edu
Generalization The positive outcomes due to the combination effect of several items Items are molecules Depends on a complex: subset of molecules Example: complexes of Eukaryotic DNA transcription and RNA translation Now, let’s look at the pooling design problem at the different angle. Up until this point, we have seen that the pool is positive if it contains a positive item. However, under some scenarios, the positive outcome of a test dues to the combination effect of several items. For example, if items are molecules, the many biological processes depends on the present of a complex, which is a subset of molecules. The study of this general version is very important. My T. Thai mythai@cise.ufl.edu
A Complex Model Definition Pool: set of subsets of items Given n items and a collection of at most d positive subsets Identify all positive subsets with the minimum number of tests Pool: set of subsets of items Positive pool: Contains a positive subset And we call this general version a Complex Model. In the complex model, given n items and a collection of at most d positive subsets, the goal is to identify all positive subsets with minimum number of test. Here, in this context, the pool is a set of subsets of items. The pools is positive if it contains a positive subset My T. Thai mythai@cise.ufl.edu
What is Hypergraph H? H = (V,E ) where: V is a set of n vertices (items) E a set of m hyperedges Ej where Ej is a subsets of V Rank: r = max {| Ej| s.t Ej inE } My T. Thai mythai@cise.ufl.edu
Group Testing in Hypergraph H Definition: Given H with at most d positive hyperedges Identify all positive hyperedges with the minimum number of tests Hyperedges = suspect subsets Positive hyperedges = positive subsets Positive pool: contains a positive hyperedge Assume that Ei Ej My T. Thai mythai@cise.ufl.edu
d(H)-disjunct Matrix Definition: Decoding Algorithm: M is a binary matrix with t rows and n columns For any d + 1 edges E0, E1, …, Ed of H, there exists a row containing E0 but not E1, …, Ed Decoding Algorithm: Remove all negatives edges from the negative pools Remaining edges are positive How can we solve this problem? Can we extended the d-disjunct matrix for the classical problem to the complex model. Fortunately, the answer is yes, we call it d(H)-disjunct matrix. The d(H)-disjunct matrix is also a binary matrix, with n column representing the number of items and t rows, representing the number of pools. Using the same method as in the classical model, we were able to prove that the number of edges is not in the negative pools are at most d. Hence, the decoding algorithm is simply just similar as before… My T. Thai mythai@cise.ufl.edu
Construction Algorithms Consider a finite field GF(q). Choose k, s, and q: Step 1: for each v in V associate v with pv of degree k -1 over GF(q) My T. Thai mythai@cise.ufl.edu
A Proposed Algorithm Step 2: Construct matrix Asxm as follows: for x from 0 to s -1 (rkd <=s < q) for each edge Ej inE A[x,Ej] = PE(x) = {pv(x) | v in Ej} E1 E2 Ej Em 1 A = x PE2(x) PEj(x) s-1 My T. Thai mythai@cise.ufl.edu
A Proposed Algorithm Step 3: Construct matrix Btxn from Asxm as follows: for x from 0 to s -1 for each PEj(x) for each vertex v in V if pv(x) in PEj(x), then B[(x, PEj(x)),v] = 1 else B[(x, PEj(x)),v] = 0 E1 E2 Ej Em 1 A = x PEj(x) s-1 v1 v2 vj vn (0, PE0(0)) (0, PE1(0)) B = (x, PEj(x)) (s-1, PEm(s-1)) 1 My T. Thai mythai@cise.ufl.edu
Analysis Theorem: If rd (k -1) + 1≤ s ≤ q, then B is d(H)-disjunct My T. Thai mythai@cise.ufl.edu
Proof of d(H)-disjunct Matrix Construction Matrix A has this property: For any d + 1 columns C0, …, Cd, there exists a row at which the entry of C0 does not contain the entry of Cj for j = 1…d Proof: Using contradiction method. Assume that that row does not exist, then there exists a j (in 1…d) such that entries of C0 contain corresponding entries of Cj at least r(k-1)+1 rows. Then PEj(x) is in PE0(x) for at least r(k-1)+1 distinct values of x. This means that Ej is in E0 My T. Thai mythai@cise.ufl.edu
Proof of d(H)-disjunct Matrix Construction (cont) Prove B is d(H)-disjunct Proof: A has a row x such that the entry F in cell (x, E0) does not contain the entry at cell (x, Ej) for all j = 1…d. Then the row <x,F> in B will contain E0 but not Ej for all j = 1…d My T. Thai mythai@cise.ufl.edu