Computational Molecular Biology

Slides:



Advertisements
Similar presentations
Vector Spaces A set V is called a vector space over a set K denoted V(K) if is an Abelian group, is a field, and For every element vV and K there exists.
Advertisements

Presented by: Ms. Maria Estrellita D. Hechanova, ECE
Theory of Computing Lecture 23 MAS 714 Hartmut Klauck.
Lecture 24 MAS 714 Hartmut Klauck
BCH Codes Hsin-Lung Wu NTPU.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
1. 2 Overview Review of some basic math Review of some basic math Error correcting codes Error correcting codes Low degree polynomials Low degree polynomials.
Information and Coding Theory
Complexity ©D Moshkovitz 1 Approximation Algorithms Is Close Enough Good Enough?
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Information and Coding Theory Finite fields. Juris Viksna, 2015.
Math 3121 Abstract Algebra I
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Chapter 5 Orthogonality
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
Turning Privacy Leaks into Floods: Surreptitious Discovery of Social Network Friendships Michael T. Goodrich Univ. of California, Irvine joint w/ Arthur.
Correcting Errors Beyond the Guruswami-Sudan Radius Farzad Parvaresh & Alexander Vardy Presented by Efrat Bank.
Algorithm Design Techniques: Induction Chapter 5 (Except Section 5.6)
Contents Introduction Related problems Constructions –Welch construction –Lempel construction –Golomb construction Special properties –Periodicity –Nonattacking.
CHAPTER 4 Decidability Contents Decidable Languages
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
Orthogonality and Least Squares
Copyright © Cengage Learning. All rights reserved.
October 1, 2009Theory of Computation Lecture 8: Primitive Recursive Functions IV 1 Primitive Recursive Predicates Theorem 6.1: Let C be a PRC class. If.
Linear codes 1 CHAPTER 2: Linear codes ABSTRACT Most of the important codes are special types of so-called linear codes. Linear codes are of importance.
M. Khalily Dermany Islamic Azad University.  finite number of element  important in number theory, algebraic geometry, Galois theory, cryptography,
DIGITAL COMMUNICATION Error - Correction A.J. Han Vinck.
Cyclic codes 1 CHAPTER 3: Cyclic and convolution codes Cyclic codes are of interest and importance because They posses rich algebraic structure that can.
Chapter 5: The Orthogonality and Least Squares
Linear Algebra Chapter 4 Vector Spaces.
Research on the Discrete Logarithm Problem Wang Ping Meng Xuemei
Introduction to Proofs
Great Theoretical Ideas in Computer Science.
06/10/2015Applied Algorithmics - week81 Combinatorial Group Testing  Much of the current effort of the Human Genome Project involves the screening of.
Algebraic and Transcendental Numbers
 2004 SDU Lecture 7- Minimum Spanning Tree-- Extension 1.Properties of Minimum Spanning Tree 2.Secondary Minimum Spanning Tree 3.Bottleneck.
3. Counting Permutations Combinations Pigeonhole principle Elements of Probability Recurrence Relations.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Pooling designs for clone library screening in the inhibitor complex model Department of Mathematics and Science National Taiwan Normal University (Lin-Kou)
Session 1 Stream ciphers 1.
Lecture 10 Applications of NP-hardness. Knapsack.
Great Theoretical Ideas in Computer Science.
Nonunique Probe Selection and Group Testing Ding-Zhu Du.
DIGITAL COMMUNICATIONS Linear Block Codes
The Fast Fourier Transform and Applications to Multiplication
Chapter 31 INTRODUCTION TO ALGEBRAIC CODING THEORY.
Information and Coding Theory Cyclic codes Juris Viksna, 2015.
Information Theory Linear Block Codes Jalal Al Roumy.
Word : Let F be a field then the expression of the form a 1, a 2, …, a n where a i  F  i is called a word of length n over the field F. We denote the.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Basic Concepts of Encoding Codes and Error Correction 1.
CS Lecture 14 Powerful Tools     !. Build your toolbox of abstract structures and concepts. Know the capacities and limits of each tool.
Approximation Algorithms by bounding the OPT Instructor Neelima Gupta
An Algorithm for the Consecutive Ones Property Claudio Eccher.
Approximation Algorithms based on linear programming.
Matrices CHAPTER 8.9 ~ Ch _2 Contents  8.9 Power of Matrices 8.9 Power of Matrices  8.10 Orthogonal Matrices 8.10 Orthogonal Matrices 
Computational Molecular Biology Pooling Designs – Inhibitor Models.
1 Design and Analysis of Algorithms Yoram Moses Lecture 13 June 17, 2010
The Acceptance Problem for TMs
Chapter 3 The Real Numbers.
Unit-III Algebraic Structures
Lecture 2-6 Polynomial Time Hierarchy
Great Theoretical Ideas In Computer Science
§3-3 realization for multivariable systems
Computational Molecular Biology
RS – Reed Solomon List Decoding.
I. Finite Field Algebra.
Lecture 2-5 Applications of NP-hardness
Computational Molecular Biology
Locality In Distributed Graph Algorithms
Presentation transcript:

Computational Molecular Biology Group Testing – Pooling Designs

Group Testing (GT) Definition: Each test is on a subset of items Given n items with at most d positive ones Identify all positive ones by the minimum number of tests Each test is on a subset of items Positive test outcome: there exists a positive item in the subset My T. Thai mythai@cise.ufl.edu

An Idea of GT _ _ _ _ _ _ _ _ _ _ _ + _ _ _ _ _ + Positive Negative My T. Thai mythai@cise.ufl.edu

Example 1 – Sequential Method 1 2 3 4 5 6 7 8 9 1 2 3 4 5 4 5 My T. Thai mythai@cise.ufl.edu

Example 2 – Non-adaptive Method P4 p5 p6 p1 1 2 3 p2 4 5 6 p3 7 8 9 Non-adaptive group testing is called pooling design in biology My T. Thai mythai@cise.ufl.edu

Sequential and Non-adaptive Sequential GT needs less number of tests, but longer time. Non-adaptive GT needs more tests, but shorter time. In molecular biology, non-adaptive GT is usually taken. Why? My T. Thai mythai@cise.ufl.edu

Because… The same library is screened with many different probes. It is expensive to prepare a pool for testing first time. Once a pool is prepared, it can be screened many times with different probes. Screening one pool at a time is expensive. Screening pools in parallel with same probe is cheaper. There are constrains on pool sizes. If a pool contains too many different clones, then positive pools can become too dilute and could be mislabeled as negative pools. My T. Thai mythai@cise.ufl.edu

Pooling Designs Problem Definition Pool: a subset of clones Given a set of n clones with at most d positive clones Identify all positive clones with the minimum number of tests Pool: a subset of clones Positive pool: a pool contains at least one positive clone Clones = Items My T. Thai mythai@cise.ufl.edu

Relation to Pooling Designs clones c1 c2 cj cn p1 0 0 … 0 … 0 … 0 … 0 0 p2 0 1 … 0 … 0 … 0 … 0 1 pools . . . . pi 0 0 … 0 … 1 … 0 … 0 1 pt 0 0 … 0 … 0 … 0 … 0 0 txn tx1 M[i, j] = 1 iff the ith pool contains the jth clone Decoding Algorithm: Given M and V, identify all positive clones V Testing Mtxn = My T. Thai mythai@cise.ufl.edu

Observation Observation: All columns are distinct. clones c1 c2 c3 cj p1 1 1 1 0 0 0 0 0 0 p2 0 0 0 1 1 1 0 0 0 p3 0 0 0 0 0 0 1 1 1 pools 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 Observation: All columns are distinct. To identify up to d positives, all unions of up to d columns should be distinct! Union of d columns: Boolean sum of these d columns My T. Thai mythai@cise.ufl.edu

Challenges Challenge 1: How to construct the binary matrix M such that: Outputs of any union of d columns are distinct Challenge 2: How to design a decoding algorithm with efficient time complexity [O(tn)] My T. Thai mythai@cise.ufl.edu

d-separable Matrix All unions of d columns are distinct. clones c1 c2 c3 cj cn p1 0 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 p2 0 1 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 p3 1 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 pools 0 0 1 … 0 … 0 … 0 … 0 … 0 … 0 … 0 . pi 0 0 0 … 0 … 0 … 1 … 0 … 0 … 0 … 0 pt 0 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 All unions of d columns are distinct. My T. Thai mythai@cise.ufl.edu

d-separable Matrix All unions of up to d columns are distinct. clones c1 c2 c3 cj cn p1 0 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 p2 0 1 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 p3 1 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 pools 0 0 1 … 0 … 0 … 0 … 0 … 0 … 0 … 0 . pi 0 0 0 … 0 … 0 … 1 … 0 … 0 … 0 … 0 pt 0 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 All unions of up to d columns are distinct. Decoding: O(nd) My T. Thai mythai@cise.ufl.edu

d-disjunct Matrix Definition: A binary matrix Mtxn is a d-disjunct matrix (d < t) if: The union of any d columns does not contain any other column Example: 1 0 0 0 1 0 0 0 1 A 2-disjunct matrix M = My T. Thai mythai@cise.ufl.edu

d-disjunct Matrix (cont) d-disjunct matrix can efficiently identify up to d positive clones. Why? Theorem 1: All unions of d distinct columns are distinct (thus d-disjunct implies d-separable) Theorem 2: The number of clones not in negative pools is always at most d Corollary 1: The tests of negative outputs determine all negative clones Decoding time complexity: O(tn) My T. Thai mythai@cise.ufl.edu

Proof of Theorem 2 Note that an item does not appearing in any negative pool iff its corresponding column is contained by the union of d positive columns Therefore, the number of items not appearing in any negative pool is more than d iff there are at least a non-positive item whose column is contained by the d positive columns But M is d-disjunct, hence Theorem 2 follows My T. Thai mythai@cise.ufl.edu

Decoding Algorithm Input: d-disjunct matrix M and output vector V Output: All positive clones for each clone c in n clones if c is in a negative pool remove c return remaining clones c1 c2 c3 c4 c5 c6 p1 1 1 1 0 0 0 1 P2 1 0 0 1 1 0 0 P3 0 1 0 1 0 1 0 P4 0 0 1 0 1 1 1 My T. Thai mythai@cise.ufl.edu

Fields Field: is any set of elements that satisfies the field axioms for both addition and multiplication and is a division algebra Eg: Compex, Rational, Real My T. Thai mythai@cise.ufl.edu

Division Algebra My T. Thai mythai@cise.ufl.edu

Finite Fields Finite Field: is a field with a finite field order, i.e., number of elements. The order of a finite field is always a prime or a prime power (power of a prime) Eg: 16 = 2^4 is a prime power where 6, 15 are not Eg: in GF(5), 4+3=7 is reduced to 2 modulo 5 My T. Thai mythai@cise.ufl.edu

How to construct a d-disjunct matrix Consider a finite field GF(q). Choose s, q, k satisfying: Step 1: Construct matrix Asxn as follows: for x from 0 to s -1 for each polynomials pj of degree k A[x,pj] = pj(x) p1 p2 pj pn 1 A = x p2(x) pj(x) s-1 First, consider a finite field of order q. Construct the matrix A with s rows and n columns. Rows are indexed in the value of s where columns are indexed in the value of polynomials of degree k. The value of each cell in the matrix A is assigned as follows: For each row x and colums pj, we assign the value pj(x) over the finite field. Why n \le q^k, to make sure we have enough polynomial to associate with n items My T. Thai mythai@cise.ufl.edu

Algorithm (cont) Step 2: Construct matrix Btxn from Asxn as follows: for x from 0 to s -1 for y from 0 to q -1 for each polynomials pj of degree k if A[x,pj] = = y B[(x,y),pj] = 1 else B[(x,y),pj] = 0 p1 p2 pj pn 1 A = x p2(x) pj(x) s-1 p2(x) ≠ y p1 p2 pj pn (0,0) (0,1) B = (x,y) (s-1,q-1) pj(x) = y Next, at the second step, we construct the matrix B from the matrix A. The matrix B has t rows and n columns. The columns are indexed in the polynomials of degree k. The rows are indexed in the ordered pairs in s values and in q values. The values of each cell in the matrix B is assigned as follows: Let look at the row (x,y) and the column p2 in B. Then in the matrix A, if this cell is != to y, then … We claim that B is d-disjunct and just use the simple decoding algorithm as we just presented to identify all the positive clones. 1 My T. Thai mythai@cise.ufl.edu

Algorithm Analysis Theorem 3: (Correctness) If kd ≤ s ≤ q, then Btxn is d-disjunct. Theorem 4: The number of tests t obtained from this algorithm is t = qs = O(q2) where: My T. Thai mythai@cise.ufl.edu

Errors in Experiments False negative: False positive: Pool contains some positive clones But return the negative outcome False positive: Pool contains all negative clones But return the positive outcome Now, it is well known that there may exist some errors in biological experiments. The test may return some false negative or false positive results. In the false negative, the pool contains some positive clones. It should return a positive outcome. However, under testing errors, it return a negative outcome. Likewise, in the false positive, the pool contains all negative clones. So, how we can correct these errors? My T. Thai mythai@cise.ufl.edu

An e-Error Correcting Model Definition: Assume that there is at most e errors in testing All positive clones can still be identified Hamming distance: the Hamming distance of two column vectors is the number of different components between them e-error-correcting: A matrix is said to be e-error-correcting if the Hamming distance of any two unions of d columns is at least 2e + 1 We call this an e-error correcting model. In this model, we assume that there is at most e errors in testing. After constructing the d-disjuct matrix and get the outcome vector which consists of at most e errors, the model is still able to correct these errors in order to identify all the positive clones. My T. Thai mythai@cise.ufl.edu

(d,e)-disjunct Matrix Definition: An t × n binary matrix M is (d, e)-disjunct if for any one column j and any other d columns j1, j2, . . . , jd, there exist e + 1 rows i0, i2, … , ie such that Miuj = 1 and Miujv = 0 for u = 0, 1,…, e and v = 1, 2, . . . , d My T. Thai mythai@cise.ufl.edu

E-error Correcting Theorem 5: For every (d,k)-disjunct matrix, the Hamming distance between any two unions of d columns is at least 2k + 2 My T. Thai mythai@cise.ufl.edu

Theorem 6 Theorem 6: Suppose testing is based on a (d,e)-disjunct matrix. If the number of errors is at most e, then the number of negative pools containing a positive item is always smaller than the number of negative pools containing a negative item My T. Thai mythai@cise.ufl.edu

Proof of Theorem 6 Let i be a positive item, j be a negative item. Suppose #negative pools containing i = m. Then m pools must receive errors. Hence, there are at most e – m error tests turning negative outcome to positive outcome. Moreover, if no error exists, # negative pools containing j is at least e + 1 due to (d,e)-disjunct. Hence #negative pools containing j is at least (e+1)-(e-m) = m +1>m My T. Thai mythai@cise.ufl.edu

Decoding in e-error-correcting Corollary: From Theorem 6, we see that to decode positives from testing based on (d,e)-disjuct matrix, we only need to compute the number of negative pools containing each item and select d smallest one. This runs in time O(nt) My T. Thai mythai@cise.ufl.edu

Decoding Algorithm with e Errors T = empty set for each clone ci (i = 1…n) t(ci) = # negative pools containing ci T = T t(ci) end for Let Td = set of d smallest t(ci) in T return ci if t(ci) in Td Time complexity: O(tn) In this proposed method, for each clone, we count the number of negative pools containing this clone. Then we just select d smallest one. This decoding algorithm is able to correct all e-errors and find all positive clones because of the previous theorem that we have proved. My T. Thai mythai@cise.ufl.edu