Combinatorial Group Testing 傅恆霖應用數學系. Mathematics? You are you, you are the only one in the world just like you! Mathematics is mathematics, there is.

Slides:



Advertisements
Similar presentations
CS 336 March 19, 2012 Tandy Warnow.
Advertisements

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Lecture 17 Path Algebra Matrix multiplication of adjacency matrices of directed graphs give important information about the graphs. Manipulating these.
Computational problems, algorithms, runtime, hardness
Turning Privacy Leaks into Floods: Surreptitious Discovery of Social Network Friendships Michael T. Goodrich Univ. of California, Irvine joint w/ Arthur.
Combinatorial Designs and Related Discrete Combinatorial Structures Sarah Spence Adams Fall 2008.
Important Problem Types and Fundamental Data Structures
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
Physical Mapping of DNA Shanna Terry March 2, 2004.
MCS312: NP-completeness and Approximation Algorithms
© by Kenneth H. Rosen, Discrete Mathematics & its Applications, Sixth Edition, Mc Graw-Hill, 2007 Chapter 9 (Part 2): Graphs  Graph Terminology (9.2)
1 Software Testing. 2 Path Testing 3 Structural Testing Also known as glass box, structural, clear box and white box testing. A software testing technique.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Pooling designs for clone library screening in the inhibitor complex model Department of Mathematics and Science National Taiwan Normal University (Lin-Kou)
Lecture 10 Applications of NP-hardness. Knapsack.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Nonunique Probe Selection and Group Testing Ding-Zhu Du.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Chapter 7 Dynamic Programming 7.1 Introduction 7.2 The Longest Common Subsequence Problem 7.3 Matrix Chain Multiplication 7.4 The dynamic Programming Paradigm.
Graphs Basic properties.
C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
Introduction to NP Instructor: Neelima Gupta 1.
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
Computational Molecular Biology
Learning Hidden Graphs Hung-Lin Fu 傅 恆 霖 Department of Applied Mathematics Hsin-Chu Chiao Tung Univerity.
Group Testing and Its Applications
Computational Molecular Biology Pooling Designs – Inhibitor Models.
Sorting by placement and Shift Sergi Elizalde Peter Winkler By 資工四 B 周于荃.
Group Testing – An Introduction 簡介群試理論
Search Methodology – An Introduction Hung-Lin Fu 傅 恆 霖.
Fundamental Graph Theory (Lecture 1) Lectured by Hung-Lin Fu 傅 恆 霖 Department of Applied Mathematics National Chiao Tung University.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Group Testing and Its Applications
The NP class. NP-completeness
Eigenfaces (for Face Recognition)
Applied Discrete Mathematics Week 14: Trees
Computational problems, algorithms, runtime, hardness
Chapter 9 (Part 2): Graphs
Applied Discrete Mathematics Week 13: Graphs
Special Graphs By: Sandeep Tuli Astt. Prof. CSE.
Group Testing – a tool of protecting Network Security
Copyright © Zeph Grunschlag,
Lecture 2-6 Polynomial Time Hierarchy
Group Testing and Its Applications
Introduction to Graph Theory
Computational Molecular Biology
Chapter 5. Optimal Matchings
Degree and Eigenvector Centrality
Graphs.
Analysis and design of algorithm
ICS 353: Design and Analysis of Algorithms
Graphs Chapter 13.
More about the Learning Algorithm More about the Class being learned
Graphs All tree structures are hierarchical. This means that each node can only have one parent node. Trees can be used to store data which has a definite.
Chapter 2 Determinants Basil Hamed
Quantum One.
CSE 589 Applied Algorithms Spring 1999
CPS 173 Computational problems, algorithms, runtime, hardness
Systems of distinct representations
Lecture 2-5 Applications of NP-hardness
Pooling Designs in Group Testing
Computational Molecular Biology
Applied Discrete Mathematics Week 13: Graphs
Implementation of Learning Systems
Agenda Review Lecture Content: Shortest Path Algorithm
Learning a hidden graph with adaptive algorithms
Unit III – Chapter 3 Path Testing.
Presentation transcript:

Combinatorial Group Testing 傅恆霖應用數學系

Mathematics? You are you, you are the only one in the world just like you! Mathematics is mathematics, there is nothing else in the world just like mathematics! Mathematics is a part of “ Life ”.

What is Mathematics?

Mathematics is the study of quantity, structure, space, relation, change, and various topics of pattern, form and entity. Mathematicians seek out patterns whether found in numbers, space, natural science, computers, imaginary abstractions, or elsewhere. Mathematicians formulate new conjectures and establish truth by rigorous deduction from appropriately chosen axioms and definitions.quantity structurespacerelationMathematiciansnumbersnatural sciencecomputers conjectures rigorousdeductionaxiomsdefinitions

Combinatorial Theory What is Combinatorial Theory? We are not interested in “ limits ” and “ ε ”. We are interested in “ counting ”. In order to count, we start with Arithmetic and learn to count from 1, 2, 3, …. Existence of “ Configurations ” is an important theme in this theory.

Configuration? The term configuration has several meanings. In computing it may refer to: Computer configuration or system configuration Configure (computing) is the output of Autotools and used to detect system configuration. This is referred to as "./configure" in Unix Configuration file is a software file used to configure the initial settings...

Other usages include: Geometric configuration Molecular configuration Graphs … The following notion can be considered as one of the configurations!

Quiz Problem: I have at most two favor numbers in my pocket and the number are in {1,2,3,…,63}. Can you find the numbers by asking (me) as few queries ( 問題 ) as possible?

An Answer for “ one ” number

Syphilitic Blood Testing Tests (World war II)

One by one

More positives (72 tests)

Test a group at one time

Red is positive ( 陽性 )and Blue is negative ( 陰性 )

Some group is negative!

They are done!

Save 33-3 times Save a lot of Money! (Three Positive Groups) What ’ s next?

Group Testing Robert Dorfman ‘ s paper in 1943 introduced the field of (Combinatorial) Group Testing. The motivation arose during the Second World War when the United States Public Health Service and the Selective service embarked upon a large scale project. The objective was to weed out all syphilitic ( 梅毒 ) men called up for induction. However, syphilis testing back then was expensive and testing every soldier individually would have been very cost heavy and inefficient. A basic breakdown of a test is: Say we have soldiers, then this method of testing leads to tests. If we have 70-75% of the people infected then the method of individual testing would be reasonable. Our goal however, is to achieve effective testing in the more likely scenario where it does not make sense to test 100,000 people to get (say) 10 positives.

Formal Definitions Consider a set N of n items consisting of at most d positive (used to be called defective) items with the others being negative (used to be called good) items. A group test, sometimes called a pool, can be applied to an arbitrary set S of items with two possible outcomes; negative: all items in S are negative; positive: at least one positive item in S, not knowing which or how many.

Adaptive (Sequential) and Non-adaptive Can you see the difference between the above two ways in finding the answer? The first one (adaptive or sequential): You ask the second question (query) after knowing the answer of the first one and continue …. That is, the previous knowledge will be used later. The second one (non-adaptive): You can ask all the questions (queries) at the same time.

Non-adaptive Algorithm We can use a matrix to describe a non-adaptive algorithm. Items are indexed by columns and the tests are indexed by rows. Therefore, the (i, j) entry is 1 if the item j is included in the pool i (for test), and 0 otherwise.

An Example Let M = [m i,j ] be a txn matrix mentioned above. Then we can use n sets (ordered) S i ’ s to represent the matrix where S k = {i : m i,k = 1, i = 1, 2, …, t}, k = 1, 2, …, n. The following sets represent a (0,1)-matrix: {1,2,3}, {4,5,6}, {7,8,9}, {1,4,7}, {2,5,8}, {3,6,9}, {1,5,9}, {2,6,7}, {3,4,8}, {1,6,8}, {2,4,9}, {3,5,7}. (Have you seen this collection of sets before?)

Incidence Matrix

Can we find positives from the above matrix? Yes, we can if the number of positives is not too many, say at most 2, by running the 9 tests simultaneously corresponding to rows. The reason is that the union of (at most) 2 columns can not contain any other distinct column. (?)

d-separable and d-disjunct matrices A matrix is d-separable if  D   D ’ for any two distinct d-sets D and D ’ (columns), i.e. no two unions of d columns are the same. A matrix is d-disjunct if no column is contained in the union of any other d columns.

Important Facts A d-disjunct matrix is also a d-separable matrix. A d-disjunct matrix can be applied to find k (  d) positives. Proof. The union of k (  d) columns corresponds to distinct outcome vector. d-disjunct matrices have a simple decoding algorithm, namely, a column is positive if and only if it does not appear in a negative row.

More group testing models

Class Note A design is a pair (X,B) where X is a v-set and B is a collection of subsets of X. So, the matrix mentioned above can be viewed as a design. A design defined on a v-set can be viewed as a binary code of length v. A design (X,B) can also be viewed as a hypergraph with vertex set X and edge set B.

Application In screening clone library the goal is to determine which clones in the clone library hybridize with a given probe in an efficient fashion. A clone is said to be positive if it hybridize with the probe( 探針 ), and negative otherwise.

Shotgun Sequencing Shotgun sequencing is a throughput technique resulting in the sequencing of a large number of bacterial genomes, mouse genomes and the celebrated human genomes. In all such projectss, we are left with a collection of contigs that for special reasons cannot be assembled with general assembly algorithms. Continued …

Random shotgun approach cut many times at random (Shotgun) genomic segment 6

38 Whole-genome shotgun sequencing –Short reads are obtained and covering the genome with redundancy and possible gaps. Circular genome

–Reads are assembled into contigs with unknown relative placement.

–Primers : (short) fragments of DNA characterizing ends of contigs.

–A PCR (Polymerase Chain Reaction) reaction reveals if two primers are proximate (adjacent to the same gap). –Multiplex PCR can treat multiple primers simultaneously and outputs if there is a pair of adjacent primers in the input set and even sometimes the number of such pairs.

Two primers of each contig are “ mixed together ” –Find a Hamiltonian cycle by PCRs!

Primers are treated independently. –Find a perfect matching by PCRs.

Goal Our goal is to provide an experimental protocol that identifies all pairs of adjacent primers with as few PCRs (queries) (or multiplex PCRs respectively) as possible.

Mathematical Models Hidden Graphs (Reconstructed)

Models 1)Multi-vertex model 2)Quantitative multi-vertex model 3)k -vertex model 4)Quantitative k -multi-vertex model Learning a hidden graph by edge-detecting queries: 8

Then, … We can delete the edge and start it over again to find another edge until all the edges are found. Clearly, if the hidden graph we are looking for is of large size compare to n, then this algorithm is not going to be a good one. We may simply ask all the n(n-1)/2 2- subsets of [n].

Open Problems (Group Testing) What is the minimum number of tests we need in order to find at most two positives in 10,000 items? ( 懸賞 NT 2,000 dollars) How about three positives out of 10,000 items? ( 懸賞 NT 3,000 dollars) How about if mistakes (lies) happened sometime?