Tracing a Single User Joint work with Noga Alon. Group Testing Dorfman raised the following problem in 1941: All American inductees gave blood samples,

Slides:



Advertisements
Similar presentations
Ulams Game and Universal Communications Using Feedback Ofer Shayevitz June 2006.
Advertisements

On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Occupancy Problems m balls being randomly assigned to one of n bins. (Independently and uniformly) The questions: - what is the maximum number of balls.
1 Decomposing Hypergraphs with Hypertrees Raphael Yuster University of Haifa - Oranim.
The Communication Complexity of Approximate Set Packing and Covering
. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
Multicut Lower Bounds via Network Coding Anna Blasiak Cornell University.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Dynamic Wavelength Allocation in All-optical Ring Networks Ori Gerstel and Shay Kutten Proceedings of ICC'97.
Week 21 Basic Set Theory A set is a collection of elements. Use capital letters, A, B, C to denotes sets and small letters a 1, a 2, … to denote the elements.
COUNTING AND PROBABILITY
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Probability and Statistics Dr. Saeid Moloudzadeh Sample Space and Events 1 Contents Descriptive Statistics Axioms of Probability Combinatorial.
1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
61 Nondeterminism and Nodeterministic Automata. 62 The computational machine models that we learned in the class are deterministic in the sense that the.
Balanced Graph Partitioning Konstantin Andreev Harald Räcke.
Lecture 14: Oct 28 Inclusion-Exclusion Principle.
Conditional Regularity and Efficient testing of bipartite graph properties Ilan Newman Haifa University Based on work with Eldar Fischer and Noga Alon.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
1 Shira Zucker Ben-Gurion University of the Negev Advisors: Prof. Daniel Berend Prof. Ephraim Korach Anticoloring for Toroidal Grids.
Adaptiveness vs. obliviousness and randomization vs. determinism Dariusz Kowalski University of Connecticut & Warsaw University Andrzej Pelc University.
DANSS Colloquium By Prof. Danny Dolev Presented by Rica Gonen
Variable-Length Codes: Huffman Codes
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
Packing Element-Disjoint Steiner Trees Mohammad R. Salavatipour Department of Computing Science University of Alberta Joint with Joseph Cheriyan Department.
1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation.
Intro to Set Theory. Sets and Their Elements A set A is a collection of elements. If x is an element of A, we write x  A; if not: x  A. Say: “x is a.
Maximum Likelihood Estimation
Lecture II.  Using the example from Birenens Chapter 1: Assume we are interested in the game Texas lotto (similar to Florida lotto).  In this game,
Introduction to Graph Theory
ECON 6012 Cost Benefit Analysis Memorial University of Newfoundland
Applied Discrete Mathematics Week 9: Relations
Chapter 1 Probability Spaces 主講人 : 虞台文. Content Sample Spaces and Events Event Operations Probability Spaces Conditional Probabilities Independence of.
Previous Lecture: Data types and Representations in Molecular Biology.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
Section 7.2. Section Summary Assigning Probabilities Probabilities of Complements and Unions of Events Conditional Probability Independence Bernoulli.
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
The Integers. The Division Algorithms A high-school question: Compute 58/17. We can write 58 as 58 = 3 (17) + 7 This forms illustrates the answer: “3.
Pooling designs for clone library screening in the inhibitor complex model Department of Mathematics and Science National Taiwan Normal University (Lin-Kou)
CS 473Lecture X1 CS473-Algorithms I Lecture X1 Properties of Ranks.
Introduction to Real Analysis Dr. Weihu Hong Clayton State University 8/21/2008.
Chapter 9: Geometric Selection Theorems 11/01/2013
2-1 Sample Spaces and Events Random Experiments Figure 2-1 Continuous iteration between model and physical system.
Maximum density of copies of a graph in the n-cube John Goldwasser Ryan Hansen West Virginia University.
Chapter SETS DEFINITION OF SET METHODS FOR SPECIFYING SET SUBSETS VENN DIAGRAM SET IDENTITIES SET OPERATIONS.
Word : Let F be a field then the expression of the form a 1, a 2, …, a n where a i  F  i is called a word of length n over the field F. We denote the.
Basic Principles (continuation) 1. A Quantitative Measure of Information As we already have realized, when a statistical experiment has n eqiuprobable.
Sixth lecture Concepts of Probabilities. Random Experiment Can be repeated (theoretically) an infinite number of times Has a well-defined set of possible.
Introduction to Real Analysis Dr. Weihu Hong Clayton State University 8/19/2008.
Chapter 2. Conditional Probability Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
Main Menu Main Menu (Click on the topics below) Combinatorics Introduction Equally likely Probability Formula Counting elements of a list Counting elements.
Chapter 8: Relations. 8.1 Relations and Their Properties Binary relations: Let A and B be any two sets. A binary relation R from A to B, written R : A.
STT 315 This lecture note is based on Chapter 3
1 Covering Non-uniform Hypergraphs Endre Boros Yair Caro Zoltán Füredi Raphael Yuster.
Chapter Three SEQUENCES
 Public Goods In Networks Ryan Dewar. What is a public good  Public goods have two distinct aspects:  Non-excludable: The cost of keeping nonpayers.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Department of Statistics University of Rajshahi, Bangladesh
12. Lecture WS 2012/13Bioinformatics III1 V12 Menger’s theorem Borrowing terminology from operations research consider certain primal-dual pairs of optimization.
Computational Molecular Biology
Approximation Algorithms based on linear programming.
Sorting by placement and Shift Sergi Elizalde Peter Winkler By 資工四 B 周于荃.
SEQUENCES A function whose domain is the set of all integers greater than or equal to some integer n 0 is called a sequence. Usually the initial number.
Infinite sets We say that a set A is infinite if a proper subset B exists of A such that there is a bijection It is easy to see that no set with a finite.
Chapter 5. Optimal Matchings
Recurrence Relations Discrete Structures.
Locality In Distributed Graph Algorithms
Presentation transcript:

Tracing a Single User Joint work with Noga Alon

Group Testing Dorfman raised the following problem in 1941: All American inductees gave blood samples, that were tested for the presence of a syphilitic antigen. We assume that the number of infected blood samples r is much smaller than the total number m. Testing each sample separately requires m tests.

Group Testing (cont.) Instead, one can test pools that contain blood from a set of samples. If the outcome is negative – none of the samples in the pool is infected. Otherwise, the pool contains at least one infected sample, which can be determined by further tests. This way, less than m tests are needed.

Molecular Biology In recent years this problem has gained popularity again in the field of molecular biology. For example, when we are given a large set of DNA sequences, and we look for all those that contain a specific short subsequence. We can use a method similar to that of the blood testing problem.

Molecular Biology (cont.) In some applications, we are interested in finding one sequence that contains the short subsequence, rather than all of them.

Parallelization Often, we would prefer to conduct all experiments simultaneously, even at the cost of increasing the number of experiments. Thus, we need our tests to be non-adaptive, i.e. the pool tested in each experiment is independent of the outcomes of other experiments.

Non-Adaptive Tests a1a1 a2a2....amam T1T T2T TnTn

r-SUT Definition Definition: Let F be a family of subsets of [n] = {1,…,n}. F is called r-single-user-tracing superimposed (r-SUT) if  F 1,…, F k  F with | F i |  r, In other words, given the union of up to r sets from F, one can identify at least one of those sets.

Communication Suppose that m users share a common channel. Each user is associated with a vector in {0,1} n. All active users transmit their vectors, and a single receiver gets the OR of all transmitted vectors. Given that at most r users are active simultaneously, we would like the receiver to be able to identify at least one of them.

Maximal r-SUT Families Let g(n,r) denote the maximum size of an r-SUT family of subsets of [n]. Let R g (r) = lim sup n  log g(n,r) / n. Csűrös and Ruszinkó: There exist constants c 1,c 2 >0 s.t.. Our result: R g (r) =  (1/r) (and hence  (1/r)).

Lower Bound Let m = 2 n/(20r). We construct a family F ={F 1,…,F m } of subsets of [n] at random as follows:  1 ≤ i ≤ m and 1 ≤ j ≤ n independently, put j in F i with probability 1/r.

Lower Bound (cont.) We show that F is r-SUT with positive probability. We say a configuration of F 1,…, F k  F with | F i |  r and is bad if all the unions are equal. We show that with positive probability there are no bad configurations.

Lower Bound (cont.) We show that with probability > ½ no small configuration is bad, and that with probability > ½ no large configuration is bad. Therefore, with positive probability there is no bad configuration.

Small Configurations Proposition: With probability > ½ the following holds:  s<2r and distinct A 1,…,A s  F,  j  [n] that belongs to exactly one of the sets A 1,…,A s. Corollary: With probability > ½ no small configuration is bad.

Small Configurations (cont.) A1A1 A3A3 A4A4 A6A6 A8A8 A7A7 A2A2 A5A5 A9A9

Large Configurations Proposition: With probability > ½ the following holds. For all distinct A 1,…,A r,B 1,…,B r  F, Corollary: With probability > ½ no large configuration is bad.

Large Configurations (cont.) B1B1 B3B3 A2A2 A1A1 B2B2 A3A3 B1B1 B2B2 B3B3 AiAi

Tracing Multiple Users Recently, Laczay and Ruszinkó have introduced the following generalization of r-SUT families. For integers n, r  2, and 1  k  r, a family F of subsets of [n] is called k-out-of-r multiple-user-tracing superimposed (MUT k (r)) if given the union of any ℓ  r sets from F, one can identify at least min(k, ℓ ) of them.

Tracing Multiple Users (cont.) Let h(n,r,k) denote the maximum size of a MUT k (r) family of subsets of [n]. Let R h (r,k) = lim sup n  log h(n,r,k) / n. We have shown that there are constants c 1,c 2,c 3,c 4 >0 s.t..

Open Problems We have shown that R g (r) =  (1/r), but the question of finding the exact constant is still open. This problem is open even for the case of r = 2. 1/3  R g (2)  1/2+o(1). Follows from a result of Coppersmith and Shearer By a careful analysis of the random construction

Open Problems (cont.) We show how to construct an r-SUT family in time m O(r), where m is the size of the family. It would be interesting to find explicit constructions for all r. There are other related problems for which there are still gaps between lower and upper bounds: Multiple-user tracing families r-superimposed families Disjointly r-superimposed families Graph identifying codes