Download presentation
Presentation is loading. Please wait.
Published byFrancisco Harmond Modified over 10 years ago
1
The beauty of prime numbers vs the beauty of the random Ely Porat Bar-Ilan University Israel
2
Outline Applications Prime Numbers Group Testing De-randomized approach for group testing Applications getting into details Length Reduction
3
Pattern Matching Given a Text T and Pattern P, the problem is to find all the substring of T that equal to P. T= P=
4
Streaming Model T= P= Our goal is to do that with out saving P Φ(P) The character of T arrive one by one We can t save T
5
The character of T arrive one by one We can t save T Streaming Model T= P= Our goal is to do that without saving P Φ(P) Automata?
6
Hamming distance with wildcards Find a pattern in a text with 2 complications: – Don t cares (wildcards Ø ) – Mismatches Text: Pattern:
7
Summaries results Offline – O(nklog 2 m) hamming distance with wildcards Online Pattern Matching – hamming distance – O(klog 2 m) hamming distance with wildcards – O(klogm) Edit distance Streaming – O(log 2 m) space O(logm) time – Exact match – O(k 3 log 5 m) space O(k 2 log 2 m) time – hamming
8
Open problem Online convolution in o(log 2 m) time per symbol. Offline is done by FFT in O(nlogm). t 1 t 2 t 3 t 4 t 5 t 6... t n p 1 p 2 p 3 p 4 p 5 t 1 p 1 +t 2 p 2 +…t 5 p 5 p 1 p 2 p 3 p 4 p 5 t 2 p 1 +t 3 p 2 +…t 5 p 6 m=5
9
m people at most k are sick Query: Is someone in this set sick? Goal: identify the sick people by only few tests. Non-adaptive ??????...... Problem Definition...
10
Motivations Syphilis, HIV [Dor43] Mapping genomes [BLC91, BBK+95, TJP00] Quality control in product testing [SG59] Searching files in storage systems [KS64] Sequential screening of experimental variables [Li62] Efficient contention resolution algorithms for multiple access communication [KS64, Wol85] Data compression [HL00] Software testing [BG02, CDFP97] DNA sequencing [PL94] Molecular biology [DH00, FKKM97, ND00, BBKT96]
11
Background Same conditions: – Deterministic KS64 – Random KS64 – Heavy deterministic AMS06 Lower bound: – CR96 Relaxed conditions: – Fully adaptive – Two staged group testing and selectors [CGR00, Kni95, BGV03, CMS01, BV03, BGV05] – Optimal monotone encoding [AH08] Similar problems: – Inhibitors [FKKM97, Dam98, BV98, BGV03] – Bayesian case [Kni95, BL02, BL03, A.J98, BGV03] – Errors [BGV98] DIMACS 2006 Scheme size Deterministic Random and Heavy deterministic Lower bound
12
Our Results Deterministic Size Fast construction Scheme size Deterministic Random and Heavy deterministic Lower bound
13
Prime Numbers Group Testing Position of sicks Bad event: Exist y s.t
14
Prime Numbers Group Testing Bad event: Exist y s.t x1x2x3x4...xkx1x2x3x4...xk There is a dot below each prime There exisit x i that for p i1 p i2 …p id >n Y mod p ij =x i By CRT x i =y
15
Prime Numbers Group Testing This give group testing of size: p 1 +p 2 +…+p r By choosing good enough primes we get O(k 2 log 2 m)
16
Randomized Group Testing Just choose O(k 2 logn) random sets of size n/k.
17
Overall derandomization plan Derandomization Good group testing schemes Reduction from error correction codes to group testing schemes Good deterministic linear error correction codes Good deterministic error correction codes Method of conditional probabilities Good random error correction codes
18
Error correction codes Length of words = m Number of words = Distance = Rate = R Relative distance = Linear code Rm m
19
Good random linear error correction codes GV bound: There exists with Linear codes faster construction Algorithm: Pick the entries of the generating matrix uniformly and independently.
20
Method of conditional probabilities Algorithm: Pick the entries of the generating matrix one by one. In each step minimize the expected number of collisions between code words.
21
0 1 2 0 1 2 0 1 2 0 0 0 1 0 2 1 0 1 1 1 2 2 0 2 1 2 2 0 2 1 0 2 1 1 0 2 0 2 1 0 2 1 0 0 1 2 1 1 1 2 1 C=[3,2,2] 3 -RS
22
C=[3,2,2] 3 -RS: 1: 0 0 0 2: 1 1 1 3: 2 2 2 4: 0 1 2 5: 1 2 0 6: 2 0 1 7: 0 2 1 8: 2 1 0 9: 1 0 2 Reduction from Error correction codes to group testing schemes GT scheme: {1,4,7} {2,5,9} {3,6,8} {1,6,9} {2,4,8} {3,5,7} {1,5,8} {2,6,7} {3,4,9}
23
Why should it work? Theorem: Let C be an Then F(C) is a group testing scheme for n people with up to sick people. C=[3,2,2] 3 -RS: 1: 0 0 0 2: 1 1 1 3: 2 2 2 4: 0 1 2 5: 1 2 0 6: 2 0 1 7: 0 2 1 8: 2 1 0 9: 1 0 2 GT scheme: {1,4,7} {2,5,9} {3,6,8} {1,6,9} {2,4,8} {3,5,7} {1,5,8} {2,6,7} {3,4,9} (Up to 2 Sick people)
24
Why should it work? Proof A codeword representing a healthy man: Codewords representing sick men: k
25
Worst Case A codeword representing a healthy man: Codewords representing sick men: k
26
What we got? Scheme size Deterministic Random and Heavy deterministic Lower bound
27
Applications getting into details Streaming Up to 1 mismatch: – Assume we have a black box for searching for exact match. p 1 p 2 p 3 p 4 p 5 …p m P: p 1 p 3 p 5 …p m P 1,2 : p 2 p 4 … P 2,2 : There is more then one mistake The other way around isnt true
28
Streaming: Up to 1 mismatch p 1 p 2 p 3 p 4 p 5 …p m P: p 1 p 3 p 5 …p m P 1,2 : p 2 p 4 … P 2,2 : p 1 p 4 …p m p 2 p 5 … P 2,3 : p 3 … P 3,3 : P 1,3 : P q,q : 2*3*5*7*11*…*q>m With CRT we be able to find the position of the mismatch. In order to support more mistake we will had on that The Prime numbers group testing
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.