The beauty of prime numbers vs the beauty of the random Ely Porat Bar-Ilan University Israel.

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

Computing with adversarial noise Aram Harrow (UW -> MIT) Matt Hastings (Duke/MSR) Anup Rao (UW)
Sublinear Algorithms … Lecture 23: April 20.
Convolutional Codes Representation and Encoding  Many known codes can be modified by an extra code symbol or by deleting a symbol * Can create codes of.
On the Amortized Complexity of Zero-Knowledge Proofs Ronald Cramer, CWI Ivan Damgård, Århus University.
Shortest Vector In A Lattice is NP-Hard to approximate
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Applied Algorithmics - week7
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Embedding the Ulam metric into ℓ 1 (Ενκρεβάτωση του μετρικού χώρου Ulam στον ℓ 1 ) Για το μάθημα “Advanced Data Structures” Αντώνης Αχιλλέως.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
15-853:Algorithms in the Real World
Asynchronous Pattern Matching - Metrics Amihood Amir CPM 2006.
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Cellular Communications
Function Matching Amihood Amir Yonatan Aumann Moshe Lewenstein Ely Porat Bar Ilan University.
Dimensionality Reduction
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat.
Faster Algorithm for String Matching with k Mismatches Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp Date.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Pattern Matching in the streaming model Ely Porat Google inc & Bar-Ilan University.
Exact and Approximate Pattern in the Streaming Model Presented by - Tanushree Mitra Benny Porat and Ely Porat 2009 FOCS.
Pattern Matching in Weighted Sequences Oren Kapah Bar-Ilan University Joint Work With: Amihood Amir Costas S. Iliopoulos Ely Porat.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
String Matching with Mismatches Some slides are stolen from Moshe Lewenstein (Bar Ilan University)
Asynchronous Pattern Matching - Address Level Errors Amihood Amir Bar Ilan University 2010.
Student Seminar – Fall 2012 A Simple Algorithm for Finding Frequent Elements in Streams and Bags RICHARD M. KARP, SCOTT SHENKER and CHRISTOS H. PAPADIMITRIOU.
Survey: String Matching with k Mismatches Moshe Lewenstein Bar Ilan University.
Ger man Aerospace Center Gothenburg, April, 2007 Coding Schemes for Crisscross Error Patterns Simon Plass, Gerd Richter, and A.J. Han Vinck.
Some 3CNF Properties are Hard to Test Eli Ben-Sasson Harvard & MIT Prahladh Harsha MIT Sofya Raskhodnikova MIT.
Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.
Quantum Computing MAS 725 Hartmut Klauck NTU TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A.
Semi-Numerical String Matching. All the methods we’ve seen so far have been based on comparisons. We propose alternative methods of computation such as:
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Quantum Computing MAS 725 Hartmut Klauck NTU
On The Connections Between Sorting Permutations By Interchanges and Generalized Swap Matching Joint work of: Amihood Amir, Gary Benson, Avivit Levy, Ely.
Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan University and Johns Hopkins University.
Faster Algorithm for String Matching with k Mismatches (II) Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp
Real time pattern matching Porat Benny Porat Ely Bar-Ilan University.
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
DIGITAL COMMUNICATIONS Linear Block Codes
Coding Theory Efficient and Reliable Transfer of Information
© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 12 L8.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 8 Prof. Charles E. Leiserson.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
October 5, 2005Copyright © by Erik D. Demaine and Charles E. LeisersonL7.1 Prof. Charles E. Leiserson L ECTURE 8 Hashing II Universal hashing Universality.
The parity bits of linear block codes are linear combination of the message. Therefore, we can represent the encoder by a linear system described by matrices.
1 Efficient Algorithms for Substring Near Neighbor Problem Alexandr Andoni Piotr Indyk MIT.
Ravello, Settembre 2003Indexing Structures for Approximate String Matching Alessandra Gabriele Filippo Mignosi Antonio Restivo Marinella Sciortino.
Timo O. Korhonen, HUT Communication Laboratory 1 Convolutional encoding u Convolutional codes are applied in applications that require good performance.
Perfect and Related Codes
1 Asymptotically good binary code with efficient encoding & Justesen code Tomer Levinboim Error Correcting Codes Seminar (2008)
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Compression for Fixed-Width Memories Ori Rottenstriech, Amit Berman, Yuval Cassuto and Isaac Keslassy Technion, Israel.
1 Reliability-Based SD Decoding Not applicable to only graph-based codes May even help with some algebraic structure SD alternative to trellis decoding.
Search Engines WS 2009 / 2010 Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University of Freiburg Lecture.
RS – Reed Solomon Error correcting code. Error-correcting codes are clever ways of representing data so that one can recover the original information.
Amihood Amir, Gary Benson, Avivit Levy, Ely Porat, Uzi Vishne
Tali Kaufman (Bar-Ilan)
Information Complexity Lower Bounds
Sublinear-Time Error-Correction and Error-Detection
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
Y. Kotidis, S. Muthukrishnan,
Searching Similar Segments over Textual Event Sequences
In Pattern Matching Convolutions: O(n log m) using FFT b0 b1 b2
Minwise Hashing and Efficient Search
Presentation transcript:

The beauty of prime numbers vs the beauty of the random Ely Porat Bar-Ilan University Israel

Outline Applications Prime Numbers Group Testing De-randomized approach for group testing Applications getting into details Length Reduction

Pattern Matching Given a Text T and Pattern P, the problem is to find all the substring of T that equal to P. T= P=

Streaming Model T= P= Our goal is to do that with out saving P Φ(P) The character of T arrive one by one We can t save T

The character of T arrive one by one We can t save T Streaming Model T= P= Our goal is to do that without saving P Φ(P) Automata?

Hamming distance with wildcards Find a pattern in a text with 2 complications: – Don t cares (wildcards Ø ) – Mismatches Text: Pattern:

Summaries results Offline – O(nklog 2 m) hamming distance with wildcards Online Pattern Matching – hamming distance – O(klog 2 m) hamming distance with wildcards – O(klogm) Edit distance Streaming – O(log 2 m) space O(logm) time – Exact match – O(k 3 log 5 m) space O(k 2 log 2 m) time – hamming

Open problem Online convolution in o(log 2 m) time per symbol. Offline is done by FFT in O(nlogm). t 1 t 2 t 3 t 4 t 5 t 6... t n p 1 p 2 p 3 p 4 p 5 t 1 p 1 +t 2 p 2 +…t 5 p 5 p 1 p 2 p 3 p 4 p 5 t 2 p 1 +t 3 p 2 +…t 5 p 6 m=5

m people at most k are sick Query: Is someone in this set sick? Goal: identify the sick people by only few tests. Non-adaptive ?????? Problem Definition...

Motivations Syphilis, HIV [Dor43] Mapping genomes [BLC91, BBK+95, TJP00] Quality control in product testing [SG59] Searching files in storage systems [KS64] Sequential screening of experimental variables [Li62] Efficient contention resolution algorithms for multiple access communication [KS64, Wol85] Data compression [HL00] Software testing [BG02, CDFP97] DNA sequencing [PL94] Molecular biology [DH00, FKKM97, ND00, BBKT96]

Background Same conditions: – Deterministic KS64 – Random KS64 – Heavy deterministic AMS06 Lower bound: – CR96 Relaxed conditions: – Fully adaptive – Two staged group testing and selectors [CGR00, Kni95, BGV03, CMS01, BV03, BGV05] – Optimal monotone encoding [AH08] Similar problems: – Inhibitors [FKKM97, Dam98, BV98, BGV03] – Bayesian case [Kni95, BL02, BL03, A.J98, BGV03] – Errors [BGV98] DIMACS 2006 Scheme size Deterministic Random and Heavy deterministic Lower bound

Our Results Deterministic Size Fast construction Scheme size Deterministic Random and Heavy deterministic Lower bound

Prime Numbers Group Testing Position of sicks Bad event: Exist y s.t

Prime Numbers Group Testing Bad event: Exist y s.t x1x2x3x4...xkx1x2x3x4...xk There is a dot below each prime There exisit x i that for p i1 p i2 …p id >n Y mod p ij =x i By CRT x i =y

Prime Numbers Group Testing This give group testing of size: p 1 +p 2 +…+p r By choosing good enough primes we get O(k 2 log 2 m)

Randomized Group Testing Just choose O(k 2 logn) random sets of size n/k.

Overall derandomization plan Derandomization Good group testing schemes Reduction from error correction codes to group testing schemes Good deterministic linear error correction codes Good deterministic error correction codes Method of conditional probabilities Good random error correction codes

Error correction codes Length of words = m Number of words = Distance = Rate = R Relative distance = Linear code Rm m

Good random linear error correction codes GV bound: There exists with Linear codes faster construction Algorithm: Pick the entries of the generating matrix uniformly and independently.

Method of conditional probabilities Algorithm: Pick the entries of the generating matrix one by one. In each step minimize the expected number of collisions between code words.

C=[3,2,2] 3 -RS

C=[3,2,2] 3 -RS: 1: : : : : : : : : Reduction from Error correction codes to group testing schemes GT scheme: {1,4,7} {2,5,9} {3,6,8} {1,6,9} {2,4,8} {3,5,7} {1,5,8} {2,6,7} {3,4,9}

Why should it work? Theorem: Let C be an Then F(C) is a group testing scheme for n people with up to sick people. C=[3,2,2] 3 -RS: 1: : : : : : : : : GT scheme: {1,4,7} {2,5,9} {3,6,8} {1,6,9} {2,4,8} {3,5,7} {1,5,8} {2,6,7} {3,4,9} (Up to 2 Sick people)

Why should it work? Proof A codeword representing a healthy man: Codewords representing sick men: k

Worst Case A codeword representing a healthy man: Codewords representing sick men: k

What we got? Scheme size Deterministic Random and Heavy deterministic Lower bound

Applications getting into details Streaming Up to 1 mismatch: – Assume we have a black box for searching for exact match. p 1 p 2 p 3 p 4 p 5 …p m P: p 1 p 3 p 5 …p m P 1,2 : p 2 p 4 … P 2,2 : There is more then one mistake The other way around isnt true

Streaming: Up to 1 mismatch p 1 p 2 p 3 p 4 p 5 …p m P: p 1 p 3 p 5 …p m P 1,2 : p 2 p 4 … P 2,2 : p 1 p 4 …p m p 2 p 5 … P 2,3 : p 3 … P 3,3 : P 1,3 : P q,q : 2*3*5*7*11*…*q>m With CRT we be able to find the position of the mismatch. In order to support more mistake we will had on that The Prime numbers group testing