Lower Bounds for Data Structures Mihai P ă trașcu 2 nd Barriers Workshop, Aug. 29 ’10.

Slides:



Advertisements
Similar presentations
Estimating Distinct Elements, Optimally
Advertisements

Xiaoming Sun Tsinghua University David Woodruff MIT
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
Truthful Mechanisms for Combinatorial Auctions with Subadditive Bidders Speaker: Shahar Dobzinski Based on joint works with Noam Nisan & Michael Schapira.
David Luebke 1 6/7/2014 CS 332: Algorithms Skip Lists Introduction to Hashing.
Cell-Probe Lower Bounds for Succinct Partial Sums Mihai P ă trașcuEmanuele Viola.
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
Complexity Theory Lecture 3 Lecturer: Moni Naor. Recap Last week: Non deterministic communication complexity Probabilistic communication complexity Their.
Tight Bounds for Dynamic Convex Hull Queries (Again) Erik DemaineMihai Pătraşcu.
How to Grow Your Lower Bounds Mihai P ă trașcu Tutorial, FOCS’10.
Lower Bound Techniques for Data Structures Mihai P ă trașcu … Committee: Erik Demaine (advisor) Piotr Indyk Mikkel Thorup.
1 Cell Probe Complexity - An Invitation Peter Bro Miltersen University of Aarhus.
© The McGraw-Hill Companies, Inc., Chapter 8 The Theory of NP-Completeness.
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
(Data) STRUCTURES Mihai P ă trașcu. LSD dyn. marked ancestor reachability oracles in the butterfly partial match(1+ε)-ANN ℓ 1, ℓ 2 NN in ℓ 1, ℓ 2 3-ANN.
SODA Jan 11, 2004Partial Sums1 Tight Bounds for the Partial-Sums Problem Mihai PǎtraşcuErik Demaine (presenting) MIT CSAIL.
02/01/11CMPUT 671 Lecture 11 CMPUT 671 Hard Problems Winter 2002 Joseph Culberson Home Page.
What is an Algorithm? (And how do we analyze one?)
Limitations of VCG-Based Mechanisms Shahar Dobzinski Joint work with Noam Nisan.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Submitted by : Estrella Eisenberg Yair Kaufman Ohad Lipsky Riva Gonen Shalom.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
Quantum Algorithms II Andrew C. Yao Tsinghua University & Chinese U. of Hong Kong.
The Complexity of Algorithms and the Lower Bounds of Problems
On Everlasting Security in the Hybrid Bounded Storage Model Danny Harnik Moni Naor.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Towards Polynomial Lower Bounds for Dynamic Problems STOC 2010 Mihai P ă trașcu.
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
1 CSC 421: Algorithm Design & Analysis Spring 2013 Complexity & Computability  lower bounds on problems brute force, decision trees, adversary arguments,
Problems and MotivationsOur ResultsTechnical Contributions Membership: Maintain a set S in the universe U with |S| ≤ n. Given an x in U, answer whether.
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
Space Efficient Data Structures for Dynamic Orthogonal Range Counting Meng He and J. Ian Munro University of Waterloo.
Complexity of algorithms Algorithms can be classified by the amount of time they need to complete compared to their input size. There is a wide variety:
Jessie Zhao Course page: 1.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Télécom 2A – Algo Complexity (1) Time Complexity and the divide and conquer strategy Or : how to measure algorithm run-time And : design efficient algorithms.
One-way multi-party communication lower bound for pointer jumping with applications Emanuele Viola & Avi Wigderson Columbia University IAS work done while.
Communication vs. Computation S Venkatesh Univ. Victoria Presentation by Piotr Indyk (MIT) Kobbi Nissim Microsoft SVC Prahladh Harsha MIT Joe Kilian NEC.
Asymmetric Communication Complexity And its implications on Cell Probe Complexity Slides by Elad Verbin Based on a paper of Peter Bro Miltersen, Noam Nisan,
Umans Complexity Theory Lectures Lecture 7b: Randomization in Communication Complexity.
Massive Data Sets and Information Theory Ziv Bar-Yossef Department of Electrical Engineering Technion.
Data Stream Algorithms Lower Bounds Graham Cormode
1 CSC 421: Algorithm Design & Analysis Spring 2014 Complexity & lower bounds  brute force  decision trees  adversary arguments  problem reduction.
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
Communication Complexity Guy Feigenblat Based on lecture by Dr. Ely Porat Some slides where adapted from various sources Complexity course Computer science.
Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.
BY Lecturer: Aisha Dawood. A recurrence is a function is defined in terms of:  one or more base cases, and  itself, with smaller arguments. 2.
Optimal Planar Orthogonal Skyline Counting Queries Gerth Stølting Brodal and Kasper Green Larsen Aarhus University 14th Scandinavian Workshop on Algorithm.
The Message Passing Communication Model David Woodruff IBM Almaden.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.
The Range Mode ProblemCell Probe Lower Bounds for the Range Mode ProblemThe Range k-frequency Problem Preprocess an array A of n elements into a space.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
Information Complexity Lower Bounds
A Worst Case, Constant Time Priority Queue
New Characterizations in Turnstile Streams with Applications
Limits of Data Structures
Lecture 18: Uniformity Testing Monotonicity Testing
Branching Programs Part 3
Effcient quantum protocols for XOR functions
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Linear sketching with parities
Near-Optimal (Euclidean) Metric Compression
Linear sketching with parities
Presentation transcript:

Lower Bounds for Data Structures Mihai P ă trașcu 2 nd Barriers Workshop, Aug. 29 ’10

Model of Computation Word = w-bit integer Memory = array of S words Unit-time operations on words: random access to memory +, -, *, /, %,, ==, >, ^, &, |, ~ Word size: w = Ω(lg S) a Mem[a] Internal state: O(w) bits Hardware: NC 1

Cell-Probe Model Cell = w bits Memory = array of S cells CPU: state: O(w) bits apply any function on state [ non-uniform! ] read/write memory cell in O(1) time a Mem[a] Internal state: O(w) bits Hardware: anything

Classic Results [Yao FOCS’78] (kind of) defines the model membership with low space [Ajtai ’88] static lower bound: predecessor search [Fredman, Saks STOC’89] dynamic lower bounds: partial sums, union-find Have we broken barriers?

Dynamic Lower Bounds Toy problem: update: set node v to {0,1} query: xor of root–leaf path – n = # nodes – w = O(lg n) – B = branching factor [Fredman, Saks STOC’89] Any data structure with update time t u = lg O(1) n requires query time t q = Ω(lg n / lglg n) query

Hard Instance time query “Epochs”

Proof Overview time epoch k: B k updates Past: irrelevant w.r.t. epoch k Claim. ( ∀ ) k, Pr[query reads something from epoch k] ≥ 0.1 ⇒ E[t q ] = Ω(log B n) Future Only O(B i-1 t u lg n) bits are written Let B ≫ t u lg n ⇒ B k ≫ B k-1 t u lg n

Formal Proof Claim. ( ∀ ) k, Pr[query reads something from epoch k] ≥ 0.1 Proof: Assume not. Encode N=B k random bits with <N bits on average

Formal Proof ???? Public coins: N queries time The N bits to be encoded Public coins ??????????

Formal Proof Claim. ( ∀ ) k, Pr[query reads something from epoch k] ≥ 0.1 Proof: Assume not. Encode N=B k random bits with <N bits on average Public coins: past updates, future updates, N queries Equivalent task: encode query answers Assumption ⇒ 90% of queries can be run ignoring epoch k

Formal Proof time The N bits Public coins ?????????? Encoder Decoder o(N) bits Which queries read from epoch k

Formal Proof Claim. ( ∀ ) k, Pr[query reads something from epoch k] ≥ 0.1 Proof: Assume not. Encode N=B k random bits with <N bits on average Public coins: past updates, future updates, N queries Assumption ⇒ 90% of queries can be run ignoring epoch k Encoding: what future epochs wroteo(N) bits which queries read from epoch klg (N choose N/10) ≪ N bits □

Applications Partial sums: Incremental connectivity (union-find): Maintain an array A[1..n] under: update(i, Δ): A[i] = Δ sum(i): return A[1] + … + A[i] A[1]A[n] Maintain a graph under: link(u,v): add edge query(u,v): are u and v connected? “0”“1”

Fancy Application: Marked Ancestor [Alstrup, Husfeldt, Rauhe FOCS’98] mark(v) / unmark(v) query(v): any marked ancestor ? Only mark a node with probability ≈ 1/lg n time version 1 version 2 ver. lg n … Query needs cell that might have been written in another version!

Fancy Application: Buffer Trees External memory: w=B lg n Dictionary problem: t u = t q = O(1) t u = O(λ/B) ≪ 1, t q =O(log λ n) [Verbin, Zhang STOC’10] [Iacono, P ă traşcu ’11] If t u = O(λ/B) ≤ 0.99, then t q = Ω(log λ n)

Fancy Application: Buffer Trees [Verbin, Zhang STOC’10] [Iacono, P ă traşcu ’11] If t u = O(λ/B) ≤ 0.99, then t q = Ω(log λ n) Queries = { N elements from epoch k } ∪ { N random elements } Which is which? 2N bits to tell… If true queries read from epoch k & false queries don’t ⇒ can distinguish So: Random false query reads from the epoch. time ??????????

Higher Bounds [Fredman, Saks STOC’89] [Alstrup, Husfeldt, Rauhe FOCS’98] t q =Ω(lg n / lg t u ) tqtq tutu max =Ω(lg n/lglg n) n 1-o(1) nεnε lg n nεnε lg n/lglg n

Higher Bounds [Fredman, Saks STOC’89] [Alstrup, Husfeldt, Rauhe FOCS’98] t q =Ω(lg n / lg t u ) [P ă traşcu, Thorup ’11] t u = o(lg n / lglg n) ⇒ t q ≥ n 1-o(1) … for incremental connectivity “Don’t rush into a union. Take time to find your roots!” tqtq tutu max =Ω(lg n/lglg n) n 1-o(1) nεnε lg n nεnε lg n/lglg n

Hard Instance time query π π ππ ππ ππππππππ

Hard Instance (cont.) Let M = “width of edges”(M=n 1-ε ) Operations: macro-update (setting one π) ↦ M edge inserts macro-query: – color root and leaf with C colors (C=n ε ) ↦ M edge inserts – test consistency of coloring ↦ C 2 connectivity queries π π π C colors

The Lower Bound M = width of edges = n 1-ε ; C = # colors = n ε Operations: macro-update: M insertions macro-query: M insertions + C 2 queries Theorem: Let B ≫ t u. The query needs to read Ω(M) cells from each epoch, in expectation. So M t u + C 2 t q ≥ M · lg n/lg t u. If t u =o(lg n/lglg n), then t q ≥ M/C 2 = n 1-3ε.

Hardness for One Epoch time O(B i-1 M t u ) cells B i M updates ?????????? ???????? B i queries

Communication time ?????????? fix by public coins B i updates O(B i-1 Mt u ) cells B i queries first message Alice:B i permutations on [M] (π 1, π 2,…) Bob:for each π i, a coloring of inputs & outputs with C colors Goal:test if all colorings are consistent

Communication (cont.) Alice:B i permutations on [M] (π 1, π 2,…) Bob:for each π i, a coloring of inputs & outputs with C colors Goal:test if all colorings are consistent Lower bound: Ω(B i M lg M) [highest possible] Upper bound: Alice & Bob simulate the data structure The queries run O(B i M t q ) cells probes. We use O(lg n) bits per each  address contents

Nondeterminism W = { cells written by epoch i} R = { cells read by the B i queries } The prover sends: address and contents of W ∩ R cost: |W ∩ R|∙O(lg n) bits separator between W\R and R\W cost: O(|W| + |R|) bits Lower bound: Ω(B i M lg M) Since |W|,|R|=B i M ∙ O(lg n/lglg n), separator is negligible. So |W ∩ R|= Ω(B i M). □ W R

Higher Bounds [Fredman, Saks STOC’89] [Alstrup, Husfeldt, Rauhe FOCS’98] t q =Ω(lg n / lg t u ) [P ă traşcu, Thorup ’11] t u = o(lg n / lglg n) ⇒ t q ≥ n 1-o(1) tqtq tutu max =Ω(lg n/lglg n) n 1-o(1) nεnε lg n nεnε lg n/lglg n

Higher Bounds [Fredman, Saks STOC’89] [Alstrup, Husfeldt, Rauhe FOCS’98] t q =Ω(lg n / lg t u ) [P ă traşcu, Thorup ’11] t u = o(lg n / lglg n) ⇒ t q ≥ n 1-o(1) [P ă traşcu, Demaine STOC’04] t q =Ω(lg n / lg (t u /lg n)) Also: t u = o(lg n) ⇒ t q ≥ n 1-o(1) tqtq tutu n 1-o(1) nεnε lg n nεnε lg n/lglg n max =Ω(lg n) lg n/lglg n

The hard instance: π = random permutation for t = 1 to n : query: sum(π( t )) Δ t = rand() update(π( t ), Δ t ) π time Δ1Δ1 Δ2Δ2 Δ3Δ3 Δ4Δ4 Δ5Δ5 Δ6Δ6 Δ7Δ7 Δ8Δ8 Δ9Δ9 Δ 10 Δ 11 Δ 12 Δ 13 Δ 14 Δ 15 Δ 16 Maintain an array A[ n ] under: update( i, Δ): A[ i ] = Δ sum( i ): return A[0] + … + A[ i ]

time Δ1Δ1 Δ2Δ2 Δ3Δ3 Δ4Δ4 Δ5Δ5 Δ6Δ6 Δ7Δ7 Δ8Δ8 Δ9Δ9 Δ 10 Δ 11 Δ 13 Δ 14 Δ 16 Δ 17 Δ 12 t = 5, …, 8 t = 9,…,12 Communication = 2w · #memory cells * read during * written during t = 5, …, 8 t = 9,…,12 How can Mac help PC run ?

time Lower bound on entropy? Δ7Δ7 Δ8Δ8 Δ9Δ9 Δ1Δ1 Δ1Δ1 Δ2Δ2 Δ3Δ3 Δ4Δ4 Δ5Δ5 Δ 13 Δ 14 Δ 16 Δ 17 Δ1+Δ5+Δ3Δ1+Δ5+Δ3 Δ1+Δ5+Δ3+Δ7+Δ2Δ1+Δ5+Δ3+Δ7+Δ2 Δ1+Δ5+Δ3+Δ7+Δ2 +Δ8 +Δ4Δ1+Δ5+Δ3+Δ7+Δ2 +Δ8 +Δ4

The general principle Lower bound = # down arrows E[#down arrows] = Ω( k ) k operations

Recap beige period mauve period Communication = # memory locations * read during * written during Communication between periods of k items = Ω( k ) = Ω( k ) beige period mauve period * read during * written during # memory locations

Putting it all together time Ω( n /2) Ω( n /4) Ω( n /8) aaaa total Ω( n lg n ) Every memory read counted lowest_common_ancestor(, ) write timeread time

Dynamic Lower Bounds [Fredman, Saks STOC’89] [Alstrup, Husfeldt, Rauhe FOCS’98] t q =Ω(lg n / lg t u ) [P ă traşcu, Demaine STOC’04] t q =Ω(lg n / lg (t u /lg n)) [P ă traşcu, Thorup ’11] t u =o(lg n) ⇒ t q ≥ n 1-o(1) Some hope: max{t u, t q }= Ω*(lg 2 n) tqtq tutu n 1-o(1) nεnε lg n lg n/lglg n lg n nεnε lg n/lglg n ?

Dynamic Lower Bounds [Fredman, Saks STOC’89] [Alstrup, Husfeldt, Rauhe FOCS’98] t q =Ω(lg n / lg t u ) [P ă traşcu, Demaine STOC’04] t q =Ω(lg n / lg (t u /lg n)) [P ă traşcu, Thorup ’11] t u =o(lg n) ⇒ t q ≥ n 1-o(1) [P ă traşcu STOC’10] NOF conjecture ⇒ max{t u, t q }= Ω(n ε ) 3SUM conjecture ⇒ RAM lower bnd tqtq tutu n 1-o(1) nεnε lg n lg n/lglg n lg n nεnε lg n/lglg n 3SUM: S = {n numbers}, ( ∃ )x,y,z ∈ S with x+y+z=0? Conjecture: requires Ω*(n 2 ) on RAM

The Multiphase Problem Conjecture: if u∙X << k, must have X=Ω(u ε ) ⇒ reachability in dynamic graphs requires Ω(n ε ) S 1, …, S k ⊆ [u] time T ⊆ [u] time O(k∙u∙X)time O(u∙X)time O(X) S i ∩T? S1S1 SkSk 1 u T

3-Party, Number-on-Forehead S 1, …, S k ⊆ [u] time T ⊆ [u] time O(k∙u∙X)time O(u∙X)time O(X) S i ∩T? i S 1, …, S k T

Dynamic Lower Bounds tqtq tutu n 1-o(1) nεnε lg n lg n/lglg n lg n nεnε lg n/lglg n Now Future? [Fredman, Saks’89]

Classic Results [Yao FOCS’78] (kind of) defines the model membership with low space [Ajtai ’88] static lower bound: predecessor search [Fredman, Saks STOC’89] dynamic lower bounds: partial sums, union-find Have we broken barriers?

Communication Complexity → Data Structures Input: n bits lg S bits w bits lg S bits w bits Input: O(w) bits Asymmetric communication complexity

Tools in Asymmetric C.C. [Ajtai’88] [Miltersen, Nisan, Safra, Wigderson STOC’95] [Sen, Venkatesh ’03] round elimination …also message compression [Chakrabarti, Regev FOCS’04] [Miltersen, Nisan, Safra, Wigderson STOC’95] richness [P ă traşcu FOCS’08] lopsided set disjointness (via information complexity)

Round Elimination

Setup: Alice has input vector (x 1, …, x k ) f (k) Bob has inputs y, i ∈ [k] and sees x 1, …, x i-1 Output: f(x i, y) If Alice sends a message of m ≪ k bits => fix i and important technicality Now: Alice has input x i f Bob has an input y Output: f(x i, y) eliminate round 1 k 2 o(k) bits I want to talk to Alice i

Predecessor Search pred(q, S) = max { x ∈ S | x ≤ q } [van Emde Boas FOCS’75] if ⌊ q/√u ⌋ ∈ hash table, return pred(q mod √u, bottom structure) else return pred( ⌊ q/√u ⌋, top structure) 0u√u 2√u 0 1√u Space: O(n) Query: O(lg lg u) = O(lg w)

Round Elimination ↦ Predecessor [Ajtai’88] [Miltersen STOC’94] [Miltersen, Nisan, Safra, Wigderson STOC’95] [Beame, Fich STOC’99] [Sen, Venkatesh ’03] Alice: q = (q 1, q 2, …, q k ) Bob: i ∈ [k], (q 1, …, q i-1 ), S Goal:pred(q i, S) Reduction to pred(q,T):T = { (q 1,…,q i-1, x, 0,0,…) | ( ∀ )x ∈ S } Space = O(n) ⇒ set k = O(lg n) ⇒ lower bound: Ω(log lg n w)

Prove: “either Alice sends A bits or Bob sends B bits” Assume Alice sends o(A), Bob sends o(B) => big monochromatic rectangle Show any big rectangle is bichromatic Richness Lower Bounds Bob Alice 1/2 o(A) 1/2 o(B) output ≈ 1 E.g.Alice has q є {0,1,*} d Bob has S=n points in {0,1} d Goal: does the query match anything? [P ă traşcu FOCS’08] A=Ω(d), B=Ω(n 1-ε ) => t q ≥ min { d/lg S, n 1-ε /w }

Richness Lower Bounds What does this really mean? “optimal space lower bound for constant query time” 1 n 1-o(1) Θ(d/lg n) lower bound S = 2 Ω(d/t q ) upper bound ≈ either: exponential space near-linear query time S tqtq Θ(n) 2 Θ(d) E.g.Alice has q є {0,1,*} d Bob has S=n points in {0,1} d Goal: does the query match anything? [P ă traşcu FOCS’08] A=Ω(d), B=Ω(n 1-ε ) => t q ≥ min { d/lg S, n 1-ε /w }

Richness Lower Bounds What does this really mean? “optimal space lower bound for constant query time” 1 n 1-o(1) Θ(d/lg n) lower bound S = 2 Ω(d/t q ) upper bound ≈ either: exponential space near-linear query time S tqtq Θ(n) 2 Θ(d) E.g.Alice has q є {0,1,*} d Bob has S=n points in {0,1} d Goal: does the query match anything? [P ă traşcu FOCS’08] A=Ω(d), B=Ω(n 1-ε ) => t q ≥ min { d/lg S, n 1-ε /w } Also: optimal lower bound for decision trees

Results Partial match -- database of n strings in {0,1} d, query є {0,1,*} d [Borodin, Ostrovsky, Rabani STOC’99] [Jayram,Khot,Kumar,Rabani STOC’03] A = Ω(d/lg n) [P ă traşcu FOCS’08]A = Ω(d) Nearest Neighbor on hypercube (ℓ 1, ℓ 2 ): deterministic γ-approximate: [Liu’04]A = Ω(d/ γ 2 ) randomized exact: [Barkol, Rabani STOC’00]A = Ω(d) rand. (1+ε)-approx: [Andoni, Indyk, P ă traşcu FOCS’06] A = Ω(ε -2 lg n) “Johnson-Lindenstrauss space is optimal!” Approximate Nearest Neighbor in ℓ ∞ : [Andoni, Croitoru, P ă trașcu FOCS’08] “[Indyk FOCS’98] is optimal!”

The Barrier lg S bits w bits lg S bits w bits No separation between S =O( n ) and S = n O(1) !

Predecessor Search [P ă trașcu, Thorup STOC’06] For w = (1+ε) lg n and space O(n), predecessor takes Ω(lglg n) Separation O(n) space vs. n 1+ε Claim: The 1 st cell-probe can be restricted to set of O(√n) cells 0 1√u 2 Hard instance (by recursion) 0u√u 2√u 0u√u 2√u

Restricting 1 st Cell Probe If ( ∃ )k |S k |≤ √n: place query & data set in segment k 1 st memory access = f(lo(q)) ∈ S k query read M 1 read M 5 read M 1 read M 3 read M 8 … S 0 ={M 1, M 5 }S √u ={M 3, M 8 } … 0u k√u (k+1)√u

Restricting 1 st Cell Probe Otherwise ( ∀ )k |S k |≥ √n: choose T = { O(√n · lg n) cells } ⇒ each S k is hit 1 st memory access = f ( hi(q), lo(q) ) ∈ S lo(q) make lo(q) irrelevant ⇒ fix to make f ( hi(q),* ) ∈ T query read M 1 read M 5 read M 1 read M 3 read M 8 S 0 ={M 1, M 5 }S √u ={M 3, M 8 } … … 0 1√u 2 0u 2√u

What Did We Prove? If there exists a solution to Pred(n, u) with: – space complexity: O(n) – query complexity: t memory reads ⇒ There exists a solution to Pred(n, √u) with: – space complexity: O(n) – O(√n · lg n) “published cells” – query complexity: t-1 memory reads … can be read free of charge

Dealing with Public Bits Hardness came from one “secret” bit: In 2 nd round, there are O(√n · lg 2 n) published bits. Direct sum: Pred(n, u) = k × Pred(n/k, u/k) k ≫ √n · lg 2 n ⇒ With O(√n · lg 2 n) public bits, most sub-problems are still hard. 0 1√u 2 0u 2√u 0u√u 2√u 0uu/ku/k 2(u/k)

New Induction Plan Main Lemma: Fix algorithm to read from a set of (nk) ½ cells “Proof”: problem j is nice if ( ∃ )α: |S α j | ≤ (n/k) ½ ⇒ fix hi-part in problem j to α problem j is not nice if ( ∀ )α: |S α j | > (n/k) ½ ⇒ choose T to hit all such S α j read ∈ S 1 1 … problem 1problem kproblem 2 read ∈ S 2 1 read ∈ S 1 2 read ∈ S 2 2 read ∈ S 1 k read ∈ S 2 k |T| ≈ n / (n/k) ½ = (nk) ½ k·(n/k) ½ = (nk) ½

New Induction Plan Main Lemma: Fix algorithm to read from a set of (nk) ½ cells “Proof”: problem j is nice if ( ∃ )α: |S α j | ≤ (n/k) ½ ⇒ fix hi-part in problem j to α problem j is not nice if ( ∀ )α: |S α j | > (n/k) ½ ⇒ choose T to hit all such S α j read ∈ S 1 1 … problem 1problem kproblem 2 read ∈ S 2 1 read ∈ S 1 2 read ∈ S 2 2 read ∈ S 1 k read ∈ S 2 k k = 1 ↦ k ≈ √n ↦ k ≈ n 3/4 ↦ … Ω(lglg n)

Main Lemma: “Proof” ↦ Proof Main Lemma: Fix algorithm to read from a set of (nk) ½ cells But: Published bits = f(database) 1 st cell read by query = f(published bits) problem j is nice if ( ∃ )α: |S α j | ≤ (n/k) ½ ⇒ fix hi-part in problem j to α problem j is not nice if ( ∀ )α: |S α j | > (n/k) ½ ⇒ choose T to hit all such S α j

Main Lemma: “Proof” ↦ Proof New claim. We can publish (nk) ½ cells such that: Pr[random query reads a published cell] ≥ 1/100 Induction: If initial query time < (lglg n)/100 ⇒ at the end E[query time] < 0 ⇒ contradiction Proof: Publish random sample T = { (nk) ½ cells } For each problem j where lo(q) is relevant (fixed hi=α) publish S α j only if |S α j | ≤ (n/k) ½ But why does is work?

An Encoding Argument Assume Pr[random query reads a published cell] < 1/100 Use data structure to encode A[1..k] ∈ {0,1} k with <k bits. 1.Choose one random query/subproblem: q 1, q 2, …, q k 2.Choose random database: A[j]=0 ⇒ lo(q j ) is relevant in problem j A[j]=1 ⇒ hi(q j ) is relevant in problem j 3.Encode published bits → o(k) bits 4.Decoder classifies queries: when |S hi(qj) j | ≤ (n/k) ½ query is iff A[j]=0 when |S hi(qj) j | > (n/k) ½ query is iff A[j]=1 5.By assumption, E[number of  queries] ≥ 99% k So decoder can learn 99% of A[1..k]□

CPU → memory communication: one query: lg S k queries: lg ( ) =Θ ( k lg ) SkSk SkSk Beyond Communication?

SkSk SkSk Direct Sum Prob.1 Prob.2 Prob.3 Prob.k CPU → memory communication: one query: lg S k queries: lg ( ) =Θ ( k lg )

SkSk SkSk Direct Sum CPU → memory communication: one query: lg S k queries: lg ( ) =Θ ( k lg ) [P ă trașcu, Thorup FOCS’06] Any richness lower bound “Alice must send A or Bob must send B” ⇒ k×Alice must send k∙A or k×Bob must send k∙B

SkSk SkSk Direct Sum CPU → memory communication: one query: lg S k queries: lg ( ) =Θ ( k lg ) [P ă trașcu, Thorup FOCS’06] Any richness lower bound “Alice must send A or Bob must send B” ⇒ k×Alice must send k∙A or k×Bob must send k∙B Old: t q = Ω(A/lg S) New: t q = Ω(A/lg(S/k)) Set k=n/lg O(1) n ⇒ Ω(lg n/lglg n) time for space n·lg O(1) n

Ω(lg n/lglg n) for space nlg O(1) n [P ă trașcu, Thorup FOCS’06]nearest neighbor [P ă trașcu STOC’07]range counting [P ă trașcu FOCS’08]range reporting [Sommer, Verbin, Yu FOCS’09]distance oracles [Greve, Jørgensen, Larsen, Truelsen’10] range mode [Jørgensen, Larsen’10]range median [Panigrahy, Talwar, Wieder FOCS’08]c-aprox. nearest neighbor Also n 1+Ω(1/c) space for O(1) time

Classic Results [Yao FOCS’78] (kind of) defines the model membership with low space [Ajtai ’88] static lower bound: predecessor search [Fredman, Saks STOC’89] dynamic lower bounds: partial sums, union-find Have we broken barriers?

Succinct Data Structures Membership: n values ∈ [u] ⇒ optimal space H = lg(u choose n) bits Space: H + redundancy What is the redundancy / time trade-off? [Pagh’99]membership ↦ prefix sums [P ă trașcu FOCS’08]prefix sums: time O(t), redundancy ≈ n / lg t n

Succinct Lower Bounds [Gál, Miltersen ’03]polynomial evaluation ⇒ redundancy × query time ≥ Ω(n) [Golynski SODA ’09]store a permutation and query π(·), π -1 (·) If space is (1 + o(1)) · n lg n ⇒ query time is ω(1) [P ă trașcu, Viola SODA ’10]prefix sums For query time t ⇒ redundancy ≥ n / lg t n

The End