Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual.

Slides:



Advertisements
Similar presentations
Part VI NP-Hardness. Lecture 23 Whats NP? Hard Problems.
Advertisements

Problems and Their Classes
Introduction to Algorithms Greedy Algorithms
Great Theoretical Ideas in Computer Science for Some.
Introduction to Training and Learning in Neural Networks n CS/PY 399 Lab Presentation # 4 n February 1, 2001 n Mount Union College.
Lecture 12: Lower bounds By "lower bounds" here we mean a lower bound on the complexity of a problem, not an algorithm. Basically we need to prove that.
Introduction to Algorithms
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Spring 2015 Lecture 5: QuickSort & Selection
Great Theoretical Ideas in Computer Science.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
1 Amortized Analysis Consider an algorithm in which an operation is executed repeatedly with the property that its running time fluctuates throughout the.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Michael Bender - SUNY Stony Brook Dana Ron - Tel Aviv University Testing Acyclicity of Directed Graphs in Sublinear Time.
Testing Metric Properties Michal Parnas and Dana Ron.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
Analysis of Algorithms CS 477/677
Lecture 10: Search Structures and Hashing
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
Lecture 20: April 12 Introduction to Randomized Algorithms and the Probabilistic Method.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
The Design and Analysis of Algorithms
COMP s1 Computing 2 Complexity
Sublinear time algorithms Ronitt Rubinfeld Computer Science and Artificial Intelligence Laboratory (CSAIL) Electrical Engineering and Computer Science.
Section 11.4 Language Classes Based On Randomization
Johannes PODC 2009 –1 Coloring Unstructured Wireless Multi-Hop Networks Johannes Schneider Roger Wattenhofer TexPoint fonts used in EMF. Read.
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
The Best Algorithms are Randomized Algorithms N. Harvey C&O Dept TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AAAA.
Complexity Classes (Ch. 34) The class P: class of problems that can be solved in time that is polynomial in the size of the input, n. if input size is.
计算机科学概述 Introduction to Computer Science 陆嘉恒 中国人民大学 信息学院
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Searching. RHS – SOC 2 Searching A magic trick: –Let a person secretly choose a random number between 1 and 1000 –Announce that you can guess the number.
1 Sublinear Algorithms Lecture 1 Sofya Raskhodnikova Penn State University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
COLOR TEST COLOR TEST. Social Networks: Structure and Impact N ICOLE I MMORLICA, N ORTHWESTERN U.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Analysis of Algorithms CSCI Previous Evaluations of Programs Correctness – does the algorithm do what it is supposed to do? Generality – does it.
1Computer Sciences Department. Book: Introduction to Algorithms, by: Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Electronic:
CSC 211 Data Structures Lecture 13
3.3 Complexity of Algorithms
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Seminar on Sub-linear algorithms Prof. Ronitt Rubinfeld.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
CS216: Program and Data Representation University of Virginia Computer Science Spring 2006 David Evans Lecture 8: Crash Course in Computational Complexity.
Sub-linear Time Algorithms Ronitt Rubinfeld Tel Aviv University Fall 2015.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
CSE 589 Applied Algorithms Spring 1999 Prim’s Algorithm for MST Load Balance Spanning Tree Hamiltonian Path.
Errol Lloyd Design and Analysis of Algorithms Approximation Algorithms for NP-complete Problems Bin Packing Networks.
The NP class. NP-completeness
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
The Best Algorithms are Randomized Algorithms
Property Testing (a.k.a. Sublinear Algorithms )
Stochastic Streams: Sample Complexity vs. Space Complexity
Introduction to Randomized Algorithms and the Probabilistic Method
The Design and Analysis of Algorithms
On Testing Dynamic Environments
Approximating the MST Weight in Sublinear Time
Lecture 18: Uniformity Testing Monotonicity Testing
On Learning and Testing Dynamic Environments
CS 154, Lecture 6: Communication Complexity
Enumerating Distances Using Spanners of Bounded Degree
TexPoint fonts used in EMF.
Bin Fu Department of Computer Science
3. Brute Force Selection sort Brute-Force string matching
Presentation transcript:

Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

How can we understand?

Vast data Impossible to access all of it Potentially accessible data is too enormous to be viewed by a single individual Once accessed, data can change

Small world phenomenon each “node’’ is a person “edge’’ between people that know each other

Small world property “connected” if every pair can reach each other “distance’’ between two people is the minimum number of edges to reach one from another “diameter’’ is the maximum distance between any pair –“6 degrees of separation’’ is equivalent to “diameter of the world population is 6’’

Does earth have the small world property? How can we know? –data collection problem is immense –unknown groups of people found on earth –births/deaths/new introductions

Data sets can be massive examples: –sales logs –scientific measurements –genome project –world-wide web –network traffic, clickstream patterns in many cases, hardly fit in storage

The Gold Standard linear time algorithms: –for inputs encoded by n bits/words, allow cn time steps (constant c) Inadequate?

What can we hope to do without viewing most of the data? Can’t answer “for all” or “exactly” type statements: –are all individuals connected by at most 6 degrees of separation? –exactly how many individuals on earth are left-handed? Change our goals: ask about “approximate” statements –is there a large group of individuals connected by at most 6 degrees of separation? –approximately how many individuals on earth are left- handed?

Statistics! Sampling to approximate average, median values –Polling to predict outcome of elections –Approximate fraction of left-handed people, male-female ratios Can we use these ideas in algorithmic settings?

Quickly distinguish inputs that have specific property from those that are far from having the property Benefits: –natural question –just as good when data constantly changing –fast sanity check to rule out very “bad” inputs (i.e., restaurant bills) or to decide when expensive processing is worthwhile Property testing all inputs inputs with the property close to having property

An example

Monotonicity of a sequence Given: list y 1 y 2... y n Question: is the list sorted? Clearly requires n steps – must look at each y i

Monotonicity of a sequence Given: list y 1 y 2... y n Question: can we quickly test if the list close to sorted?

What do we mean by ``quick’’? query complexity measured in terms of list size n Our goal (if possible): –Very small compared to n, will go for clog n

What do we mean by “close’’? Definition: a list of size n is  -close to sorted if can delete at most.01n values to make it sorted. Otherwise,.99- far. Sorted: Close: Far: Requirements for algorithm: pass sorted lists if list passes test, can change at most.01 fraction of list to make it sorted

An attempt: Proposed algorithm : –Pick random i and test that y i ≤y i+1 Bad input type: –1,2,3,4,5,…n/4, 1,2,….n/4, 1,2,…n/4, 1,2,…,n/4 –Difficult for this algorithm to find “breakpoint” –But other algorithms work well i yiyi

A second attempt: Proposed algorithm: –Pick random i<j and test that y i ≤y j Bad input type: –n/m groups of m elements m,m-1,m-2,…,1,2m,2m-1,2m-2,…m+1,3m,3m-1,3m-2,…, –must pick i,j in same group –need at least (n/m) 1/2 choices to do this i yiyi

A test that works The test: (for distinct y i ) –Test several times: Pick random i Look at value of y i Do binary search for y i Does the binary search find any inconsistencies? If yes, FAIL Do we end up at location i? If not FAIL –Pass if never failed Running time: O(log n) time Why does this work? –If list is in order, then test always passes –If the test would pass when i and j are chosen, then y i and y j are in correct order –Since test usually passes, y i ’s in the right order

Other properties Graph properties: e.g. small world property, other connectivity properties String properties Function properties Lots more!

Conclusions Sublinear time possible in many contexts –Relatively new area, lots of techniques –What else can you compute in sublinear time? –Other applications...?