Download presentation
Presentation is loading. Please wait.
1
Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
2
How can we understand?
3
Vast data Impossible to access all of it Potentially accessible data is too enormous to be viewed by a single individual Once accessed, data can change
4
Small world phenomenon each “node’’ is a person “edge’’ between people that know each other
5
Small world property “connected” if every pair can reach each other “distance’’ between two people is the minimum number of edges to reach one from another “diameter’’ is the maximum distance between any pair –“6 degrees of separation’’ is equivalent to “diameter of the world population is 6’’
6
Does earth have the small world property? How can we know? –data collection problem is immense –unknown groups of people found on earth –births/deaths/new introductions
7
Data sets can be massive examples: –sales logs –scientific measurements –genome project –world-wide web –network traffic, clickstream patterns in many cases, hardly fit in storage
8
The Gold Standard linear time algorithms: –for inputs encoded by n bits/words, allow cn time steps (constant c) Inadequate?
9
What can we hope to do without viewing most of the data? Can’t answer “for all” or “exactly” type statements: –are all individuals connected by at most 6 degrees of separation? –exactly how many individuals on earth are left-handed? Change our goals: ask about “approximate” statements –is there a large group of individuals connected by at most 6 degrees of separation? –approximately how many individuals on earth are left- handed?
10
Statistics! Sampling to approximate average, median values –Polling to predict outcome of elections –Approximate fraction of left-handed people, male-female ratios Can we use these ideas in algorithmic settings?
11
Quickly distinguish inputs that have specific property from those that are far from having the property Benefits: –natural question –just as good when data constantly changing –fast sanity check to rule out very “bad” inputs (i.e., restaurant bills) or to decide when expensive processing is worthwhile Property testing all inputs inputs with the property close to having property
12
An example
13
Monotonicity of a sequence Given: list y 1 y 2... y n Question: is the list sorted? Clearly requires n steps – must look at each y i
14
Monotonicity of a sequence Given: list y 1 y 2... y n Question: can we quickly test if the list close to sorted?
15
What do we mean by ``quick’’? query complexity measured in terms of list size n Our goal (if possible): –Very small compared to n, will go for clog n
16
What do we mean by “close’’? Definition: a list of size n is -close to sorted if can delete at most.01n values to make it sorted. Otherwise,.99- far. Sorted: 1 2 4 5 7 11 14 19 20 21 23 38 39 45 Close: 1 4 2 5 7 11 14 19 20 39 23 21 38 45 1 4 5 7 11 14 19 20 23 38 45 Far: 45 39 23 1 38 4 5 21 20 19 2 7 11 14 1 4 5 7 11 14 Requirements for algorithm: pass sorted lists if list passes test, can change at most.01 fraction of list to make it sorted
17
An attempt: Proposed algorithm : –Pick random i and test that y i ≤y i+1 Bad input type: –1,2,3,4,5,…n/4, 1,2,….n/4, 1,2,…n/4, 1,2,…,n/4 –Difficult for this algorithm to find “breakpoint” –But other algorithms work well i yiyi
18
A second attempt: Proposed algorithm: –Pick random i<j and test that y i ≤y j Bad input type: –n/m groups of m elements m,m-1,m-2,…,1,2m,2m-1,2m-2,…m+1,3m,3m-1,3m-2,…, –must pick i,j in same group –need at least (n/m) 1/2 choices to do this i yiyi
19
A test that works The test: (for distinct y i ) –Test several times: Pick random i Look at value of y i Do binary search for y i Does the binary search find any inconsistencies? If yes, FAIL Do we end up at location i? If not FAIL –Pass if never failed Running time: O(log n) time Why does this work? –If list is in order, then test always passes –If the test would pass when i and j are chosen, then y i and y j are in correct order –Since test usually passes, y i ’s in the right order
20
Other properties Graph properties: e.g. small world property, other connectivity properties String properties Function properties Lots more!
21
Conclusions Sublinear time possible in many contexts –Relatively new area, lots of techniques –What else can you compute in sublinear time? –Other applications...?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.