Download presentation
Presentation is loading. Please wait.
Published byErick Nadler Modified over 10 years ago
1
So Much Data Bernard Chazelle Princeton University Princeton University Bernard Chazelle Princeton University Princeton University So Little Time
2
So Many Slides Bernard Chazelle Princeton University Princeton University Bernard Chazelle Princeton University Princeton University So Little Time So Little Time (before lunch) (before lunch)
3
computation math experimentationalgorithms
4
Computers have two problems
5
1. They don’t have steering wheels
7
2. End of Moore’s Law party’s over !
8
computation algorithms experimentation
9
32 x 17 224 32 = 544 This is not me
10
FFT RSA
13
noisy low entropy uncertain unevenly priced big
14
noisy low entropy uncertain unevenly priced big
15
Biomedical imaging Sloan Digital Sky Survey 4 petabytes (~1MG) (~1MG) 10 petabytes/yr 150 petabytes/yr
16
Collected works of Micha Sharir My A(9,9)-th paper
17
massive input massive input output Sublinear Algorithms Sample tiny fraction
18
Shortest Paths [C-Liu-Magen ’03] New York DelphiDelphi
19
Ray Shooting Volume Intersection Point location
20
Approximate MST [C-Rubinfeld- Trevisan ’01]
21
Reduces to counting connected components
22
EE = no. connected components varvar << (no. connected components) 22 whp, is a good estimator of # connected components
23
worst case input space average case (uniform)
24
worst case
25
average case = actuarial view
26
“ OK, if you elect NOT to have the surgery, the insurance company offers 6 days and 7 nights in Barbados. “
27
arbitrary, unknown random source Self-Improving Algorithms
28
Yes ! This could be YOU, too !
29
E Tk Optimal expected time for random source time T1 time T2 time T3 time T4
30
Clustering [ Ailon-C-Liu-Comandur ’05 ] K-median over Hamming cube
31
minimize sum of distances
33
[ Kumar-Sabharwal-Sen ’04 ] COST OPT ( 1 + )
34
How to achieve linear limiting time? Input space {0,1} dndn prob < O(dn)/KSS Identify core Tail:Tail: Use KSS
35
Store sample of precomputed KSS Nearest neighbor Incremental algorithm
36
Main difficulty: How to spot the tail?
38
encode
39
decode
41
Data inaccessible before noise What makes you think it’s wrong?
42
Data inaccessible before noise must satisfy some property (eg, convex, bipartite) but does not quite
43
f(x) = ? x f(x) data f = access function
44
f(x) = ? x f(x) f = access function
45
f(x) = ? x f(x) But life being what it is…
46
f(x) = ? x f(x)
47
Humans Define distance from any object to data class
48
f(x) = ? x g(x) x 1, x 2,… f ( x 1), f ( x 2),… filter g is access function for:
49
Online Data Reconstructio n Online Data Reconstructio n
50
Monotone function: [n] R d Filter requires polylog (n) lookups [ Ailon-C-Liu-Comandur ’04 ] [ Ailon-C-Liu-Comandur ’04 ]
51
Convex polygon Filter requires : lookups [C-Comandur ’06 ]
52
Convex terrain lookups Filter requires :
53
Iterated planar separator theorem
55
Iterated (weak) planar separator theorem Iterated (weak) planar separator theorem in sublinear time!
56
Using epsilon-nets in spaces of unbounded VC dimension reconstruct
57
bipartite graph k-connectivity expander
58
denoising low-dim attractor sets
59
Priced computation & accuracy Priced computation & accuracy spectrometry/cloning/gene chip spectrometry/cloning/gene chip PCR/hybridization/chromatography PCR/hybridization/chromatography gel electrophoresis/blotting gel electrophoresis/blotting spectrometry/cloning/gene chip spectrometry/cloning/gene chip PCR/hybridization/chromatography PCR/hybridization/chromatography gel electrophoresis/blotting gel electrophoresis/blotting 0 1 0 0 10 0 11 1 0 1 0 1 01 1 0 0 1 0 0 01 1 1o 1 0 0 1 0 Linear programming Linear programming
60
Pricing data Pricing data Factoring is easy. Here’s why… Gaussian mixture sample: 00100101001001101010101 ….
61
Collaborators: Collaborators: Nir Ailon, Seshadri Comandur, Ding Liu Avner Magen, Ronitt Rubinfeld, Luca Trevisan Collaborators: Collaborators: Nir Ailon, Seshadri Comandur, Ding Liu Avner Magen, Ronitt Rubinfeld, Luca Trevisan
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.