Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biomedical imaging Sloan Digital Sky Survey 4 petabytes (~1MG) (~1MG) 10 petabytes/yr 150 petabytes/yr.

Similar presentations


Presentation on theme: "Biomedical imaging Sloan Digital Sky Survey 4 petabytes (~1MG) (~1MG) 10 petabytes/yr 150 petabytes/yr."— Presentation transcript:

1

2 Biomedical imaging Sloan Digital Sky Survey 4 petabytes (~1MG) (~1MG) 10 petabytes/yr 150 petabytes/yr

3

4

5 massive input massive input output Sublinear algorithm s Sample tiny fraction

6

7 Approximate MST [CRT ’01]

8 Reduces to counting connected components

9 EE = no. connected components varvar << (no. connected components) 22

10 Shortest Paths [CLM ’03]

11 Ray Shooting  Volume  Intersection  Point location  Volume  Intersection  Point location [CLM ’03]

12

13 low-entropy data low-entropy data Takens embeddings Takens embeddings Markov models (speech) Markov models (speech) Takens embeddings Takens embeddings Markov models (speech) Markov models (speech)

14 Self-Improving Algorithms Arbitrary, unknown random source Sorting Sorting Matching Matching MaxCut MaxCut All pairs shortest paths All pairs shortest paths Transitive closure Transitive closure Clustering Clustering Sorting Sorting Matching Matching MaxCut MaxCut All pairs shortest paths All pairs shortest paths Transitive closure Transitive closure Clustering Clustering

15 Self-Improving Algorithms Arbitrary, unknown random source 1. Run algorithm for best worst-case behavior 1. Run algorithm for best worst-case behavior or best under uniform distribution or best under or best under uniform distribution or best under some postulated prior. some postulated prior. 1. Run algorithm for best worst-case behavior 1. Run algorithm for best worst-case behavior or best under uniform distribution or best under or best under uniform distribution or best under some postulated prior. some postulated prior. 2. Learning phase: Algorithm finetunes itself 2. Learning phase: Algorithm finetunes itself as it learns about the random source through as it learns about the random source through repeated use. repeated use. 2. Learning phase: Algorithm finetunes itself 2. Learning phase: Algorithm finetunes itself as it learns about the random source through as it learns about the random source through repeated use. repeated use. 3. Algorithm settles to stationary status: optimal 3. Algorithm settles to stationary status: optimal expected complexity under (still unknown) expected complexity under (still unknown) random source. random source. 3. Algorithm settles to stationary status: optimal 3. Algorithm settles to stationary status: optimal expected complexity under (still unknown) expected complexity under (still unknown) random source. random source.

16 Self-Improving Algorithms E Tk  Optimal expected time for random source time T1 time T2 time T5 time T3 time T4

17 (x1, x2, …, xn) SortingSorting  each xi independent from Di  H = entropy of rank distribution  each xi independent from Di  H = entropy of rank distribution

18 ClusteringClustering K-median (k=2)

19 Minimize sum of distances Hamming cube {0,1} dd

20 Minimize sum of distances Hamming cube {0,1} dd

21 Minimize sum of distances Hamming cube {0,1} dd [KSS][KSS]

22 How to achieve linear limiting expected time? Input space {0,1} dndn prob < O(dn)/KSS Identify core Tail:Tail: Use KSS

23 How to achieve linear limiting expected time? Store sample of precomputed KSS nearest neighbor Incremental algorithm NP vs P: input vicinity  algorithmic vicinity

24 Main difficulty: How to spot the tail?

25

26 1. Data is accessible before noise 2. Or it’s not 2. Or ?

27 1. Data is accessible before noise

28 encode decode

29 Data inaccessible before noise Assumptions are necessary !

30 Data inaccessible before noise 2. Bipartite graph, expander 3. Solid w/ angular constraints 1. Sorted sequence 4. Low dim attractor set

31 Data inaccessible before noise data must satisfy data must satisfy some property P but does not quite

32 f(x) = ? x f(x) But life being what it is… data f = access function

33 f(x) = ? x f(x) data

34 Humans Define distance from any object to data class

35 f(x) = ? x g(x) x 1, x 2,… f ( x 1), f ( x 2),… filter g is access function for:

36 Similar to Self-Correction [RS96, BLR’93] except: about data, not functions error-free allows O(distance to property)

37 Monotone function: [n]  R d Filter requires polylog (n) queries

38 Offline reconstruction

39

40 Online reconstruction

41

42

43

44 monotone function

45

46 Frequency of a point Smallest interval I containing > |I|/2 violations involving f(x) xx

47 Frequency of a point

48 Given x: 1. estimate its frequency 2. if nonzero, find “smallest” interval around x with both endpoints having zero frequency 3. interpolate between f(endpoints)

49 To prove: 1. Frequencies can be estimated in 2. Function is monotone over polylog time 3. ZF domain occupies (1-2 zero-frequency domain ) fraction

50 Bivariate concave function Filter requires polylog (n) queries

51 bipartite graph k-connectivity expander

52 denoising low-dim attractor sets

53

54 Priced computation & accuracy Priced computation & accuracy spectrometry/cloning/gene chip spectrometry/cloning/gene chip PCR/hybridization/chromatography PCR/hybridization/chromatography gel electrophoresis/blotting gel electrophoresis/blotting spectrometry/cloning/gene chip spectrometry/cloning/gene chip PCR/hybridization/chromatography PCR/hybridization/chromatography gel electrophoresis/blotting gel electrophoresis/blotting 0 1 0 0 10 0 11 1 0 1 0 1 01 1 0 0 1 0 0 01 1 1o 1 0 0 1 0 Linear programming Linear programming

55 computation experimentation

56 Pricing data Pricing data Ongoing project w/ Nir Ailon Factoring is easy. Here’s why… Gaussian mixture sample: 00100101001001101010101 ….

57 Collaborators: Nir Ailon, Seshadri Comandur, Ding Liu


Download ppt "Biomedical imaging Sloan Digital Sky Survey 4 petabytes (~1MG) (~1MG) 10 petabytes/yr 150 petabytes/yr."

Similar presentations


Ads by Google