So Much Data Bernard Chazelle Princeton University Princeton University Bernard Chazelle Princeton University Princeton University So Little Time.

Slides:



Advertisements
Similar presentations
© Common Craft, LLC | Licensed to Current Common Craft Members Only | commoncraft.com Common Craft Cut-Outs Category: Sample Content: Sample Count: XX.
Advertisements

Findings on BOSS Fall 2000 Sample Survey of Military Personnel
Lap 1.
Review of Chapters Decimals. Question: Write the fraction and decimal for the shaded part.
Output-Sensitive Construction of the Union of Triangles Esther Ezra and Micha Sharir.
Logic Gates. Digital Signals Logic Gates NOT (Inverter) Gate AND Gate OR Gate NAND Gate NOR Gate XOR Gate.
Grigory Yaroslavtsev Joint work with Piotr Berman and Sofya Raskhodnikova.
Fourier Transform Fourier transform decomposes a signal into its frequency components Used in telecommunications, data compression, digital signal processing,
Visions of Australia – Regional Exhibition Touring Fund Applicant organisation Exhibition title Exhibition Sample Support Material Instructions 1) Please.
 Please make one if you don’t have one  NASA clearinghouse for data gathered  Earth  Moon  Mars  Sloan Digital sky  Kml data  OPEN use Google.
Epp, section 10.? CS 202 Aaron Bloomfield
all-pairs shortest paths in undirected graphs
EE3190 Optical Sensing and Imaging, Fall 2004 © Timothy J. Schulz EE3190 Optical Sensing and Imaging Computing PSFs with a digital computer.
Tools from Computational Geometry Bernard Chazelle Princeton University Bernard Chazelle Princeton University Tutorial FOCS 2005.
Questions to think about… How would you describe the importance of accuracy and precision in experimentation? How is precision connected to experimental.
Frustratingly Easy Domain Adaptation
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Paper Title Your Name CMSC 838 Presentation. CMSC 838T – Presentation Motivation u Problem paper is trying to solve  Characteristics of problem  … u.
Data-Powered Algorithms - I Bernard Chazelle Princeton University Bernard Chazelle Princeton University.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 17: Application-Driven Hardware Acceleration (3/4)
Data-Powered Algorithms Bernard Chazelle Princeton University Bernard Chazelle Princeton University.
Biomedical imaging Sloan Digital Sky Survey 4 petabytes (~1MG) (~1MG) 10 petabytes/yr 150 petabytes/yr.
1 Efficient Algorithms for Non-Parametric Clustering With Clutter Weng-Keen Wong Andrew Moore.
A sample processing of an input molecule. S0 S1 a a b b A1: even number of b’s Automaton A1 accepting inputs with an even number of b ’s.
Path Planning in Expansive C-Spaces D. HsuJ.-C. LatombeR. Motwani CS Dept., Stanford University, 1997.
Lecture 39 CSE 331 Dec 9, Announcements Please fill in the online feedback form Sample final has been posted Graded HW 9 on Friday.
100 Most Common Words.
SLIDES FOR ORGANIZATION OF PORTFOLIOS You can use these slides as a reference to help you get your portfolio organized. The sample table of contents can.
Sublinear time algorithms Ronitt Rubinfeld Computer Science and Artificial Intelligence Laboratory (CSAIL) Electrical Engineering and Computer Science.
GO BACK TO ACTIVITY SLIDE GO TO TEACHER INFORMATION SLIDE To move from one activity to the next, just click on the slide! PATTERNS OR CLICK ON A BUTTON.
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
Approximating the Minimum Spanning Tree Weight in Sublinear Time Speaker: Chuang-Chieh Lin Advisor: Professor Maw-Shang Chang Computation Theory Laboratory.
Using Dijkstra’s Algorithm to Find a Shortest Path from a to z 1.
Cosmological Parameters with Gravitational Lens systems from the SDSS Du-Hwan Han & Myeong-Gu Park Kyungpook National University Department of Astronomy.
RA PRESENTATION Sublinear Geometric Algorithms B 張譽馨 B 汪牧君 B 李元翔.
JPEG image compression Group 7 Arvind Babel (y07uc024) Nikhil Agarwal (y08uc086)
Computer Networks: Switching and Queuing Ivan Marsic Rutgers University Chapter 4 – Switching and Queuing Delay Models.
Lecture 7 All-Pairs Shortest Paths. All-Pairs Shortest Paths.
POWER POINT PRESENTATION BROUGHT TO YOU BY SANDRA PHILLIPS BROUGHT TO YOU BY SANDRA PHILLIPS.
Math – What is a Function? 1. 2 input output function.
Welco me! Fuquay Varina Middle School Online assessments for math and reading.
What can we see in the sky?. IN THE SKY WE CAN SEE MUCH MORE!
The busy little Arduino in the TC1 A short tour Arduino/TC1 1.
Sound Monitor KS2: Design, write and debug programs which control physical systems.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 11.
My Wonderful World of Stuff This is a sample slide upload.
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
Learning Health for Michigan Infrastructure
Approximating the MST Weight in Sublinear Time
Date of download: 12/26/2017 Copyright © ASME. All rights reserved.
Sample Presentation. Slide 1 Info Slide 2 Info.
Lecture 3: Analysis of Algorithms
Millions of little gates
The Little Man Computer
Apache Spark & Complex Network
Calculators and logs Logarithmic equations
Shortest Path Consider the following weighted undirected graph: 20 10
HW2 EE 562.
What is a computer?.
Algorithms Lecture # 29 Dr. Sohail Aslam.
Rubric: You will be scored on the following:
Year 5 Beat It.
Year 4 Beat It.
Computer Networks: Switching and Queuing
Year 3 Beat It.
The Shortest Path Algorithm
 Is a machine that is able to take information (input), do some work on (process), and to make new information (output) COMPUTER.
Internal components of a computer.
Reconfigurable Computing (EN2911X, Fall07)
Fractions – Parts of a Set
Presentation transcript:

So Much Data Bernard Chazelle Princeton University Princeton University Bernard Chazelle Princeton University Princeton University So Little Time

So Many Slides Bernard Chazelle Princeton University Princeton University Bernard Chazelle Princeton University Princeton University So Little Time So Little Time (before lunch) (before lunch)

computation math experimentationalgorithms

Computers have two problems

1. They don’t have steering wheels

2. End of Moore’s Law party’s over !

computation algorithms experimentation

32 x = 544 This is not me

FFT RSA

noisy low entropy uncertain unevenly priced big

noisy low entropy uncertain unevenly priced big

Biomedical imaging Sloan Digital Sky Survey 4 petabytes (~1MG) (~1MG) 10 petabytes/yr 150 petabytes/yr

Collected works of Micha Sharir My A(9,9)-th paper

massive input massive input output Sublinear Algorithms Sample tiny fraction

Shortest Paths [C-Liu-Magen ’03] New York DelphiDelphi

Ray Shooting  Volume  Intersection  Point location

Approximate MST [C-Rubinfeld- Trevisan ’01]

Reduces to counting connected components

EE = no. connected components varvar << (no. connected components) 22 whp, is a good estimator of # connected components

worst case input space average case (uniform)

worst case

average case = actuarial view

“ OK, if you elect NOT to have the surgery, the insurance company offers 6 days and 7 nights in Barbados. “

arbitrary, unknown random source Self-Improving Algorithms

Yes ! This could be YOU, too !

E Tk  Optimal expected time for random source time T1 time T2 time T3 time T4

Clustering [ Ailon-C-Liu-Comandur ’05 ] K-median over Hamming cube

minimize sum of distances

[ Kumar-Sabharwal-Sen ’04 ] COST OPT ( 1 + )

How to achieve linear limiting time? Input space {0,1} dndn prob < O(dn)/KSS Identify core Tail:Tail: Use KSS

Store sample of precomputed KSS Nearest neighbor Incremental algorithm

Main difficulty: How to spot the tail?

encode

decode

Data inaccessible before noise What makes you think it’s wrong?

Data inaccessible before noise must satisfy some property (eg, convex, bipartite) but does not quite

f(x) = ? x f(x) data f = access function

f(x) = ? x f(x) f = access function

f(x) = ? x f(x) But life being what it is…

f(x) = ? x f(x)

Humans Define distance from any object to data class

f(x) = ? x g(x) x 1, x 2,… f ( x 1), f ( x 2),… filter g is access function for:

Online Data Reconstructio n Online Data Reconstructio n

Monotone function: [n]  R d Filter requires polylog (n) lookups [ Ailon-C-Liu-Comandur ’04 ] [ Ailon-C-Liu-Comandur ’04 ]

Convex polygon Filter requires : lookups [C-Comandur ’06 ]

Convex terrain lookups Filter requires :

Iterated planar separator theorem

Iterated (weak) planar separator theorem Iterated (weak) planar separator theorem in sublinear time!

Using epsilon-nets in spaces of unbounded VC dimension reconstruct

bipartite graph k-connectivity expander

denoising low-dim attractor sets

Priced computation & accuracy Priced computation & accuracy spectrometry/cloning/gene chip spectrometry/cloning/gene chip PCR/hybridization/chromatography PCR/hybridization/chromatography gel electrophoresis/blotting gel electrophoresis/blotting spectrometry/cloning/gene chip spectrometry/cloning/gene chip PCR/hybridization/chromatography PCR/hybridization/chromatography gel electrophoresis/blotting gel electrophoresis/blotting o Linear programming Linear programming

Pricing data Pricing data Factoring is easy. Here’s why… Gaussian mixture sample: ….

Collaborators: Collaborators: Nir Ailon, Seshadri Comandur, Ding Liu Avner Magen, Ronitt Rubinfeld, Luca Trevisan Collaborators: Collaborators: Nir Ailon, Seshadri Comandur, Ding Liu Avner Magen, Ronitt Rubinfeld, Luca Trevisan