Fast N-Body Algorithms for Massive Datasets Alexander Gray Georgia Institute of Technology.

Slides:

Advertisements

Similar presentations

Scaling Multivariate Statistics to Massive Data Algorithmic problems and approaches Alexander Gray Georgia Institute of Technology

Advertisements

Bayesian Belief Propagation

Principles of Density Estimation

Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.

Yicheng Tu, § Shaoping Chen, §¥ and Sagar Pandit § § University of South Florida, Tampa, Florida, USA ¥ Wuhan University of Technology, Wuhan, Hubei, China.

Searching on Multi-Dimensional Data

Cse 521: design and analysis of algorithms Time & place T, Th pm in CSE 203 People Prof: James Lee TA: Thach Nguyen Book.

Support Vector Machines

Dynamic Bayesian Networks (DBNs)

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Computational Mathematics for Large-scale Data Analysis Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic.

Visual Recognition Tutorial

Support Vector Machines and Kernel Methods

Bounding Volume Hierarchy “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presented by Mathieu Brédif.

Instance Based Learning

Scalable Data Mining The Auton Lab, Carnegie Mellon University Brigham Anderson, Andrew Moore, Dan Pelleg, Alex Gray, Bob Nichols, Andy.

1 Efficient Algorithms for Non-Parametric Clustering With Clutter Weng-Keen Wong Andrew Moore.

Machine Learning on Massive Datasets Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and Statistical.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Visual Recognition Tutorial

Efficient Distance Computation between Non-Convex Objects by Sean Quinlan presented by Teresa Miller CS 326 – Motion Planning Class.

1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)

Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.

Efficient Distance Computation between Non-Convex Objects By Sean Quinlan Presented by Sean Augenstein and Nicolas Lee.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Tutorial on Statistical N-Body Problems and Proximity Data Structures

FLANN Fast Library for Approximate Nearest Neighbors

Radial Basis Function Networks

How to do Machine Learning on Massive Astronomical Datasets Alexander Gray Georgia Institute of Technology Computational Science and Engineering College.

How to do Fast Analytics on Massive Datasets Alexander Gray Georgia Institute of Technology Computational Science and Engineering College of Computing.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

The Statistical Properties of Large Scale Structure Alexander Szalay Department of Physics and Astronomy The Johns Hopkins University.

Module 04: Algorithms Topic 07: Instance-Based Learning

Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.

Playing in High Dimensions Bob Nichol ICG, Portsmouth Thanks to all my colleagues in SDSS, GRIST & PiCA Special thanks to Chris Miller, Alex Gray, Gordon.

Data mining and machine learning A brief introduction.

Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.

The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.

K Nearest Neighbors Classifier & Decision Trees

The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.

Fast Statistical Algorithms in Databases Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and Statistical.

Forward-Scan Sonar Tomographic Reconstruction PHD Filter Multiple Target Tracking Bayesian Multiple Target Tracking in Forward Scan Sonar.

Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.

Randomized Algorithms for Bayesian Hierarchical Clustering

Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Interacting Molecules in a Dense Fluid

Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.

An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.

KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.

Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.

Spatial Data Management

Data Transformation: Normalization

Data Science Algorithms: The Basic Methods

Distance Computation “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presentation by Julie Letchner.

Real-Time Ray Tracing Stefan Popov.

KD Tree A binary search tree where every node is a

Instance Based Learning

Linear Discrimination

Data Mining CSCI 307, Spring 2019 Lecture 24

Data Mining CSCI 307, Spring 2019 Lecture 21

Data Mining CSCI 307, Spring 2019 Lecture 23

Presentation transcript:

Fast N-Body Algorithms for Massive Datasets Alexander Gray Georgia Institute of Technology

Is science in 2007 different from science in 1907? Instruments [Science, Szalay & J. Gray, 2001]

Is science in 2007 different from science in 1907? 1990 COBE 1, Boomerang 10, CBI 50, WMAP 1 Million 2008 Planck 10 Million Data: CMB Maps Data: Local Redshift Surveys 1986 CfA 3, LCRS 23, dF 250, SDSS 800,000 Data: Angular Surveys 1970 Lick 1M 1990 APM 2M 2005 SDSS 200M 2008 LSST 2B Instruments [Science, Szalay & J. Gray, 2001]

Sloan Digital Sky Survey (SDSS)

Size matters! Now possible: low noise: subtle patterns global properties and patterns rare objects and patterns more info: 3d, deeper/earlier, bands in parallel: more accurate simulations 2008: LSST – time-varying phenomena 1 billion objects 144 dimensions (~250M galaxies in 5 colors, ~1M 2000-D spectra)

Happening everywhere! Molecular biology microarray chips Earth sciences satellite topography Neuroscience functional MRI microprocessor nuclear mag. resonanceDrug discovery Physical simulation Internet fiber optics

1.How did galaxies evolve? 2.What was the early universe like? 3.Does dark energy exist? 4.Is our model (GR+inflation) right? Astrophysicist Machine learning/ statistics guy R. Nichol, Inst. Cosmol. Gravitation A. Connolly, U. Pitt Physics C. Miller, NOAO R. Brunner, NCSA G. Kulkarni, Inst. Cosmol. Gravitation D. Wake, Inst. Cosmol. Gravitation R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii Inst. Astronomy G. Richards, Princeton Physics A. Szalay, Johns Hopkins Physics

1.How did galaxies evolve? 2.What was the early universe like? 3.Does dark energy exist? 4.Is our model (GR+inflation) right? Astrophysicist Machine learning/ statistics guy O(N n ) O(N 2 ) O(N 3 ) O(c D T(N)) R. Nichol, Inst. Cosmol. Grav. A. Connolly, U. Pitt Physics C. Miller, NOAO R. Brunner, NCSA G. Kulkarni, Inst. Cosmol. Grav. D. Wake, Inst. Cosmol. Grav. R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii Inst. Astro. G. Richards, Princeton Physics A. Szalay, Johns Hopkins Physics Kernel density estimator n-point spatial statistics Nonparametric Bayes classifier Support vector machine Nearest-neighbor statistics Gaussian process regression Bayesian inference

R. Nichol, Inst. Cosmol. Grav. A. Connolly, U. Pitt Physics C. Miller, NOAO R. Brunner, NCSA G. Kulkarni, Inst. Cosmol. Grav. D. Wake, Inst. Cosmol. Grav. R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii Inst. Astro. G. Richards, Princeton Physics A. Szalay, Johns Hopkins Physics Kernel density estimator n-point spatial statistics Nonparametric Bayes classifier Support vector machine Nearest-neighbor statistics Gaussian process regression Bayesian inference 1.How did galaxies evolve? 2.What was the early universe like? 3.Does dark energy exist? 4.Is our model (GR+inflation) right? Astrophysicist Machine learning/ statistics guy O(N n ) O(N 2 ) O(N 3 ) O(c D T(N))

R. Nichol, Inst. Cosmol. Grav. A. Connolly, U. Pitt Physics C. Miller, NOAO R. Brunner, NCSA G. Kulkarni, Inst. Cosmol. Grav. D. Wake, Inst. Cosmol. Grav. R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii Inst. Astro. G. Richards, Princeton Physics A. Szalay, Johns Hopkins Physics Kernel density estimator n-point spatial statistics Nonparametric Bayes classifier Support vector machine Nearest-neighbor statistics Gaussian process regression Bayesian inference 1.How did galaxies evolve? 2.What was the early universe like? 3.Does dark energy exist? 4.Is our model (GR+inflation) right? But I have 1 million points Astrophysicist Machine learning/ statistics guy O(N n ) O(N 2 ) O(N 3 ) O(c D T(N))

Data: The Stack Apps (User, Science) Perception Computer Vision, NLP, Machine Translation, Bibleome, Autonomous vehicles ML / Opt Machine Learning / Optimization / Linear Algebra / Privacy Data Abstractions DBMS, MapReduce, VOTables, Clustering / Threading Programming with 1000s of powerful compute nodes O/S Network Motherboards / Datacenter ICs

Data: The Stack Apps (User, Science) Perception Computer Vision, NLP, Machine Translation, Bibleome, Autonomous vehicles ML / Opt Machine Learning / Optimization / Linear Algebra / Privacy Data Abstractions DBMS, MapReduce, VOTables, Data structures Clustering / Threading Programming with 1000s of powerful compute nodes O/S Network Motherboards / Datacenter ICs

Making fast algorithms There are many large datasets. There are many questions we want to ask them. –Why we must not get obsessed with one specific dataset. –Why we must not get obsessed with one specific question. The activity I’ll describe is about accerating computations which occur commonly across many ML methods.

Scope Nearest neighbor K-means Hierarchical clustering N-point correlation functions Kernel density estimation Locally-weighted regression Mean shift tracking Mixtures of Gaussians Gaussian process regression Manifold learning Support vector machines Affinity propagation PCA ….

Scope ML methods with distances underneath –Distances only –Continuous kernel functions ML methods with counting underneath

Scope Computational ideas in this tutorial: –Data structures –Monte Carlo –Series expansions –Problem/solution abstractions Challenges –Don’t introduce error, if possible –Don’t introduce tweak parameters, if possible

Two canonical problems Nearest-neighbor search Kernel density estimation

Ideas 1.Data structures and how to use them 2.Monte Carlo 3.Series expansions 4.Problem/solution abstractions

Slides by Jeremy Kubica Nearest Neighbor - Naïve Approach Given a query point X. Scan through each point Y: –Calculate the distance d(X,Y) –If d(X,Y) < best_seen then Y is the new nearest neighbor. Takes O(N) time for each query! 33 Distance Computations

Slides by Jeremy Kubica Speeding Up Nearest Neighbor We can speed up the search for the nearest neighbor: –Examine nearby points first. –Ignore any points that are further then the nearest point found so far. Do this using a KD-tree: –Tree based data structure –Recursively partitions points into axis aligned boxes.

Slides by Jeremy Kubica KD-Tree Construction PtXY ……… We start with a list of n-dimensional points.

Slides by Jeremy Kubica KD-Tree Construction PtXY ……… We can split the points into 2 groups by choosing a dimension X and value V and separating the points into X > V and X <= V. X>.5 PtXY ……… YES NO

Slides by Jeremy Kubica KD-Tree Construction PtXY ……… We can then consider each group separately and possibly split again (along same/different dimension). X>.5 PtXY ……… YES NO

Slides by Jeremy Kubica KD-Tree Construction PtXY ……… We can then consider each group separately and possibly split again (along same/different dimension). X>.5 PtXY ……… YES NO PtXY ……… Y>.1 NO YES

Slides by Jeremy Kubica KD-Tree Construction We can keep splitting the points in each set to create a tree structure. Each node with no children (leaf node) contains a list of points.

Slides by Jeremy Kubica KD-Tree Construction We will keep around one additional piece of information at each node. The (tight) bounds of the points at or below this node.

Slides by Jeremy Kubica KD-Tree Construction Use heuristics to make splitting decisions: Which dimension do we split along? Widest Which value do we split at? Median of value of that split dimension for the points. When do we stop? When there are fewer then m points left OR the box has hit some minimum width.

Slides by Jeremy Kubica Exclusion and inclusion, using point-node kd-tree bounds. O(D) bounds on distance minima/maxima:

Slides by Jeremy Kubica Exclusion and inclusion, using point-node kd-tree bounds. O(D) bounds on distance minima/maxima:

Slides by Jeremy Kubica Nearest Neighbor with KD Trees We traverse the tree looking for the nearest neighbor of the query point.

Slides by Jeremy Kubica Nearest Neighbor with KD Trees Examine nearby points first: Explore the branch of the tree that is closest to the query point first.

Slides by Jeremy Kubica Nearest Neighbor with KD Trees Examine nearby points first: Explore the branch of the tree that is closest to the query point first.

Slides by Jeremy Kubica Nearest Neighbor with KD Trees When we reach a leaf node: compute the distance to each point in the node.

Slides by Jeremy Kubica Nearest Neighbor with KD Trees When we reach a leaf node: compute the distance to each point in the node.

Slides by Jeremy Kubica Nearest Neighbor with KD Trees Then we can backtrack and try the other branch at each node visited.

Slides by Jeremy Kubica Nearest Neighbor with KD Trees Each time a new closest node is found, we can update the distance bounds.

Slides by Jeremy Kubica Nearest Neighbor with KD Trees Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.

Slides by Jeremy Kubica Nearest Neighbor with KD Trees Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.

Slides by Jeremy Kubica Nearest Neighbor with KD Trees Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.

Slides by Jeremy Kubica Simple recursive algorithm (k=1 case) NN(x q,R,d lo,x sofar,d sofar ) { if d lo > d sofar, return. if leaf(R), [x sofar,d sofar ]=NNBase(x q,R,d sofar ). else, [R1,d1,R2,d2]=orderByDist(x q,R.l,R.r). NN(x q,R1,d1,x sofar,d sofar ). NN(x q,R2,d2,x sofar,d sofar ). }

Slides by Jeremy Kubica Nearest Neighbor with KD Trees Instead, some animations showing real data… 1.kd-tree with cached sufficient statistics 2.nearest-neighbor with kd-trees 3.range-count with kd-trees For animations, see:

Range-count example

Pruned! (inclusion)

Range-count example

Pruned! (exclusion)

Range-count example

Some questions Asymptotic runtime analysis? –In a rough sense, O(logN) –But only under some regularity conditions How high in dimension can we go? –Roughly exponential in intrinsic dimension –In practice, in less than 100 dimensions, still big speedups

Another kind of tree Ball-trees, metric trees –Use balls instead of hyperrectangles –Can often be more efficient in high dimension (though not always) –Can work with non-Euclidean metric (you only need to respect the triangle inequality) –Many non-metric similarity measures can be bounded by metric quantities.

A Set of Points in a metric space

Ball Tree root node

A Ball Tree

J. Uhlmann, 1991 S. Omohundro, NIPS 1991

Ball-trees: properties Let Q be any query point and let x be a point inside ball B |x-Q|  |Q - B.center| - B.radius |x-Q|  |Q - B.center| + B.radius Q B.center x

How to build a metric tree, exactly? Must balance quality vs. build-time ‘Anchors hierarchy’ (farthest-points heuristic, 2-approx used in OR) Omohundro: ‘Five ways to build a ball- tree’ Which is the best? A research topic…

Some other trees Cover-tree –Provable worst-case O(logN) under an assumption (bounded expansion constant) –Like a non-binary ball-tree Learning trees –In this conference

‘All’-type problems Nearest-neighbor search All-nearest neighbor (bichromatic): Kernel density estimation ‘All’ version (bichromatic):

Almost always ‘all’-type problems Kernel density estimation Nadaraya-Watson & locally-wgtd regression Gaussian process prediction Radial basis function networks Monochromatic all-nearest neighbor (e.g. LLE) n-point correlation (n-tuples) Always ‘all’-type problems

Dual-tree idea If all the queries are available simultaneously, then it is faster to: 1. Build a tree on the queries as well 2. Effectively process the queries in chunks rather than individually  work is shared between similar query points

Single-tree:

Dual-tree (symmetric):

Exclusion and inclusion, using point-node kd-tree bounds. O(D) bounds on distance minima/maxima:

Exclusion and inclusion, using point-node kd-tree bounds. O(D) bounds on distance minima/maxima:

Exclusion and inclusion, using kd-tree node-node bounds. O(D) bounds on distance minima/maxima: (Analogous to point-node bounds.) Also needed: Nodewise bounds.

Exclusion and inclusion, using kd-tree node-node bounds. O(D) bounds on distance minima/maxima: Also needed: (Analogous to point-node bounds.) Nodewise bounds.

Single-tree: simple recursive algorithm (k=1 case) NN(x q,R,d lo,x sofar,d sofar ) { if d lo > d sofar, return. if leaf(R), [x sofar,d sofar ]=NNBase(x q,R,d sofar ). else, [R1,d1,R2,d2]=orderByDist(x q,R.l,R.r). NN(x q,R1,d1,x sofar,d sofar ). NN(x q,R2,d2,x sofar,d sofar ). }

Single-tree  Dual-tree x q  Q d lo (x q,R)  d lo (Q,R) x sofar  x sofar, d sofar  d sofar store Q.d sofar =max Q d sofar 2-way recursion  4-way recursion N x O(logN)  O(N)

Dual-tree: simple recursive algorithm (k=1) AllNN(Q,R,d lo,x sofar,d sofar ) { if d lo > Q.d sofar, return. if leaf(Q) & leaf(R), [x sofar,d sofar ]=AllNNBase(Q,R,d sofar ). Q.d sofar =max Q d sofar. else if !leaf(Q) & leaf(R), … else if leaf(Q) & !leaf(R), … else if !leaf(Q) & !leaf(R), [R1,d1,R2,d2]=orderByDist(Q.l,R.l,R.r). AllNN(Q.l,R1,d1,x sofar,d sofar ). AllNN(Q.l,R2,d2,x sofar,d sofar ). [R1,d1,R2,d2]=orderByDist(Q.r,R.l,R.r). AllNN(Q.r,R1,d1,x sofar,d sofar ). AllNN(Q.r,R2,d2,x sofar,d sofar ). Q.d sofar = max(Q.l.d sofar,Q.r.d sofar ). }

Query points Reference points Dual-tree traversal (depth-first)

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Query points Reference points Dual-tree traversal

Meta-idea: Higher-order Divide-and-conquer Break each set into pieces. Solving the sub-parts of the problem and combining these sub-solutions appropriately might be easier than doing this over only one set. Generalizes divide-and-conquer of a single set to divide-and-conquer of multiple sets.

Ideas 1.Data structures and how to use them 2.Monte Carlo 3.Series expansions 4.Problem/solution abstractions

2-point correlation r Characterization of an entire distribution? “How many pairs have distance < r ?” 2-point correlation function

The n-point correlation functions Spatial inferences: filaments, clusters, voids, homogeneity, isotropy, 2-sample testing, … Foundation for theory of point processes [Daley,Vere-Jones 1972], unifies spatial statistics [Ripley 1976] Used heavily in biostatistics, cosmology, particle physics, statistical physics 2pcf definition: 3pcf definition:

3-point correlation “How many triples have pairwise distances < r ?” r3r3 r1r1 r2r2 Standard model: n>0 terms should be zero!

How can we count n-tuples efficiently? “How many triples have pairwise distances < r ?”

Use n trees! [Gray & Moore, NIPS 2000]

“How many valid triangles a-b-c (where ) could there be? A B C r count{A,B,C} = ?

“How many valid triangles a-b-c (where ) could there be? count{A,B,C} = count{A,B, C.left } + count{A,B,C.right} A B C r

“How many valid triangles a-b-c (where ) could there be? A B C r count{A,B,C} = count{A,B,C.left} + count{A,B, C.right }

“How many valid triangles a-b-c (where ) could there be? A B C r count{A,B,C} = ?

“How many valid triangles a-b-c (where ) could there be? A B C r Exclusion count{A,B,C} = 0!

“How many valid triangles a-b-c (where ) could there be? AB C count{A,B,C} = ? r

“How many valid triangles a-b-c (where ) could there be? AB C Inclusion count{A,B,C} = |A| x |B| x |C| r Inclusion

Key idea (combinatorial proximity problems): for n-tuples: n-tree recursion

3-point runtime (biggest previous: 20K) VIRGO simulation data, N = 75,000,000 naïve: 5x10 9 sec. (~150 years) multi-tree: 55 sec. (exact) n=2: O(N) n=3: O(N log3 ) n=4: O(N 2 )

But… Depends on r D-1. Slow for large radii. VIRGO simulation data, N = 75,000,000 naïve: ~150 years multi-tree: large h: 24 hrs Let’s develop a method for large radii.

c = p T EASIER? known. hard. no dependence on N! but it does depend on p

c = p T no dependence on N! but it does depend on p

c = p T no dependence on N! but it does depend on p

c = p T no dependence on N! but it does depend on p

c = p T no dependence on N! but it does depend on p

This is junk: don’t bother c = p T

This is promising c = p T

Basic idea: 1. Remove some junk (Run exact algorithm for a while)  make p larger 2. Sample from the rest

Get disjoint sets from the recursion tree … … … [prune] all possible n-tuples

+ + = + + = Now do stratified sampling

Speedup Results VIRGO simulation data N = 75,000,000 naïve: ~150 years multi-tree: large h: 24 hrs multi-tree monte carlo: 99% confidence: 96 sec

Ideas 1.Data structures and how to use them 2.Monte Carlo 3.Multipole methods 4.Problem/solution abstractions

Kernel density estimation

How to use a tree… 1. How to approximate? 2. When to approximate? [Barnes and Hut, Science, 1987] if s R r

How to use a tree… 3. How to know potential error? Let’s maintain bounds on the true kernel sum At the beginning: R

Single-tree: Dual-tree (symmetric): [Gray & Moore 2000] How to use a tree… 4. How to do ‘all’ problem?

rRrR rQrQ s if Generalizes Barnes-Hut to dual-tree RQ

Case 1 – alg. gives no error bounds Case 2 – alg. gives error bounds, but must be rerun Case 3 – alg. automatically achieves error tolerance BUT: We have a tweak parameter: So far we have case 2; let’s try for case 3 Let’s try to make an automatic stopping rule

Finite-difference function approximation. Taylor expansion: Gregory-Newton finite form:

Finite-difference function approximation. assumes monotonic decreasing kernel approximate {Q,R} if

Speedup Results (KDE) One order-of-magnitude speedup over single-tree at ~2M points 12.5K K K K K1976*2 400K7904*5 800K31616*10 1.6M35 hrs23 dual- N naïve tree 5500x

Ideas 1.Data structures and how to use them 2.Monte Carlo 3.Multipole methods 4.Problem/solution abstractions

These are all examples of… Generalized N-body problems General theory and toolkit for designing algorithms for such problems All-NN: 2-point: 3-point: KDE: SPH:

For more… In this conference: Learning trees Monte Carlo for statistical summations Large-scale learning workshop EMST GNP’s and MapReduce-like parallelization Monte Carlo SVD