Presentation is loading. Please wait.

Presentation is loading. Please wait.

Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and.

Similar presentations


Presentation on theme: "Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and."— Presentation transcript:

1 Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and Statistical Tools

2 The FASTlab Fundamental Algorithmic and Statistical Tools Laboratory 1.Arkadas Ozakin: Research scientist, PhD Theoretical Physics 2.Dong Ryeol Lee: PhD student, CS + Math 3.Ryan Riegel: PhD student, CS + Math 4.Parikshit Ram: PhD student, CS + Math 5.William March: PhD student, Math + CS 6.James Waters: PhD student, Physics + CS 7.Hua Ouyang: PhD student, CS 8.Sooraj Bhat: PhD student, CS 9.Ravi Sastry: PhD student, CS 10.Long Tran: PhD student, CS 11.Michael Holmes: PhD student, CS + Physics (co-supervised) 12.Nikolaos Vasiloglou: PhD student, EE (co-supervised) 13.Wei Guan: PhD student, CS (co-supervised) 14.Nishant Mehta: PhD student, CS (co-supervised) 15.Wee Chin Wong: PhD student, ChemE (co-supervised) 16.Abhimanyu Aditya: MS student, CS 17.Yatin Kanetkar: MS student, CS 18.Praveen Krishnaiah: MS student, CS 19.Devika Karnik: MS student, CS 20.Prasad Jakka: MS student, CS

3 10 sample tasks “Find engines like this one” (querying) “Plot the distribution of engine sizes and emissions” (density estimation) “Predict the lifetime maintenance cost” (regression) “Predict existence of fault or not” (classification) “Predict the number of failures next year” (time series analysis) “Show all engines on a 2-d plot” (dimension reduction) “Show or remove the unusual engines” (outlier detection) “Show the different types of engines” (clustering) “Is this group equivalent to this group?” (two-sample testing) “What’s the best action to take based on this behavior?” (reinforcement learning/control) Types of data: Sensor measurements Documents Database records, etc.

4 Rankmap Can do manifold learning using only ordinal data

5 Isometric Separation Maps Preserve class proximity

6 Density-Preserving Maps Preserve densities, not distances

7 The problem: big datasets D N M Could be large: N (#data), D (#features), M (#models)

8 Dual-tree All-nearest-neighbors O(N2)  O(N)

9 Rank-approximate Nearest-neighbor Search Distance approximation  rank approximation

10 Multi-scale Decompositions e.g. kd-trees [Bentley 1975], [Friedman, Bentley & Finkel 1977],[Moore & Lee 1995] How can we compute these efficiently?

11 A kd-tree: level 1

12 A kd-tree: level 2

13 A kd-tree: level 3

14 A kd-tree: level 4

15 A kd-tree: level 5

16 A kd-tree: level 6

17 Some application highlights Our software is being put into the pipelines of the world’s massive- scale science projects –Astronomy sky surveys (LSST, Pan- STARRS, DES): 1B objects/month –Large Hadron Collider: 1M events/sec

18 Some application highlights Others –McAfee spam blacklisting: 300M emails/day –Supermarket demand forecasting –Algorithmic trading –Audio fingerprint matching –Legal document browsing and search

19 Software MLPACK (C++) –First scalable comprehensive ML library MLPACK-db –fast data analytics in relational databases (SQL Server) MLPACK Pro - Very-large-scale data


Download ppt "Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and."

Similar presentations


Ads by Google