Panel: New Opportunities in High Performance Data Analytics (HPDA) and High Performance Computing (HPC) The 2014 International Conference.

Slides:

Advertisements

Similar presentations

Information retrieval – LSI, pLSI and LDA

Advertisements

Scalable High Performance Dimension Reduction

Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.

Supervised Learning Recap

Parallelizing Data Analytics INTERNATIONAL ADVANCED RESEARCH WORKSHOP ON HIGH PERFORMANCE COMPUTING From Clouds and Big Data to Exascale.

High Performance Dimension Reduction and Visualization for Large High-dimensional Data Analysis Jong Youl Choi, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Neural Networks II CMPUT 466/551 Nilanjan Ray. Outline Radial basis function network Bayesian neural network.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.

FLANN Fast Library for Approximate Nearest Neighbors

CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.

Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.

HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC

Collaborative Filtering Matrix Factorization Approach

Data Mining Runtime Software and Algorithms BigDat 2015: International Winter School on Big Data Tarragona, Spain, January 26-30, 2015 January

Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox

X-Informatics Web Search; Text Mining B 2013 Geoffrey Fox Associate Dean for.

Remarks on Big Data Clustering (and its visualization) Big Data and Extreme-scale Computing (BDEC) Charleston SC May Geoffrey Fox

Big Data Ogres and their Facets Geoffrey Fox, Judy Qiu, Shantenu Jha, Saliya Ekanayake Big Data Ogres are an attempt to characterize applications and algorithms.

Generative Topographic Mapping by Deterministic Annealing Jong Youl Choi, Judy Qiu, Marlon Pierce, and Geoffrey Fox School of Informatics and Computing.

Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.

Community Grids Lab. Indiana University, Bloomington Seung-Hee Bae.

Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm Seung-Hee Bae, Judy Qiu, and Geoffrey Fox SALSA group in Pervasive.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.

Deterministic Annealing Dimension Reduction and Biology Indiana University Environmental Genomics April Geoffrey.

Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

51 Detailed Use Cases: Contributed July-September 2013 Covers goals, data features such as 3 V’s, software, hardware

Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox

Chapter 13 (Prototype Methods and Nearest-Neighbors )

CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.

Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.

Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.

Page 1 Cloud Study: Algorithm Team Mahout Introduction 박성찬 IDS Lab.

Deterministic Annealing and Robust Scalable Data mining for the Data Deluge Petascale Data Analytics: Challenges, and Opportunities (PDAC-11) Workshop.

Optimization Indiana University July Geoffrey Fox

Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016.

Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Visualizing and Clustering Life Science Applications in Parallel HiCOMB th IEEE International Workshop on High Performance Computational Biology.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.

Geoffrey Fox Panel Talk: February

Panel: Beyond Exascale Computing

Optimization: Algorithms and Applications

Deep Feedforward Networks

Image & Model Fitting Abstractions February 2017

NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.

High Performance Big Data Computing in the Digital Science Center

Data Science Curriculum March

Latent Dirichlet Analysis

Advanced Artificial Intelligence

Probabilistic Models with Latent Variables

Collaborative Filtering Matrix Factorization Approach

Scalable Parallel Interoperable Data Analytics Library

Adaptive Interpolation of Multidimensional Scaling

SPIDAL Presentation December

Stochastic Optimization Maximization for Latent Variable Models

Department of Intelligent Systems Engineering

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Junghoo “John” Cho UCLA

Indiana University July Geoffrey Fox

PHI Research in Digital Science Center

Panel on Research Challenges in Big Data

EM Algorithm and its Applications

MDS and Visualization September Geoffrey Fox

Convergence of Big Data and Extreme Computing

Presentation transcript:

Panel: New Opportunities in High Performance Data Analytics (HPDA) and High Performance Computing (HPC) The 2014 International Conference on High Performance Computing & Simulation (HPCS 2014) July 21 – 25, 2014 The Savoia Hotel Regency Bologna (Italy) July Geoffrey Fox School of Informatics and Computing Digital Science Center Indiana University Bloomington

SPIDAL (Scalable Parallel Interoperable Data Analytics Library) 2

Introduction to SPIDAL Learn from success of PetSc, Scalapack etc. as HPC Libraries Here discuss Global Machine Learning GML as part of SPIDAL (Scalable Parallel Interoperable Data Analytics Library) – GML = Machine Learning parallelized over nodes – LML = Pleasingly Parallel; Machine Learning on each node Surprisingly little packaged scalable GML – Apache: Mahout low performance and MLlib just starting – R largely sequential (best for local machine learning LML) Our experience based on four big data algorithms – Dimension Reduction (Multi Dimensional Scaling) – Levenberg-Marquardt Optimization – Clustering: similar to Gaussian Mixture Models, PLSI (probabilistic latent semantic indexing), LDA (Latent Dirichlet Allocation) – Deep Learning 3

Some Core Machine Learning Building Blocks 4 AlgorithmApplicationsFeaturesStatus//ism DA Vector Clustering Accurate ClustersVectorsP-DMGML DA Non metric Clustering Accurate Clusters, Biology, WebNon metric, O(N 2 )P-DMGML Kmeans; Basic, Fuzzy and Elkan Fast ClusteringVectorsP-DMGML Levenberg-Marquardt Optimization Non-linear Gauss-Newton, use in MDS Least SquaresP-DMGML SMACOF Dimension Reduction DA- MDS with general weights Least Squares, O(N 2 ) P-DMGML Vector Dimension Reduction DA-GTM and OthersVectorsP-DMGML TFIDF Search Find nearest neighbors in document corpus Bag of “words” (image features) P-DMPP All-pairs similarity search Find pairs of documents with TFIDF distance below a threshold TodoGML Support Vector Machine SVM Learn and ClassifyVectorsSeqGML Random Forest Learn and ClassifyVectorsP-DMPP Gibbs sampling (MCMC) Solve global inference problemsGraphTodoGML Latent Dirichlet Allocation LDA with Gibbs sampling or Var. Bayes Topic models (Latent factors)Bag of “words”P-DMGML Singular Value Decomposition SVD Dimension Reduction and PCAVectorsSeqGML Hidden Markov Models (HMM) Global inference on sequence models VectorsSeq PP & GML

Some General Issues Parallelism 5

Some Parallelism Issues All use parallelism over data points – Entities to cluster or map to Euclidean space Except deep learning which has parallelism over pixel plane in neurons not over items in training set – as need to look at small numbers of data items at a time in Stochastic Gradient Descent Maximum Likelihood or  2 both lead to structure like Minimize sum  items=1 N (Positive nonlinear function of unknown parameters for item i) All solved iteratively with (clever) first or second order approximation to shift in objective function – Sometimes steepest descent direction; sometimes Newton – Have classic Expectation Maximization structure 6

Parameter “Server” Note learning networks have huge number of parameters (11 billion in Stanford work) so that inconceivable to look at second derivative Clustering and MDS have lots of parameters but can be practical to look at second derivative and use Newton’s method to minimize Parameters are determined in distributed fashion but are typically needed globally – MPI use broadcast and “All.. Collectives” – AI community: use parameter server and access as needed 7

Some Important Cases Need to cover non vector semimetric and vector spaces for clustering and dimension reduction (N points in space) Vector spaces have Euclidean distance and scalar products – Algorithms can be O(N) and these are best for clustering but for MDS O(N) methods may not be best as obvious objective function O(N 2 ) MDS Minimizes Stress  (X) =  i<j =1 N weight(i,j) (  (i, j) - d(X i, X j )) 2 Semimetric spaces just have pairwise distances defined between points in space  (i, j) Note matrix solvers all use conjugate gradient – converges in iterations – a big gain for matrix with a million rows. This removes factor of N in time complexity – Full matrices not sparse as in HPCG In clustering, ratio of #clusters to #points important; new ideas if ratio >~ 0.1 There is quite a lot of work on clever methods of reducing O(N 2 ) to O(N) and logs – This is extensively used in search but not in “arithmetic” as in MDS or semimetric clustering – Arithmetic similar to fast multipole methods in O(N 2 ) particle dynamics 8

Some Futures Always run MDS. Gives insight into data – Leads to a data browser as GIS gives for spatial data Claim is algorithm change gave as much performance increase as hardware change in simulations. Will this happen in analytics? – Today is like parallel computing 30 years ago with regular meshs. – We will learn how to adapt methods automatically to give “multigrid” and “fast multipole” like algorithms Need to start developing the libraries that support Big Data – Understand architectures issues – Have coupled batch and streaming versions – Develop much better algorithms Please join SPIDAL (Scalable Parallel Interoperable Data Analytics Library) community 9