Interactive Website (

Slides:

Advertisements

Similar presentations

Matei Zaharia Large-Scale Matrix Operations Using a Data Flow Engine.

Advertisements

HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC

Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,

Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.

Large Scale Distributed Distance Metric Learning by Pengtao Xie and Eric Xing PRESENTED BY: PRIYANKA.

SALSASALSA Large-Scale Data Analysis Applications Computer Vision Complex Networks Bioinformatics Deep Learning Data analysis plays an important role in.

Building faster data applications on spark* clusters using Intel® DAAL.

Scaling up R computation with high performance computing resources.

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

Apache Ignite Compute Grid Research Corey Pentasuglia.

1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.

Geoffrey Fox Panel Talk: February

Big Data Analytics and HPC Platforms

SPIDAL Java Optimized February 2017 Software: MIDAS HPC-ABDS

TensorFlow– A system for large-scale machine learning

Dimensionality Reduction and Principle Components Analysis

Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes for an HPC Enhanced Cloud and Fog Spanning IoT Big Data and Big Simulations.

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

SPIDAL Analytics Performance February 2017

Digital Science Center II

Early Results of Deep Learning on the Stampede2 Supercomputer

Department of Intelligent Systems Engineering

A survey of Exascale Linear Algebra Libraries for Data Assimilation

Accelerating Machine Learning with Model-Centric Approach on Emerging Architectures Many-Task Computing on Clouds, Grids, and Supercomputers Workshop.

Status and Challenges: January 2017

Enabling machine learning in embedded systems

NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.

Department of Intelligent Systems Engineering

Research in Digital Science Center

Big Data Processing Issues taking care of Application Requirements, Hardware, HPC, Grid (distributed), Edge and Cloud Computing Geoffrey Fox, November.

Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes from Cloud to Edge Applications The 15th IEEE International Symposium on.

NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.

Department of Intelligent Systems Engineering

Twister2: A High-Performance Big Data Programming Environment

I590 Data Science Curriculum August

High Performance Big Data Computing in the Digital Science Center

Convergence of HPC and Clouds for Large-Scale Data enabled Science

Apache Spark Vibhatha Abeykoon Tyler Balson Gregor von Laszewski

Data Science Curriculum March

Department of Intelligent Systems Engineering

Tutorial Overview February 2017

CMPT 733, SPRING 2016 Jiannan Wang

Department of Intelligent Systems Engineering

Early Results of Deep Learning on the Stampede2 Supercomputer

AI First High Performance Big Data Computing for Industry 4.0

Summary Background Introduction in algorithms and applications

13th Cloud Control Workshop, June 13-15, 2018

A Tale of Two Convergences: Applications and Computing Platforms

Scalable Parallel Interoperable Data Analytics Library

HPC Cloud and Big Data Testbed

HPML Conference, Lyon, Sept 2018

10th IEEE/ACM International Conference on Utility and Cloud Computing

Indiana University, Bloomington

Tutorial: Harp-DAAL for High Performance Big Data Machine Learning

Twister2: Design of a Big Data Toolkit

Department of Intelligent Systems Engineering

2 Programming Environment for Global AI and Modeling Supercomputer GAIMSC 2/19/2019.

Introduction to Twister2 for Tutorial

$1M a year for 5 years; 7 institutions Active:

PHI Research in Digital Science Center

Assoc. Prof. Marc FRÎNCU, PhD. Habil.

Big Data, Simulations and HPC Convergence

Motivation Contemporary big data tools such as MapReduce and graph processing tools have fixed data abstraction and support a limited set of communication.

Intelligent Systems Engineering Department, Indiana University

Research in Digital Science Center

Research in Digital Science Center

Convergence of Big Data and Extreme Computing

Twister2 for BDEC2 Poznan, Poland Geoffrey Fox, May 15,

Presentation transcript:

Interactive Website (https://dexterrules.github.io/SC-Demo-17/SC-Demo.html)

HPC-ABDS is Cloud-HPC interoperable software with the performance of HPC (High Performance Computing) and the rich functionality of the commodity Apache Big Data Stack. This concept is illustrated by Harp-DAAL. High Level Usability: Python Interface, well documented and packaged modules Middle Level Data-Centric Abstractions: Computation Model and optimized communication patterns Low Level optimized for Performance: HPC kernels Intel® DAAL and advanced hardware platforms such as Xeon and Xeon Phi Harp-DAAL

Datasets: 5 million points, 10 thousand centroids, 10 feature dimensions 10 to 20 nodes of Intel KNL7250 processors Harp-DAAL has 15x speedups over Spark MLlib Datasets: 500K or 1 million data points of feature dimension 300 Running on single KNL 7250 (Harp-DAAL) vs. single K80 GPU (PyTorch) Harp-DAAL achieves 3x to 6x speedups Datasets: Twitter with 44 million vertices, 2 billion edges, subgraph templates of 10 to 12 vertices 25 nodes of Intel Xeon E5 2670 Harp-DAAL has 2x to 5x speedups over state-of-the-art MPI-Fascia solution

Explore Algorithms Alternating least squares (ALS) PCA Alternating least squares (ALS) SubGraph Counting MatrixFactorization-SGD

Computation Models Model-Centric Synchronization Paradigm Distributed Memory broadcast reduce allgather allreduce regroup push & pull rotate sendEvent getEvent waitEvent Communication Operations Computation Models Model-Centric Synchronization Paradigm Computation Models Locking Rotation Asynchronous Allreduce Intel® DAAL, MKL High Performance Library Shared Memory Schedulers Dynamic Scheduler Static Scheduler Solutions to Big Data Problems

Example: K-means Clustering The Allreduce Computation Model Model Worker When the model size cannot be held in each machine’s memory When the model size is large but can still be held in each machine’s memory When the model size is small allgather regroup push & pull broadcast reduce allreduce rotate

result: sample images of the result clusters, one row for each cluster

Harp-DAAL: Prototype and Production Code Source codes became available on Github in February, 2017. Harp-DAAL follows the same standard of DAAL’s original codes Twelve Applications Harp-DAAL Kmeans Harp-DAAL MF-SGD Harp-DAAL MF-ALS Harp-DAAL SVD Harp-DAAL PCA Harp-DAAL Neural Networks Harp-DAAL Naïve Bayes Harp-DAAL Linear Regression Harp-DAAL Ridge Regression Harp-DAAL QR Decomposition Harp-DAAL Low Order Moments Harp-DAAL Covariance Harp-DAAL: Prototype and Production Code Available at https://dsc-spidal.github.io/harp