Tutorial Overview February 2017

Slides:



Advertisements
Similar presentations
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Dibbs Research at Digital Science
Advertisements

SALSASALSASALSASALSA Digital Science Center June 25, 2010, IIT Geoffrey Fox Judy Qiu School.
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science
Data Science at Digital Science October Geoffrey Fox Judy Qiu
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
51 Use Cases and implications for HPC & Apache Big Data Stack Architecture and Ogres International Workshop on Extreme Scale Scientific Computing (Big.
51 Detailed Use Cases: Contributed July-September 2013 Covers goals, data features such as 3 V’s, software, hardware
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
SALSASALSA Large-Scale Data Analysis Applications Computer Vision Complex Networks Bioinformatics Deep Learning Data analysis plays an important role in.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center 1.
Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.
1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.
Public Health February 2017
Geoffrey Fox Panel Talk: February
Panel: Beyond Exascale Computing
Hyungro Lee, Geoffrey C. Fox
VisIt Project Overview
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes for an HPC Enhanced Cloud and Fog Spanning IoT Big Data and Big Simulations.
SPIDAL Analytics Performance February 2017
Digital Science Center II
Biomolecular Simulations February 2017
MIDAS- Molecular Dynamics Analysis Tutorial February 2017
Tutorial: Big Data Algorithms and Applications Under Hadoop
Status and Challenges: January 2017
Image & Model Fitting Abstractions February 2017
Pathology Spatial Analysis February 2017
HPC Cloud Convergence February 2017 Software: MIDAS HPC-ABDS
Implementing parts of HPC-ABDS in a multi-disciplinary collaboration
NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.
Department of Intelligent Systems Engineering
Interactive Website (
Tutorial February 2017 Software: MIDAS HPC-ABDS
Big Data and Simulations: HPC and Clouds
Some Remarks for Cloud Forward Internet2 Workshop
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
Department of Intelligent Systems Engineering
Digital Science Center I
HPSA18: Logistics 7:00 am – 8:00 am Breakfast
I590 Data Science Curriculum August
Applications SPIDAL MIDAS ABDS
High Performance Big Data Computing in the Digital Science Center
NSF Dibbs Award 5 yr. Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science IU(Fox, Qiu, Crandall, von Laszewski),
Data Science Curriculum March
Department of Intelligent Systems Engineering
AI First High Performance Big Data Computing for Industry 4.0
Data Science for Life Sciences Research & the Public Good
Hilton Hotel Honolulu Tapa Ballroom 2 June 26, 2017 Geoffrey Fox
A Tale of Two Convergences: Applications and Computing Platforms
Martin Swany Gregor von Laszewski Thomas Sterling Clint Whaley
Research in Digital Science Center
Scalable Parallel Interoperable Data Analytics Library
Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Discussion: Cloud Computing for an AI First Future
Digital Science Center III
Twister2: Design of a Big Data Toolkit
Department of Intelligent Systems Engineering
$1M a year for 5 years; 7 institutions Active:
PHI Research in Digital Science Center
Panel on Research Challenges in Big Data
Big Data, Simulations and HPC Convergence
Using HPC-ABDS for Streaming Data
Geoffrey Fox High-Performance Big Data Computing: International, National, and Local initiatives COLLABORATORS China and IU: Fudan University, SICE, OVPR.
Research in Digital Science Center
Convergence of Big Data and Extreme Computing
Presentation transcript:

Tutorial Overview February 2017 NSF 1443054: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Software: MIDAS HPC-ABDS Tutorial Overview February 2017 http://dsc.soic.indiana.edu/publications/SPIDALTutorialProgram-Feb2017.pdf

SPIDAL Project Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science NSF14-43054 started October 1, 2014 Indiana University (Fox, Qiu, Crandall, von Laszewski) Rutgers (Jha) Virginia Tech (Marathe) Kansas (Paden) Stony Brook (Wang) Arizona State (Beckstein) Utah (Cheatham) A co-design project: Software, algorithms, applications

Status of NSF 1443054 Project Big Data Application Analysis identifies features of data intensive applications that need to be supported in software and represented in benchmarks. This analysis was started for proposal and has been extended to support HPC-Simulations-Big Data convergence. The project is a collaboration between computer and domain scientists in application areas in Biomolecular Simulations, Network Science, Epidemiology, Computer Vision, Spatial Geographical Information Systems, Remote Sensing for Polar Science and Pathology Informatics. HPC-ABDS as Cloud-HPC interoperable software with performance of HPC (High Performance Computing) and the rich functionality of the commodity Apache Big Data Stack was a bold idea developed for proposal. We have successfully delivered and extended this approach, which is one of ideas described in Exascale Big Data report.

Status of NSF 1443054 Project MIDAS integrating middleware that links HPC and ABDS now has several components including an architecture for Big Data analytics, an integration of HPC in communication and scheduling on ABDS; it also has rules to get high performance Java scientific code. SPIDAL (Scalable Parallel Interoperable Data Analytics Library) now has 20 members with domain specific (general) and core algorithms. Benchmarks. We reached out to database community with keynote and paper at WBDB2015 Benchmarking Workshop. Language: SPIDAL Java runs as fast as C++ Designed and Proposed HPCCloud as hardware-software infrastructure supporting Big Data Big Simulation Convergence Big Data Management via Apache Stack ABDS Big Data Analytics using SPIDAL and other libraries

HPC and/or Cloud 1.0 2.0 3.0 Cloud 1.0: IaaS PaaS Cloud 2.0: DevOps Cloud 3.0: Insight (Solution) as a Service from IBM; server-less computing; event driven function as a service HPC 1.0 and Cloud 1.0 separate ecosystems HPCCloud or HPC-ABDS: Take performance of HPC and functionality of Cloud (Big Data) systems HPCCloud 2.0 Use DevOps to invoke HPC or Cloud software on VM, Docker, HPC infrastructure HPCCloud 3.0 Automate Solution as a Service using HPC-ABDS on varied infrastructure suitable for HPC and Big Data Management and Analytics

Contents of Tutorial I Introduction HPC Cloud and Big Data, Big Simulation, HPC Convergence Big Data Use Cases: Ogres and Diamonds. Examples available Examples of HPC Analytics and applications: Clustering, dimension reduction, visualization Deterministic Annealing Algorithms for Big Data Analytics WebPlotViz browser 3D viewer Tutorial to use these SPIDAL tools General HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack SPIDAL Java making Java run fast Parallel Performance of core SPIDAL Data Analytics Comparison of Parallel and Distributed Computing in MPI, Spark, Flink

Contents of Tutorial II HPC ABDS MIDAS Components RADICAL-Pilot, Pilot-YARN, Pilot-Spark Technology and Application to Biomolecular Simulations Harp Plugin for Hadoop and several algorithms (LDA) Streaming Applications in HPC-ABDS SPIDAL Algorithms General discussion Graph Algorithms SPIDAL Image & Model Fitting Abstractions

Contents of Tutorial III Polar Science Applications Pathology Applications Public Health Geospatial applications MDAnalysis Biomolecular Simulations HPC Cloud Convergence HPCCloud and Software Defined Systems SPIDAL Status and Challenges