NSF14-43054 start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.

Slides:



Advertisements
Similar presentations
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Dibbs Research at Digital Science
Advertisements

HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science
Data Science at Digital Science October Geoffrey Fox Judy Qiu
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center 1.
1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.
Department of Intelligent Systems Engineering
Geoffrey Fox Panel Talk: February
Panel: Beyond Exascale Computing
Hyungro Lee, Geoffrey C. Fox
SPIDAL Java Optimized February 2017 Software: MIDAS HPC-ABDS
Digital Science Center
SPIDAL Analytics Performance February 2017
Digital Science Center II
Department of Intelligent Systems Engineering
MIDAS- Molecular Dynamics Analysis Tutorial February 2017
Status and Challenges: January 2017
Pathology Spatial Analysis February 2017
HPC 2016 HIGH PERFORMANCE COMPUTING
HPC Cloud Convergence February 2017 Software: MIDAS HPC-ABDS
Big Data, Simulations and HPC Convergence
Implementing parts of HPC-ABDS in a multi-disciplinary collaboration
Department of Intelligent Systems Engineering
Research in Digital Science Center
Distinguishing Parallel and Distributed Computing Performance
Big Data Processing Issues taking care of Application Requirements, Hardware, HPC, Grid (distributed), Edge and Cloud Computing Geoffrey Fox, November.
Tutorial February 2017 Software: MIDAS HPC-ABDS
Theme 4: High-performance computing for Precision Health Initiative
Structure of Applications and Infrastructure in Convergence of High Performance Computing and Big Data OSTRAVA, CZECH REPUBLIC, September 7 - 9, 2016 Geoffrey.
Big Data and Simulations: HPC and Clouds
Some Remarks for Cloud Forward Internet2 Workshop
NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
Department of Intelligent Systems Engineering
Digital Science Center I
I590 Data Science Curriculum August
Applications SPIDAL MIDAS ABDS
High Performance Big Data Computing in the Digital Science Center
Big Data and Simulations: HPC and Clouds
NSF Dibbs Award 5 yr. Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science IU(Fox, Qiu, Crandall, von Laszewski),
Data Science Curriculum March
Big Data on Clouds and HPC
Tutorial Overview February 2017
Department of Intelligent Systems Engineering
AI First High Performance Big Data Computing for Industry 4.0
Data Science for Life Sciences Research & the Public Good
A Tale of Two Convergences: Applications and Computing Platforms
Martin Swany Gregor von Laszewski Thomas Sterling Clint Whaley
Research in Digital Science Center
Scalable Parallel Interoperable Data Analytics Library
Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Distinguishing Parallel and Distributed Computing Performance
HPC Cloud and Big Data Testbed
Digital Science Center III
Twister2: Design of a Big Data Toolkit
Department of Intelligent Systems Engineering
$1M a year for 5 years; 7 institutions Active:
PHI Research in Digital Science Center
Big Data, Simulations and HPC Convergence
Using HPC-ABDS for Streaming Data
ScaDS and BBDC Big Data All-Hands-Meeting June Dresden Geoffrey Fox
Geoffrey Fox High-Performance Big Data Computing: International, National, and Local initiatives COLLABORATORS China and IU: Fudan University, SICE, OVPR.
Research in Digital Science Center
Convergence of Big Data and Extreme Computing
Twister2 for BDEC2 Poznan, Poland Geoffrey Fox, May 15,
Presentation transcript:

NSF14-43054 start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University (Fox, Qiu, Crandall, von Laszewski), Rutgers (Jha) Virginia Tech (Marathe) Kansas (Paden) Stony Brook (Wang) Arizona State(Beckstein) Utah(Cheatham) Overview by Gregor von Laszewski April 6 2016 http://spidal.org/ http://news.indiana.edu/releases/iu/2014/10/big-data-dibbs-grant.shtml http://www.nsf.gov/awardsearch/showAward?AWD_ID=1443054 04/6/2016

Some Important Components of SPIDAL Dibbs NIST Big Data Application Analysis – features of data intensive appls HPC-ABDS: Cloud-HPC interoperable software performance of HPC (High Performance Computing) and the rich functionality of the commodity Apache Big Data Stack. This is reservoir of software subsystems – nearly all from outside project and mix of HPC and Big Data communities Leads to Big Data – Simulation - HPC Convergence MIDAS: Integrating Middleware – from project Applications: Biomolecular Simulations, Network and Computational Social Science, Epidemiology, Computer Vision, Spatial Geographical Information Systems, Remote Sensing for Polar Science and Pathology Informatics. SPIDAL (Scalable Parallel Interoperable Data Analytics Library): Scalable Analytics for Domain specific data analytics libraries – mainly from project Add Core Machine learning Libraries – mainly from community Performance of Java and MIDAS Inter- and Intra-node Benchmarks – project adds to community; See WBDB 2015 Seventh Workshop on Big Data Benchmarking 04/6/2016

64 Features in 4 views for Unified Classification of Big Data and Simulation Applications Simulations Analytics (Model for Data) Both (All Model) (Nearly all Data+Model) (Nearly all Data) (Mix of Data and Model) 04/6/2016

Java MPI performs better than Threads 128 24 core Haswell nodes on SPIDAL DA-MDS Code Best MPI; inter and intra node Best Threads intra node MPI inter node 04/6/2016

Big Data and (Exascale) Simulation Convergence II 04/6/2016

HPC-ABDS Mapping of Activities Level 17: Orchestration: Apache Beam (Google Cloud Dataflow) integrated with Cloudmesh Level 16: Applications. Datamining for molecular dynamics, Image processing for remote sensing and pathology, graphs, streaming Level 16: Algorithms. MLlib, Mahout, SPIDAL Release SPIDAL Clustering, Dimension Reduction (multi-dimensional scaling), Web point set visualization Level 14: Programming. Storm, Heron, Hadoop, Spark, Flink. Inter and Intra-node performance Level 13: Communication. Enhanced Storm and Hadoop using HPC runtime technologies Level 11: Data management. Hbase and MongoDB integrated via use of Beam and other Apache tools Level 9: Cluster Management. Integrate Pilot Jobs with Yarn, Spark, Hadoop Level 6: DevOps. Python Cloudmesh virtual Cluster Interoperability 04/6/2016