Geoffrey Fox Panel Talk: February

Slides:



Advertisements
Similar presentations
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Dibbs Research at Digital Science
Advertisements

Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October Chicago Geoffrey Fox
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
1 Challenges Facing Modeling and Simulation in HPC Environments Panel remarks ECMS Multiconference HPCS 2008 Nicosia Cyprus June Geoffrey Fox Community.
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
DISTRIBUTED COMPUTING
Data Science at Digital Science October Geoffrey Fox Judy Qiu
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
NA-MIC National Alliance for Medical Image Computing Core 1b – Engineering Computational Platform Jim Miller GE Research.
Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
DS-Grid: Large Scale Distributed Simulation on the Grid Georgios Theodoropoulos Midlands e-Science Centre University of Birmingham, UK Stephen John Turner,
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
SALSASALSA Large-Scale Data Analysis Applications Computer Vision Complex Networks Bioinformatics Deep Learning Data analysis plays an important role in.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
Centre of Excellence in Physics at Extreme Scales Richard Kenway.
1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.
Public Health February 2017
Big Data Analytics and HPC Platforms
Panel: Beyond Exascale Computing
SPIDAL Java Optimized February 2017 Software: MIDAS HPC-ABDS
Optimization: Algorithms and Applications
SPIDAL Analytics Performance February 2017
Geoffrey Fox, Shantenu Jha, Dan Katz, Judy Qiu, Jon Weissman
MIDAS- Molecular Dynamics Analysis Tutorial February 2017
Status and Challenges: January 2017
Image & Model Fitting Abstractions February 2017
Pathology Spatial Analysis February 2017
Design and Manufacturing in a Distributed Computer Environment
HPC Cloud Convergence February 2017 Software: MIDAS HPC-ABDS
National e-Infrastructure Vision
Modern Data Management
NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.
University of Technology
Interactive Website (
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
Department of Intelligent Systems Engineering
Digital Science Center I
iSERVOGrid Architecture Working Group Brisbane Australia June
HPSA18: Logistics 7:00 am – 8:00 am Breakfast
I590 Data Science Curriculum August
Applications SPIDAL MIDAS ABDS
Summary of Streaming Data Workshop STREAM2015 October
High Performance Big Data Computing in the Digital Science Center
Data Science Curriculum March
Tutorial Overview February 2017
EXCITED Workshop Suvrajeet Sen, DMII, ENG – CI Coordinator Workshop
Department of Intelligent Systems Engineering
Data Science for Life Sciences Research & the Public Good
Hilton Hotel Honolulu Tapa Ballroom 2 June 26, 2017 Geoffrey Fox
Martin Swany Gregor von Laszewski Thomas Sterling Clint Whaley
Research in Digital Science Center
Scalable Parallel Interoperable Data Analytics Library
Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Summary of Streaming Data Workshop STREAM2015 October
Panel: Revisiting Distributed Simulation and the Grid
Discussion: Cloud Computing for an AI First Future
Cyberinfrastructure and PolarGrid
Chapter 7 –Implementation Issues
Embedded and Real-Time Systems
Department of Intelligent Systems Engineering
Digital Science Center
$1M a year for 5 years; 7 institutions Active:
PHI Research in Digital Science Center
PolarGrid and FutureGrid
Panel on Research Challenges in Big Data
Digital Science Center
Big Data, Simulations and HPC Convergence
Geoffrey Fox High-Performance Big Data Computing: International, National, and Local initiatives COLLABORATORS China and IU: Fudan University, SICE, OVPR.
Convergence of Big Data and Extreme Computing
Presentation transcript:

Geoffrey Fox Panel Talk: February 15 2017 Software: MIDAS HPC-ABDS NSF 1443054: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Geoffrey Fox Panel Talk: February 15 2017

Current challenges and opportunities for big data research and development

Some Questions? Discover what data analytics is being used or could be used in different research areas. e.g. How important is deep learning in a general science field? e.g. what data analytics is needed for Precision Medicine, SKA, explosion of light source image data This is non-trivial as need training as not so many experts in both application fields and modern big data approaches Get a better consensus as to requirements of infrastructure What fraction of data analysis can use modern cloud technology What fraction require high performance computing How important is streaming data How important is interactive use Integrate the many distributed instruments (microscopes, sequencers) Implications of exascale technologies and “End of Moore’s Law” Research replacement of O(N2) algorithms systematically to O(NlogN)

Status of NSF 1443054 Project Big Data Application Analysis identifies features of data intensive applications that need to be supported in software and represented in benchmarks. This analysis was started for proposal and has been extended to support HPC-Simulations-Big Data convergence. The project is a collaboration between computer and domain scientists in application areas in Biomolecular Simulations, Network Science, Epidemiology, Computer Vision, Spatial Geographical Information Systems, Remote Sensing for Polar Science and Pathology Informatics. HPC-ABDS as Cloud-HPC interoperable software with performance of HPC (High Performance Computing) and the rich functionality of the commodity Apache Big Data Stack was an idea developed for proposal. We have successfully delivered and extended this approach, which is one of ideas described in Exascale Big Data report.

Status of NSF 1443054 Project MIDAS integrating middleware that links HPC and ABDS now has several components including an architecture for Big Data analytics, an integration of HPC in communication and scheduling on ABDS; it also has rules to get high performance Java scientific code. SPIDAL (Scalable Parallel Interoperable Data Analytics Library) now has 20 members with domain specific (general) and core algorithms. Benchmarks. Need to develop benchmarks covering these issues as most big data benchmarks have a commercial focus Language: SPIDAL Java runs as fast as C++ Designed and Proposed HPCCloud as hardware-software infrastructure supporting Big Data Big Simulation Convergence Big Data Management via Apache Stack ABDS Big Data Analytics using SPIDAL and other libraries