Digital Science Center I

Slides:



Advertisements
Similar presentations
TEMPLATE DESIGN © High Performance Molecular Dynamics in Cloud Infrastructure with SR-IOV and GPUDirect Andrew J. Younge.
Advertisements

Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Dibbs Research at Digital Science
SALSASALSASALSASALSA Digital Science Center June 25, 2010, IIT Geoffrey Fox Judy Qiu School.
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science
Data Science at Digital Science October Geoffrey Fox Judy Qiu
Scientific Computing Environments ( Distributed Computing in an Exascale era) August Geoffrey Fox
Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
SALSASALSA Large-Scale Data Analysis Applications Computer Vision Complex Networks Bioinformatics Deep Learning Data analysis plays an important role in.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
SPIDAL Java High Performance Data Analytics with Java on Large Multicore HPC Clusters
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center 1.
1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.
Geoffrey Fox Panel Talk: February
Hyungro Lee, Geoffrey C. Fox
SPIDAL Java Optimized February 2017 Software: MIDAS HPC-ABDS
Digital Science Center
SPIDAL Analytics Performance February 2017
Digital Science Center II
Status and Challenges: January 2017
Big Data, Simulations and HPC Convergence
Implementing parts of HPC-ABDS in a multi-disciplinary collaboration
NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.
Department of Intelligent Systems Engineering
Research in Digital Science Center
Distinguishing Parallel and Distributed Computing Performance
Big Data and Simulations: HPC and Clouds
Some Remarks for Cloud Forward Internet2 Workshop
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
IEEE BigData 2016 December 5-8, Washington D.C.
Department of Intelligent Systems Engineering
HPSA18: Logistics 7:00 am – 8:00 am Breakfast
I590 Data Science Curriculum August
Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.
High Performance Big Data Computing in the Digital Science Center
Convergence of HPC and Clouds for Large-Scale Data enabled Science
Research in Intelligent Systems Engineering
NSF Dibbs Award 5 yr. Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science IU(Fox, Qiu, Crandall, von Laszewski),
Data Science Curriculum March
Development of the Nanoconfinement Science Gateway
Tutorial Overview February 2017
AI First High Performance Big Data Computing for Industry 4.0
Data Science for Life Sciences Research & the Public Good
13th Cloud Control Workshop, June 13-15, 2018
Martin Swany Gregor von Laszewski Thomas Sterling Clint Whaley
Distinguishing Parallel and Distributed Computing Performance
Research in Digital Science Center
Scalable Parallel Interoperable Data Analytics Library
Distinguishing Parallel and Distributed Computing Performance
Clouds from FutureGrid’s Perspective
Digital Science Center III
Towards High Performance Data Analytics with Java
Indiana University, Bloomington
Indiana University Gregor von Laszewski.
Twister2: Design of a Big Data Toolkit
Department of Intelligent Systems Engineering
$1M a year for 5 years; 7 institutions Active:
PHI Research in Digital Science Center
Panel on Research Challenges in Big Data
Big Data, Simulations and HPC Convergence
Research in Digital Science Center
Geoffrey Fox High-Performance Big Data Computing: International, National, and Local initiatives COLLABORATORS China and IU: Fudan University, SICE, OVPR.
Research in Digital Science Center
Convergence of Big Data and Extreme Computing
I590 Data Science Curriculum August
Presentation transcript:

Digital Science Center I Geoffrey C. Fox, David Crandall, Judy Qiu, Gregor von Laszewski, Fugang Wang, Badi' Abdul-Wahid Saliya Ekanayake, Supun Kamburugamuva, Jerome Mitchell, Bingjing Zhang, Pulasthi Wickramasinghe, Hyungro Lee, Andrew Younge School of Informatics and Computing, Indiana University Digital Science Center Research Areas DA-MDS speedup for 200K with different optimization techniques Java K-Means 1 mil points and 1k centers performance on 16 nodes for LRT-FJ and LRT-BSP with varying affinity patterns over varying threads and processes. Digital Science Center Facilities RaPyDLI Deep Learning Environment SPIDAL Scalable Data Analytics Library and applications including Bioinformatics and Polar Remote Sensing Data Analysis MIDAS Big Data Software; Harp for HPC-ABDS Big Data Ogres Classification and Big Data Analytics Performance including communication, VM and Java overhead CloudIOT Internet of Things Environment Cloudmesh Cloud and Bare metal Automation and NIST use cases XSEDE TAS Monitoring citations and system metrics Visualization WebPlotviz Best MPI; inter and intra node Ice Layer Detection Algorithm Polar Remote Sensing Algorithms High Performance Molecular Dynamics in Cloud Infrastructure Parallel Sparse LDA using Harp The polar science community has built radars capable of surveying the polar ice sheets, and as a result, have collected terabytes of data and is increasing its repository each year as signal processing techniques improve and the cost of hard drives decrease enabling a new-generation of high resolution ice thickness and accumulation maps. Manually extracting layers from an enormous corpus of ice thickness and accumulation data is time-consuming and requires sparse hand-selection, so developing image processing techniques to automatically aid in the discovery of knowledge is of high importance. Original LDA (orange) compared to LDA exploiting sparseness (blue) Note data analytics making use of Infiniband (i.e. limited by communication!) Java code running under Harp – Hadoop plus HPC plugin Corpus: 3,775,554 Wikipedia documents, Vocabulary: 1 million words; Topics: 10k topics; BR II is Big Red II supercomputer with Cray Gemini interconnect Juliet is Haswell Cluster with Intel (switch) and Mellanox (node) infiniband (not optimized) Large potential for running MD simulations in virtualized infrastructure using KVM hypervisor Advanced HW: InfiniBand and GPUs in VMs LAMMPS MD – 1.9% overhead = near native KVM can outperform native with Huge Pages Building scalable High Performance Virtual Clusters Harp LDA on Juliet (36 core nodes)