Applications SPIDAL MIDAS ABDS

Slides:

Advertisements

Similar presentations

Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox

Advertisements

BigData Tools Seyyed mohammad Razavi. Outline  Introduction  Hbase  Cassandra  Spark  Acumulo  Blur  MongoDB  Hive  Giraph  Pig.

Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.

Hadoop Ecosystem Overview

SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.

HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC

NSF Dibbs Award 5 yr. Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science IU(Fox, Qiu, Crandall, von Laszewski),

Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,

© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.

Our Experience Running YARN at Scale Bobby Evans.

BIG DATA APPLICATIONS & ANALYTICS LOOKING AT INDIVIDUAL HPCABDS SOFTWARE LAYERS 1/26/2015 Cloud Computing Software 1 Geoffrey Fox January BigDat.

Big Data Ogres and their Facets Geoffrey Fox, Judy Qiu, Shantenu Jha, Saliya Ekanayake Big Data Ogres are an attempt to characterize applications and algorithms.

Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu.

Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.

Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.

Panel Discussion Software Defined Ecosystems June BigSystem Software-Defined Ecosystems at HPDC Vancouver Canada Geoffrey Fox.

What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.

3 Hadoop? Cloud data warehousing? Machine learning? NoSQL?

Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.

1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.

Big Data & Test Automation

Geoffrey Fox Panel Talk: February

OMOP CDM on Hadoop Reference Architecture

SPIDAL Analytics Performance February 2017

Department of Intelligent Systems Engineering

Apache hadoop & Mapreduce

Distributed Programming in “Big Data” Systems Pramod Bhatotia wp

An Open Source Project Commonly Used for Processing Big Data Sets

MIDAS- Molecular Dynamics Analysis Tutorial February 2017

Status and Challenges: January 2017

Pathology Spatial Analysis February 2017

Spark Presentation.

Big Data, Simulations and HPC Convergence

NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.

Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.

DATA SCIENCE Online Training at GoLogica

Data Platform and Analytics Foundational Training

Big Data Processing Issues taking care of Application Requirements, Hardware, HPC, Grid (distributed), Edge and Cloud Computing Geoffrey Fox, November.

NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.

NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.

Department of Intelligent Systems Engineering

I590 Data Science Curriculum August

MIT 802 Introduction to Data Platforms and Sources Lecture 2

High Performance Big Data Computing in the Digital Science Center

NSF Dibbs Award 5 yr. Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science IU(Fox, Qiu, Crandall, von Laszewski),

Data Science Curriculum March

Tutorial Overview February 2017

AI First High Performance Big Data Computing for Industry 4.0

CS110: Discussion about Spark

Scalable Parallel Interoperable Data Analytics Library

Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data

Hadoop for SQL Server Pros

Introduction to Apache

Clouds from FutureGrid’s Perspective

Execution Framework: Hadoop 2.x

TIM TAYLOR AND JOSH NEEDHAM

Twister2: Design and initial implementation of a Big Data Toolkit

Twister2: Design of a Big Data Toolkit

Department of Intelligent Systems Engineering

2 Programming Environment for Global AI and Modeling Supercomputer GAIMSC 2/19/2019.

$1M a year for 5 years; 7 institutions Active:

Charles Tappert Seidenberg School of CSIS, Pace University

PHI Research in Digital Science Center

Big-Data Analytics with Azure HDInsight

Big Data, Simulations and HPC Convergence

Motivation Contemporary big data tools such as MapReduce and graph processing tools have fixed data abstraction and support a limited set of communication.

Convergence of Big Data and Extreme Computing

I590 Data Science Curriculum August

Presentation transcript:

Applications SPIDAL MIDAS ABDS Govt. Operations Commercial Defense Healthcare, Life Science Deep Learning, Social Media Research Ecosystems Astronomy, Physics Earth, Env., Polar Science Energy (Inter)disciplinary Workflow Analytics Libraries Native ABDS SQL-engines, Storm, Impala, Hive, Shark Native HPC MPI HPC-ABDS MapReduce Map Only, PP Many Task Classic MapReduce Map Collective Map – Point to Point, Graph MIddleware for Data-Intensive Analytics and Science (MIDAS) API Communication (MPI, RDMA, Hadoop Shuffle/Reduce, HARP Collectives, Giraph point-to-point) Data Systems and Abstractions (In-Memory; HBase, Object Stores, other NoSQL stores, Spatial, SQL, Files) Higher-Level Workload Management (Tez, Llama) Workload Management (Pilots, Condor) Framework specific Scheduling (e.g. YARN) External Data Access (Virtual Filesystem, GridFTP, SRM, SSH) Cluster Resource Manager (YARN, Mesos, SLURM, Torque, SGE) Compute, Storage and Data Resources (Nodes, Cores, Lustre, HDFS) Community & Examples SPIDAL Programming & Runtime Models MIDAS Resource Fabric