Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data

Similar presentations


Presentation on theme: "Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data"— Presentation transcript:

1 Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce) Raw Data Information Wisdom Knowledge Data Decisions Analytics Pub-Sub System Orchestration / Dataflow / Workflow

2 Analytics System Orchestration / Dataflow / Workflow
Data Ingest Storm Archival Storage – Accumulo Streaming Processing (Bolts) Batch Processing (MapReduce) Raw Data Information Wisdom Knowledge Data Decisions Analytics Pub-Sub System Orchestration / Dataflow / Workflow

3 Big Data HPC

4 HPC-ABDS IntegratedSoftware
Big Data ABDS HPC, Cluster Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus Libraries Mllib/Mahout, R, Python Matlab, Eclipse, Apps High Level Programming Pig, Hive, Drill Domain-specific Languages Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack Languages Java, Erlang, SQL, SparQL Fortran, C/C++ Streaming Storm, Kafka, Kinesis Parallel Runtime MapReduce MPI/OpenMP/OpenCL Coordination Zookeeper Caching Memcached Data Management Hbase, Neo4J, MySQL iRODS Data Transfer Sqoop GridFTP Scheduling Yarn Slurm File Systems HDFS, Object Stores Lustre Formats Thrift, Protobuf FITS, HDF Virtualization OpenStack Docker, SR-IOV Infrastructure CLOUDS SUPERCOMPUTERS

5 HPC-ABDS IntegratedSoftware
Big Data ABDS HPCCloud HPC, Cluster 17. Orchestration Beam, Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna 16. Libraries MLlib/Mahout, TensorFlow, CNTK, R, Python ScaLAPACK, PETSc, Matlab 15A. High Level Programming Pig, Hive, Drill Domain-specific Languages 15B. Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python 14B. Streaming Storm, Kafka, Kinesis 13,14A. Parallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL 2. Coordination Zookeeper 12. Caching Memcached 11. Data Management Hbase, Accumulo, Neo4J, MySQL iRODS 10. Data Transfer Sqoop GridFTP 9. Scheduling Yarn, Mesos Slurm 8. File Systems HDFS, Object Stores Lustre 1, 11A Formats Thrift, Protobuf FITS, HDF 5. IaaS OpenStack , Docker Linux, Bare-metal, SR-IOV Infrastructure Intelligent CLOUDS HPC Clusters, Classic SUPERCOMPUTERS CUDA, Exascale Runtime

6 HPC-ABDS IntegratedSoftware
HPC-ABDS Stack HPC,Cluster 17. Orchestration Beam, Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna 16. Libraries SPIDAL, MLlib/Mahout, TensorFlow, R, Python ScaLAPACK, PETSc, Matlab 15A. High Level Programming Pig, Hive, Drill Domain-specific Languages 15B. Platform as a Service Twister2, App Engine, Elastic Beanstalk HPC Software Stack Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python 14B. Streaming Heron, Kafka, Kinesis 13,14A. Parallel Runtime Hadoop, Spark, Harp MPI/OpenMP/OpenCL 2. Coordination Zookeeper 12. Caching Memcached 11. Data Management Hbase, Accumulo, Neo4J, MySQL iRODS 10. Data Transfer Sqoop, Data Transfer DTP GridFTP 9. Scheduling Yarn, Mesos, Kubernetes Slurm 8. File Systems HDFS, Object Stores Lustre 1, 11A Formats Thrift, Protobuf FITS, HDF 5. IaaS OpenStack , Docker, KVM Linux, Bare-metal, SR-IOV Infrastructure Intelligent CLOUDS HPC Clusters, Global AI Supercomputer Classic Supercomputers CUDA, Exascale Runtime

7 Initial Convergence Software
Big Data ABDS HPC, Cluster Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna Libraries MLlib/Mahout, R, Python ScaLAPACK, PETSc, Matlab High Level Programming Pig, Hive, Drill Domain-specific Languages Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python Streaming Storm, Kafka, Kinesis Parallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL Coordination Zookeeper Caching Memcached Data Management Hbase, Accumulo, Neo4J, MySQL iRODS Data Transfer Sqoop GridFTP Scheduling Mesos, Aurora, Yarn Slurm File Systems HDFS, Object Stores Lustre Formats Thrift, Protobuf FITS, HDF IaaS OpenStack , Docker Linux, Bare-metal, SR-IOV Infrastructure CLOUDS SUPERCOMPUTERS CUDA, Exascale Runtime

8

9

10

11 4 Forms of MapReduce Correspond to first 4 of Identified Architectures
(1) Map Only (4) Point to Point or Map-Communication (3) Iterative Map Reduce or Map-Collective (2) Classic MapReduce Input map reduce Iterations Output Local Graph BLAST Analysis Local Machine Learning Pleasingly Parallel High Energy Physics (HEP) Histograms Distributed search Recommender Engines Expectation maximization Clustering e.g. K-means Linear Algebra, PageRank Classic MPI PDE Solvers and Particle Dynamics Graph Problems MapReduce and Iterative Extensions (Spark, Twister) MPI, Giraph Integrated Systems such as Hadoop + Harp with Compute and Communication model separated Correspond to first 4 of Identified Architectures

12 (6) Shared memory Map Communicates
(5) Map Streaming maps brokers Events (6) Shared memory Map Communicates Map & Communicate  Shared Memory

13 6 Data Analysis Architectures
Difficult to parallelize asynchronous parallel Graph Algorithms Classic Hadoop in classes 1) 2) BLAST Analysis Local Machine Learning Pleasingly Parallel High Energy Physics (HEP) Histograms Web search Recommender Engines Expectation maximization Clustering Linear Algebra, PageRank Classic MPI PDE Solvers and Particle Dynamics Graph Streaming images from Synchrotron sources, Telescopes, IoT MapReduce and Iterative Extensions (Spark, Twister) MPI, Giraph Apache Storm Harp – Enhanced Hadoop Maps are Bolts

14

15

16 Kmeans Clustering Time Secs Efficiency # Cores

17 Software-Defined Distributed System (SDDS) as a Service includes
SDDS-aaS Tools Provisioning Image Management IaaS Interoperability NaaS, IaaS tools Expt management Dynamic IaaS NaaS DevOps Dynamic Orchestration and Dataflow Software (Application Or Usage) SaaS Use HPC-ABDS Class Usages e.g. run GPU & multicore Applications Control Robot CloudMesh is a SDDSaaS tool that uses Dynamic Provisioning and Image Management to provide custom environments for general target systems Involves (1) creating, (2) deploying, and (3) provisioning of one or more images in a set of machines on demand Platform PaaS Cloud e.g. MapReduce HPC e.g. PETSc, SAGA Computer Science e.g. Compiler tools, Sensor nets, Monitors Infra structure IaaS Software Defined Computing (virtual Clusters) Hypervisor, Bare Metal Operating System Network NaaS Software Defined Networks OpenFlow GENI

18 Figure 3: Dual Convergence System
C D C D Data Management Model for Big Data and Simulation Figure 3: Dual Convergence System

19 C Data


Download ppt "Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data"

Similar presentations


Ads by Google