Download presentation
Presentation is loading. Please wait.
Published byΟυρανία Καλογιάννης Modified over 6 years ago
1
Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce) Raw Data Information Wisdom Knowledge Data Decisions Analytics Pub-Sub System Orchestration / Dataflow / Workflow
2
Analytics System Orchestration / Dataflow / Workflow
Data Ingest Storm Archival Storage – Accumulo Streaming Processing (Bolts) Batch Processing (MapReduce) Raw Data Information Wisdom Knowledge Data Decisions Analytics Pub-Sub System Orchestration / Dataflow / Workflow
3
Big Data HPC
4
HPC-ABDS IntegratedSoftware
Big Data ABDS HPC, Cluster Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus Libraries Mllib/Mahout, R, Python Matlab, Eclipse, Apps High Level Programming Pig, Hive, Drill Domain-specific Languages Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack Languages Java, Erlang, SQL, SparQL Fortran, C/C++ Streaming Storm, Kafka, Kinesis Parallel Runtime MapReduce MPI/OpenMP/OpenCL Coordination Zookeeper Caching Memcached Data Management Hbase, Neo4J, MySQL iRODS Data Transfer Sqoop GridFTP Scheduling Yarn Slurm File Systems HDFS, Object Stores Lustre Formats Thrift, Protobuf FITS, HDF Virtualization OpenStack Docker, SR-IOV Infrastructure CLOUDS SUPERCOMPUTERS
5
HPC-ABDS IntegratedSoftware
Big Data ABDS HPCCloud HPC, Cluster 17. Orchestration Beam, Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna 16. Libraries MLlib/Mahout, TensorFlow, CNTK, R, Python ScaLAPACK, PETSc, Matlab 15A. High Level Programming Pig, Hive, Drill Domain-specific Languages 15B. Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python 14B. Streaming Storm, Kafka, Kinesis 13,14A. Parallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL 2. Coordination Zookeeper 12. Caching Memcached 11. Data Management Hbase, Accumulo, Neo4J, MySQL iRODS 10. Data Transfer Sqoop GridFTP 9. Scheduling Yarn, Mesos Slurm 8. File Systems HDFS, Object Stores Lustre 1, 11A Formats Thrift, Protobuf FITS, HDF 5. IaaS OpenStack , Docker Linux, Bare-metal, SR-IOV Infrastructure Intelligent CLOUDS HPC Clusters, Classic SUPERCOMPUTERS CUDA, Exascale Runtime
6
HPC-ABDS IntegratedSoftware
HPC-ABDS Stack HPC,Cluster 17. Orchestration Beam, Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna 16. Libraries SPIDAL, MLlib/Mahout, TensorFlow, R, Python ScaLAPACK, PETSc, Matlab 15A. High Level Programming Pig, Hive, Drill Domain-specific Languages 15B. Platform as a Service Twister2, App Engine, Elastic Beanstalk HPC Software Stack Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python 14B. Streaming Heron, Kafka, Kinesis 13,14A. Parallel Runtime Hadoop, Spark, Harp MPI/OpenMP/OpenCL 2. Coordination Zookeeper 12. Caching Memcached 11. Data Management Hbase, Accumulo, Neo4J, MySQL iRODS 10. Data Transfer Sqoop, Data Transfer DTP GridFTP 9. Scheduling Yarn, Mesos, Kubernetes Slurm 8. File Systems HDFS, Object Stores Lustre 1, 11A Formats Thrift, Protobuf FITS, HDF 5. IaaS OpenStack , Docker, KVM Linux, Bare-metal, SR-IOV Infrastructure Intelligent CLOUDS HPC Clusters, Global AI Supercomputer Classic Supercomputers CUDA, Exascale Runtime
7
Initial Convergence Software
Big Data ABDS HPC, Cluster Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna Libraries MLlib/Mahout, R, Python ScaLAPACK, PETSc, Matlab High Level Programming Pig, Hive, Drill Domain-specific Languages Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python Streaming Storm, Kafka, Kinesis Parallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL Coordination Zookeeper Caching Memcached Data Management Hbase, Accumulo, Neo4J, MySQL iRODS Data Transfer Sqoop GridFTP Scheduling Mesos, Aurora, Yarn Slurm File Systems HDFS, Object Stores Lustre Formats Thrift, Protobuf FITS, HDF IaaS OpenStack , Docker Linux, Bare-metal, SR-IOV Infrastructure CLOUDS SUPERCOMPUTERS CUDA, Exascale Runtime
11
4 Forms of MapReduce Correspond to first 4 of Identified Architectures
(1) Map Only (4) Point to Point or Map-Communication (3) Iterative Map Reduce or Map-Collective (2) Classic MapReduce Input map reduce Iterations Output Local Graph BLAST Analysis Local Machine Learning Pleasingly Parallel High Energy Physics (HEP) Histograms Distributed search Recommender Engines Expectation maximization Clustering e.g. K-means Linear Algebra, PageRank Classic MPI PDE Solvers and Particle Dynamics Graph Problems MapReduce and Iterative Extensions (Spark, Twister) MPI, Giraph Integrated Systems such as Hadoop + Harp with Compute and Communication model separated Correspond to first 4 of Identified Architectures
12
(6) Shared memory Map Communicates
(5) Map Streaming maps brokers Events (6) Shared memory Map Communicates Map & Communicate Shared Memory
13
6 Data Analysis Architectures
Difficult to parallelize asynchronous parallel Graph Algorithms Classic Hadoop in classes 1) 2) BLAST Analysis Local Machine Learning Pleasingly Parallel High Energy Physics (HEP) Histograms Web search Recommender Engines Expectation maximization Clustering Linear Algebra, PageRank Classic MPI PDE Solvers and Particle Dynamics Graph Streaming images from Synchrotron sources, Telescopes, IoT MapReduce and Iterative Extensions (Spark, Twister) MPI, Giraph Apache Storm Harp – Enhanced Hadoop Maps are Bolts
16
Kmeans Clustering Time Secs Efficiency # Cores
17
Software-Defined Distributed System (SDDS) as a Service includes
SDDS-aaS Tools Provisioning Image Management IaaS Interoperability NaaS, IaaS tools Expt management Dynamic IaaS NaaS DevOps Dynamic Orchestration and Dataflow Software (Application Or Usage) SaaS Use HPC-ABDS Class Usages e.g. run GPU & multicore Applications Control Robot CloudMesh is a SDDSaaS tool that uses Dynamic Provisioning and Image Management to provide custom environments for general target systems Involves (1) creating, (2) deploying, and (3) provisioning of one or more images in a set of machines on demand Platform PaaS Cloud e.g. MapReduce HPC e.g. PETSc, SAGA Computer Science e.g. Compiler tools, Sensor nets, Monitors Infra structure IaaS Software Defined Computing (virtual Clusters) Hypervisor, Bare Metal Operating System Network NaaS Software Defined Networks OpenFlow GENI
18
Figure 3: Dual Convergence System
C D C D Data Management Model for Big Data and Simulation Figure 3: Dual Convergence System
19
C Data
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.