Big Data Open Source Software and Projects ABDS in Summary VI: Layer 6 Part 2 Data Science Curriculum March 5 2015 Geoffrey Fox

Slides:



Advertisements
Similar presentations
Cloud computing is used to describe a variety of computing concepts that involve a large number of computers connected through a real-time communication.
Advertisements

FutureGrid related presentations at TG and OGF Sun. 17th: Introduction to FutireGrid (OGF) Mon. 18th: Introducing to FutureGrid (TG) Tue. 19th –Educational.
Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary I I590 Data Science Curriculum August Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary III: Level 6 I590 Data Science Curriculum August Geoffrey Fox
1 NETE4631 Cloud deployment models and migration Lecture Notes #4.
Big Data Open Source Software and Projects Data Access Patterns and Introduction to using HPC-ABDS I590 Data Science Curriculum August Geoffrey.
Big Data Open Source Software and Projects ABDS in Summary XIX: Layer 14B Data Science Curriculum March Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary XVI: Layer 13 Part 1 Data Science Curriculum March Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary II: Layers 3 to 4 Data Science Curriculum March Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary XXII: Layer 15B Part 2 Data Science Curriculum March Geoffrey Fox
Cloudmesh: Software Defined Distributed Systems as a Service SDDSaaS January BigDat 2015: International Winter School on Big Data Tarragona, Spain,
Big Data Open Source Software and Projects ABDS in Summary XIII: Level 14A I590 Data Science Curriculum August Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary XXI: Layer 15B Part 1 Data Science Curriculum March Geoffrey Fox
Cloudmesh a Gentle Overview Gregor von Laszewski Sep. 2014
Cloudmesh: Software Defined Distributed Systems as a Service SDDSaaS Workshop on the Development of a Next-Generation, Interoperable, Federated Network.
Big Data Open Source Software and Projects Unit 0 Part B: Class Introduction Data Science Curriculum March Geoffrey Fox
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Design Discussion Rain: Dynamically Provisioning Clouds within FutureGrid Geoffrey Fox, Andrew J. Younge, Gregor von Laszewski, Archit Kulshrestha, Fugang.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Lecture 8 – Platform as a Service. Introduction We have discussed the SPI model of Cloud Computing – IaaS – PaaS – SaaS.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
Accessing and Managing Multiple Clouds (Infrastructures) with Cloudmesh June BigSystem Software-Defined Ecosystems at HPDC Vancouver Canada.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
BIG DATA APPLICATIONS & ANALYTICS LOOKING AT INDIVIDUAL HPCABDS SOFTWARE LAYERS 1/26/2015 Cloud Computing Software 1 Geoffrey Fox January BigDat.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
Data Science at Digital Science October Geoffrey Fox Judy Qiu
Big Data Open Source Software and Projects ABDS in Summary I: Layers 1 to 2 Data Science Curriculum March Geoffrey Fox
FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.
Big Data Open Source Software and Projects ABDS in Summary XVIII: Layer 14A Data Science Curriculum March Geoffrey Fox
Magellan: Experiences from a Science Cloud Lavanya Ramakrishnan.
Big Data Open Source Software and Projects ABDS in Summary IV: Level 7 I590 Data Science Curriculum August Geoffrey Fox
Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Deploying BiobankCloud with Karamel/Chef and Federated Authentication in BiobankCloud Jim Dowling, KTH – Royal Institute of Technology.
noun ; Software Defined Enterprise/SDE/ The enterprise who leverages software to flank their traditional business offerings, or to create entirely new.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
Panel Discussion Software Defined Ecosystems June BigSystem Software-Defined Ecosystems at HPDC Vancouver Canada Geoffrey Fox.
Big Data Open Source Software and Projects ABDS in Summary III: Levels 6 and 7 I590 Data Science Curriculum August Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
Information Initiative Center, Hokkaido University North 11, West 5, Sapporo , Japan Tel, Fax: General.
Introductory Tutorial: OpenStack, Chef, Hadoop, Hbase, Pig I590 Data Science Curriculum Big Data Open Source Software and Projects September Geoffrey.
Project Cumulus Overview March 15, End Goal Unified Public & Private PaaS for GlassFish/Java EE Simplify deployment of Java EE Apps on top of.
1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.
Hyungro Lee, Geoffrey C. Fox
Chapter 6: Securing the Cloud
Organizations Are Embracing New Opportunities
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.
Platform as a Service.
Kubernetes Container Orchestration
Some Remarks for Cloud Forward Internet2 Workshop
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
I590 Data Science Curriculum August
Data Science Curriculum March
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Orchestration & Container Management in EGI FedCloud
Big Data Open Source Software and Projects ABDS in Summary I
Cloud Computing: Concepts
OpenStack Summit Berlin – November 14, 2018
Big Data, Simulations and HPC Convergence
I590 Data Science Curriculum August
Presentation transcript:

Big Data Open Source Software and Projects ABDS in Summary VI: Layer 6 Part 2 Data Science Curriculum March Geoffrey Fox School of Informatics and Computing Digital Science Center Indiana University Bloomington Helped by Gregor von Laszewski

Functionality of 21 HPC-ABDS Layers 1)Message Protocols: 2)Distributed Coordination: 3)Security & Privacy: 4)Monitoring: 5)IaaS Management from HPC to hypervisors: 6)DevOps: Part 2 7)Interoperability: 8)File systems: 9)Cluster Resource Management: 10)Data Transport: 11)A) File management B) NoSQL C) SQL 12)In-memory databases&caches / Object-relational mapping / Extraction Tools 13)Inter process communication Collectives, point-to-point, publish-subscribe, MPI: 14)A) Basic Programming model and runtime, SPMD, MapReduce: B) Streaming: 15)A) High level Programming: B) Application Hosting Frameworks 16)Application and Analytics: 17)Workflow-Orchestration: Here are 21 functionalities. (including 11, 14, 15 subparts) 4 Cross cutting at top 17 in order of layered diagram starting at bottom

CloudMesh Cloudmesh Open source is a SDDSaaS toolkit to supporthttp://cloudmesh.github.io/ – A software-defined distributed system encompassing virtualized and bare- metal infrastructure, networks, application, systems and platform software with a unifying goal of providing Computing as a Service. – The creation of a tightly integrated mesh of services targeting multiple IaaS frameworks – The ability to federate a number of resources from academia and industry. This includes existing FutureSystems infrastructure, Amazon Web Services, Azure, HP Cloud, Karlsruhe using several IaaS frameworks – The creation of an environment in which it becomes easier to experiment with platforms and software services while assisting with their deployment and execution. – The exposure of information to guide the efficient utilization of resources. (Monitoring) – Support reproducible computing environments – IPython-based workflow as an interoperable onramp Cloudmesh exposes both hypervisor-based and bare-metal provisioning to users and administrators Access through command line, API, and Web interfaces.

Building Blocks of Cloudmesh Uses internally Libcloud and Cobbler Celery Task/Query manager (AMQP - RabbitMQ) MongoDB Accesses via abstractions external systems/standards OpenPBS, Chef OpenStack (including tools like Heat), AWS EC2, Eucalyptus, Azure Xsede user management (Amie) via Futuregrid Implementing Docker, Slurm, OCCI, Ansible, Puppet Evaluating Razor, Juju, Xcat (Original Rain used this), Foreman

Cloudmesh and SDDSaaS Stack for HPC-ABDS SaaS PaaS IaaS NaaS BMaaS Orchestration Mahout, MLlib, R Hadoop, Giraph, Storm OpenStack, Bare metal OpenFlow Just examples from 150 components Cobbler Abstract Interfaces removes tool dependency IPython, Pegasus, Kepler, FlumeJava, Tez, Cascading HPC-ABDS at 4 levels

Cloudmesh Functionality

Rocks Rocks Cluster Distribution is developed at SDSC to automate deployment of real and virtual clusters. Rocks was initially based on the Red Hat Linux distribution, however modern versions of Rocks were based on CentOS, with a modified Anaconda installer that simplifies mass installation onto many computers. Rocks includes many tools (such as MPI) which are not part of CentOS but are integral components that make a group of computers into a cluster.Red Hat LinuxCentOSAnaconda installerMPI Installations can be customized with additional software packages at install-time by using special user-supplied packages or Rolls. The "Rolls" extend the system by integrating seamlessly and automatically into the management and packaging mechanisms used by base software, greatly simplifying installation and configuration of large numbers of computers. Over a dozen Rolls have been created, including the SGE roll, the Condor roll, the Lustre roll, the Java roll, and the Ganglia roll.

Cisco Intelligent Automation for Cloud I cloud cloud automation-cloud/index.html automation-cloud/index.html Supports deployment on OpenStack, Amazon, vCloud, Bare-metal Integrates Network as a Service

Cisco Intelligent Automation for Cloud II: Production Deployment

Facebook Tupperware avindnarayanan- facebook phpapp avindnarayanan- facebook phpapp Facebook uses containers not hypervisors to improve performance Tupperware predates Docker

AWS OpsWorks I You define the stack's components by adding one or more layers. A layer is basically a blueprint that specifies how to configure a set of Amazon EC2 instances for a particular purpose, such as serving applications or hosting a database server. You assign each instance to at least one layer, which determines what packages are to be installed on the instance, how they are configured, whether the instance has an Elastic IP address or Amazon EBS volume, and so on. AWS OpsWorks includes a set of built-in layers that support the following scenarios: – Application server: Java App Server, Node.js App Server, PHP App Server, Rails App Server, Static Web Server – Database server: Amazon RDS and MySQL – Load balancer: Elastic Load Balancing, HAProxy – Monitoring server: Ganglia – In-memory key-value store: Memcached If the built-in layers don't quite meet your requirements, you can customize or extend them by modifying packages' default configurations, adding custom Chef recipes to perform tasks such as installing additional packages, and more. You can also customize layers to work with AWS services that are not natively supported, such as using Amazon RDS as a database server. If that's still not enough, you can create a fully custom layer, which gives you complete control over which packages are installed, how they are configured, how applications are deployed, and more.

AWS OpsWorks II

Google Kubernetes I DevOps Cluster management for Docker Kubernetes builds Google Container Engine, which is a hosted container management platform, that runs and manages Docker containers on Google Compute Engine virtual machines. – Container-optimized Google Compute Engine images pre-install Debian, Docker, Kubernetes Kubernetes is an open source container cluster manager. It schedules any number of container replicas across a group of node instances. A master instance exposes the Kubernetes API, through which tasks are defined. Kubernetes spawns containers on nodes to handle the defined tasks. The number and type of containers can be dynamically modified according to need. An agent (a kubelet) on each node instance monitors containers and restarts them if necessary. Kubernetes is optimized for Google Cloud Platform, but can run on any physical or virtual machine.

Google Kubernetes II ngoasguen/kubernetes-on- cloudstack-with-coreos

Buildstep, Gitreceive Used by Dokku (layer 15B) to support application hosting on Docker by understanding Heroku buildpacks and interfacing to Github Buildstep uses Heroku's open source buildpacks and is responsible for building the base images that applications are built on. You can think of it as producing the "stack" for Dokku, to borrow a concept from Heroku. Gitreceive is a project that provides you with a git user that you can push repositories to and so build systems with software in Github.