Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November 19 2014 Geoffrey Fox, Gregor von.

Slides:



Advertisements
Similar presentations
FutureGrid related presentations at TG and OGF Sun. 17th: Introduction to FutireGrid (OGF) Mon. 18th: Introducing to FutureGrid (TG) Tue. 19th –Educational.
Advertisements

FutureGrid UAB Meeting XSEDE13 San Diego July
Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary III: Level 6 I590 Data Science Curriculum August Geoffrey Fox
HPC Pack On-Premises On-premises clusters Ability to scale to reduce runtimes Job scheduling and mgmt via head node Reliability HPC Pack Hybrid.
Big Data Open Source Software and Projects Data Access Patterns and Introduction to using HPC-ABDS I590 Data Science Curriculum August Geoffrey.
Cloudmesh Resource Shifting 1 2. Cloudmesh: from IaaS(NaaS) to Workflow (Orchestration) Workflow Virtual Cluster Components Infrastructure iPython (Pegasus)
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
Cloudmesh: Software Defined Distributed Systems as a Service SDDSaaS January BigDat 2015: International Winter School on Big Data Tarragona, Spain,
Big Data Open Source Software and Projects ABDS in Summary VI: Layer 6 Part 2 Data Science Curriculum March Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary XXI: Layer 15B Part 1 Data Science Curriculum March Geoffrey Fox
Cloudmesh a Gentle Overview Gregor von Laszewski Sep. 2014
Jefferson Ridgeway 2, Ifeanyi Rowland Onyenweaku 3, Gregor von Laszewski 1*, Fugang Wang 1 1* Indiana University, Bloomington, IN 47408, U.S.A.,
Cloudmesh: Software Defined Distributed Systems as a Service SDDSaaS Workshop on the Development of a Next-Generation, Interoperable, Federated Network.
Big Data Open Source Software and Projects Unit 0 Part B: Class Introduction Data Science Curriculum March Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Dibbs Research at Digital Science
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Design Discussion Rain: Dynamically Provisioning Clouds within FutureGrid Geoffrey Fox, Andrew J. Younge, Gregor von Laszewski, Archit Kulshrestha, Fugang.
Accessing and Managing Multiple Clouds (Infrastructures) with Cloudmesh June BigSystem Software-Defined Ecosystems at HPDC Vancouver Canada.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
Scalable Algorithms in the Cloud III Microsoft Summer School Doing Research in the Cloud Moscow State University August Geoffrey Fox
Accessing and Managing Multiple Clouds (Infrastructures) with Cloudmesh June BigSystem Software-Defined Ecosystems at HPDC Vancouver Canada.
Software Architecture
Science Clouds and FutureGrid’s Perspective June Science Clouds Workshop HPDC 2012 Delft Geoffrey Fox
BIG DATA APPLICATIONS & ANALYTICS LOOKING AT INDIVIDUAL HPCABDS SOFTWARE LAYERS 1/26/2015 Cloud Computing Software 1 Geoffrey Fox January BigDat.
Selected lessons learned from FutureGrid resulting in a toolkit for ComputingTestbedaaS: Cloudmesh HPDS 2014, Halifax, CA Gregor von Laszewski Geoffrey.
Data Science at Digital Science October Geoffrey Fox Judy Qiu
FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.
Image Generation and Management on FutureGrid CTS Conference 2011 Philadelphia May Geoffrey Fox
Image Management and Rain on FutureGrid Javier Diaz - Fugang Wang – Gregor von.
RAIN: A system to Dynamically Generate & Provision Images on Bare Metal by Application Users Presented by Gregor von Laszewski Authors: Javier Diaz, Gregor.
FutureGrid Computing Testbed as a Service Overview July Geoffrey Fox for FutureGrid Team
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
FutureGrid Computing Testbed as a Service NSF Presentation NSF April Geoffrey Fox for FutureGrid Team
FutureGrid Computing Testbed as a Service for Condo_of_Condos Internet 2 panel April Jose Fortes for FutureGrid Team.
Big Data Open Source Software and Projects ABDS in Summary IV: Level 7 I590 Data Science Curriculum August Geoffrey Fox
Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)
Computing Research Testbeds as a Service: Supporting large scale Experiments and Testing SC12 Birds of a Feather November.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
Panel Discussion Software Defined Ecosystems June BigSystem Software-Defined Ecosystems at HPDC Vancouver Canada Geoffrey Fox.
Big Data Open Source Software and Projects ABDS in Summary III: Levels 6 and 7 I590 Data Science Curriculum August Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
Introductory Tutorial: OpenStack, Chef, Hadoop, Hbase, Pig I590 Data Science Curriculum Big Data Open Source Software and Projects September Geoffrey.
1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.
Hyungro Lee, Geoffrey C. Fox
Private Public FG Network NID: Network Impairment Device
Digital Science Center II
Status and Challenges: January 2017
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.
NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.
Some Remarks for Cloud Forward Internet2 Workshop
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
Department of Intelligent Systems Engineering
FutureGrid Computing Testbed as a Service
I590 Data Science Curriculum August
Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.
Data Science Curriculum March
Tutorial Overview February 2017
13th Cloud Control Workshop, June 13-15, 2018
Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Introduction to Apache
Clouds from FutureGrid’s Perspective
Department of Intelligent Systems Engineering
Big-Data Analytics with Azure HDInsight
Big Data, Simulations and HPC Convergence
I590 Data Science Curriculum August
Presentation transcript:

Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von Laszewski School of Informatics and Computing Digital Science Center Indiana University Bloomington

There are a lot of Big Data and HPC Software systems Challenge! Manage environment offering these different components

Maybe a Big Data Initiative would include We don’t need 266 software packages so can choose e.g. Workflow: IPython, Pegasus or Kepler (replaced by tools like Tez?) Data Analytics: Mahout, R, ImageJ, Scalapack High level Programming: Hive, Pig Parallel Programming model: Hadoop, Spark, Giraph (Twister4Azure, Harp), MPI; Streaming: Storm, Kapfka or RabbitMQ (Sensors) In-memory: Memcached Data Management: Hbase, MongoDB, MySQL or Derby Distributed Coordination: Zookeeper Cluster Management: Yarn, Slurm File Systems: HDFS, Lustre DevOps: Cloudmesh, Chef, Puppet, Docker, Cobbler IaaS: Amazon, Azure, OpenStack, Libcloud Monitoring: Inca, Ganglia, Nagios

CloudMesh SDDSaaS Architecture Cloudmesh is a open source toolkit: – A software-defined distributed system encompassing virtualized and bare-metal infrastructure, networks, application, systems and platform software with a unifying goal of providing Computing as a Service. – The creation of a tightly integrated mesh of services targeting multiple IaaS frameworks – The ability to federate a number of resources from academia and industry. This includes existing FutureSystems infrastructure, Amazon Web Services, Azure, HP Cloud, Karlsruhe using several IaaS frameworks – The creation of an environment in which it becomes easier to experiment with platforms and software services while assisting with their deployment and execution. – The exposure of information to guide the efficient utilization of resources. (Monitoring) – Support reproducible computing environments – IPython-based workflow as an interoperable onramp Cloudmesh exposes both hypervisor-based and bare-metal provisioning to users and administrators Access through command line, API, and Web interfaces.

Cloudmesh and SDDSaaS Stack for HPC-ABDS SaaS PaaS IaaS NaaS BMaaS Orchestration Mahout, MLlib, R Hadoop, Giraph, Storm Docker, OpenStack, Bare metal OpenFlow Just examples from 266 components Cobbler Abstract Interfaces removes tool dependency IPython, Pegasus, Kepler, FlumeJava, Tez, Cascading One Chef recipe per IU CS Masters Student …. Data Distributed and Streaming … HPC-ABDS at 4 levels

Cloudmesh: from IaaS(NaaS) to Workflow (Orchestration) (SaaS Orchestration) Workflow (IaaS Orchestration) Virtual Cluster Components Infrastructure IPython Pegasus etc. Heat Python Chef or Puppet (Recipes/Puppies) VMs, Docker, Networks, Baremetal Images Data HPC-ABDS Software components defined in Chef. Python (Cloudmesh) controls deployment (virtual cluster) and execution (workflow)

Cloudmesh Functionality

Cloudmesh Components I Cobbler: Python based provisioning of bare-metal or hypervisor-based systems Apache Libcloud: Python library for interacting with many of the popular cloud service providers using a unified API. (One Interface To Rule Them All) Celery is an asynchronous task queue/job queue environment based on RabbitMQ or equivalent and written in Python OpenStack Heat is a Python orchestration engine for common cloud environments managing the entire lifecycle of infrastructure and applications. Docker (written in Go) is a tool to package an application and its dependencies in a virtual Linux container OCCI is an Open Grid Forum cloud instance standard Slurm is an open source C based job scheduler from HPC community with similar functionalities to OpenPBS

Cloudmesh Components II Chef Ansible Puppet Salt are system configuration managers. Scripts are used to define system Razor cloud bare metal provisioning from EMC/puppet Juju from Ubuntu orchestrates services and their provisioning defined by charms across multiple clouds Xcat (Originally we used this) is a rather specialized (IBM) dynamic provisioning system Foreman written in Ruby/Javascript is an open source project that helps system administrators manage servers throughout their lifecycle, from provisioning and configuration to orchestration and monitoring. Builds on Puppet or Chef

… Working with VMs in Cloudmesh VMs Panel with VM Table (HP) Search

Cloudmesh MOOC Videos