Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August 15 2014 Geoffrey Fox

Slides:



Advertisements
Similar presentations
Cloud computing is used to describe a variety of computing concepts that involve a large number of computers connected through a real-time communication.
Advertisements

Ljubomir Ivaniš CPU d.o.o.
Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox
System Center 2012 R2 Overview
The Case for Enterprise Ready Virtual Private Clouds Timothy Wood, Alexandre Gerber *, K.K. Ramakrishnan *, Jacobus van der Merwe *, and Prashant Shenoy.
OpenStack Update Infrastructure as a Service May 23 nd 2012 Rob Hirschfeld, Dell.
An Approach to Secure Cloud Computing Architectures By Y. Serge Joseph FAU security Group February 24th, 2011.
Introduction to Virtualization
Big Data Open Source Software and Projects ABDS in Summary II: Layers 3 to 4 Data Science Curriculum March Geoffrey Fox
Server Virtualization Gina Myers. Definition Creating virtual machines (VMs) “VMs are software entities that emulate a real machine’s functionality” ◦
Big Data Open Source Software and Projects ABDS in Summary IV: Layer 5 Part 2 Data Science Curriculum March Geoffrey Fox
Virtualization and the Cloud
Big Data Open Source Software and Projects ABDS in Summary III: Layer 5-Part 1 Data Science Curriculum March Geoffrey Fox
M.A.Doman Model for enabling the delivery of computing as a SERVICE.
European Organization for Nuclear Research Virtualization Review and Discussion Omer Khalid 17 th June 2010.
Virtualization for Cloud Computing
Virtualization 101.
© 2010 VMware Inc. All rights reserved VMware ESX and ESXi Module 3.
Virtualization 101.
Plan Introduction What is Cloud Computing?
Tanenbaum 8.3 See references
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Opensource for Cloud Deployments – Risk – Reward – Reality
Operating System Virtualization
Software to Data model Lenos Vacanas, Stelios Sotiriadis, Euripides Petrakis Technical University of Crete (TUC), Greece Workshop.
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.

INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 7 2/23/2015.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
Introduction to Cloud Computing
Secure & flexible monitoring of virtual machine University of Mazandran Science & Tecnology By : Esmaill Khanlarpour January.
M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
Big Data Open Source Software and Projects ABDS in Summary I: Layers 1 to 2 Data Science Curriculum March Geoffrey Fox
608D CloudStack 3.0 Omer Palo Readiness Specialist, WW Tech Support Readiness May 8, 2012.
COMS E Cloud Computing and Data Center Networking Sambit Sahu
Big Data Open Source Software and Projects ABDS in Summary IV: Level 7 I590 Data Science Curriculum August Geoffrey Fox
| nectar.org.au NECTAR TRAINING Module 1 Overview of cloud computing and NeCTAR services.
Full and Para Virtualization
Cloud Computing is a Nebulous Subject Or how I learned to love VDF on Amazon.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Docker and Container Technology
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Web Technologies Lecture 13 Introduction to cloud computing.
noun ; Software Defined Enterprise/SDE/ The enterprise who leverages software to flank their traditional business offerings, or to create entirely new.
Cloud Architecture. SPI Model Cloud Computing Classification Model – SPI Cloud Computing Classification Model – SPI - SaaS: (Software as a Service) -
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
1 TCS Confidential. 2 Objective: In this session we will be able to learn  What is Openstack?  History  Capabilities  Openstack as IaaS  Advantages.
CEG 2400 FALL 2012 Chapter 10 Virtual Networks and Remote Access 1.
Panel Discussion Software Defined Ecosystems June BigSystem Software-Defined Ecosystems at HPDC Vancouver Canada Geoffrey Fox.
Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August Geoffrey Fox
Virtualization Assessment. Strategy for web hosting Reduce costs by consolidating services onto the fewest number of physical machines
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
CS 6027 Advanced Networking FINAL PROJECT ​. Cloud Computing KRANTHI ​ CHENNUPATI PRANEETHA VARIGONDA ​ SANGEETHA LAXMAN ​ VARUN ​ DENDUKURI.
Unit 3 Virtualization.
Virtualization for Cloud Computing
Guide to Operating Systems, 5th Edition
Chapter 6: Securing the Cloud
Virtualization overview
Outline Virtualization Cloud Computing Microsoft Azure Platform
Guide to Operating Systems, 5th Edition
I590 Data Science Curriculum August
Presentation transcript:

Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August Geoffrey Fox School of Informatics and Computing Digital Science Center Indiana University Bloomington

HPC-ABDS Layers 1)Message Protocols 2)Distributed Coordination: 3)Security & Privacy: 4)Monitoring: 5)IaaS Management from HPC to hypervisors: 6)DevOps: 7)Interoperability: 8)File systems: 9)Cluster Resource Management: 10)Data Transport: 11)SQL / NoSQL / File management: 12)In-memory databases&caches / Object-relational mapping / Extraction Tools 13)Inter process communication Collectives, point-to-point, publish-subscribe 14)Basic Programming model and runtime, SPMD, Streaming, MapReduce, MPI: 15)High level Programming: 16)Application and Analytics: 17)Workflow-Orchestration: Here are 17 functionalities. Technologies are presented in this order 4 Cross cutting at top 13 in order of layered diagram starting at bottom

Xen Xen supports a form of type 1 virtualization known as paravirtualization, in which guests run a modified operating system. The guests are modified to use a special hypercall ABI, instead of certain architectural features. Through paravirtualization, Xen can achieve high performance even on its host architecture (x86) which has a reputation for non-cooperation with traditional virtualization techniques Xen was developed at the University of Cambridge but is now owned by Citrix Responsibilities of the hypervisor include memory management and CPU scheduling of all virtual machines ("domains"), and for launching the most privileged domain ("dom0") - the only virtual machine which by default has direct access to hardware. From the dom0 the hypervisor can be managed and unprivileged domains ("domU") can be launched.

KVM, VirtualBox KVM is a GNU licensed type 2 virtualization infrastructure for the Linux kernel that turns it into a hypervisor, which was merged into the Linux kernel mainline in February 2007http://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine – It requires a processor with hardware virtualization extension. Oracle VirtualBox is another well known type 2 hypervisor with GPL2 licensehttps:// – Runs on many O/S

Hyper-V Microsoft proprietary Hypervisor that supports Windows and some variants of Linux There must be a parent partition running Windows Server

OpenVZ OpenVZ is a type 2 Hypervisor with GPL licensehttp://openvz.org/Main_Page OpenVZ (Open VirtualiZation) or Open Virtuozzo is an operating system- level virtualization technology based on the Linux kernel and operating system. OpenVZ allows a physical server to run multiple isolated operating system instances, known as containers, Virtual Private Servers (VPSs), or Virtual Environments (VEs). Docker works well with containers OpenVZ is not true virtualization but really containerization like FreeBSD jails. Technologies like VMware and Xen are more flexible in that they virtualize the entire machine and can run multiple operating systems and different kernel versions. OpenVZ uses a single patched Linux kernel and therefore can run only Linux, all containers share the same architecture and kernel version. However, as it does not have the overhead of a true hypervisor, it is very fast and efficient. The disadvantage with this approach is the single kernel. All guests must function with the same kernel version that the host uses. LXC (LinuX Containers) and Linux-Vserver are similar technologies

OpenStack OpenStack, OpenNebula, CloudStack, Nimbus, Eucalyptus are all cloud or Virtual managers. They help users and system administers use virtual machines with various characteristics – The big commercial public clouds have equivalent proprietary systems OpenStack is a free and open-source Apache Licensed software cloud computing software platform. Users primarily deploy it as an infrastructure as a service (IaaS) solution. The technology consists of a series of interrelated projects that control pools of processing, storage, and networking resources throughout a data center—which users manage through a web- based dashboard, command-line tools, or a RESTful API. OpenStack began in 2010 as a joint project of Rackspace Hosting and NASA. Currently, it is managed by the OpenStack Foundation, a non-profit corporate entity established in September 2012 to promote OpenStack software and its community. More than 200 companies have joined the project, including Arista Networks, AT&T, AMD, Avaya, Canonical, Cisco, Dell, EMC, Ericsson, Go Daddy, Hewlett-Packard, IBM, Intel, Mellanox, Mirantis, NEC, NetApp, Nexenta, Oracle, PLUMgrid, Red Hat, SUSE Linux, VMware and Yahoo!. The OpenStack community collaborates around a six-month, time-based release cycle with frequent development milestones. During the planning phase of each release, the community gathers for the OpenStack Design Summit to facilitate developer working- sessions and to assemble plans. The most recent OpenStack Summit, in May 2014 in Atlanta, drew 4,500 attendees, a 50% increase from the Hong Kong Summit six months earlier

Apache CloudStack Has reputation for solid software but does not have the rabid adoption of OpenStack; unusual that Apache solution not most popular! Came from Citrix via acquisitions Features include – Built-in high-availability for hosts and VMs – AJAX web GUI for management – AWS API compatibility – Hypervisor agnostic (VMware, KVM, XenServer, Xen Cloud Platform (XCP) and Hyper-V) – Snapshot management – Usage metering – Network management (VLAN's, security groups) – Virtual routers, firewalls, load balancers – Multi-role support

Eucalyptus, Nimbus Eucalyptus was top academic project in 2009 and was commercialized and just recently purchased by Hewlett Packardhttps:// – Eucalyptus had both commercial and Open source GPL3 tracks but latter was not developed as vigorously as other open source solutions – Perhaps first to offer AWS compatible interface Apache licensed Nimbus was probably most effective academic cloud software after Eucalyptus was commercialized and before OpenStack became popular

FutureGrid IaaS request popularity by year

OpenNebula Apache License. OpenNebula orchestrates storage, network, virtualization, monitoring, and security technologies to deploy multi-tier services (e.g. compute clusters) as virtual machines on distributed infrastructures, combining both data center resources and remote cloud resources, according to allocation policies The toolkit includes features for integration, management, scalability, security and accounting. It also claims standardization, interoperability and portability, providing cloud users and administrators with a choice of several cloud interfaces (Amazon EC2 Query, OGF Open Cloud Computing Interface and vCloud) and hypervisors (Xen, KVM and VMware), and can accommodate multiple hardware and software combinations in a data center Good system which strongly promoted in Europe but little used in USA where eclipsed by OpenStack

VMware vCloud VMware ESX is an enterprise- level computer virtualization product offered by VMware. ESX is a component of VMware's larger offering, VMware Infrastructure, which adds management and reliability services to the core server product. VMware recommends that deployments running the earlier ESX architecture migrate to the newer ESXi hypervisor architecture. VMware ESX and ESXi are VMware's enterprise software Type 1 hypervisors for guest virtual servers; they run on host server hardware without an underlying operating system. vSphere uses VMware’s ESXi hypervisor adding management (as in OpenStack) Note desktop VMware Workstation is a type 2 hypervisor VMware has historically been a software vendor focused on virtualization technologies. It entered the cloud IaaS market when it launched the VMware vCloud Hybrid Service (vCHS) into general availability in September This allows customers to migrate work on demand from their "internal cloud" of cooperating VMware hypervisors to a remote cloud of VMware hypervisors. – This is called cloud bursting

Amazon, Azure, Google Clouds Gartner has a “magic quadrant” summarizing public clouds 28 May Note Amazon is way ahead! Google with GCE (Google Compute Engine) is just starting IaaS. Previously it offered PaaS with Google App Engine Microsoft has recently expanded Azure but still catching up

Amazon Web Services AWS Compute: Elastic Compute Cloud (EC2) offers multitenant, fixed-size and nonresizable, Xen-virtualized VMs without autorestart. Single-tenant VMs are available via Dedicated Instances. There are special options for HPC, including graphics processing units (GPUs). AWS does not have any formal private cloud offerings, though it is willing to negotiate such deals (such as its deal for the U.S. intelligence community cloud). Storage: VM storage is ephemeral. Persistence requires VM-independent block storage (Elastic Block Store). There is an option for SSDs, as well as storage performance guarantees (Provisioned IOPS). Object-based storage (Simple Storage Service [S3]) is integrated with a CDN (CloudFront), there is an option for long-term archive storage (Glacier), and AWS offers its own cloud storage gateway appliance. Network: AWS offers a full range of networking options. Complex networking and IPsec VPN is done via Amazon Virtual Private Cloud (VPC). Third-party connectivity is via partner exchanges (AWS Direct Connect). Security: RBAC (Role based Access Control) is per-element, with customer- defined roles and exceptional control over permissions. AWS has obtained many security and compliance-related certifications and audits.

Google Compute Engine Google has been operating App Engine since 2008, but did not enter the IaaS market until the general-availability launch of GCE in December Compute: GCE offers multitenant, fixed-size and nonresizable, KVM-virtualized VMs, metered by the minute. Provisioning is exceptionally fast (typically under 1 minute). Storage: VM storage is persistent, and there is also VM-independent block storage. All block storage is encrypted. Network: Third-party private connectivity is not supported. Customers cannot bring their own private IP addresses (although this need may possibly be addressed by GCE's Advanced Routing features). There is no back-end load balancing. Security: RBAC permissions apply to the whole account. Google's strategy for Google Cloud Platform centers on the concept of allowing other organizations to "run like Google" by taking Google's highly innovative internal technology capabilities and exposing them as services that other companies can purchase. Consequently, although Google is a late entrant to the IaaS market, it is primarily productizing existing capabilities, rather than having to engineer those capabilities from scratch. It will therefore be able to advance its offering more rapidly than most competitors

Microsoft Azure The Azure business was previously strictly PaaS with a Windows and.Net focus, but Microsoft launched Azure Infrastructure Services (which include Azure Virtual Machines and Azure Virtual Network) into general availability in April 2013, thus entering the cloud IaaS market. Compute: Azure VMs (Linux or Windows) are fixed-size, paid-by- the-VM, and Hyper-V-virtualized; they are metered by the minute. Storage: Block storage ("virtual hard disk") is persistent and VM- independent. Object-based cloud storage is integrated with a CDN. Network: There is no support for complex network topologies. Third-party connectivity is via partner exchange (Azure ExpressRoute). Security: Virtual network topology limitations prevent useful deployment of most security-related virtual appliances, such as a perimeter intrusion detection/prevention system (IDS/IPS). RBAC uses Azure Active Directory, but permissions are whole-account.

Google Cloud DNS & Amazon Route 53 Google Cloud DNS – Authoritative DNS server available as a service in Google Cloud – The service is efficient, fault-tolerant and available globally – This service can be used by the user hosted services in Google Cloud or from third party applications – Amazon Route 53 – Authoritative DNS server available as a service in Amazon AWS – Provides a fault-tolerant, very fast DNS service. – Same as Google Cloud DNS this service can be used by the hosted services in Amazon Cloud or from third party applications – The service is available in all continents except Africa –