Big Data Open Source Software and Projects ABDS in Summary III: Layer 5-Part 1 Data Science Curriculum March 5 2015 Geoffrey Fox

Slides:



Advertisements
Similar presentations
Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox
Advertisements

OpenStack Update Infrastructure as a Service May 23 nd 2012 Rob Hirschfeld, Dell.
Virtual Machine Overview
Introduction to Virtualization
Big Data Open Source Software and Projects ABDS in Summary XVI: Layer 13 Part 1 Data Science Curriculum March Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary II: Layers 3 to 4 Data Science Curriculum March Geoffrey Fox
Server Virtualization Gina Myers. Definition Creating virtual machines (VMs) “VMs are software entities that emulate a real machine’s functionality” ◦
Big Data Open Source Software and Projects ABDS in Summary IV: Layer 5 Part 2 Data Science Curriculum March Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary XIII: Level 14A I590 Data Science Curriculum August Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary XXI: Layer 15B Part 1 Data Science Curriculum March Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August Geoffrey Fox
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
European Organization for Nuclear Research Virtualization Review and Discussion Omer Khalid 17 th June 2010.
Virtualization B. Ramamurthy. References Practical Virtualization Solutions: Virtualization from the Trenches by K. Hess and A. Newman, Prentice-Hall.
Virtualization for Cloud Computing
Virtualization 101.
A Brief Introduction To Virtualization Technologies Yin Yunqiao HP.
Big Data Open Source Software and Projects Unit 0 Part B: Class Introduction Data Science Curriculum March Geoffrey Fox
Virtualization 101.
Methodologies, strategies and experiences Virtualization.
Tanenbaum 8.3 See references
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Opensource for Cloud Deployments – Risk – Reward – Reality
Operating System Virtualization
Virtualization Virtualization is the creation of substitutes for real resources – abstraction of real resources Users/Applications are typically unaware.
An Introduction to Xen Prof. Chih-Hung Wu
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 7 2/23/2015.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
SAIGONTECH COPPERATIVE EDUCATION NETWORKING Spring 2010 Seminar #1 VIRTUALIZATION EVERYWHERE.
SAIGONTECH COPPERATIVE EDUCATION NETWORKING Spring 2009 Seminar #1 VIRTUALIZATION EVERYWHERE.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
Secure & flexible monitoring of virtual machine University of Mazandran Science & Tecnology By : Esmaill Khanlarpour January.
BIG DATA APPLICATIONS & ANALYTICS LOOKING AT INDIVIDUAL HPCABDS SOFTWARE LAYERS 1/26/2015 Cloud Computing Software 1 Geoffrey Fox January BigDat.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
Big Data Open Source Software and Projects ABDS in Summary I: Layers 1 to 2 Data Science Curriculum March Geoffrey Fox
COMS E Cloud Computing and Data Center Networking Sambit Sahu
Cloud Operating System Unit 09 Cloud OS Core Technology M. C. Chiang Department of Computer Science and Engineering National Sun Yat-sen University Kaohsiung,
Big Data Open Source Software and Projects ABDS in Summary IV: Level 7 I590 Data Science Curriculum August Geoffrey Fox
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
Full and Para Virtualization
Docker and Container Technology
Virtualization One computer can do the job of multiple computers, by sharing the resources of a single computer across multiple environments. Turning hardware.
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Cloud Architecture. SPI Model Cloud Computing Classification Model – SPI Cloud Computing Classification Model – SPI - SaaS: (Software as a Service) -
1 TCS Confidential. 2 Objective: In this session we will be able to learn  What is Openstack?  History  Capabilities  Openstack as IaaS  Advantages.
CEG 2400 FALL 2012 Chapter 10 Virtual Networks and Remote Access 1.
Panel Discussion Software Defined Ecosystems June BigSystem Software-Defined Ecosystems at HPDC Vancouver Canada Geoffrey Fox.
Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August Geoffrey Fox
Virtualization Assessment. Strategy for web hosting Reduce costs by consolidating services onto the fewest number of physical machines
Inventory Monitor Protect InfraToolsProcessBusiness Requirements Deploy Configure Migrate Tools Service Provider “Service Admin Fabric Admin” Application.
Introductory Tutorial: OpenStack, Chef, Hadoop, Hbase, Pig I590 Data Science Curriculum Big Data Open Source Software and Projects September Geoffrey.
Open Source Virtualization Andrey Meganov RHCA, RHCX Consultant / VDEL
Virtualization for Cloud Computing
Interoperability Between Modern Clouds using DevOps
Operating System & Application Software
Fundamentals Sunny Sharma Microsoft
Prepared by: Assistant prof. Aslamzai
Virtualization Virtualization is the creation of substitutes for real resources – abstraction of real resources Users/Applications are typically unaware.
Usage of Openstack Cloud Computing Architecture in COE Seowon Jung Systems Administrator, COE
Virtualization overview
Interoperability in Modern Clouds using DevOps
Virtual Servers.
Virtualization Virtualization is the creation of substitutes for real resources – abstraction of real resources Users/Applications are typically unaware.
Virtualization Layer Virtual Hardware Virtual Networking
Big Data Open Source Software and Projects ABDS in Summary I
Windows Virtual PC / Hyper-V
I590 Data Science Curriculum August
Presentation transcript:

Big Data Open Source Software and Projects ABDS in Summary III: Layer 5-Part 1 Data Science Curriculum March Geoffrey Fox School of Informatics and Computing Digital Science Center Indiana University Bloomington

Functionality of 21 HPC-ABDS Layers 1)Message Protocols: 2)Distributed Coordination: 3)Security & Privacy: 4)Monitoring: 5)IaaS Management from HPC to hypervisors: Part 1 6)DevOps: 7)Interoperability: 8)File systems: 9)Cluster Resource Management: 10)Data Transport: 11)A) File management B) NoSQL C) SQL 12)In-memory databases&caches / Object-relational mapping / Extraction Tools 13)Inter process communication Collectives, point-to-point, publish-subscribe, MPI: 14)A) Basic Programming model and runtime, SPMD, MapReduce: B) Streaming: 15)A) High level Programming: B) Application Hosting Frameworks 16)Application and Analytics: 17)Workflow-Orchestration: Here are 21 functionalities. (including 11, 14, 15 subparts) 4 Cross cutting at top 17 in order of layered diagram starting at bottom

Xen Xen supports a form of type 1 virtualization known as paravirtualization, in which guests run a modified operating system. The guests are modified to use a special hypercall ABI, instead of certain architectural features. Through paravirtualization, Xen can achieve high performance even on its host architecture (x86) which has a reputation for non-cooperation with traditional virtualization techniques Xen was developed at the University of Cambridge but is now owned by Citrix Responsibilities of the hypervisor include memory management and CPU scheduling of all virtual machines ("domains"), and for launching the most privileged domain ("dom0") - the only virtual machine which by default has direct access to hardware. From the dom0 the hypervisor can be managed and unprivileged domains ("domU") can be launched.

KVM, VirtualBox KVM is a GNU licensed type 2 virtualization infrastructure for the Linux kernel that turns it into a hypervisor, which was merged into the Linux kernel mainline in February 2007http://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine – It requires a processor with hardware virtualization extension. Oracle VirtualBox is another well known type 2 hypervisor with GPL2 licensehttps:// – Runs on many O/S

Hyper-V Microsoft proprietary Hypervisor that supports Windows and some variants of Linux There must be a parent partition running Windows Server

OpenVZ OpenVZ is a type 2 Hypervisor with GPL licensehttp://openvz.org/Main_Page OpenVZ (Open VirtualiZation) or Open Virtuozzo is an operating system- level virtualization technology based on the Linux kernel and operating system. OpenVZ allows a physical server to run multiple isolated operating system instances, known as containers, Virtual Private Servers (VPSs), or Virtual Environments (VEs). Docker works well with containers OpenVZ is not true virtualization but really containerization like FreeBSD jails. Technologies like VMware and Xen are more flexible in that they virtualize the entire machine and can run multiple operating systems and different kernel versions. OpenVZ uses a single patched Linux kernel and therefore can run only Linux, all containers share the same architecture and kernel version. However, as it does not have the overhead of a true hypervisor, it is very fast and efficient. The disadvantage with this approach is the single kernel. All guests must function with the same kernel version that the host uses. LXC (LinuX Containers) and Linux-Vserver are similar technologies

OpenStack OpenStack, OpenNebula, CloudStack, Nimbus, Eucalyptus are all cloud or Virtual managers. They help users and system administers use virtual machines with various characteristics – The big commercial public clouds have equivalent proprietary systems OpenStack is a free and open-source Apache Licensed software cloud computing software platform. Users primarily deploy it as an infrastructure as a service (IaaS) solution. The technology consists of a series of interrelated projects (such as Heat in DevOps and Swift in Storage) that control pools of processing, storage, and networking resources throughout a data center—which users manage through a web-based dashboard, command-line tools, or a RESTful API. OpenStack began in 2010 as a joint project of Rackspace Hosting and NASA. Currently, it is managed by the OpenStack Foundation, a non-profit corporate entity established in September 2012 to promote OpenStack software and its community. More than 200 companies have joined the project, including Arista Networks, AT&T, AMD, Avaya, Canonical, Cisco, Dell, EMC, Ericsson, Go Daddy, Hewlett-Packard, IBM, Intel, Mellanox, Mirantis, NEC, NetApp, Nexenta, Oracle, PLUMgrid, Red Hat, SUSE Linux, VMware and Yahoo!. The OpenStack community collaborates around a six-month, time-based release cycle with frequent development milestones. During the planning phase of each release, the community gathers for the OpenStack Design Summit to facilitate developer working- sessions and to assemble plans. The recent OpenStack Summit, in May 2014 in Atlanta, drew 4,500 attendees, a 50% increase from the Hong Kong Summit six months earlier

Apache CloudStack Has reputation for solid software but does not have the rabid adoption of OpenStack; unusual that Apache solution not most popular! Came from Citrix via acquisitions Features include – Built-in high-availability for hosts and VMs – AJAX web GUI for management – AWS API compatibility – Hypervisor agnostic (VMware, KVM, XenServer, Xen Cloud Platform (XCP) and Hyper-V) – Snapshot management – Usage metering – Network management (VLAN's, security groups) – Virtual routers, firewalls, load balancers – Multi-role support

Eucalyptus, Nimbus Eucalyptus was top academic project in 2009 and was commercialized and just recently purchased by Hewlett Packardhttps:// – Eucalyptus had both commercial and Open source GPL3 tracks but latter was not developed as vigorously as other open source solutions – Perhaps first to offer AWS compatible interface Apache licensed Nimbus was probably most effective academic cloud software after Eucalyptus was commercialized and before OpenStack became popular