UPV-IBM’S BIG DATA OBSERVATORY & HADOOP INFRASTRUCTURE MANAGEMENT Damian Segrelles, Germán Moltó & Ignacio Blanquer,

Slides:



Advertisements
Similar presentations
FI-WARE – Future Internet Core Platform FI-WARE Cloud Hosting July 2011 High-level description.
Advertisements

CloudMoV: Cloud-based Mobile Social TV
© 2012 IBM Corporation Build a low-touch, highly scalable cloud with IBM SmartCloud Provisioning.
1/17 Distributed Systems Architecture Research Group Universidad Complutense de Madrid Execution of SGE Clusters on top of Hybrid Clouds using OpenNebula.
Software to Data model Lenos Vacanas, Stelios Sotiriadis, Euripides Petrakis Technical University of Crete (TUC), Greece Workshop.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
An emerging computing paradigm where data and services reside in massively scalable data centers and can be ubiquitously accessed from any connected devices.
Windows Azure: Microsoft’s Cloud Platform By Shahed Chowdhuri.
Microsoft Virtual Academy.
Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France
CloudNaaS: A Cloud Networking Platform for Enterprise Applications Theophilus Benson*, Aditya Akella*, Anees Shaikh +, Sambit Sahu + (*University of Wisconsin,
Luis Russi¹, Carlos R. Senna¹, Edmundo R. M. Madeira¹, Xuan Liu², Shuai Zhao², and Deep Medhi² Hadoop-in-a-Hybrid-Cloud GEC21 The 21st GENI Engineering.
Biomedical Big Data Training Collaborative biobigdata.ucsd.edu BBDTC UPDATES Biomedical Big Data Training Collaborative biobigdata.ucsd.edu.
Actualog Social PIM Helps Companies to Manage and Share Product Information Using Secure, Scalable Ease of Microsoft Azure MICROSOFT AZURE ISV PROFILE:
Big Data Open Source Software and Projects ABDS in Summary IV: Level 7 I590 Data Science Curriculum August Geoffrey Fox
WLCG infrastructure monitoring proposal Pablo Saiz IT/SDC/MI 16 th August 2013.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
Microsoft Azure Active Directory. AD Microsoft Azure Active Directory.
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
MidVision Enables Clients to Rent IBM WebSphere for Development, Test, and Peak Production Workloads in the Cloud on Microsoft Azure MICROSOFT AZURE ISV.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Running Big Data on the EGI Federated Cloud Javier Lopez Cacheiro, Álvaro.
StratusLab is co-funded by the European Community’s Seventh Framework Programme (Capacities) Grant Agreement INFSO-RI StratusLab Collaborations.
Enabling the Cloud OS Today  New high-density Web Sites with elastic cloud scaling and complete dev-ops experiences  New rich IaaS experience for self-service.
GRNET Cloud Services and Collaborations Kostas Koumantaros {kkoum at grnet.gr}
OpenNebula: Experience at SZTAKI Peter Kacsuk, Sandor Acs, Mark Gergely, Jozsef Kovacs MTA SZTAKI EGI CF Helsinki.
Grappling Cloud Infrastructure Services with a Generic Image Repository Javier Diaz Andrew J. Younge, Gregor von Laszewski, Fugang.
Deploying Highly Available SQL Server in Windows Azure A Presentation and Demonstration by Microsoft Cluster MVP David Bermingham.
PLATFORM TO EASE THE DEPLOYMENT AND IMPROVE THE AVAILABILITY OF TRENCADIS INFRASTRUCTURE IberGrid 2013 Miguel Caballer GRyCAP – I3M - UPV.
Overview of the global architecture Giacinto DONVITO INFN-Bari.
StratusLab is co-funded by the European Community’s Seventh Framework Programme (Capacities) Grant Agreement INFSO-RI Demonstration StratusLab First.
Possible contributions of UPV to WP5 (NDIGO- DataCloud) Germán Moltó UPV RIA
EGI-Engage is co-funded by the Horizon 2020 Framework Programme of the European Union under grant number WG on Python and WG on Workflows.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
Instituto de Biocomputación y Física de Sistemas Complejos Cloud resources and BIFI activities in JRA2 Reunión JRU Española.
IBERGRID as RC Total Capacity: > 10k-20K cores, > 3 Petabytes Evolving to cloud (conditioned by WLCG in some cases) Capacity may substantially increase.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.
StratusLab is co-funded by the European Community’s Seventh Framework Programme (Capacities) Grant Agreement INFSO-RI StratusLab: Enhancing Grid.
© 2007 IBM Corporation IBM Software Strategy Group IBM Google Announcement on Internet-Scale Computing (“Cloud Computing Model”) Oct 8, 2007 IBM Confidential.
SCI-BUS is supported by the FP7 Capacities Programme under contract nr RI CloudBroker usage Zoltán Farkas MTA SZTAKI LPDS
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI /09/14 1 Appliance lifecycle services Marios Chatziangelou, et al.
Federated Cloud: Computing UPVLC-I3M Effort allocated: 6 pms. Proposal: Integration of an Infrastructure Broker with self-configuration and auto-scaling.
DreamFactory for Microsoft Azure Is an Open Source REST API Platform That Enables Mobilization of Data in Minutes across Frameworks and Storage Methods.
In Depth Azure StackIn Depth Azure Stack Resource Providers Damian Flynn MVP Daniel Savage Microsoft.
Project Cumulus Overview March 15, End Goal Unified Public & Private PaaS for GlassFish/Java EE Simplify deployment of Java EE Apps on top of.
1 EGI Federated Cloud Architecture Matteo Turilli Senior Research Associate, OeRC, University of Oxford Chair – EGI Federated Clouds Task Force
A cloud-based architecture for supporting medical imaging analysis through INDIGO-DataCloud Ignacio Blanquer, Germán Moltó, Miguel Caballer (UPV) Luis.
PaaS services for Computing and Storage
Course: Cluster, grid and cloud computing systems Course author: Prof
Use of HLT farm and Clouds in ALICE
Danilo Ardagna Politecnico di
StratusLab First Periodic Review
StratusLab Roadmap C. Loomis (CNRS/LAL) EGI TCB (Amsterdam)
Population Imaging Use Case - EuroBioImaging
IaaS Layer – Solutions for “Enablers”
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Free Cloud Management Portal for Microsoft Azure Empowers Enterprise Users to Govern Their Cloud Spending and Optimize Cloud Usage and Planning MICROSOFT.
PaaS Core Session (Notes from UPV)
DI4R, 30th September 2016, Krakow
What are the most popular services offered by Amazon Web Services..?Amazon Web Services
Action U-E-5 Technical Coordination – User Technical Support
Nimble Streamer Helps Media Content Providers Create Streaming Networks Cost-Effectively and Easily by Utilizing Azure’s Worldwide Scalability MICROSOFT.
An easier path? Customizing a “Global Solution”
Management of Virtual Execution Environments 3 June 2008
OpenNebula Offers an Enterprise-Ready, Fully Open Management Solution for Private and Public Clouds – Try It Easily with an Azure Marketplace Sandbox MICROSOFT.
Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.
Orchestration & Container Management in EGI FedCloud
OpenStack Summit Berlin – November 14, 2018
Presentation transcript:

UPV-IBM’S BIG DATA OBSERVATORY & HADOOP INFRASTRUCTURE MANAGEMENT Damian Segrelles, Germán Moltó & Ignacio Blanquer,

IBM-UPV BIG DATA OBSERVATORY A joint action by the School of Informatics of the UPV and IBM to promote BigData practicals in post-graduate courses. IBM has offered a free Enterprise license to IBM InfoSphere BigInsights (An IBM’s version of Hadoop and other analytic tools for visualization, “R”, text analysis and tables). Usage restricted to academic and research activities in the UPV – not production. Main common interest is to educate students on BigData technologies Currently a post-graduate course and a Master Degree.

RESOURCES InfoSphere BigInsights is installed on our cloud on-premises UPV’s site is for application development and prototyping. IBM offered the possibility of using IBM Bluemix IaaS for production. Currently two projects under development Suitability of university degrees according to employment demand. Extraction of trending topics from live streamed content in radio and tv.

PLANS AND OPPORTUNITIES The work of the observatory aims at: Supporting the research part of the academic activity of forthcoming post-degree students (40-60 per year). Create success stories and new use cases. Potential links with EGI (To be proposed to IBM) Support BigInsights from UPV’s FedCloud site. Scale-up the concept of Observatory to other centres in EGI.

IAAS MANAGEMENT FRAMEWORK OF GRYCAP

IAAS MANAGING SERVICES IM COMPaaS EC3 VMCA VMRC CLUES A SLA management framework for IaaS. A Virtual Machine Consolidation Agent dons.php An energy management system for Clusters and Cloud infrastructures. Automatic configuration and recontextualization service. Creation of elastic virtual clusters on top of both public and on-premise IaaS providers A rich metadata repository of Virtual Machine Images.

INFRASTRUCTURE MANAGER General platform to deploy on demand customizable virtual computing infrastructures. – Multiple VMs with multiple configurations. – No need of pre-packaged VMIs. Enable re-using of VMIs. Infrastructure-Agnostic – Currently supports: OpenNebula, OpenStack, EC2, GCE, OCCI (FedCloud), FogBow, Docker, LibVirt. Service with two APIs: XML- RPC and REST. Working Scheme: – Receives a document (RADL) Resource and Application Description Language High level Language to define virtual infrastructures – Specify VM requirements – Contact the catalog to get a list with the most suitable VMI. – Obtain the list of IaaS providers available to the user. – Selects the best combination. – Contact the IaaS provider selected to deploy the infrastructure and finally configure it.

INFRASTRUCTURE MANAGER (II) network privada network publica (outbound = 'yes') system front ( cpu.arch='x86_64' and cpu.count>=2 and memory.size>=2048m and net_interface.0.connection = 'publica' and net_interface.1.connection = 'privada' and net_interface.1.dns_name = 'hadoopmaster' and disk.0.os.name='linux' and disk.0.os.flavour='ubuntu' and disk.0.os.version='12.04' and disk.0.applications contains (name='ansible.modules.micafer.hadoop') ) system wn ( cpu.arch='x86_64' and memory.size>=2048m and net_interface.0.connection='privada' and disk.0.os.name='linux' and disk.0.os.flavour='ubuntu') configure front - roles: - { role: 'micafer.hadoop', hadoop_master: 'hadoopmaster', hadoop_type_of_node: 'master' ) configure wn - roles: - { role: 'micafer.hadoop', hadoop_master: 'hadoopmaster' ) deploy front 1 deploy wn 5

HADOOP CLUSTER 11 nodes (EC2) 51 nodes (EC2) 6 nodes (ONE) 11 nodes (ONE) Creation9:2914:2612:0522:13 Node Addition5:4611:235:135:33 Node Removal3:176:361:181:32 Creation Times EC2: Master: c1.medium 1.5 GB de RAM and 2 cores WNs: m1.small. 1.5 GB de RAM and 1 cores ONE: Tailored requirements (1 core and 2 GB RAM). Overhead with many MVs due to bottleneck in network accesses Deploys a hadoop cluster (1 head node and 5 WNs) Uses a special application requirement ( ansible.modules.micafer.hadoop) It downloads roles from Ansible Galaxy ( The configure section uses the galaxy roles.

EC3 EC3 = IM + CLUES IM is used in conjunction with a energy management software for clusters called CLUES to enable the deployment of elastic virtual clusters on cloud platforms. Elastic Cloud Computing Cluster

EC3 CLUES and the IM interact at two levels: 1.EC3 launcher deploys a front-end node configuring CLUES and other instance of the IM service. 2.CLUES uses internally IM to deploy the VMs that will be used as working nodes for the cluster. For that, it uses a RADL document defined by the sysadmin, where the features of the working nodes are specified. Once these nodes are available, they are automatically integrated in the cluster as new available nodes for the LRMS. Elastic Cloud Computing Cluster

CONTACTS IBM-UPV Big Data Observatory Damià Segrelles: GRyCAP IaaS management ecosystem Germán Moltó: GRyCAP head Ignacio Blanquer: