Luis Russi¹, Carlos R. Senna¹, Edmundo R. M. Madeira¹, Xuan Liu², Shuai Zhao², and Deep Medhi² Hadoop-in-a-Hybrid-Cloud GEC21 The 21st GENI Engineering Conference Oct 20-23, Bloomington, IN, USA ¹Institute of Computing, State University of Campinas – Brazil ²University of Missouri–Kansas City – USA
Agenda Motivation and Objectives Proposed Architecture – Web Cloud Portal – Execution Engine – Execution Service Why using GENI Testbed GEC
Motivation and Objectives Why – Hadoop installed in a private cloud may not have sufficient resources for all types of computational requirements – Need a seamless environment where Hadoop in a private cloud can access resources in other clouds Hybird Cloud An architecture to make the orchestration of Hadoop applications in hybrid clouds – Automatic preparation of a cross-domain cluster – Provisioning files – Making the results available to the user GEC
Cont.. Execution of Hadoop applications in hybrid cloud is not easy! – Spends time – Needs technical knowledge – Continuous evaluation of cloud resources – On-demand preparation of public cloud resources – Hybrid cloud requires an appropriate model that combines performance with minimal cost GENI platforms allows us to test out the Hadoop in a hybrid cloud concept
The Proposed Architecture GEC HM – Hadoop Master Node HW – Hadoop Worker Nodes
Web Cloud Portal GEC User interface Management of files (application, data and submission) Simple XML-Based submission file – Number of Virtual Machines (VM) – Image identification (Hadoop Master and Workers) – Requirements of VMs (memory, disk, flavor, etc) Organizing the application workspace
Orchestration Engine GEC Prepares working place in the private cloud’s storage Creates an Execution Service Instance (ESI) already associated with this cloud storage area Releases the ESI to manage the application execution (asynchronously) Copies the resulting files from the cloud storage to the user’s working place Eliminates ESI Notifies WCP
Execution Engine GEC ES Instance interacts with the private cloud monitoring system to evaluate the computational resources conditions Checks for extra resources from the public cloud (if needed) Automatic Hadoop Cluster preparation (Master and Workers) Makes a copy of the resulting files from the HDFS to the cloud storage accessible by the Orchestration Engine Eliminates all involved VMs Notifies the Orchestration Engine about the end of processes Monitors all stages of processing
Great environment for testing the Hybrid Cloud High speed networks Provisinable environments for cloud computing Public cloud deployment Cluster installation automation API integration Why using GENI?
UNICAMP-UMKC Hybrid Testbed GEC Word Count Java software prototype
Initial Results GEC Deploy exoGENI virtual machines with Hadoop Include the UMKC compute node at the UNICAMP cloud controller GRE Tunnel established between UMKC and UNICAMP
Future Work GEC ExoGENI virtual machines and cloud Hadoop cluster joint Execute the Wordcount Hadoop application at the cluster ( Integrate GENI API to the private cloud framework
Luis Russi¹, Carlos R. Senna¹, Edmundo R. M. Madeira¹, Xuan Liu², Shuai Zhao², and Deep Medhi² ¹Institute of Computing, State University of Campinas – Brazil ²University of Missouri–Kansas City – USA Thank you! Hadoop-in-a-Hybrid-Cloud