EVGM081 Multi-Site Virtual Cluster: A User-Oriented, Distributed Deployment and Management Mechanism for Grid Computing Environments Takahiro Hirofuchi, Takeshi Yokoi, Tadashi Ebara, Yusuke Tanimura, Hirotaka Ogawa, Hidetomo Nakada, Yoshio Tanaka, and Satoshi Sekiguchi National Institute of Advanced Industrial Science and Technology (AIST)
EVGM082 Background Grid Computing Large-scale Distributed Heterogeneous Grid Middleware Globus toolkit gLite
EVGM083 Software Deployment and Management Problems Software complexity Dependency Configuration Resource diversity Hardware x86, x64, memory, storage, … Software CentOS 4, CentOS 5, Solaris, … Library versions Organizations Administration policy x Sites Management Cost Explosion
EVGM084 My Experience VOMS Installation at 2007 spring Virtual organization management system Supports only SL3 (based on RHEL3 since ) It was nightmare Debian EtchCentOS 5 VOMS depends on GT4. VOMS requires old GCC for C++. GT4 requires new GCC for its patched OpenSSL Me, too…?
EVGM085 Our Concept Virtualization Isolate resources Capsulate environments Create new administrative domains Site A Site B Site C Create virtual machines at each site
EVGM086 Our Concept Virtualization Isolate resources Capsulate environments Create new administrative domains Site A Site B Site C Group distributed VMs for a Virtual Organization
EVGM087 Our Concept Virtualization Isolate resources Capsulate environments Create new administrative domains Site A Site B Site C Multi-Site Virtual Cluster
EVGM088 Multi-Site Virtual Cluster Integrate distributed VMs Single cluster view. Allow single administrative domain OS installation and full configuration Enable easy system deployment Large-scale nodes Application Site A Site B Application Scientific application Emulation testbed Deployment and configuration
EVGM089 System Components (1) Site A Site B Site C Resource Virtualization Mechanism
EVGM0810 System Components (2) Site A Site B Site C Web Service API for Virtualized Resource Control
EVGM0811 System Components (3) Site A Site B Site C Easy management system for large-scale, distributed nodes
EVGM0812 Resource Virtualization Mechanism Design criteria Create completely-isolated VMs (Virtual Cluster) Our virtual cluster system Physical nodes VMware Server and Xen VLAN LVM and iSCSI Free and Open Source
EVGM0813 Allocating a Virtual Cluster (1) Cluster Manager VMM Private Network Public Network
EVGM0814 Allocating a Virtual Cluster (2) Cluster Manager VMM Create a new VLAN (eth0.1234) Bridge to VLAN
EVGM0815 Allocating a Virtual Cluster (3) Cluster Manager VMM Create new storage volumes LVM Attach via iSCSI
EVGM0816 Allocating a Virtual Cluster (4) Cluster Manager VMM Launch VMs on VMM LVM Attach via iSCSI VM
EVGM0817 Web Service API REST API /api/vc List virtual clusters (GET), Create a new virtual cluster (POST) /api/vc/1234/ Get the status of VC 1234 (GET) /api/vc/1234/vm List the VMs (GET), Add/Delete a new VM(POST) /api/vc/1234/vm/{0, 1, 2, 3, 4, 5} Get the status of a VM, Start/Stop a VM (POST) /api/vc/1234/vpn /api/vc/1234/vpn/{0, 1, 2}
EVGM0818 Distributed Node Management Exploit an existing cluster management system for the inside of a multi-site virtual cluster Designed for physical clusters Powerful node management Node database, parallel command execution, automatic node installation Integrate distributed VMs by Ethernet VPN Bridge internal networks of single-site virtual clusters Transparency for admins and users Transparent package caching at each site Package-based software installer Quick installation and reconfiguration Flexible customizability
EVGM0819 VM Ethernet VPN among site-local VLANs Package cache repository Managed by Rocks toolkit Frontend Node Console Node DB PXE installation server
EVGM0820 Prototype Implementation Reservation portal Site resource monitoring Automatic reservation NPACI Rocks 4.2 VMware Server VLAN, iSCSI OpenVPN 2.0 Squid 3.0
EVGM0821 Evaluation Demo over the Pacific Software configuration Parallel command line tools Node status monitoring Condor job sumission Evaluate Scalability WAN Large number of VMs Reconfiguration time Network traffic
EVGM0822 Experiment Setting AMD Opteron 244, 3GB Mem, Gb Eth x2 16 nodes 134 nodes, reconfigured with 900MByte packages AMD Opteron 246, 6GB Mem, Gb Eth x2 Node Config. DB
EVGM Node Reinstallation over WAN Only 20 minutes for a 134-nodes virtual cluster 900MB programs per node
EVGM MB/s 800KB/s 10MB/s Cache Enabled Pre Cached Cache Disabled WAN Traffic over VPN RTT 20ms A cache server minimizes VPN traffic for reinstallation.
EVGM0825 Conclusion Multi-site virtual cluster Virtualization for Grid computing Isolate resources Give independent administrative domains Easy-to-use UI Future work Amazon EC2 support Live Migration (demo