NTU Cloud 2010/05/30
System Diagram
Architecture Gluster File System – Provide a distributed shared file system for migration NFS – A Prototype Image storage space Node Gluster File System Compute ImgC- Img S- ImgStorage ImgC- Img NFS Prototype Img
Architecture Prototype Image – Original Image e.g. Hadoop MPI Compute Image – Modified Images for user – Do not preserve the content after cluster shutdown Node Gluster File System Compute ImgC- Img S- ImgStorage ImgC- Img NFS Prototype Img
XEN A hypervisor Virtualization
Cloud Master Monitor system state Scheduling Use NFS to store Prototype Image Web server
OpenNebula A middleware Provides an interface to manage virtual infrastructure (computation and network) VM Migration => We use OpenNebula to manage VM deployment, migration and set up virtual local area network(VLAN).
Gluster file system User level distributed file system Client/Server Architecture Use TCP/IP to transfer data =>We use GlusterFS to build our share file system environment for VM live migration. =>Our deployment is "symmetrical" - every machine is both a server and a client.
System Flow
Hadoop Benchmark Case 1 – M1 : Master + Slave-01 + Slave02 Case 2 – M1 : Master – M2 : Slave-01 + Slave-02 Case 3 – M1 : Master – M3 : Slave-01 + Slave-02 Case 4 – M1 : Master – M2 : Slave-01 – M3 : Slave-02
All in M1Slave in M2Slave in M3Slave-01 in M2 Slave-02 in M Sec Sec Iteration
Set 1 VMHost MachineVCPUMemPurpose Set 1.1Single machine MasterM112.2GNamenode+Datanode+Jobtracker+Tasktracker WorkerM111.2GDatanode+Tasktracker Set 1.2Two machine MasterM112.2GNamenode+Datanode+Jobtracker+Tasktracker WorkerM211.2GDatanode+Tasktracker M1&M2 has same CPU and Memory size. HADOOP_HEAPSIZE=500MB mapred.child.java.opts=100MB RandomWriter 10M for 30Maps Sortting HDFS_BYTES_READ= HDFS_BYTES_WRITTEN=
Sort Therefore, putting two VM into one machine performance slowdown to 88.92% two machine / single machine = % Launched reduce tasks=4 Others=3 Reduce shuffle bytes= Reduce shuffle bytes= Exactly the same!
Set 2 VMHost MachineVCPUMemPurpose Set 2.1Single machine MasterM122.2GNamenode+Datanode+Jobtracker+Tasktracker WorkerM121.2GDatanode+Tasktracker Set 2.2Two machine MasterM122.2GNamenode+Datanode+Jobtracker+Tasktracker WorkerM221.2GDatanode+Tasktracker 1.RandomWriter10M for 30Maps 2.Sort HADOOP_HEAPSIZE=500MB mapred.child.java.opts=100MB
RandomWriter Therefore, putting two VM into one machine performance slowdown to 80.70% two machine / single machine = %
RandomWriter Single machineTwo machine IterationSecHDFS_BYTES_WRITTENSecHDFS_BYTES_WRITTEN Avg Avg. on 1,2,
Sort
Current Progress Xen 4.0 is ready on each node. We can offer two kinds of images – Hadoop – MPI Start up VMs to destination node automatically. Configure MPI and Hadoop environment for use automatically.