Data-Intensive Cloud Control for GENI GEC 10 Orca control framework March 15 th, 2011 Michael Zink, Prashant Shenoy, Jim Kurose, David Irwin and Emmanuel Cecchet
2 DiCloud Project Overview Support researchers conducting data-intensive experiments Sensors network storage volumes processing Extend Orca with Data-centric Slices: storage as a first-class resource with Amazon Simple Storage Service (S3) and Elastic Block Storage (EBS) Cloud Computing: connect Amazon Elastic Compute Cloud (EC2) resources to GENI
3 Demo – Step 1 (Allocate resources) ViSE Sensor Network S3 (storage) EC2 servers Researcher creates a slice with: weather radar storage space on S3 compute servers on EC2 Orca DiCloud
4 Demo – Step 2 (Storing data) ViSE Sensor Network S3 (storage) Weather radar data is stored in S3 DiCloud S3 Proxy
5 Demo – Step 3 (Processing data) S3 (storage) EC2 servers Get radar data from S3 Process data on EC2 server Put generated image in S3 DiCloud AWS accounting
6 Demo – Step 4 (Visualizing data) S3 (storage) DiCloud S3 Proxy Get generated image in S3
7 Technical details Implemented on top of Amazon Web Service API DiCloud monitors EC2 Server hourly usage Network usage (in and out traffic using CloudWatch statistics) S3 storage space and put/get operations using a dedicated proxy EBS disk usage (storage space and number of IOs using CloudWatch statistics) All accounting operations are logged in a database Resources are automatically revoked when budget has expired
8 CF and GENI AM API Integration DiCloud implements building blocks for using Clouds Proxy to monitor cloud API calls Accounting software to track $ spent Mechanisms to prevent users from exceeding budget Software sits beneath the CF/GENI AM API Orca AM implements GENI AM API DiCloud hooks into Orca join/leave handlers Internal Orca AM interfaces Other CFs may plug in at different places
9 Challenges Network connectivity with AWS Monitoring CloudWatch does not differentiate between free and paying network traffic or disk IO S3 Proxy in the cloud would save network traffic and cost What security model for storage How to share data with others? Storage leases can last years What budget to allocate to make this useful?