Evaluating Clouds for Smart Grid Computing: early Results using GE MARS App Ketan Maheshwari
Agenda Objectives of this Study Application Characterization Clouds Implementation Results Conclusions
Objectives 1.To evaluate cloud infrastructures for smart grid applications 2.To parallelize and port a smart grid application on clouds 3.Evaluate parallel scripting paradigm for usability and performance on clouds
Application Characterization Two tasks: marsMain and marsOut marsMain Compute Intensive: sec marsOut trivial: 3-10 sec A modest run=100 marsMain + 1 marsOut Intermediate results crucial Two tasks: marsMain and marsOut marsMain Compute Intensive: sec marsOut trivial: 3-10 sec A modest run=100 marsMain + 1 marsOut Intermediate results crucial 150M/run
Clouds Considered Amazon EC2 – Commercial, large – provides shared FS – Native interface Cornell RedCloud – Academic, small (96 CPUs) – Eucalyptus interface Futuregrid Cloud (NSF funded) – Academic, medium (~3000 CPUs) – Multiple interfaces (Nimbus, Eucalyptus, OpenStack)
Implementation: Parallel Scripting App Definition Control parameters Parallel Invocation Application expressed in < 30 lines of code
Overview of Results Experiments performed running MARS app: – On a local machine: serial and parallel – On individual clouds: serial and parallel – On multiple clouds Data staging experiments performed: – local -> local – local -> cloud instances – cloud instance -> S3 Cloud elasticity evaluated All experiments performed from a neutral external location to avoid network bias (especially since RedCloud is within Cornell network)
Local: with and without Input Data staging Dramatic speedup from 1 to 8 cores Steady speedup from 8 to 32; can be only as fast as the execution time of slowest task Dramatic speedup from 1 to 8 cores Steady speedup from 8 to 32; can be only as fast as the execution time of slowest task
Serial and Parallel on Individual Clouds Fast CPUs (2.8 GHz), low bandwidth New Cluster, high bandwidth, fast CPUs (2.6GHz) Seasoned! (2.3GHz)
Multiple Clouds Slow CPUs and bottlenecks in data staging contributes to low scaling Slow CPUs and bottlenecks in data staging contributes to low scaling
Cloud Data Movement locally mounted S3 not the fastest!
Cloud Elasticity elastic not so elastic!
Inter-cloud Bandwidth *=Gbits/sec
Conclusions Cloud environments are diverse in properties – Interfaces, invocations, configurations, pricing – Require special tending to make them work seamlessly Academic clouds “not quite there” – Clouds can’t rescue slow, old infrastructures Data movement bottleneck: cloud-based, distributed data-store required? Parallel scripting well-suited to multi-staged computing and well interfaced to clouds
Thanks! Thank you! Questions and comments welcome!