Presentation is loading. Please wait.

Presentation is loading. Please wait.

OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative Common Execution Infrastructure Kate Keahey, Tim Freeman, Alex Clemesha, David LaBissoniere,

Similar presentations


Presentation on theme: "OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative Common Execution Infrastructure Kate Keahey, Tim Freeman, Alex Clemesha, David LaBissoniere,"— Presentation transcript:

1 OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative Common Execution Infrastructure Kate Keahey, Tim Freeman, Alex Clemesha, David LaBissoniere, John Bresnahan Life Cycle Architecture Review La Jolla, CA

2 OOI CI LCA REVIEW August 2010 Common Execution Infrastructure Purpose Basic capabilities in resource provisioning on IaaS clouds Commercial National infrastructure Highly Available (HA) services Allow OOI computations to scale to demand by leveraging elastically provisioned resources

3 OOI CI LCA REVIEW August 2010 R1 Use Cases IDTitleDescription UC.R1.14Use Service AnywhereMessages go to services wherever they are UC.R1.15Put Services AnywhereAllocate services where need is greatest UC.R1.16Scale the ProcessingIncrease processing quickly to meet demand UC.R1.17Replicate ServiceConfigure service once, deploy many times UC.R1.20Command A ResourceSend typical commands to specific resource UC.R1.25Assure ReliabilityComputer fails, messages resent, work resumes UC.R1.26Virtualize EverythingVirtual processes embody all system services UC.R1.28Operate SystemConfigure system and respond to requests UC.R1.30Troubleshoot SystemDiagnose issues using logs, feeds, tools

4 OOI CI LCA REVIEW August 2010 User’s View of the Architecture EPU EPU Worker (Operational Unit) EPU Worker (Operational Unit) HA Service (OOI Application) VM (Deployable Unit) Application Software (Deployable Type) EPU Worker (Operational Unit) EPU Worker (Operational Unit) EPU Worker (Operational Unit) EPU Worker (Operational Unit) VM (Deployable Unit) VM (Deployable Unit)

5 OOI CI LCA REVIEW August 2010 Overall Architecture HA App- v1 Client VM Exchange Point …and then a miracle occurs…

6 OOI CI LCA REVIEW August 2010 Provisioner-2 Provisioner-0 Overall Architecture Capability Container App-v1 cc-agent EPU Worker ctx-agent IaaS Context Broker HA App- v1 Client Provisioner-0 Deployable Type Registry Service EPU Controller (App-v1) DE (Planne r) Sensor Aggregator (App-v1) A HA-P VM updates Queue length uses queries contextualization Launches VM Health report Per-node status Exchange Point

7 OOI CI LCA REVIEW August 2010 Capability Container One VM HA Provisioner Provisioner-2 Provisioner-0 IaaS Context Broker Provisioner-0 A HA-P Provisioner (Provisioner) Controller (HA-Provisioner) Sensor Aggregator (HA-Provisioner) Per-node status Queue length Base CEI Instance All other EPU controllers Bottom Turtle: Operations Monitors and restarts

8 OOI CI LCA REVIEW August 2010 Daemonize and monitor Bootstrapping and Monitoring Provisioner (Provisioner) Controller (HA-Provisioner) Sensor Aggregator (HA-Provisioner) Base CEI Instance Context Broker Messaging Service Core Services epu_control launch test monitor launch test monitor launch test monitor launch test monitor Provisioner-2 Provisioner-0 HA-P Service launches

9 OOI CI LCA REVIEW August 2010 Summary of Implementation Status Detailed design and implementation documents All major components implemented: Provisioner, EPU Controller, Decision Engine and Planner, Sensor Aggregator, DTRS Integrated with ION Some components needing refinement: bootstrap process, draft user and administrator process, image building and management Tested on infrastructure from Magellan to EC2

10 OOI CI LCA REVIEW August 2010 Technology Choices ION: Integrated Observatory Network boto txrabbitmqTwotp Nimboss Context Broker Fabric

11 OOI CI LCA REVIEW August 2010 The Testfest Objectives: Test a fully experimental system queue_length excepted to make progress Identify areas needing potential redesign Test the “muscle” of the system: no optimizations, no policies, no fancy improvements Scalability target: up to 1000 VMs 237 achieved so far

12 OOI CI LCA REVIEW August 2010 R1 Use Cases Demonstrated UC.R1.16: Scale the processing A load is put on the system Additional demand is recognized via different sensors Message queue length, CPU loads, disk usage System scales up to meet increased demand System scales down when demand goes away UC.R1.25: Assure reliability Failures happen Remedial actions happen No significant impact on observatory operation

13 OOI CI LCA REVIEW August 2010 Testing Environment Provisioner-2 Provisioner-0 Capability Container App-v1 cc-agent EPU Worker ctx-agent IaaS Context Broker HA App- v1 Client Provisioner-0 Deployable Type Registry Service EPU Controller (App-v1) DE (Planne r) Sensor Aggregator (App-v1) A HA-P VM updates Queue length uses queries contextualization Launches VM Health report Per-node status Exchange Point EC2 small EC2 High-CPU XL EC2 Small UC EC2 small

14 OOI CI LCA REVIEW August 2010 Scale the Processing Average load scenario 70 jobs, infinitely long One job per VM Submitted over 28 minutes, 5 jobs every 2 minutes Worst-case scenario 70 jobs, infinitely long One job per VM Saturating the system

15 OOI CI LCA REVIEW August 2010 Assure Reliability How does the system react to failure? Saturate the system with 10s jobs Bounded policy: 20 VMs Kill 2 VMs every 5 minutes

16 OOI CI LCA REVIEW August 2010 Lessons Learned Many, MANY, tractable small issues and lessons learned a.k.a., “an endless stream of simple bugs” ;-) Most significant unresolved issues: Messaging system connections close unexpectedly Currently prevents us from running at scale, need for scalability testing in COI Inspecting message queue remotely needs to be rethought Need for concurrency in the container Unresolved issue in “pulling work” Lots of work to do!

17 OOI CI LCA REVIEW August 2010 Risk Assessment -CEI Use Cases IDNameDescription Risk of Not Availability Level of Maturity Target Use UC.R1.15Put Services Anywhere Allocate services where need is greatest LowExpectedDeveloper UC.R1.16Scale the ProcessingIncrease processing quickly to meet demand LowExpectedDeveloper UC.R1.17Replicate ServiceConfigure service once, deploy many times LowExpectedDeveloper UC.R1.26Virtualize Everything Virtual processes embody all system services LowExpectedDeveloper UC.R1.25Assure ReliabilityComputer fails, messages resent, work resumes MediumNecessaryDeveloper UC.R1.28Operate SystemConfigure system and respond to requests MediumNecessaryOperator Services Name Risk of Not Availability Level of MaturityTarget Use Elastic ComputingLowExpectedDeveloper Exec Engine RepositoryLowExpectedDeveloper Resource Management ServicesMediumNecessaryDeveloper

18 OOI CI LCA REVIEW August 2010 Roadmap Iteration 1: Finalize components and interactions - Continue stress testing - Refine Deployable Type Creation and Management - Integration with Data Management - Bootstrapping Iteration 2: Prepare an Internal Release - Refine the policy engine - Continue testing - Build&test harness - Preliminary documentation Iteration 3: Prepare an External Release -Testing and robustness - User and admin process - Improve quality and documentation

19 OOI CI LCA REVIEW August 2010 Questions?

20 OOI CI LCA REVIEW August 2010 Use Cases at (Medium) Risk for Release 1 TypeTitleImpact UC.R1.16Scale ProcessingPotential known obstacles to scalability UC.R1.25Assure ReliabilityPotential known unreliable scenarios UC.R1.28Operate SystemScaled down functionality, ease of use UC.R1.30Troubleshoot System Scaled down functionality, ease of use


Download ppt "OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative Common Execution Infrastructure Kate Keahey, Tim Freeman, Alex Clemesha, David LaBissoniere,"

Similar presentations


Ads by Google