TOSCA Monitoring Straw-man for Initial Minimal Monitoring Use Case Roger Dev CA Technologies April 24, 2015
TOSCA Monitoring Use Cases (full – From Arch Ref Strawman) As an Application Architect, I want to know which metrics I can expect to be available from a given component. As an Application Architect, I want to define, within the Service Template, the Metrics to be collected for a component, as well as how they are to be collected and managed (thresholded, etc). – Additionally, I may want to use my favorite monitoring tools rather than those provided by the Service Provider. As an Application Operator, I want to be able to access collected metrics and events for any or all of my deployed components, either: – Interactively – Programmatically As an Application Developer, I want to be able to produce custom metrics from my application(s) and have them stored and accessed along with any standard metrics As a Service Provider, I may want to define Monitoring Policies for component types that may be different from those designated by the Application Architect. As a Service Provider, I want to be able to utilize my favorite monitoring tools rather than those supplied with an orchestration framework. As a Service Provider, I want to be able to access a robust set of Metrics and Events about the orchestration framework, since that is a critical component of my infrastructure. As a Service Provider, I want to be able to utilize the full set of topological information provided by the Nested Service Template(s) to enhance my knowledge of running applications. This includes the output sections of the Templates.
Revised approach based on feedback and discussion to date Simplify the initial use-case to the bare minimum Use that to work through the basic mechanisms Define and agree upon the fundamental mechanisms Expand from there
TOSCA Monitoring Reference Diagram Monitoring Automation Point (MAP) MAD (Monitoring Act / De-act) MIA (Monitorning Info Access) MEA (Monitoring Extension Advert) OM (Orchestrator Monitoring) Service TemplateExternal ProcessInternal Process External Monitoring System - Monitoring Template / Policy -Management Communication Info - Metric Availability - Metric Time Series - Events? -Metric Values -Events? -Metric Time Series -Events Focus on Subset of MAD
Initial Minimal Use Case (1 of 2) Assume we have a mechanism for defining the metrics associated with a given component-type: – This is a tractable problem, so let’s come back to it after solving more fundamental issues Assume that we want to monitor all the metrics that each component can produce – Defer defining the mechanism whereby the Application Architect can define the monitoring policy – Defer defining the finer points of policy (e.g. events, actions, transformations, etc.)
Initial Minimal Use Case (2 of 2) Scenario: – A Service Template is deployed with a single SoftwareComponent running on a single ComputeNode (virtual). Metrics (Capabilities) are defined for both SoftwareComponent and ComputeNode types. – ComputeNode: » PercentCpuUtilization, IoBytesIn, IoBytesOut – SoftwareComponent: » PercentCpuUtilization, TransactionsProcessed, ErrorsEncountered – Metrics are collected for some time – The Service Template is de-deployed (what’s the right TOSCA word for this?)
Minimal Use Case Constraints The components themselves (e.g. the ComputeNode -- via virtualization software, and the SoftwareComponent) will not be required to implement a new monitoring protocol – One of several existing monitoring protocols could be used (SNMP, WMI, Proprietary,etc.) depending on the service provider and the underlying technologies used The Monitoring Sub-System (MSS) is not required to be embedded within the Orchestrator. – An off-the-shelf monitoring system may be employed The MSS is running and attached to the Orchestrator before the Service Template is deployed
Notes on Monitoring Agent We should consider that there is always a Monitoring Agent (or give it a new name) – So called “Agentless Monitoring” just means that the Monitoring Agent role is baked into the component and doesn’t have to be explicitly added in. – From the MSS side, the only difference is the particular protocol used, and elimination of the need for an agent deployment step. Coordination is, in any case, still needed between the MSS and the Agent role within the component (address, port, creds, and other identifiers). In many cases, the Agent capability of one component is used to monitor a different component (e.g. one might use the hypervisor’s Agent to monitor a VM; one might use the Host OS’s agent to monitor an application process)
Diagram for Scenario 1 Service Template Virtual Machine (new) Software Component (new) Causes HostedOn Monitoring Sub-System (MSS) Notify State Change: -Create -Modify -Destroy Deliver Metrics (any existing push or pull protocol) 1 2 3
Notes for Scenario 1 Diagram New components are created. In some cases, there must be relationship information for components that are created outside of the ST (such as hypervisor or physical system -- see 2 below) MSS is notified of new components to be monitored. MSS Needs: – Service Template meta-data in order to know the ID and Type of the component – Instance Model in order to know the address, port, and credentials needed in order to collect metrics – Possibly the relationship to components not in the ST (e.g., the hypervisor) if info about the component is provided by that outside component. If push protocol, the monitoring agent, within the component, must be configured with the address of the MAP, and the TOSCA id of the component. If pull, then there must be coordination between Orchestrator and the monitoring agent, or explicitly defined in the ST, so that the create notification can know e.g., the agent’s port address and creds
Minimal Scenario Questions to Answer What information is needed by the Monitoring Sub-System (MSS) in order to activate monitoring when the Service Template is deployed. What mechanisms could be used to notify the MSS of the significant state changes for the components? – Activate – Modify – Deactivate What is the simplest mechanism that could handle this scenario