TOSCA Monitoring Straw-man for Initial Minimal Monitoring Use Case Roger Dev CA Technologies Revision 3 May 21, 2015
TOSCA Monitoring Use Cases (full – From Arch Ref Strawman) As an Application Architect, I want to know which metrics I can expect to be available from a given component. As an Application Architect, I want to define, within the Service Template, the Metrics to be collected for a component, as well as how they are to be collected and managed (thresholded, etc). – Additionally, I may want to use my favorite monitoring tools rather than those provided by the Service Provider. As an Application Operator, I want to be able to access collected metrics and events for any or all of my deployed components, either: – Interactively – Programmatically As an Application Developer, I want to be able to produce custom metrics from my application(s) and have them stored and accessed along with any standard metrics As a Service Provider, I may want to define Monitoring Policies for component types that may be different from those designated by the Application Architect. As a Service Provider, I want to be able to utilize my favorite monitoring tools rather than those supplied with an orchestration framework. As a Service Provider, I want to be able to access a robust set of Metrics and Events about the orchestration framework, since that is a critical component of my infrastructure. As a Service Provider, I want to be able to utilize the full set of topological information provided by the Nested Service Template(s) to enhance my knowledge of running applications. This includes the output sections of the Templates.
Revised approach based on feedback and discussion to date Simplify the initial use-case to the bare minimum Use that to work through the basic mechanisms Define and agree upon the fundamental mechanisms Expand from there
TOSCA Monitoring Reference Diagram Monitoring Automation Point (MAP) MAD (Monitoring Act / De-act) MIA (Monitorning Info Access) MEA (Monitoring Extension Advert) OM (Orchestrator Monitoring) Service TemplateExternal ProcessInternal Process External Monitoring System - Monitoring Template / Policy -Management Communication Info - Metric Availability - Metric Time Series - Events? -Metric Values -Events? -Metric Time Series -Events Focus on Subset of MAD
Initial Minimal Use Case (1 of 2) Assume we have a mechanism for defining the metrics associated with a given component-type: – This is a tractable problem, so let’s come back to it after solving more fundamental issues Assume that we want to monitor all the metrics that each component can produce – Defer defining the mechanism whereby the Application Architect can define the monitoring policy – Defer defining the finer points of policy (e.g. events, actions, transformations, etc.)
Initial Minimal Use Case (2 of 2) Scenario: – A Service Template is deployed with a single SoftwareComponent running on a single ComputeNode (virtual). Metrics (Capabilities) are defined for both SoftwareComponent and ComputeNode types. – ComputeNode: » PercentCpuUtilization, IoBytesIn, IoBytesOut – SoftwareComponent: » PercentCpuUtilization, TransactionsProcessed, ErrorsEncountered – Metrics are collected for some time – The Service Template is removed (de-deployed)
Minimal Use Case Constraints The components themselves (e.g. the ComputeNode -- via virtualization software, and the SoftwareComponent) will not be required to implement a new monitoring protocol – One of several existing monitoring protocols could be used (SNMP, WMI, Proprietary,etc.) depending on the service provider and the underlying technologies used The Monitoring Sub-System (MSS) is not required to be embedded within the Orchestrator. – An off-the-shelf monitoring system may be employed The MSS is running and attached to the Orchestrator before the Service Template is deployed
Notes on Monitoring Agent We should consider that there is always a Monitoring Agent (or give it a new name) – So called “Agentless Monitoring” just means that the Monitoring Agent role is baked into the component and doesn’t have to be explicitly added in. – From the MSS side, the only difference is the particular protocol used, and elimination of the need for an agent deployment step. Coordination is, in any case, still needed between the MSS and the Agent role within the component (address, port, creds, and other identifiers). In many cases, the Agent capability of one component is used to monitor a different component (e.g. one might use the hypervisor’s Agent to monitor a VM; one might use the Host OS’s agent to monitor an application process)
Diagram for Scenario 1 Service Template Virtual Machine (new) Software Component (new) Causes HostedOn Monitoring Sub-System (MSS) Notify State Change: -Create -Modify -Destroy Deliver Metrics (any existing push or pull protocol) 1 2 3
Notes for Scenario 1 Diagram New components are created. In some cases, there must be relationship information for components that are created outside of the ST (such as hypervisor or physical system -- see 2 below) MSS is notified of new components to be monitored. MSS Needs: – Service Template meta-data in order to know the ID and Type of the component – Instance Model in order to know the address, port, and credentials needed in order to collect metrics – Possibly the relationship to components not in the ST (e.g., the hypervisor) if info about the component is provided by that outside component. If push protocol, the monitoring agent, within the component, must be configured with the address of the MAP, and the TOSCA id of the component. If pull, then there must be coordination between Orchestrator and the monitoring agent, or explicitly defined in the ST, so that the create notification can know e.g., the agent’s port address and creds
Minimal Scenario Questions to Answer What information is needed by the Monitoring Sub-System (MSS) in order to activate monitoring when the Service Template is deployed. What mechanisms could be used to notify the MSS of the significant state changes for the components? – Activate – Modify – Deactivate What is the simplest mechanism that could handle this scenario
What information does MSS need? Agent Address and Protocol (How to talk to agent) Component Identifier (How to ask the question about the correct component) Credentials to access Agent – Might be able to set up a closed management network and not need creds???
Potential Mechanisms M1 – When a component is activated, make correlated Template Model and Instance Model available to MSS. MSS figures out how to monitor based on this info: – Proximate portions of Models are extracted and passed or an interactive API for browsing relationships is provided – Assume that Agents are either baked into the components, or are explicitly deployed by Service Template. Others???
What do we need to specify? What metric available for each Component Type? – Metric Type ID – Description – Data Type (e.g. Numeric, String, etc.) – Units (e.g. Volts, Megabytes, Percent, etc.) – Constraints (Min, Max, Enumerated Values, etc) Monitoring Policy (controlled via Service Template): – Monitoring Disposition: Required – Don’t deploy if you can’t monitor Best Effort – Deploy anyhow but enable monitoring if available None – Metrics to Include? Exclude? – Components to Include? Exclude? – Minimum Sample Frequency? – Action Conditions (e.g. If A then Do B) Not in this phase?
Metric Types Availability: – Percent Available Performance: – CPU Usage – I/O – Memory Workload: – Units Processed – Failed Units – Bytes Processed Security – Access Attempts – Failed Access Attempts Locally Defined?