Presentation is loading. Please wait.

Presentation is loading. Please wait.

Section 3 - Exploring Data Ingestion

Similar presentations


Presentation on theme: "Section 3 - Exploring Data Ingestion"— Presentation transcript:

1 Section 3 - Exploring Data Ingestion

2 Objectives Ingest data into AIOps
Correctly identify the best data ingestion method for the source data at hand Perform basic correlation of Events

3 Topics In This Section Brief Review of AIOps Standard Event Fields
Generic LAMs Vendor Specific LAMs Event and Alert Fields Core LAMs Exploring REST Lam Config File

4 Brief Review

5 Putting It All Together
MySQL moogfarmd Apache Web stack Mooms Message Bus Lam Lam Lam

6 Definition of an Event An Event is simply a message received by AIOps.
A message can have basic host name and description OR It may have many different bits of information like: ppClient=SW730SPLNKDA008 Domain=SW730SPLNKDA008 OriginSeverity=MAJOR FreeText=BMC: The Virtual Memory of java exceeded 2GB OriginEvtClass=CUSTOM_DEFAULT ObjectClass=suppressescl OrigObject=MEMORY Region=CONSOLE OriginType=WIN_Process_0116 ForwardFlag=N/A OriginKey=N/A OriginDateTime=2016/02/22 19:42:29 Origin=

7 What Is Data Ingestion Data Ingestion takes the message, in the vendor’s format, and translates the message into the standard Event fields. As an Implementer one of your responsibilities is to ensure that the messages are translated in the best way possible, for the specific customer’s environment. The first component in this process is the Lam

8 Data Ingestion Review Performed by LAMs Five generic types of LAMs
Take raw event data Map to MOOG event fields Pass events onto the Message Bus Five generic types of LAMs REST - reads JSON data from a local socket & returns acknowledgement Socket - reads data from a local socket (TCP transport) REST Client - reads event data from RESTful Services Logfile - reads data from a locally accessible log file Trap - parses SNMP v1/v2 traps sent to a local socket The component that maps raw data to the Moogsoft event object is called Lam - Link Access Modules. You need one Lam per monitored system, and there are five fundamentally different types of LAMs as listed above. Each Lam works with a configuration file which defines how to harvest the data and map it to the Moog event object fields. The LAMs then publish the data on the Message Bus. The component that handles this task is called “Lam”, Link Access Modules. We’ll discuss different types of LAMs in the configuration section. You need one Lam per monitored system, and there are five fundamentally different types of LAMs as listed above.

9 Data Ingestion Component
Lam binaries are found in $MOOGSOFT_HOME/bin Default config file found in $MOOGSOFT_HOME/config Invocation from command prompt (e.g. REST Lam): rest_lam --config rest_lam.my.conf --loglevel WARN Config file defines input source, parsing, mapping and initial signature logic Default configuration files found in $MOOGSOFT_HOME/config In order to map the data of your events to the Moogsoft event fields, you have to be able to manipulate LAMs. Let’s discuss some key points of considerations before trying to play with them.

10 Event Fields

11 Event and Alert Fields Field Data Type Description signature
VARBINARY (essentially string) This is a special attribute used to determine when Moog deduplicates events into Alerts. source_id Text Any identifier for the source machine generated ( IP, MAC, CI Number, etc.) external_id Any unique identifier provided in the source event (event ID, Incident ID etc.) manager A general identifier of the event generator or intermediary (NAGIOS, SCOM, etc.) source Should be any unique human readable name (FQDN, Hostname, etc) type Class and Type are generic classifications for the event in a hierarchy that allow you to maintain a simple event ontologies; class then type.

12 Event and Alert Fields (cont.)
Data Type Description agent Text The specific agent that created the event, (SCOM REST, NAGIOS SOCKET, SNMP TRAP NATIVE, etc.). agent_location This is typically the geographic location of the agent and/or CI such as "London". agent_time UNIX TIME This is the timestamp in epoch seconds when the event occurred. severity Integer Standard 0-5 but be mindful of the significance across all event sources if possible. description The main text payload of the event.

13 Event and Alert Fields (cont.)
Data Type Description custom_info Text Custom_info is a special field that is the mechanism for extending the Moog alert schema. This is a JSON encoded string that should contain key value pairs for each data element not supplied in the initial event or having been enriched via alert transformation. Please Note: Maximum Total Event size is 64k

14 Data Mapping - Key Considerations
Recast input data to the correct internal format here - i.e. convert the severity type text to an integer 1-5. Consult the Event and Alert Field Best Practice to view the fields exposed in the events Note value for the Signature field is used for event de- duplication (more coming up!) Events can be placed into named event streams allowing for separate AlertBuilders to be specialized if necessary Here are some tips for configuring data ingestion. Consider any need to recast data, and handle it at this point. For example, incoming data might have tagged an event as critical, but internally, this will be a severity level of “4”. AIOps Field Notes site is a great resource for implementers. Consult the Data Ingestion section, particularly the Event and Alert Field Best Practice page for mapping data. The document lists all the exposed fields, and their definitions. Also in Lam you can specify the zone for the event, so a specific vhost is used to listen and process the event when they are published to the Message Bus. When you are mapping data to the signature field, please note that the value is going to be used for Event de-duplication. This is very important so let’s look into this a bit deeper.

15 Importance of the Signature Field
First point of correlation AlertBuilder de-duplicates events into an existing alert where the signature is the same. Made of combination of one or more attributes (i.e. combination of source, manager, and class) Separate the combined field values with “::” (i.e. host1::nagios::cpu) Length cannot exceed 745 No need to be human readable Signature for the event is used by AlertBuilder for de-duplication. This means the events with the same signature will be considered to be part of the same Alert. So for example if you simply used the hostname, then every Event from that host will be treated as one, with the description matching only the first Event. Or, if you used the timestamp of the Event in the signature, then every event will be treated as unique, thus disabling the de-duplication. Best practice advice for setting the signature field is to use source or source_id, agent, class and perhaps manager.

16 De-Duplication at Data Ingestion
Fail event 1 Fail event 2 Fail event 3 Clear event 1 Clear event 2 Clear event 3 Fail event 1 Fail event 2 Fail event 3 Clear event 1 Clear event 2 Clear event 3 We have several opportunities to take care of Event correlation, but this is the first good place to think about it. Let’s say 3 fail events on host A, B, and C occurred at the same time, then 3 clear events came in for those hosts right away. Without correlating capability, all these are going to be acknowledged as separate events. You can manually cluster these 6 events, but it’d be nice if we can handle it in a more elegant way. LAMs Each event receives a unique signature

17 De-Duplication at Data Ingestion
Use signature-based de-duplication System function (no rules) Signature defined to match fail to clear host::class::type - HostA::Disk::CapacityThreshold There are two ways to handle do this. We’ll discuss one way here, then later at Alert Rules Engine section we’ll discuss the other. Isn practice, you’ll use them in combination to address this issue. We can have our LAMs to automatically correlate related events. By automatically performing the signature-based de-duplication, symmetrical fails and clears are bundled. And if you look at the timeline, both fail and clear events will appear so you will not lose the history of the alert. Fail event 1 Fail event 2 LAMs Alert 1 Fail event 3 Alert 2 Situation 1 Clear event 1 Alert 3 Clear event 2 Clear event 3

18 Correlation at Data Ingestion (cont.)
Effective in correlating binary events on the same entity Works out of the box for many event feeds Needs be configured for event types with different event attributes (trap) De-duplication with Alert Rules Engine to be discussed later This method is effective in handling the binary events on the same entity and correlating them. It takes generic logic and it can be applied across a wide range of events. Also this method works out of the box for many event feeds where the signature is the same by default. However, cases with trap may need additional logic in Lambot, as correlating events would have two different trap numbers. Second method of correlation can be handled by Alert Rules Engine. We’ll discuss it when we get to the Alert processing stage.

19 Intent of the Event Time Field
AIOps int_last_event_time - the time on the db server* the last event was received Alerts have four timestamps identified as above, and because they go by different types of timestamp, care should be taken setting the event time. Every event that is ingested needs to carry this agent_time timestamp indicating it's occurrence time. Events without valid timestamps will not be processed. The AlertBuilder uses the agent_time value to set the (first|last)_event_time. The first_event_time and last_event_time are visible in the UI as "First Event Time" and 'Last Event Time". However, Sigalisers do not use agent_time in their calculations - the timing of alerts will be taken from the system time as the event is processed by the sigaliser. As a best practice, the agent_time value should always be set in the Lam configuration file, regardless of the final source of the timestamp, ensuring that the value type is defined correctly, and populated with a default value. Consult the best practice documentation for more: last_state_change - the time on the db server* the Alert was last updated Alert first_event_time - calculated from the agent_time of the event last_event_time - calculated from the agent_time of the event Event Event Event Event * set using stored procedure

20 Generic LAMs

21 REST Data Ingestion Accepts events via HTTP or HTTPS POST requests in a JSON format. Responds with standard HTTP return codes, or REST response codes for errors in JSON format Can be configured to listen on a specific port Security Supports basic authentication Can use header or body authentication token Can be configured to use HTTPS (TLS/SSL) Multiple events can be sent in a single request Uses JSON properties to map to Moogsoft Event fields Unmapped values are assigned to overflow View Generic REST Integration Quick Start for ingesting REST data Let’s go over other types of data ingestion. The REST Lam provides a method of data ingestion through RESTful services, enabling AIOps to accept events via HTTP or HTTPS POST requests in a JSON format. In this case, the incoming REST messages must be formatted in conformance with HTTP message standards. The REST Lam can be configured to listen on a specific port and if authentication is configured, will reject incoming requests that do not contain the expected secret auth_token. When the authentication_type is set to basic, the graze login will be used. If set to none (default) or configuration is not specified, then basic authentication does not occur.

22 Socket Data Ingestion SERVER Mode: Monitor data written to a UNIX TCP network socket CLIENT Mode: Connects to a remote TCP socket Data received is broken into tokens Accepts multi-line events Delimiters specified to define how to split the data into tokens Variables – Define friendly names for the tokened values Limited to one socket connection Let’s use the example of Socket data ingestion and actually configure a Lam. This is the case where you want to monitor data written to a UNIX TCP network socket. Data mapping is specified in socket_lam.conf, so if we wanted to add new Lam for Socket data ingestion, we should be copying this file to start with. When AIOps is installed on premise, the majority of data ingestion will be based on Socket Lam. Note the current limitation that the Socket LAMcan only handle one socket connection.

23 REST Client Data Ingestion
Pulls information from a REST source on configurable intervals REST Client Lam interacts with the data source according to REST interface via http/https. This allows for data transmission over an encrypted channel, batching and some basic feedback to the data source on the ingestion process.

24 TRAP Data Ingestion The trapd Lam currently supports SNMP v1 and SNMP v2c traps (v3 support is on the roadmap) Trap processing is provided by LAMBots specific to a MIB or set of MIBs The modules are created by converting MIBs to Lambot modules using a Moog supplied utility. To support native SNMP, trap Lam reads SNMP traps, and map them to the AIOps data structure. Note that trap Lam only supports v1 and v2 trap data.

25 Syslog Ingestion Protocol is supported by a wide range of devices and can be used to log different types of events. Syslog Lam Types UDP or TCP Connection Logfile

26 Vendor Specific LAMs

27 Vendor Specific LAMs Started by default Not Started by Default
AppDynamics Nagios Netcool Solarwinds Not Started by Default New Relic

28 UI-Based Data Ingestion Options
Accessible from Menu > System Administration > Monitoring > Add Monitoring Integration Lam specific to each source, with a pre-configured Alert processing capability AppDynamics® NewRelic® Nagios® SolarWinds® For some of the often used data sources, UI-based data ingestion is available! Rather than tweaking the .config files, simply go to System Administration > Monitoring, and click on the Add Monitoring Integration button. Not only these tools can map the given data to the AIOps Event object, it comes with further configuration information to process the resulting Events into Alerts too.

29 Questions


Download ppt "Section 3 - Exploring Data Ingestion"

Similar presentations


Ads by Google