Collectd 101.

Slides:



Advertisements
Similar presentations
Programming Types of Testing.
Advertisements

MP IP Strategy Stateye-GUI Provided by Edotronik Munich, May 05, 2006.
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
JXTA P2P Platform Denny Chen Dai CMPT 771, Spring 08.
CSCE 515: Computer Network Programming Chin-Tser Huang University of South Carolina.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
TCP/IP protocols Communication over Internet is mostly TCP/IP (Transmission Control Protocol over Internet Protocol) TCP/IP "stack" is software which allows.
Web Application Architecture and Communication. Displaying a Web page in a Browser
Presentation on Osi & TCP/IP MODEL
INTERNET APPLICATION DEVELOPMENT For More visit:
1 IP Forwarding Relates to Lab 3. Covers the principles of end-to-end datagram delivery in IP networks.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
COEN 252: Computer Forensics Network Analysis and Intrusion Detection with Snort.
10/13/2015 ©2006 Scott Miller, University of Victoria 1 Content Serving Static vs. Dynamic Content Web Servers Server Flow Control Rev. 2.0.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Introduction to Routing and Packet Forwarding Routing Protocols and.
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
SCALABLE EVOLUTION OF HIGHLY AVAILABLE SYSTEMS BY ABHISHEK ASOKAN 8/6/2004.
Re-Configurable Byzantine Quorum System Lei Kong S. Arun Mustaque Ahamad Doug Blough.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
Marcelo R.N. Mendes. What is FINCoS? A set of tools for data generation, load submission, and performance measurement of CEP systems; Main Characteristics:
Ceilometer + Gnocchi + Aodh Architecture
+ Routing Concepts 1 st semester Objectives  Describe the primary functions and features of a router.  Explain how routers use information.
2.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition System Programs (p73) System programs provide a convenient environment.
© Copyright 2014 TONE SOFTWARE CORPORATION. Confidential and Proprietary. All rights reserved. ® Administrator Training – Release Alarms Administration.
Module 6: Administering Reporting Services. Overview Server Administration Performance and Reliability Monitoring Database Administration Security Administration.
RIP Routing Protocol. 2 Routing Recall: There are two parts to routing IP packets: 1. How to pass a packet from an input interface to the output interface.
1 Internetworking: IP Packet Switching Reading: (except Implementation; pp )
ACI – Monitoring the CLI
Topic 4: Distributed Objects Dr. Ayman Srour Faculty of Applied Engineering and Urban Planning University of Palestine.
An Introduction To Gateway Intrusion Detection Systems Hogwash GIDS Jed Haile Nitro Data Systems.
PART1 Data collection methodology and NM paradigms 1.
SOFTWARE TESTING TRAINING TOOLS SUPPORT FOR SOFTWARE TESTING Chapter 6 immaculateres 1.
Metrics data published Via different methods Monitoring Server
SDN controllers App Network elements has two components: OpenFlow client, forwarding hardware with flow tables. The SDN controller must implement the network.
Chapter 3: Packet Switching (overview)
Behrouz A. Forouzan TCP/IP Protocol Suite, 3rd Ed.
Collectd 101.
Instructor Materials Chapter 5: Ethernet
CMS High Level Trigger Configuration Management
Networking Devices.
Data Transport for Online & Offline Processing
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
RMON.
ITIS 3110 IT Infrastructure II
BOSS: the CMS interface for job summission, monitoring and bookkeeping
PREGEL Data Management in the Cloud
Chapter 6 Delivery & Forwarding of IP Packets
* Lecture # 7 Instructor: Rida Noor Department of Computer Science
Processes The most important processes used in Web-based systems and their internal organization.
Chapter 6: Network Layer
Introduction to Networking
PHP Introduction.
What is Bash Shell Scripting?
Dynamic Routing Protocols part2
Chapter 27 WWW and HTTP.
Chapter 8: Monitoring the Network
Dynamic Packet-filtering in High-speed Networks Using NetFPGAs
– Chapter 3 – Device Security (B)
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 4: Planning and Configuring Routing and Switching.
Implementing an OpenFlow Switch on the NetFPGA platform
EE 122: HyperText Transfer Protocol (HTTP)
William Stallings Data and Computer Communications
The ELK stack - get to know logs
Rational Publishing Engine RQM Multi Level Report Tutorial
CST8177 Scripting 2: What?.
Multiprocessors and Multi-computers
Map Reduce, Types, Formats and Features
Presentation transcript:

Collectd 101

What is collectd? collectd is a system statistics collection daemon; it uses plugins to collect system statistics and can mangle/publish those statistics in a number of ways. Free open source project, (GPLv2) and MIT licensed Platform independent Extensible via plugins Most plugins in collectd fall under one of the following categories: Input plugins: read system statistics at a regular interval, and dispatch the values to the collectd daemon. Output plugins: receive dispatched values from the daemon and output/write/dispatch those values in various output formats (e.g. JSON, SNMP, AMQP, MySQL, HTTP, etc).

Collectd data flow

Other collectd plugin catagories Binding plugins: which provide language bindings for collectd to allow plugins to be written in languages other than C. Logging plugins: which write information to log files or dispatch messages to syslog. Notification plugins: enable limited monitoring support in collectd. Other plugins which can both read and dispatch values.

Collectd plugin architecture Many collectd plugins can be enabled simultaneously, with input plugins collecting statistics/metrics on the platform and dispatching those values to the collectd daemon. The daemon will then pass the values to all enabled write plugins

Why use collectd? The collectd API, the selection of available plugins (90+) coupled with the modular nature of collectd architecture, provides a generic path to expose platform statistics from all existing plugins and newly developed plugins to OpenStack or other fault management applications. collectd also has bindings for several programming languages, which allows a developer to implement a plug-in in their language of choice, without having to modify the receiver of the collected metrics, for example Ceilometer itself.

collectd Statistics Statistics in collectd consist of a value list. A value list includes: Values Value length: the number of values in the data set. Time: timestamp at which the value was collected. Interval: interval at which to expect a new value. Host: used to identify the host. Plugin: used to identify the plugin. Plugin instance (optional): used to group a set of values together. For e.g. values belonging to a DPDK interface. Type: unit used to measure a value. In other words used to refer to a data set. Type instance (optional): used to distinguish between values that have an identical type. meta data: an opaque data structure that enables the passing of additional information about a value list. “Meta data in the global cache can be used to store arbitrary information about an identifier”  Within collectd notifications and performance data are dispatched in the same way Host, plugin, plugin instance, type and type instance uniquely identify a collectd value. Host, plugin, plugin instance, type and type instance uniquely identify a collectd value

collectd Values Values, can be one of: Derive: used for values where a change in the value since it’s last been read is of interest. Can be used to calculate and store a rate. Counter: similar to derive values, but take the possibility of a counter wrap around into consideration. Gauge: used for values that are stored as is. Absolute: used for counters that are reset after reading

collectd Data Sets Values lists are often accompanied by data sets that describe the values in more detail. Data sets consist of: A type: a name which uniquely identifies a data set. One or more data sources (entries in a data set) which include: The name of the data source. If there is only a single data source this is set to “value”. The type of the data source, one of: counter, gauge, absolute or derive. A min and a max value. Values lists are often accompanied by data sets that describe the values in more detail. Data sets consist of: A type: a name which uniquely identifies a data set. One or more data sources (entries in a data set) which include: The name of the data source. If there is only a single data source this is set to “value”. The type of the data source, one of: counter, gauge, absolute or derive. A min and a max value. Types in collectd are defined in types.db. Examples of types in types.db: bitrate value:GAUGE:0:4294967295 counter value:COUNTER:U:U if_octets rx:COUNTER:0:4294967295, tx:COUNTER:0:4294967295 In the example above if_octets has two data sources: tx and rx.

Examples of types in collectd Examples of types in types.db: bitrate value:GAUGE:0:4294967295 counter value:COUNTER:U:U if_octets rx:COUNTER:0:4294967295, tx:COUNTER:0:4294967295 In the example above if_octets has two data sources: tx and rx.

collectd notifications/alerts Notifications in collectd are generic messages containing: An associated severity, which can be one of OKAY, WARNING, and FAILURE. A time. A Message A host. A plugin. A plugin instance (optional). A type. A types instance (optional). Meta-data.

Thresholding and Notification Generate notifications based on thresholds LoadPlugin "threshold" <Plugin threshold>  <Plugin "dpdkevents">     Instance "port.0"     <Type "gauge-link_status">       FailureMin 2.00 # AT THE MOMENT SIMULATING A FAILURE, 1 is  actually link up       Persist false # only send one notification if the value is not OK     </Type>   </Plugin> </Plugin>

Values and thresholds Hysteresis value: applied when checking minimum and maximum bounds. This is useful for values that increase slowly and fluctuate a bit while doing so. When these values come close to the threshold, they may "flap", i.e. switch between failure / warning case and okay case repeatedly. Hits: Delay creating the notification until the threshold has been passed Number times collectd will issue a notification if values monitored by the threshold plugin are not received for Timeout iterations. When a value comes within range again or is received after it was missing, an "OKAY-notification" is dispatched.

Exec Plugin Executes scripts / applications and reads values back that are printed to STDOUT by that program. Extends the daemon in an easy and flexible way. Can also be used to call a bash script that does something with the notification from the threshold plugin.        <Plugin exec>        #       Exec "user:group" "/path/to/exec"                NotificationExec "stack" "write_notification.sh"        </Plugin> write_notification.sh just writes the notification passed from exec through STDIN to a file (/tmp/notifications).        #!/bin/bash        rm /tmp/notifications        while read x y        do          echo $x$y >> /tmp/notifications          done

Exec Plugin II /tmp/notifications contents Severity:FAILURE Time:1472552207.385 Host:pod3-node1 Plugin:dpdkevents PluginInstance:dpdk0 Type:gauge TypeInstance:link_status DataSource:value CurrentValue:1.000000e+00 WarningMin:nan WarningMax:nan FailureMin:2.000000e+00 FailureMax:nan Hostpod3-node1, plugin dpdkevents (instance dpdk0) type gauge (instance link_status): Data source "value" is currently 1.000000. That is below the failure threshold of 2.000000.

Collectd Plain Text Protocol Submit statistics and notifications to the daemon as well as query the current value of collected statistics. Plugins currently using this protocol are Exec (partially) and UnixSock

Filter Configuration Starting with collectd 4.6 there is a powerful filtering infrastructure implemented in the daemon. The concept has mostly been copied from ip_tables. Terminology: Match:  criteria to select specific values.  Target: action that is to be performed with data. Rule: The combination of any number of matches and at least one target Chain: a list of rules and possibly default targets. Rules tried in order and if one matches, the associated target will be called.

Precache and post cache chains When "read" plugins call the dispatch functions to dispatch values, the pre-cache chain is run. The values are then added to the internal cache. The post-cache chain is run after the values have been added to the cache.  Allows you to remap value names. The cache is also used to convert counter values to rates. More info @ https://collectd.org/documentation/manpages/collect d.conf.5.shtml#filter_configuration

Networking Plugin Uses a binary protocol UDP transport Sends data to a remote instance of collectd, receives data from a remote instance, or both at the same time. Data which has been received from the network can be Forwarded again. It’s possible to sign or encrypt the network traffic.

Platform Metrics and Information Can be divided into static and dynamic information. Supported metrics by collectd: https://wiki.opnfv.org/display/fastpath/Collectd+Metrics+and+Events Next Steps: start defining list of static and dynamic metrics under category sets.

References https://www.netways.de/fileadmin/images/Events_Trainings/Events/OSMC/2015/S lides_2015/collectd_Thresholds_Plugin_and_Icinga_-_Florian_Forster.pdf https://collectd.org/documentation/manpages/collectd.conf.5.shtml https://collectd.org/wiki/index.php/Plain_text_protocol