Section 8 Monitoring SES 3

Section 8 Monitoring SES 3

Objectives What should be monitored and why
Command line tools to monitor Ceph Understand Calamari Installing Calamari

What should be monitored?
So now we've got something running, one way or another. How do we know it's working? How do we know how well it's working?

Is the cluster working ? # ceph status cluster 565bbaaf-11e a-6b468f0b7b7e health HEALTH_OK monmap e1: 1 mons at {node1= :6789/0} election epoch 1, quorum 0 node1 osdmap e12: 2 osds: 2 up, 2 in pgmap v118: 64 pgs, 1 pools, 1024 kB data, 3 objects kB used, MB / MB avail active+clean This status display looks OK, no errors or warnings Graphic shows healthy cluster – but only 1 mon and 2 OSDs. However all PGs are active and clean.

How well is it working? # ceph status cluster 565bbaaf-11e a-6b468f0b7b7e health HEALTH_WARN 33 pgs degraded; 35 pgs stuck... monmap e1: 3 mons at {ceph2=...,ceph3=...,ceph4=...}, election epoch 22, quorum 0,1,2 ceph2,ceph3,... osdmap e411: 52 osds: 52 up, 52 in pgmap v1014: 4288 pgs, 4 pools, 0 bytes data, MB used, GB / GB avail active+degraded active+remapped active+clean This status display shows issues with placement groups In this larger example things are not quite so healthy. Note the status of the placement groups.

If it is really not working
# ceph status cluster c9d3ae97-2f4c-4d91-a3f7-ff42bce754df health HEALTH_WARN 2174 pgs backfill; 367 pgs backfilling; 3271 pgs degraded; 23 pgs down; 57 pgs peering; 35 pgs recovering; 188 pgs recovery_wait; 227 pgs stale; 26 pgs stuck inactive; 3065 pgs stuck unclean; recovery / objects degraded (17.988%); 1/148 in osds are down monmap e3: 3 mons at {a001= :6789/0,a002= :6789/0,a003= :6789/0}, election ... osdmap e51357: 168 osds: 147 up, 148 in pgmap v : pgs, 5 pools, 1183 GB data, 5609 objects GB used, GB / GB avail / objects degraded (17.988%) inactive active+clean degraded+remapped active+degraded+remapped active+degraded+remapped+backfilling stale+active+degraded+remapped+wait_backfill peering active+recovery_wait stale+active+clean active+recovery_wait+degraded+remapped active+remapped+wait_backfill down+peering stale+active+degraded+remapped+backfilling stale+active+recovery_wait active+remapped degraded active+degraded active+remapped+backfilling remapped+peering active+recovery_wait+remapped active+recovery_wait+degraded stale+active+degraded active+degraded+remapped+wait_backfill active+recovering And something much worse is happening now (status from Andrew Cowie's ceph talk at linux.conf.au 2015 “we were having some network trouble...” - transcribed from We cannot recreate this scale of error in class, but it is always nice to see what major problems look like. This status display shows just a few problems!

What should we monitor What is most important to understand the status of the cluster health at any given point in time? Overall cluster health MON quorum OSD status PG status Disk used/free Are any nodes offline/dead

Is the cluster working well?
What do we care about, mid to long term? What's CPU bound? What's disk bound? What's network bound? Baseline and trending can help understand if the cluster is performing poorly This may be less obvious than tracing an error as the cluster may report no problems and have no obvious functional problems

Monitoring at the command line

Checking Cluster Health
To get a quick overview of the current cluster health, use the ceph health command

Checking Cluster Status
To view more details of the current cluster condition, use the ceph status comman

Watching Cluster Activity
As the cluster operates, various changes of state and condition will generate internal messages and updates. To view these updates in real time use the ceph -w command to watch real time cluster activity

Health Commands – interactive shell
It is possible to run a ceph shell, and issue various commands within the shell

Monitoring with Calamari

Calamari and Romana Browser based GUI included with SUSE Enterprise Storage Calamari is the backend (REST API) Romana is the frontend (GUI) Provides monitoring and some management

Cluster Status - Dashboard

Cluster Status - Workbench

Cluster Performance - Charts

Cluster Performance (2)

Cluster Performance (3)

Management At some point you will want to tweak something, or something's going to break, and you'll need to replace it.

Ongoing Management with Calamari
OSD management Pool / Placement Group management Users, authentication Adding new nodes, disks, etc.

Cluster Settings

OSD Management

Pool / Placement Group Management

Using Calamari to diagnose problems
Dead disks Dead nodes Major cluster failures

Calamari will tell you about it...

...and help you find the problem
Some OSDs are down, different views here could tell you a whole node is down. Then you go investigate further, and fall back to the various deployment tools to deploy a new node for example.

Installing Calamari

Installing the Calamari Software
Install the calamari-clients packages zypper in calamari-clients Installs the software on a node Still requires configuration Configuration started using calamari-ctl initialize

Configuring Calamari During the initialization process the script prompts for User name This is the user name for the Calamari interface address Used for system messages, can leave blank Password Password for the user defined above After the initial configuration the web interface is available for login Connect to the calamari server with a web browser No plugins etc required

Configuring Cluster Nodes for monitoring
When the calamari server and web interface are configured the cluster nodes need to be added for monitoring ceph-deploy calamari --master <monitoring_node> connect <node1> <node2>...<nodex> This will install and configure the required services on each node and enable them for monitoring Managed by ceph-deploy, but uses Salt configuration management tools When configuration is complete, the web interface will prompt to add each configured server to the monitoring system

Web Interface When Calamari is installed, use a browser to connect and view the dashboard

Section 8 Exercises Objective Notes:

Section 8 Monitoring SES 3

Similar presentations

Presentation on theme: "Section 8 Monitoring SES 3"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Section 8 Monitoring SES 3

Similar presentations

Presentation on theme: "Section 8 Monitoring SES 3"— Presentation transcript:

Similar presentations

About project

Feedback