Presentation is loading. Please wait.

Presentation is loading. Please wait.

Import XRootD monitoring data from MonALISA Sergey Belov, JINR, Dubna DNG section meeting, 30.10.2012.

Similar presentations


Presentation on theme: "Import XRootD monitoring data from MonALISA Sergey Belov, JINR, Dubna DNG section meeting, 30.10.2012."— Presentation transcript:

1 Import XRootD monitoring data from MonALISA Sergey Belov, JINR, Dubna Sergey.Belov@cern.ch DNG section meeting, 30.10.2012

2 Motivation XRootD federations monitoring is of importance for ALICE, ATLAS, CMS For the experiments now is more convenient to collect initial monitoring data on their side Two collector types for MonALISA are in use: – Individual transfers statistics, some server statistics (in ALICE, based on ALICE developments) – Server statistics, some server statistics (in CMS and ATLAS, is fed wit information from UCSD collector) Goal: to have this information in Dashboard 2

3 What could we get from ML? ALICE: transfers – individual transfer summary for the 60 seconds – Server name, client IP – Read/write MB – NO transfer ID ! CMS, ATLAS: servers – Incoming and outgoing traffic – Current connections number and total connections ever – Authenticated and unauthenticated logins count, authentication failures number – Redirection count  For all these parameters their rates (HZ) are also provided 3

4 How it should be done? Requirements: Standard way: send information via message brokers, in JSON format (ML only as a transport) Reliability – In messages handling along all the chain – No information loss on failures Reasonable behavior in sending messages – Send only consistent information – Respect connection frequency, authorization, timeouts – Few big messages instead of hundreds small 4

5 Dumping data from MonALISA (1) Steps to get the data: Setup ML repository Subscribe it to the appropriate monitoring groups (alice, xrootd_cms, xrootd_atlas) Configure ML to consume only required parameters, but do not store anything Set a custom filter (=handler) putting the data to outside - dumper 5

6 Dumping data from MonALISA (2) ML result object structure: “farm”, “cluster”, “node” – “node” → xrootd server name – “Site name” could be get from “farm” or “cluster” timestamp arrays of parameters’ names and values  Most common case: result object “decays” to the objects with just a single parameter name and value in the corresponding arrays – transfer ‘s information should be gathered piece by piece 6

7 Dumping data from MonALISA (3) The dumper: Is called each time repository has results from the subscriptions Should be fast enough to not slow all the things down (consecutive calls for coming results) If doing message handling or sending in here, no hope to have a reliable or stable solution 7

8 Proposed information handling chain xrootd server collector MonALISA local queue collector AGGREGATOR local queue Messaging Transfer Agent (s) Dashboard Message brokers 8

9 Technical solutions (1) ML filter (Dumper) – Java class, catching incoming results from ML – Initial data transformation (decode IPs, etc.) – Stores data to local directory queues Aggregator – Python 2.4 program, aggregating Dumper’s queues and preparing final messages to be sent by MTA – Reads/write messages from local directory queue – Does message messages aggregation and grouping 9

10 Technical solutions (2) Directory queues libraries – Java implementation: ch.cern.dirq class (by Massimo Paladin) – python-dirq (available in EPEL repository) Messaging Transfer Agent – stompclt : flexible tool to consume and dispatch messages between different sources in a configurable and reliable way (by Lionel Cons), available in CERN SW rep, in EPEL soon – now STOMP protocol is enough (AMPQ protocol support is on the way with amqpclt tool) 10

11 Adding more reliability with supervision Proven concept (Erlang/OTP) – Workers do their work – Supervisors monitor workers – All are defined in a supervision tree Flexible implementation available ( simplevisor ) – Non intrusive – Handle service evolution Messaging Services and Client Software, Lionel Cons – Massimo Paladin, EGI Technical Forum - Prague, 18 th September 2012 11

12 Aggregator’s internals Accumulates statistics on xrootd servers (per timestamp), groups it by hostname Reconstructs transfer statistics from subsequent messages, aggregates transfers by server and timestamp Passes a bunch of messages (by type) in a large message to MTA Removes all local queues messages involved when aggregated message is successfully sent All semi-complete information chunks are to be sent on timeout, all (hopelessly) incomplete ones are wiped out Three threads in the process: – Main (control) – Worker (periodically consume, aggregate, republish for MTA) – Cleanup (remove temporary stuff in directory queues involved) 12

13 Message formats: xrootd transfers { "header": { "message_id": "6061d13b….", "mon_service_fqdn": "mon.x.ch", "timestamp": "1223456789", "vo": "alice", }, "body": { "transfers": [ transfers messages ] } { "message_id": "05b179bb….", "server_host": "xr.cern.ch", "timestamp": "1123456789", "clients": [ { "client_ip": “12.34.56.78", "read_mb": “1.234", "written_mb": “2.345", "transfer_speed_mb": “3.582" }, ….. ] } * Need VO, or just send to different queues? 13

14 Message formats: xrootd servers (1) { "header": { "message_id": "0d502ae9….", "timestamp": "122356789", “mon_host_fqdn": "mon.x.ch", "vo": "atlas|cms", }, "body": { “servers_stats": [ stats messages are here ] } { "message_id": "25e3c2f8….", "timestamp": "1123456789", "server_host": "example.cern.ch ", "link_in": "5048475", "link_in_R": "5.1234", "link_out": "10493857", "link_out_R": "7.2345", "link_tot": "16949274", "xrootd_lgn_af_R": "0.123", "xrootd_lgn_au_R": "2.345", "xrootd_lgn_ua_R": "0.5“, …. } * Need VO, or just send to different queues? 14

15 Message formats: xrootd servers (2) ParameterDescription link_in [_R]Incoming traffic [rate, B/s] link_out[_R]Outgoing traffic [rate, B/s] link_tot[_R]Total connections [rate, Hz] link_numCurrent connections number xrootd_lgn_af[_R]Authentication failures [rate, Hz] xrootd_lgn_au[_R]Authenticated login [rate, Hz] xrootd_lgn_ua[_R]Unauthenticated login [rate, Hz] xrootd_rdr[_R]Redirection count [rate, Hz] 15

16 Current state of developments ML dumper filter is ready and works fine – Produces intermediate JSON messages to be consumed by aggregator, no performance limits observed Aggregator is ready and being tested Chosen technical solution (directory queues libraries, stopmclt) is proven to be appropriate, fast and scalable 16

17 Further steps Tests of full message processing chain (including stress tests) Consumer from the Dashboard’s side Tuning the setting of ML dumper, aggregator and stompclt Supervision of all components (ML repo, aggregator, MTA) with simplevisor Packaging of dumper, aggregator and all the configurations to RPM within the Dashboard 17

18 Thanks for your attention!


Download ppt "Import XRootD monitoring data from MonALISA Sergey Belov, JINR, Dubna DNG section meeting, 30.10.2012."

Similar presentations


Ads by Google