Xrootd Monitoring for the CMS Experiment Abstract: During spring and summer 2011 CMS deployed Xrootd front- end servers on all US T1 and T2 sites. This allows for remote access to all experiment data and is used for user-analysis, visualization, running of jobs at T2s and T3s when data is not available at local sites, and as a fail- over mechanism for data-access in CMSSW jobs. Monitoring of Xrootd infrastructure is implemented on three levels: 1.Service and data availability checks 2.Xrootd summary monitoring Custom analyzer MonALISA 3.Xrootd detailed monitoring GLED Web, Gratia, ROOT Trees, … L.A.T. Bauerdick 1, K.Bloom 3, B.P.Bockelman 3, D.C.Bradley 4, S.Dasu 4, I.Sfiligoi 2, A.Tadel 2, M.Tadel 2, F.Wuerthwein 2, A.Yagil 2 1 FNAL, 2 UC San Diego, 3 University of Nebraska-Lincoln, 4 University of Wisconsin-Madison #begin unique_id=xrd file_lfn=/store/data/Run2011B/…/XXXX.root file_size= start_time= end_time= read_bytes= read_operations=196 read_min=300 read_max= read_average= read_sigma= # single-read operation statistics removed read_vector_bytes= read_vector_operations=64 read_vector_min= read_vector_max= read_vector_average= read_vector_sigma= read_vector_count_min=3 read_vector_count_max=512 read_vector_count_average= read_vector_count_sigma= read_bytes_at_close= # write operation statistics removed user_dn=XXXX user_vo= user_role= user_fqan= client_domain=hep.wisc.edu client_host=g22n10 server_username=cmsuser127 app_info= server_domain=t2.ucsd.edu server_host=uaf-7 #end References: AAA & FAX, at this CHEP GLEDhttp://gled.org/ MonALISA ROOThttp://root.cern.ch/ Xrootdhttp://xrootd.org/ 1. Service & Data Availability Nagios probes track the following core operations: Check redirection from sites Check authentication with CERN & OSG certificates Check that files can actually be read (get first 1kB) Mail alarms sent in case of problems Checking of individual Xrootd servers: Some sites also use (historically) The plan is to delegate this to sites (RSV probes exist) Summary monitoring also reveals a lot about server state 2. Xrootd Summary Monitoring All redirectors and servers send their summary monitoring UDP packets to a collector at UCSD where data is pre-processed and stored into MonALISA repository. Examples of collected data: Number of connected clients Rates of new connections, authentications, and various errors Incoming and outgoing network traffic caused by Xrootd Server’s usage of system resources Processing with ML plugins: Calculating per-site quantities, e.g. total traffic for each site Detecting error conditions and sending notification s Presentation options: Standard ML graphs – for individual sites / host, totals Dashboard UDP ➙ TCP multiplexer GLED TTree writer MonALISA xrd-rep-snatcher.pl Development, testing Summary UDP packets Detailed UDP packets 3. Xrootd Detailed Monitoring As with summary data, detailed monitoring UDP packets are also sent to UCSD. The streams are merged and made available via a UDP to TCP converter / multiplexer. Contents of detailed monitoring streams: User authentication records, including their DN and VOMS info File-open records, including LFN by which the file was requested All read and write requests (offset, length, and timestamp) Vector-read requests (# of elements, total length, timestamp) Optionally, servers can send offset & length info for each element Redirection records Default processing with GLED Complete in-memory representation of all servers, sessions and open files is required as packets are highly encoded. Embedded http server shows currently ongoing user sessions When a file is closed a detailed report is generated Sent to OSG Gratia and written into ROOT trees for further analysis