Quick Look on dCache Monitoring at FNAL

Slides:



Advertisements
Similar presentations
P3- Represent how data flows around a computer system
Advertisements

ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
What is SDM? SDM : Server and Database Monitoring  SDM is the web-based real-time server and database monitoring and reporting tool  Service Items Server.
Network Administration Procedures Tools –Ping –SNMP –Ethereal –Graphs 10 commandments for PC security.
CP476 Internet Computing Browser and Web Server 1 Web Browsers A client software program that allows you to access and view Web pages on the Internet –Examples.
Linux Networking CIS Why Linux/Unix? Configurability ▫Customizable System to satisfy unique needs. Scalability ▫Able to serve an increasing number.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
IBM Software Group Washington Area Informix User Group Forum 2004 The DB2 DBA Checklist Dwaine R Snow, DB2 & Informix.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Rsv-control Marco Mambelli – Site Coordination meeting October 1, 2009.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
Module 7: Fundamentals of Administering Windows Server 2008.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 23 How Web Host Servers Work.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
TELE 301 Lecture 10: Scheduled … 1 Overview Last Lecture –Post installation This Lecture –Scheduled tasks and log management Next Lecture –DNS –Readings:
A. Sim, CRD, L B N L 1 OSG Applications Workshop 6/1/2005 OSG SRM/DRM Readiness and Plan Alex Sim / Jorge Rodriguez Scientific Data Management Group Computational.
CERN IT Department CH-1211 Geneva 23 Switzerland t Daniel Gomez Ruben Gaspar Ignacio Coterillo * Dawid Wojcik *CERN/CSIC funded by Spanish.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
08/30/05GDM Project Presentation Lower Storage Summary of activity on 8/30/2005.
Kurt Mueller San Diego Supercomputer Center NPACI HotPage Updates.
1 MONGODB: CH ADMIN CSSE 533 Week 4, Spring, 2015.
Local Monitoring at SARA Ron Trompert SARA. Ganglia Monitors nodes for Load Memory usage Network activity Disk usage Monitors running jobs.
Unified scripts ● Currently they are composed of a main shell script and a few auxiliary ones that handle mostly the local differences. ● Local scripts.
SRM Monitoring 12 th April 2007 Mirco Ciriello INFN-Pisa.
CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration System Monitoring.
Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  Concept of the Project  System architecture  Implementation – HDFS  Implementation – System.
1 WWW. 2 World Wide Web Major application protocol used on the Internet Simple interface Two concepts –Point –Click.
DPM Python tools Ivan Calvet IT/SDC-ID DPM Workshop 10 th October 2014.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
7/8/2016 OAF Jean-Jacques Gras Stephen Jackson Blazej Kolad 1.
1 RIC 2009 Symbolic Nuclear Analysis Package - SNAP version 1.0: Features and Applications Chester Gingrich RES/DSA/CDB 3/12/09.
Storage discovery in AliEn
WEB TESTING
Compute and Storage For the Farm at Jlab
4.01 How Web Pages Work.
4.01 How Web Pages Work.
Chapter 19: Network Management
Simulation Production System
Introduction to Operating Systems
Troubleshooting Tools
WWW and HTTP King Fahd University of Petroleum & Minerals
Classic Storage Element
Shared Services with Spotfire
Software Architecture in Practice
Tango Administrative Tools
Operating Systems (CS 340 D)
WaterWare description
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 13: Administering Web Resources.
MONITORING MICROSOFT WINDOWS SERVER 2003
Processes The most important processes used in Web-based systems and their internal organization.
HC Hyper-V Module GUI Portal VPS Templates Web Console
Introduction to NetDB2 IST210.
DriveScale Log Collection Method of Procedure
TANGO MONITORING SYSTEM
Chapter 2: Operating-System Structures
A Network Operating System Edited By Maysoon AlDuwais
Prof. Leonardo Mostarda University of Camerino
Introduction to High Performance Computing Using Sapelo2 at GACRC
File System Implementation
Chapter 8, pp 171 – pp 200 Web Security, by Lincoln D. Stein
Digitask Consultants, Inc. (212)
4.01 How Web Pages Work.
Chapter 2: Operating-System Structures
IBM Tivoli Storage Manager
Presentation transcript:

Quick Look on dCache Monitoring at FNAL Alex Kulyavtsev, FNAL 09/13/05

FNAL dCache systems Four major dCache systems: FNDCA – public dcache http://fndcam.fnal.gov/ CDFDCA – CDF dCache http://cdfdca.fnal.gov/ CMSDCA – USCMS T1 dCache http://cmsdcam.fnal.gov/ LQCD – Lattice QCD cluster dCache http://lqcd-pm.fnal.gov/ More test and development systems

Data Movement CDFDCA: CMSDCA Read peaks about 30 TB/day Recent record 75 TB/day CMSDCA Read peaks about 50 TB/day, Recent record about 75 TB/day

Disk Space USCMS T1: ~ 90 TB total 14 TB - Resilient 40 TB - Read 25 TB - Write 12 TB - Eval. Read and Write pools ~ 90 TB total See “Pool Space” plots on USCMS T1 web page.

Standard dCache Administration interface through admin door Web Collector accumulates monitoring data, embedded dcache httpd cell serves data to clients (port :2288) Billing data stored in DB Log files Issue – log file grows, needs rotation and/or ‘nullifying’.

CGI scripts + Apache FTP door writes short files with transfer status. Python CGI script processes files and generates table with recent “FTP Transfers”. Shell script generates table for SRM Transfers - Easy to implement

Tomcat Servlets : 8090 Data about transfers stored in Billing DB. Servlets generate and display plots on “Data Movement” Volume of transfers Number of transfers dCache internals (transaction cost). Replica Manager Monitoring

Shell scripts Access dCache status and statistics data through admin interface Generate plain text or html files, located in ‘known’ place, user has access to files through web interface Pool Directory Listings Login List … more

Shell scripts: plots Shell scripts store historical data by appending snapshots to intermediate text files Other scripts generate historical plots (html + .gif + .ps) Door Logins Queue size (movers, stores, restores, p2p transfers)

Shell scripts: WatchDogs Some shell scripts regularly poll dcache cells info through admin interface and check that dcache operates correctly. May send e-mail with warning to admin list. Pool status (pool crashed or disabled because disk errors) IP tables are up Self check – watchDogs are running

More Shell Scripts In large scale storage system low probability bit errors sift through parity checks. Script regularly recalculate CRC sums for files in pools and reports broken files. Occasional dcache audits (script run manually) to verify dcache consistency (e.g. ‘cached’ files were saved on tape).

Page DCache Checks dcache is operational Regularly performs various basic dcache operations (transfer file through different doors) and rises alarm if transfer fails.

Ganglia Monitoring Status of the nodes in the cluster and historical data on CPU usage Memory Networking

dgang “rgang” is FNAL tool to perform distributed operations in cluster (e.g. distribute script, execute, collect output) “dgang” is dcache oriented wrapper for rgang. Knows about dcache cluster configuration Simplifies distributed administartion of dcache cluster: E.g.: execute command cmd on all pool nodes and return status and output.