Download presentation
Presentation is loading. Please wait.
Published byCameron Cooper Modified over 11 years ago
1
1 NORDUnet conferenceGrid Monitoring : Paryavekshanam 9 th April 2008 PARYAVEKSHANAM STATUS MONITORING TOOL for INDIAN National Grid: GARUDA Karuna Karunap@cdacb.ernet.in Co-authors: Deepika H.V.,Mangala N., Prahlada Rao BB, MohanRam N. System Software Development Group, Center for Development of Advanced Computing(C-DAC), Bangalore INDIA
2
2 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM GARUDA Overview GARUDA Architecture Monitoring Requirements Paryavekshanam Objectives Paryavekshanam Architecture Paryavekshanam Features Alert and Notification system Conclusion Presentation Plan
3
3 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Indian National Grid: GARUDA GARUDA is initiated by C-DAC, and is funded by Dept. of Information Technology, Govt. of India. GARUDA provides an amalgam of advanced capabilities to enable increasingly interdisciplinary scientific environments required to solve complex problems. GARUDA connects 45 national research and academic institutions, across 17 cities/locations in India. GARUDA is used by applications communities such as Weather / Climate Modeling, Disaster Management, and Bio-informatics.
4
4 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM GARUDA Grid : Key Features Geographically distributed resources across 17 cities and 45 research institute and academia Resources are dynamic and Heterogeneous in nature (Linux, Solaris, AIX) Resources are under various administrative domains Network backbone is of 2.43GB, 10/100 Mbps BW links from point –point. GARUDA middleware - Globus 2.x Multi-institutional Virtual Organization
5
5 9 th – 11 th April 2008 24 th NORDUnet conference IGIB Linux Submit node gridfs Cluster Head Node Compute Nodes Bangalore GARUDA HeadNode Cluster Head Node Cluster Head Node Chennai Linux C-DAC Bangalore AIX Cluster Head Node Cluster Head Node Compute Nodes Pune Linux RRI- Bangalore Linux C-DAC (Hyd) Linux GARUDA Grid Architecture Cluster Head Node
6
6 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Management & Monitoring Paryavekshanam Resources Compute, Data Storage Scientific Instruments Softwares Resource Mgmt & Scheduling Moab from Cluster Resources Load Leveler, Torque Globus 2.x Application (PoC) Disaster Management Bioinformatics Climate modeling Access Methods Access Portal Problem Solving Environments Data Management Storage Resource Broker Development Environment DIViA for Grid GridIDE GARUDA Components
7
7 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Ethernet based High BW capacity of Layer 2/3 MPLS VPN Scalable over entire geographic area High levels of reliability Fault tolerance and redundancy High security Effective Network Management GARUDA Network Fabric Features
8
8 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM GARUDA Resources C-DAC Centers are contributing computing resources at: Bangalore, Pune, Chennai, and Hyderabad HPC systems from partner sites. Total processor > 600 Aggregated compute power = 3.5 TFlops Satellite terminals from SAC Ahmedabad Grid Labs at Bangalore, Pune, Hyderabad
9
9 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM GARUDA Resources conti..
10
10 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Institute of Plasma Research, Ahmedabad Physical Research Laboratory, Ahmedabad Space Applications Centre, Ahmedabad Harish Chandra Research Institute, Allahabad Motilal Nehru National Institute of Technology, Allahabad Raman Research Institute, Bangalore National Center for Biological Sciences Indian Institute of Astrophysics, Bangalore Indian Institute of Science, Bangalore Institute of Microbial Technology, Chandigarh Punjab Engineering College, Chandigarh Madras Institute of Technology, Chennai Indian Institute of Technology, Chennai Institute of Mathematical Sciences, Chennai ERNET, Delhi Indian Institute of Technology, Delhi Jawaharlal Nehru University, Delhi Institute for Genomics and Integrative Biology, Delhi Indian Institute of Technology, Guwahati Guwahati University, Guwahati GARUDA Partners
11
11 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM University of Hyderabad, Hyderabad Centre for DNA Fingerprinting and Diagnostics, Hyderabad Jawaharlal Nehru Technological University, Hyderabad Indian Institute of Technology, Kanpur Indian Institute of Technology, Kharagpur Saha Institute of Nuclear Physics, Kolkatta Central Drug Research Institute, Lucknow Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow Bhabha Atomic Research Centre, Mumbai Indian Institute of Technology, Mumbai Tata Institute of Fundamental Research, Mumbai IUCCA, Pune National Centre for Radio Astrophysics, Pune National Chemical Laboratory, Pune Pune University, Pune Indian Institute of Technology, Roorkee Regional Cancer Centre, Thiruvananthapuram Vikram Sarabhai Space Centre, Thiruvananthapuram Institute of Technology, Banaras Hindu University, Varanasi GARUDA Partners conti..
12
12 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM GARUDA Grid Monitoring- Purpose Detect, record, and report faults and service degradations Ensure GARUDA operates optimally Check Status availability & usage of grid resources Monitoring data repository for developers and Admin for Troubleshooting, Scheduling, Performance tuning and Analysis.
13
13 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Monitoring Requirements: GARUDA Needed a simple and easy to use tool Able to handle different users perspective Information should be readily available Should have more graphical views Should produce relevant and accurate timely data Diagnose the problems of GARUDA Environment
14
14 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Paryavekshanam: Monitoring Tool GARUDA is monitored by PARYAVEKSHANAM PARYAVEKSHANAM in Sanskrit means Supervision PARYAVEKSHANAM is a web-based user-friendly grid monitoring tool to monitor GARUDA Grids health to enhance the reliability, usability and manageability. PARYAVEKSHANAM is scalable and can be deployed on platforms like AIX, Linux and solaris. It assists users in resource allocation/selection through various GARUDA tools like G-IDE.
15
15 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Components Monitored by Parya.. Computing nodes Network Grid middleware Submitted jobs Software Storage and Storage Resource Broker Scientific Instruments
16
16 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Paryavekshanam Architecture Client server architecture with pull model having a centralized server Resource - everything connected to grid Headnode – is the contact node of clusters Four components: –Information generator –Information Receiver –Information Repository –Paryavekshanam Visualizer
17
17 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Paryavekshanam Architecture
18
18 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Information Generator –Daemon resides on cluster Headnodes –Collects the cluster details and creates the data collection. –Data collection is processed using the MDS schema and populated into Globus MDS Paryavekshanam Architecture (Conti..) Information Receiver –Daemon that resides on the monitoring server. –requests Information Generator to produce the Data collection and fetches it from Globus MDS Information Repository –The data collection obtained from Globus MDS is processed and stored in the Information Repository. –It resides on the monitoring server –It has mirror repository for providing the fault tolerance Paryavekshanam Visualizer –User friendly Graphical User Interface –It retrieves data from Information Repository and displays through well- structured graphs and tables –Visualizer helps in diagnosing the problem areas.
19
19 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Paryavekshanam Features Hierarchical drill down of information Birds eye view of Grid Health through Radar Graph Dashboard providing the top level view Status bar for quick and action oriented insights Alerts generation through emails Easy Interface for New site addition Multiple Views: Grid, Nodes, GOC and Network views Visualization of data in tabular and graphical format Data Gallery for analysis of historical data Search facility for resources, software stack and jobs Separate resolution for GOC monitoring
20
20 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Dashboard of Paryavekshanam GARUDA Connected cities on India Map Status Bar Birds eye view of Grid Health through Radar Graph Grid Strength
21
21 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Dashboard of Paryavekshanam Conti.. Radar Graph Compare performance of different entities on axes starting from same point Easy inference of utilization of quantitative parameters Uniform utilization of various parameters can be inferred from the radar graphs. Provides the glimpse of deviation from Ideal scenario. Grid Strength Defines health of grid and mathematically derived from radar graphs parameters It is % representation on the dashboard Colored bullets for representing different values of grid strength Globus Strength : Monitoring Globus Strength based on empirical formula. Status Bar gives the instantaneous up/down status can be drilled down further.
22
22 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Alert & Notification system:AlNotis Paryavekshanam captures errors generated in the grid such as failures of link, cluster, node, grid middleware and jobs through AlNotis Provides more visibility into the health of the system Any failure or breakdown of resources needs to be captured and notified Necessary for corrective actions Whenever any error occurs, generates Error emails Sends Warning emails when utilization crosses threshold level Well-defined Escalation procedure –Unattended errors after 48 hrs is sent to grid admins
23
23 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Error Message Description DescriptionError Code Error Condition Network link down eLNKpkt loss 100% Cluster downeCLSHeadNode status down Node DowneNODNode Status down Globus Component eGLBComponent fail Jobs not running eJOBtotal jobs>0, RJ =0 Warning Message Description DescriptionWarning Code Warning Condition Utilization of CPU wCPUThreshold reached (cpu load >=1) Utilization of memory wMEMThreshold reached (mem utli >= 80%) Bandwidth Utilization wB/WThreshold reached (b/w utli >= 90%) Alert & Notification system conti..
24
24 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM AlNotis tabulation showing the error id, date & time the error generated, effected resources and time taken to close the ticket. Alert error messages generated during the last 6 months. Alert & Notification system conti..
25
25 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM GOC Desk : Parya.. Grid Operation Center (GOC) help Desk built for GARUDA monitoring with State of art Wall Display GOC is responsible for monitoring of the Grid Infrastructure as a whole. GOC operates in four regional areas and centrally reporting to the GOC at Bangalore Apart from monitoring through Paryavekshanam it coordinates it activities through video conferencing
26
26 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM GOC Desk Page GOC Desk page mainly used daily monitoring Provides overall performance of parameters like BW utilization etc for 24 hrs Each graph is a hyperlinked to details of that parameter for the respective grid center. Additional table for reading accurate value on graphs.
27
27 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM GOC Desk Page conti..
28
28 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Grid Overview Page: Parya.. It summarizes the performance of the entire grid for users. Provides information of all the parameters for all the centers in a tabular format It can be drilled down to fetch center resource details as Node level Summary It monitors the middleware components that provide detailed status summary for error resolving. It lists all the software available on the clusters. Helps in knowing which components of Globus are up.
29
29 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Grid Overview Page: Parya..
30
30 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Nodes view & Globus component status GSIFTP service is not available
31
31 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Software packages installed at headnodes
32
32 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Network Info Page: Parya.. Routers and switches are monitored Displays the bw avail, bw used, pkt loss, RTT and link status The report generation facility helps in maintaining the SLA of RTT, Pkt loss, Circuit uptime on monthly basis Monitors the operation of network on 24x7x365 basis
33
33 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Network Info Page: Parya..
34
34 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM SRB Server status check Status of Storage Resource Broker is checked Space availability of storage servers Report generation in word and excel format
35
35 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Data Gallery Page: Parya.. It archives data for reviewing the performance of the Grid in past Can view previous data both in tabular and graphical format Generates report for the duration selected.
36
36 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Search Page: Parya.. Resource and software search is provided for user Resources can be searched based on os, memory, cpu speed etc Softwares can be searched on categories like debugger, libraries etc.
37
37 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Paryavekshanam tracks the progress of submitted jobs Shows the current status based on jobid Report of jobs based on users, status, job id, duration and running at clusters are available Job search : Parya..
38
38 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM GARUDA Resource usage - Resources are extensively used - More than 100 registered users - >600 cpus across 14 sites - 65 TB data transferred on 2.43 GB backbone
39
39 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Admin Page: Parya.. Paryavekshanam adds the new sites and resources through simple interface Managed by access control Modification and deletion of sites supported
40
40 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Conclusion Successfully monitoring GARUDA from last 2 years Dashboard has been a very useful feature aggregating lots of information AlNotis system accelerates the speed of problem rectification Paryavekshanam overall improves the usability of GARUDA
41
41 NORDUnet conferenceGrid Monitoring : Paryavekshanam 9 th April 2008 Thank Q
42
42 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM Globus Strength Each distinct value is indicative of the Globus status. It is having a value of 29 - summing up the individual distinct weights as shown below: Major 4 pillars of globus 1.Security – 10 2.Job Submission – 8 3.Data Management – 7 4.Information Services– 4 --------------- 29 E.g. : Globus strength = 21 Result : Security, data mgmt, info services are up and Job submission is not possible.
43
43 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM The value 22 shows that Data Mgmt service is down
44
44 9 th – 11 th April 2008 24 th NORDUnet conferenceGARUDA Grid Monitoring : PARYAVEKSHANAM GSIFTP service is not available
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.