Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria.

Similar presentations


Presentation on theme: "Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria."— Presentation transcript:

1 Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

2  Grid support for national nanotechnology network of Russia ◦ To provide for science and industry an effective access to the distributed computational, informational and networking facilities ◦ Expecting breakthrough in nanotechnologies ◦ Supported by the special federal program  Main technical points ◦ based on a network of supercomputers (about 15-30) ◦ has two grid operations centers (main and backup) ◦ is a set of grid services with unified interface ◦ partially based on Globus Toolkit 4 S. Belov, GridNNN monitoring 2/15

3  Main aim ◦ integration of small and medium supercomputers into a unified distributed computing environment  Highly heterogeneous grid environment (hardware, software)  Oriented to parallel tasks rather than single batch tasks  Workflow management ◦ Jobs consist of tasks  Follows core OGSA principles  GSI based security model  RESTful grid services S. Belov, GridNNN monitoring 3/15

4 S. Belov, GridNNN monitoring 4/15 Based on the report of A.Kryukov et al., Architecture of GridNNN, GRID’2010

5  WebUI server  Resource Brocker/metascheduler + Workflow management (RESTful)  Information Service (RESTful / WS MDS)  Monitoring & Accounting  Registration service (RESTful)  GSI services ◦ CA, MyProxy, VOMS  GridFTP servers S. Belov, GridNNN monitoring 5/15 Based on the report of A.Kryukov et al., Architecture of GridNNN, GRID’2010

6  State of sites and services ◦ Availability ◦ Real operational state  Monitoring of user's jobs and tasks  Keeping history on different system's parameters  Information representation ◦ General infrastructure state in whole ◦ Running jobs and tasks ◦ Separate sites and services (real-time and history) ◦ Visualization of job events S. Belov, GridNNN monitoring 6/15

7  State of computational resources by site (based on data from information index(es))  Slots available for tasks  Jobs (total on site), jobs belong to GridNNN  Structure and properties of clusters ◦ Subclusters, nodes, slots, operation system, architecture ◦ Application software ◦ Supported VOs (with ACLs, Access Control Lists)  Monitoring of jobs running on sites (by information from Pilot servers) S. Belov, GridNNN monitoring 7/15

8  Goal: checks of services' operation  Simple tests for services registered in Service for Registration of Resources and Services  Connection to the declared port of the machine (plane or secured — in depend of specified protocol)  Information requests to some services  Separate tests scenarios for MDS information indexes and Service for Registration of Resources and Services: information  Web page with the history of functional tests results S. Belov, GridNNN monitoring 8/15

9  Goal: to get information, both real-time and historical, on resources utilization and jobs running on GridNNN infrastructure (by users, VOs, sites)  Information sources: Pilot servers, GRAMs and local resources managers  Collecting data on jobs and tasks in the system ◦ All jobs events timestamps, real consumed CPU time  Accounting information reports in different views: ◦ by sites, VOs and single users  Aggregation of actual job's execution time from all sites S. Belov, GridNNN monitoring 9/15

10  Gathering statistics on CPU time consumed by users and VOs ◦ In plain hours, later with allowance of computational system productivity  Displaying the statistics of CPU resources usage ◦ Different report kinds: for user, VO manager, site admin, GridNNN project admins ◦ Statistics access roles to protect private information of users and VOs S. Belov, GridNNN monitoring 10/15

11 S. Belov, GridNNN monitoring 11/15

12 S. Belov, GridNNN monitoring 12/15 Monitoring and accounting data storage Information collector Pilot Job management services Monitoring website Monitoring data provisioning (Web Services) Accounting Information publisher Functional tests of the services Infosys central Information index

13  More than 15 resource centers at the moment in different regions of Russia ◦ RRC KI, «Chebyshev» (MSU), IPCP RAS, CC FEB RAS, ICMM RAS, JINR, SINP MSU, PNPI, KNC RAS, SPbSU, SPII RAS and others S. Belov, GridNNN monitoring 13/15 http://mon.ngrid.ru

14 S. Belov, GridNNN monitoring 14/15

15  GridNNN project was successfully finished this summer  The resulting software and created infrastructure are to be used for developing Russian Grid Network project  Fully operational monitoring and accounting tools are in production  Further user interfaces improvements are planned within Russian Grid Network project S. Belov, GridNNN monitoring 15/15


Download ppt "Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria."

Similar presentations


Ads by Google