Download presentation
Presentation is loading. Please wait.
Published byCamilla Campbell Modified over 9 years ago
1
New perfSonar Dashboard Andy Lake, Tom Wlodek
2
Outline Why we need a dashboard? Current dashboard – overview New dashboard – proposed architecture New dashboard – proposed components New dashboard – interaction between components New dashboard – work flow
3
Why we need a dashboard Why not use a well established monitoring tool like nagios ? Nagios: good for monitoring large homogenous computing farms. Not good for distributed heterogeneous systems. Lacks configurable display and convenient configuration management.
4
Problem 1: PerfSonar Data model is not represented in existing monitoring packages PS has three types of probes: 1 host probe (primitive service): owamp, bwctl probes ….. The service they correspond to runs on a single host 2 host probes: traceroute probes. Check traceroute from host1 to host2 3 host probes (throughput, latency): Check throughput from host1 to host2 monitored from host3 Neither Nagios nor other monitoring systems account for multi host services. Gratia data model can be modified to take multi host services into account, but this requires work.
5
Problem 2: Monitoring systems are designed to gather and display time dependent data time Some variable time Variable 1 Variable 2 Variable n …. Time cut. What is the status NOW()? What we really need is this:
6
Our data use model implies a completely different database structure! Most monitoring systems have “time flow” tables: records correspond to points in time We need to access current status of several hundred or thousand independent channels – and it has to be done FAST! The number of channels in a cloud increases like number of hosts squared Obtaining USATLAS cloud status requires 471 queries. Indexing database tables helps, but is not enough.
7
The old dashboard database structure Dashboard History Table (disk resident) Current Status Table (memory resident) Configuration Table (disk resident) UPDATE INSERT
8
The database structure Keeping most recent data in memory table improves system performance. However it still has its limitations: database introduces its own overhead Not a significant problem for small clouds For big clouds it can become an issue LHCONE cloud requires approx 1300 queries to have the status displayed – and that is without traceroute tests. The time delay is noticeable.
9
Old dashboard - overview dashboard Collector API Collector PS Host database user
10
Old dashboard http://perfsonar.usatlas.bnl.gov:8080/exda https://perfsonar.usatlas.bnl.gov:8443/exda
11
Problems: The code started as a php add-on to Nagios and relied heavily on Nagios structure When we started this did look like a perfectly reasonable approach. It turned out that this was completely wrong approach – but we were not aware of this when we started. Once it became clear that we should not copy Nagios it was already to late to back off.
12
Problems: The code was converted from nagios-php-apache to java- tomcat technology, but the nagios legacy had to be carried on. Initially we had no clear idea what we intend to do with the code – new features were added when we realized we needed them without any formal plan nor design. Result: the code contains features which were demanded by users (and therefore are useful) but it is increasingly hard to maintain it.
13
Current status of the dashboard code
14
Problems: Storing data in database (even in memory table) slows down the application The system is entirely self contained: GUI, data store, alarm system, authentication – all is in one application, often embedded in pieces of the same piece of code It is hard to establish cooperative effort I suggest to break the project into several independent sub projects
15
New dashboard Minimum objective: New dashboard should keep all the functionalities of the old one, but have code that is better organized, documented, extensible and scalable. Beyond minimim objective: Dashboard should interact with the centralized configuration management. Other ideas, suggestions as what should be included in „beyond minimal objective”?
16
Proposed structure of new dashboard framework Data Store Data Access API Data Persistence Layer Database Display GUIObject config GUIAlarmsAuthenticationCollectorOther?User mgmt
17
Data objects The data is stored in data objects representing: hosts, services, sites, matrices, clouds etc. The object definitions can be found in Andy’s design document. Each object has an internal (java) external (JSON) representation. Utility classes translate between representations External modules talk to data store by exchanging JSON objects.
18
Data Store and Access API It is implemented in java servlets executed within tomcat server Data access API is a servlet, responding to GET and POST requests from clients Data Store is a java singleton object holding data descriptionobjects, it is called by data access API. Details follow
19
New dashboard – design. The central store will store objects representing hosts, services, sites, matrices and clouds. We wrote a formal description of the data objects, Andy Lake maintains a design document, available on request.
20
Central data store The central store maintains list of known hosts and services: Vector hosts; Vector services; It also maintains lists of sites, matrices and clouds. Sites, matrices and cloud objects contain references to each other. (Clouds contain sites and matrices, sites contain hosts, hosts contain services etc).
21
Operations on objects Dedicated java classes are used to perform operations on objects: ObjectCreator – create new sites, clouds hosts or services, ObjectManipulator – add/remove services to hosts, hosts to sites, hosts to matrices, sites and matrices to clouds ObjectShredder – deletes hosts, services, sites, matrices, clouds.
22
Interaction between modules Modules talk to the data store by exchanging JSON objects using POST and GET methods Data access API is a servlet, responding to GET and POST requests from clients returns JSON objects Client programs obtain data from JSON objects. Details follow
23
Data Store multithreading issues The objects in Data Store have to be thread safe, as they will be updated/accessed by different threads In practice this is no big problem: collector will update the objects once per several minutes and users will not change the configuration more often than once/day In addition humans and collector will modify different fields in the objects Risk of collision is slim
24
Collector Collector is a program which runs on remote site Periodically it connects to data store and asks „do you have jobs to run?” Data store returns list of JSON objects representing the jobs. Collector executes the jobs and Uploads results back to data store.
25
Data Store and Collector status Collector code has been written by Andy Lake The data store exists, but not all features are implemented We are able to build clouds and sites with primitive services only. No matrices yet. Collector can connect to datastore, get jobs, execute them and upload results. (This has been tested).
26
What next? GUI Persistence/Db History Matrix Users Config Gui Alarms and filters Authentication Interface to centralized config management?
27
Matrix Matrix services are not implemented in the data store yet Will come in week or two.
28
Data Persistence The service data will stay in memory Once tomcat is restarted, the application reloaded or the computer rebooted – it will be gone. We need persistence mechanism. Also we need a way to store historical data. For short time periods we may store the data in memory, but if we want to go very far back in time we need a back end database.
29
Data Persistence and Database Plan A: use Hibernate. Map objects in Vector services list to database objects. Plan B (If A fails): execute a timer thread in tomcat which will wake up periodically, read services in the list of services and dump them into back end database using plain JDBC calls. Neither method requires database access to display overview of current status of a cloud – it should be faster than current dashboard.
30
History (requires: persistence) We still have no design how to represent historical data We need to sit down with Andy and write it down.
31
Display GUI We need a client to display service information Connect to datastore, ask for service {url}/services?id={id} Obtain string of JSON, parse into JSON Unpack JSON Display Lucy is interested in doing this, but there is room for more people here.
32
Config GUI (needs: back end database and persistence) Get service info {url}/services?id={id} Unpack into JSON object Fill a web form with JSON fields and display it Once the form is modified pack it content into JSON Upload the JSON into data store STATUS: developer needed.
33
Display and Config GUI The actual technology to be used does not really matter We can have many different GUIs Unless you have a very good idea on how to do it I think it would be prudent to follow the layout of the old dashboard, as users know it and they seem to like it.
34
User Management Define object (bean) user with attributes:name, dn, e-mail etc. Store users in a singleton Define mapping onto back end database Provide set of JSP pages to view and modify users. Nice, self contained project good for a summer student or undergrad intern. Must know basic java and preferably have internet programming class.
35
Authentication OperationURLServletFilter Get services{url}servicesGet services servlet Update Service{url}/updateServiceUpdate service servlet DN filter Build latency host{url}/addLatencySer vicesToHost?id={id} Add latency servletDN filter Upload probe results{url}/resultsUpload results servlet Host filter... Etc, etc,...
36
We need two types of filters: DN and host DN filter: javax.security.cert.X509Certificate Compare to list of DN’s of known users Host filter: Client host IP is obtained from request context. Compared to list of known hosts
37
Alarms (requires: config GUI, Users) Define what criteria have to be met to raise an alarm and who should be alerted Current dashboard has alarms for primitive services. It is hard to define them for matrix services Project is moderately hard for primitive services and hard for matrix services.
38
Status Filters (not to be confused with servlet filters, discussed earlier!) Users should be able to define (via GUI) which lower level inputs (like service statusses) should be combined to obtain higher level status (like: site status). Filter definition should be configurable via GUI (requires skill in UI design) Filters should be then assigned to objects (sites or clouds) Not easy, but probably rather interesting project.
39
Interface to centralized configuration Aaron Brown is about to release the centralized config for testing We will have to interface the data store to it. As of today we know next to nothing about how the interface should look like Very likely it will be a rather big and interesting project, with lots of design work.
40
Documentation Andy’s design document – ask Andy for access DataStore API and work progress document – ask me for access DataStore code - will be in BNL public subversion soon, you will need doe cert to access it DataStore javadocs is available Mailing list ps-dashboard-devel-l@lists.bnl.gov http://perfsonar.racf.bnl.gov:8080/dashboard-1.0- SNAPSHOT/dump
41
That’s All Questions, comments, suggestions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.