Introduction on R-GMA Shi Jingyan Computing Center IHEP
Content R-GMA R-GMA concept R-GMA components Accounting system
Relational Grid Monitoring Architecture -- introduction Models the information infrastructure of a Grid as a set of Consumers (who request information), Producers (who provide information) and a single Registry (which mediates the communication between producers and consumers). Impose a standard query language (a subset of SQL): producer publishes tuples with INSERT statement; consumer query tuple with SELECT statement. All tuples carry a time-stamp to support monitoring system
R-GMA Introduction (cont.)
R-GMA Introduction (cont.) the information resources of a VO is in a single virtual database containing a set of virtual table. a single schema contains the name and structure of each virtual table in the system. a single registry contains a list, for each table, of producers who have offered to publish rows for the table. a consumer runs an SQL query against a table, and the registry selects the best producers to answer the query in a process called mediation. The consumer then contacts each producer directly, combines the information, and returns a set of tuples. Mediation process is hidden from the user. There is no central repository holding the contents of the virtual table.
R-GMA Introduction (cont.) Producers: Primary producer: user's code periodically inserts tuples which is then stored internally by the producer. The producer answers consumer queries from its own storage. Secondary producer: populates its own storage by running its own query against the virtual table. The user code only sets the process running; the tuples come from other producers. On-demand Producer: no internal storage; data is provided by the user code in direct reponse to a query forwarded on to it by the producer service.
R-GMA Introduction (cont.) Consumer: each consumer represents a single SQL SELECT query on the virtual database and obtain the answer tuple from the producer after the mediation. Mediation: The query is first passed to the Registry to identify which producers, for each virtual table in the query, must be contacted to answer it. The process is called Mediation.
R-GMA Introduction (cont.) Types of query continuous query: all new tuples matched the query will be streamed into the consumer's tuple-storage as soon as they are inserted into the virtual table by the rpoducers. One-time queries: History-query: all versions of any matching tuples are returned. Latest-query: only the tuples representing the ” current state ” are returned. Static query: database-like query and do not contain R-GMA time-stamps.
R-GMA Introduction (cont.) Retention Periods: LatestRetentionPeriod: is inserted into each tuple published by a Primary Producer and remains there when a tuple is re-published by a Secondary Producer. HistoryRetentionPeriod: Producer declare a HistoryRetentionPeriod for each table to which they are publishing tuples. A latest-query returns only those tuples which have not exceeded their LatestRetentionPeriod for the table. A history-query returns all versions of tuples which have not exceeded the producer's HistoryRetentionPeriod for the table.
R-GMA Introduction (cont.)
Web Service Architecture: R-GMA conforms to the Web Services Architecture. 6 principal services:Primary producer,Secondary producer,On-demand producer, Consumer, Registry and Schema Each service has one WSDL document. Message is used to communicate with the services. Message sequence and format are also specified in WSDL.
R-GMA Introduction (cont.) R-GMA uses ” SOAP messaging over http/s ” in a request/response pattern.
Apel — accounting in LCG-2 Apel software is composed of Apel Log Processor and Flexible archiver. Apel Log Processor: parses log files to extract job information and publishes it using R-GMA. Flexible Archiver:Located on the Grid Operation Center(GOC). Receive the data for the accounting table from all sites participating in the R-GMA configuration, it will contain an amalgamation of all accounting data from each site.
Apel — accounting in LCG-2 (cont.)
Apel Log Processor used to parse GateKeeper and PBS event logs generated at a site. The extracted data is pieced together to form an accounting record detailing the owner of a submitted job with the resources used to excute the job itself. Accounting records are then published using R-GMA. Accounting records are then collated together into a centralised repository on the GOC using an R-GMA Secondary Producer.
Aple Log Processor (cont.)
parsed log files: /var/log/globus-gatekeeper.log /var/log/message /var/spool/pbs/server_priv/accounting Tables used in Apel EventRecords GkRecords MessageRecords SpecRecords LcgRecords (published)
Flexible Archiver
Examples – Two Servlets The first one: Provides a web page as the user interface. Create a consumer to show the statistic info from the accounting data on the date the user provides
Example – Two servlets (cont.) The second example: Create a primary producer to publish the statistic infomation of the accounting data which can be queried from the browser servlet provided by RGMA software package
IHEP Accounting plan Pbs log file: /var/spool/pbs/server_priv/accounting Perl program analyse log file to generate DB data Java program uses producer to publish the necessary accounting info by joining DB data Rgma server has registry function to maintain the virtual table Summary accounting info with respect to user.
| Field | Type | Null | Key | Default | Extra | | theDate | date | YES | | NULL | | | eventID | varchar(60) | YES | | NULL | | | siteName | varchar(30) | YES | | NULL | | | localUser | varchar(20) | YES | | NULL | | | localGroup | varchar(20) | YES | | NULL | | | jobName | varchar(30) | YES | | NULL | | | queueName | varchar(20) | YES | | NULL | | | jobCreateTime | varchar(10) | YES | | NULL | | | jobQueuedTime | varchar(10) | YES | | NULL | | | jobEligibleTime | varchar(10) | YES | | NULL | | | startTime | varchar(10) | YES | | NULL | | | endTime | varchar(10) | YES | | NULL | | | execHOST | varchar(30) | YES | | NULL | | | resource_List_cput | time | YES | | NULL | | | resource_List_neednodes | varchar(30) | YES | | NULL | | | sessionID | int(10) | YES | | NULL | | | exitStatus | int(2) | YES | | NULL | | | resources_Used_cput | time | YES | | NULL | | | resources_Used_mem | int(16) | YES | | NULL | | | resources_Used_vmem | int(16) | YES | | NULL | | | resources_Used_walltime | time | YES | | NULL | |