INFSO-SSA International Collaboration to Extend and Advance Grid Education Information System Valeria Ardizzone INFN Catania Corso di Grid Computing Catania,
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Information System What is? – System to collect information on the state of resources Why? – To discover resources of the grid and their nature – To have useful data that helps who is in charge of managing the workload to do it more efficiently. – To check for health status of resources. How? – Monitoring state of resources locally and publishing right information on the information system. – Adopting a data model that MUST be well known to all components that want to access monitored information – Using different approaches that we are going to investigate in next slides
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Adopted Information Systems The BDII (Berkley DB Information Index) – has been adopted in LCG middleware as the Information System provider. – It is an evolution of the Globus Meta Directory System (MDS) – LCG-2 actually adopts BDII as Information System. – It is based on Lightweight Directory Access Protocol (LDAP) servers. The Relational Grid Monitoring Architecture (R-GMA) – Is an implementation of the Grid Monitoring Architecture (GMA) standardized by the Global Grid Forum (GGF) – It is a relational implementation of the GMA – It is strongly Web Services Oriented – It will be adopted by gLite middleware
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, LCG Information System
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Collecting Information Gathering of information at different levels – Lower level: Grid Resource Information Server (GRIS) Collects information on the state of a given resource One GRIS on top of each resource A set of scripts and sensor that try to extract useful info on the resource – Medium level: Grid Index Information Server (GIIS) Collects information on resources of a given site One GIIS for each site – Higher level: BDII Collects information on resources of a given VO One BDII for each VO (suggested solution) Way of collecting info – Pull model (higher level servers periodically query lower level servers) – LDAP query model
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, BDII (the present) The Berkley Database Information Index (BDII) – Developed within the context of LCG project – Solves problems of instability of the MDS occurring when the number of sites grows too much – Stays on top of GIIS sites – One for each VO – Centralized system – Three levels of hierarchy – Accessed by the Workload Management System Way of working – One GRIS for each resource – One GIIS for each site collecting info from below GRIS systems – One BDII for a given VO collecting information from below GIIS systems – Two LDAP servers, one for write access and one for read access – Every two minutes a cron-job runs a script and collects info from a list of GIIS sites – The list of GIIS is placed in the configuration file of the BDII
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Praticals Ensure that the LCG_GFAL_INFOSYS enviroenment is set: – export LCG_GFAL_INFOSYS=grid004.ct.infn.it:2170 The commands to query the Top BDII are: – lcg-info – lcg-infosites
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, lcg-info
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Praticals: lcginfo (1/5) -h/--help: to see the help –list-attrs: print the list of the possible attributes --list-ce: lists the CEs which satisfy a query, or all the CEs if no query is given. --list-se: lists the SEs which satisfy a query, or all the SEs if no query is given. --bdii: allows to specify a BDII in the form :. If not given, the value of the environmental variable LCG_GFAL_INFOSYS is used. If that is not defined, the command returns an error. --vo: restricts the output to CEs or SEs where the given VO is authorized.
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Praticals: lcginfo (2/5) Examples: $ lcg-info --vo gilda --list-ce - CE: ce-gilda-edu.ceta-ciemat.es:2119/jobmanager-lcgpbs- infinite - CE: ce-gilda-edu.ceta-ciemat.es:2119/jobmanager-lcgpbs- long - CE: ce-gilda-edu.ceta-ciemat.es:2119/jobmanager-lcgpbs- short - CE: ce-nano-37.to.infn.it:2119/jobmanager-lcgpbs-infinite -CE: ce-nano-37.to.infn.it:2119/jobmanager-lcgpbs-long …..
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Praticals: lcginfo (3/5) $ lcg-info --vo gilda --list-se - SE: aliserv6.ct.infn.it - SE: dgt02.ui.savba.sk - SE: egee016.cnaf.infn.it - SE: gilda-se-01.pd.infn.it - SE: gilda04.ihep.ac.cn - SE: gn0.hpcc.sztaki.hu …..
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Praticals: lcginfo (4/5) $ lcg-info --vo gilda --list-attrs ….. $ lcg-info --vo gilda --attrs TotalCPUs --list-ce - CE: ce-gilda-edu.ceta-ciemat.es:2119/jobmanager-lcgpbs-infinite - TotalCPUs 1 - CE: ce-gilda-edu.ceta-ciemat.es:2119/jobmanager-lcgpbs-long - TotalCPUs 1 - CE: ce-gilda-edu.ceta-ciemat.es:2119/jobmanager-lcgpbs-short - TotalCPUs 1 - CE: ce-nano-37.to.infn.it:2119/jobmanager-lcgpbs-infinite - TotalCPUs 4 - CE: ce-nano-37.to.infn.it:2119/jobmanager-lcgpbs-long - TotalCPUs 4
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Praticals: lcginfo (5/5) $ lcg-info --vo gilda --attrs AvailableSpace,UsedSpace --list-se - SE: aliserv6.ct.infn.it - AvailableSpace UsedSpace SE: dgt02.ui.savba.sk - AvailableSpace UsedSpace SE: egee016.cnaf.infn.it - AvailableSpace UsedSpace
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, lcg-infosites
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Praticals: lcginfosites (1/6) -h/--help: help option --vo: VO name (mandatory) --is: it's possible to specify a not default Top BDII Some options: – se: The names of the SEs supported by the user's VO – ce: The information relative to number of CPUs, running jobs,etc. – rb: Names of the Rbs available for each VO – sitenames: Names of the LCG sites – tag: The names of the tags relative to the software installed in site is printed together with the corresponding CE – closeSE: The names of the CEs where the user's VO is allowed to run together with their corresponding closest SEs are provided
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Praticals: lcginfosites (2/6) Examples: $ lcg-infosites --vo gilda ce #CPU Free Total Jobs Running Waiting ComputingElement ce-gilda-edu.ceta- ciemat.es:2119/jobmanager-lcgpbs-short ce-gilda-edu.ceta- ciemat.es:2119/jobmanager-lcgpbs-long ce-gilda-edu.ceta- ciemat.es:2119/jobmanager-lcgpbs-infinite grid010.ct.infn.it:2119/jobmanager- lcgpbs-short grid010.ct.infn.it:2119/jobmanager- lcgpbs-long grid010.ct.infn.it:2119/jobmanager- lcgpbs-infinite
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Praticals: lcginfosites (3/6) $ lcg-infosites --vo gilda se Avail Space(Kb) Used Space(Kb) Type SEs n.a se01-gilda-edu.ceta-ciemat.es n.a grid0028.datagrid.cnr.it n.a grid0029.datagrid.cnr.it n.a aliserv6.ct.infn.it n.a trigriden01.unime.it
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Praticals: lcginfosites (3/6) $ lcg-infosites --vo gilda sitenames CE-CETA-CIEMAT CNR-ROMA GILDA-INFN-CATANIA GILDA-ING-MESSINA GILDA-TORINO ICEAGE-CATANIA IHEP-BEIJING ….
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Praticals: lcginfosites (4/6) $ lcg-infosites --vo gilda tag Name of the CE: grid010.ct.infn.it VO-gilda-scons VO-gilda-GILDA-echo VO-gilda-GILDA-gilda VO-gilda-GRELC_DAS_2_1 …..
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Praticals: lcginfosites (5/6) $ lcg-infosites --vo gilda rb glite-rb.ct.infn.it:7772 $ lcg-infosites --vo gilda lfc lfc-gilda.ct.infn.it gilda02.ihep.ac.cn
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Praticals: lcginfosites (6/6) $ lcg-infosites --vo gilda closeSE Name of the CE: ce-gilda-edu.ceta- ciemat.es:2119/jobmanager-lcgpbs-short se01-gilda-edu.ceta-ciemat.es Name of the CE: grid0021.datagrid.cnr.it:2119/jobmanager-lcgpbs- infinite grid0028.datagrid.cnr.it Name of the CE: grid010.ct.infn.it:2119/jobmanager- lcgpbs-short aliserv6.ct.infn.it …..
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, References LCG-2 User Guide – UserGuide.html UserGuide.html GLUE Schema –
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, RELATIONAL GRID MONITORING ARCHITECTURE (RGMA)
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Grid Monitoring Architecture(GMA) PRODUCER CONSUMER REGISTRY Store location Lookup location Transfer Data The Producer stores its location (URL) in the Registry. The Consumer looks up producer URLs in the Registry. The Consumer contacts the Producer to get all the data or the Consumer can listen to the Producer for new data.
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, R-GMA: Schema-Registry- Mediator VIRTUAL DATABASE TABLE 1, Colum defs TABLE 2, Colum defs TABLE 3, Colum defs TABLE 4, Colum defs SCHEMA TABLE 1,Producer P1 details TABLE 2,Producer P1 details TABLE 2,Producer P2 details TABLE 2,Producer P3 details TABLE 3,Producer P2 details TABLE 3,Producer P1 details TABLE 3,Producer P3 details REGISTRY MEDIATOR R-GMA Server MEDIATOR: a set of rules for deciding which data providers to contact for any given query. REGISTRY: It holds the details of all producers that are publishing to tables in the virtual database and it also holds the details of “continuous” consumers. SCHEMA : it holds the names and definitions of all of the tables in the virtual database, and their authorization rules.
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, R-GMA: Producer-Consumer VIRTUAL DATABASE TABLE 1, Colum defs TABLE 2, Colum defs TABLE 3, Colum defs TABLE 4, Colum defs SCHEMA TABLE 1,Producer P1 details TABLE 2,Producer P1 details TABLE 2,Producer P2 details TABLE 2,Producer P3 details TABLE 3,Producer P2 details TABLE 3,Producer P1 details TABLE 3,Producer P3 details REGISTRY MEDIATOR P1 P2 P3 C1C2 SQL “INSERT” SQL “SELECT” Producers: are the data providers for the virtual database. Writing data into the virtual database is known as publishing, and data is always published in complete rows, known as tuples. There are three types of producer: Primary, Secondary and On-demand. Consumer: represents a single SQL SELECT query on the virtual database. The query is matched against the list of available producers in the Registry. The consumer service then selects the best set of producers to contact and sends the query directly to each of them, to obtain the answer tuples. R-GMA Server
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, R-GMA command line tool
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Before you start the R-GMA command line tool make sure you have a proxy certificate : Run the command and you should receive the following message on startup: R-GMA Command Line Tool > voms-proxy-info --all $ rgma Welcome to the R-GMA virtual database for Virtual Organisations. ================================ Your local R-GMA server is: …… rgma>
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Commands are entered by typing at the > rgma prompt and hitting ‘enter’ to execute the command. A history of the commands executed can be accessed using the Up and Down arrow keys. To search a command from history use CTRL-R and type the first few letters of the command to recall. Command autocompletion is supported (use Tab when you have partly entered a command). Commands
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, General Commands help and help Display general help information or specific for a command. exit or q Exit from R-GMA command line interface. Show and set …Banal! clear history Clear the current session history of commands executed. write history Write the session command history to a file. write results Write query results to a file. read Read in the specified file and execute the commands contained in it.
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Table Commands Show tables Display the name of all tables existing in the Schema Describe Show all information about the structure of a table create table Create a table in the R-GMA schema. drop table Delete a table in the R-GMA schema.
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Exercise 1: create a table (1/2) rgma> create TABLE TutorTable(COD_Test INT PRIMARY KEY, Application VARCHAR(20), Status VARCHAR(10), PercStatus INT, Owner VARCHAR(25)) rgma> show tables ……. | NetworkUDPPacketLoss | | NetworkFileTransferThroughput | | TutorTable |
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, rgma> describe TutorTable | Column name | Type | Primary key | Can be NULL | | COD_Test | INTEGER | Yes | No | | Application | VARCHAR(20) | No | Yes | | Status | VARCHAR(10) | No | Yes | | PercStatus | INTEGER | No | Yes | | Owner | VARCHAR(25) | No | Yes | | MeasurementDate | DATE | No | No | | MeasurementTime | TIME | No | No | Exercise 1: create a table (2/2)
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Producer Types Primary Producer Secondary Producer On-Demand Producer User Code Producer API Producer Service Tuple Storage C Control only Queries Tuples SELECT * Tuples P User Code Producer API Producer Service C Control only Queries Tuples Queries User Code Producer API Producer Service Tuple Storage C Control and inserted tuples Queries Tuples
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Query and Storage Types Continuous: as soon as new data becomes available it is broadcast to all interested parties. Latest: correspond to intuitive idea of “current information”. History: return time sequenced data. TABLE 1,Producer P1 details TABLE 2,Producer P1 details TABLE 2,Producer P2 details TABLE 2,Producer P3 details TABLE 3,Producer P2 details TABLE 3,Producer P1 details TABLE 3,Producer P3 details REGISTRY P1 Latest-store Continuous&History-store P1 LATEST RETENTION PERIOD (LRP) and HISTORY RETENTION PERIOD (RTP) allow producers to periodically purge old tuples, and to give a precise meaning to the “current state”. Tuple-store can be in Memory or Database
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Continuous Producer Servlet Registry Store location Lookup location Continuous Store table description Producer API SQL “CREATE TABLE” Result Set TableName Value 1Value 2 TableNameURLPredicate Schema TableNameColumn TableName Value 1Value 2 Insert TableName UKRALAlice Consumer ServletConsumer API SQL “SELECT” TableName Value 1Value 2 TableName Value 1Value 2 Query SQL “INSERT”
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, History or Latest Producer Servlet Registry Store location Lookup location Query Store table description Producer API SQL “CREATE TABLE” Result Set TableName Value 1Value 2 TableNameURLPredicate Schema TableNameColumn TableName Value 1Value 2 Insert TableName UKRALAlice Consumer ServletConsumer API SQL “SELECT” TableName Value 1Value 2 TableName Value 1Value 2 Query SQL “INSERT”
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Producer Properties (cli) Using the command line tool you may work with one producer at a time. The current producer type can be displayed using: rgma>show producer Set the latest retention period for tuples published by the producer rgma>set producer latestretentionperiod | lrp [ ] Set the history retention period for the producer. If the producer does not support history queries this command has no effect. rgma>set producer historyretentionperiod | hrp [ ] Producer which handles the INSERT statement. The SQL INSERT statement may be used to add data to the system: rgma> INSERT INTO VALUES (’a’, ’b’, ’c’, ’d’)
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Exercise 2: create a Producer rgma> set producer latest Producer type : continuous latest rgma> set producer latestretentionperiod 50 seconds Set producer LRP to 50 seconds rgma> set producer historyretentionperiod 2 minutes Set producer HRP to 2 minutes rgma>describe TutorTable rgma> insert INTO TutorTable values(001,'TestProducer','Start',10,'Valeria'); Inserted 1 row into ITATutTable
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Consumer Properties (cli) 1 The behaviour of Consumer varies according to the type of query being executed. In R-GMA there are three basic types of query: LATEST Queries: only the most recent tuple for each primary key HISTORY Queries: all historical tuples for each primary key CONTINUOUS Queries: returns tuples continuously as they are inserted. The type of query can be changed using the SET QUERY command as follow: rgma> SET QUERY LATEST | CONTINUOUS | HISTORY The current query type can be displayed using rgma> SHOW QUERY
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Consumer Properties (cli) 2 The maximum age of tuples to return can also be controlled. To limit the age of latest or historical tuples use the MAXAGE property. rgma> SET MAXAGE seconds | minutes | hours | days The current maximum tuple age can be displayed using rgma> SHOW MAXAGE To disable the maximum age, set it to none: rgma> SET MAXAGE none
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, The final property affecting queries is timeout. – For a latest or history query the timeout exists to prevent a problem (e.g. network failure) from stopping the query from completing. – For a continuous query, timeout indicates how long the query will continue to return new tuples. Default timeout is 1 minute and it can be changed using rgma>SET TIMEOUT seconds | minutes | hours | days The current timeout can be displayed using rgma>SHOW TIMEOUT Consumer Properties (cli) 3
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Querying data uses the standard SQL SELECT statement: rgma> SELECT * FROM Set the output format for results. 'table' formats the results in a table, tsv outputs tab-separated results and csv outputs comma-separated results: rgma> SET output table|tsv|csv Consumer Properties (cli) 4 output csv: 1,TestProducer,Start,10,Valeria, ,22:07:36, set output tsv: 1 TestProducer Start 10 Valeria :07:36
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Exercise 3: create a Consumer rgma> set query latest Set query type to latest rgma> set maxage 1 minutes Set max age to 1 minutes rgma> set timeout 50 seconds Set timeout to 50 seconds rgma> set output tsv Set output format to 'tsv‘ rgma>select Application,Status FROM TutorTable TestProducer Start TestProducer Step1 TestProducer Step2
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Exercise 4: Producer & Consumer Continuos Producer and Consumer: (NOTE. Open 2 rgma client tool, one for Consumer the other for Producer.) Consumer’s client: rgma> set query continuous rgma> set timeout 50 seconds rgma> set maxage 30 rgma> set output csv rgma> select * from TutorTable Producer’s client: rgma>clear history rgma> set producer continuous rgma> insert INTO TutorTable values(002,'TestProducer','Step2',20,'Valeria'); rgma> insert INTO TutorTable values(003,'TestProducer','Step3',30,'Valeria'); rgma> insert INTO TutorTable values(004,'TestProducer','Step4',40,'Valeria'); rgma> insert INTO TutorTable values(005,'TestProducer','Step5',50,'Valeria'); rgma> write history Prod_comm.rgma
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Batch mode The command line tool can be used in batch mode in three ways: rgma –c [ -c …] Executes the command and exit. rgma –f Executes commands in file sequentially and exit. Commands embedded in a shell script: #!/bin/sh $RGMA_HOME/bin/rgma <<EOF set query latest select Application,Status FROM TutorTable EOF
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Exercise 5: Producer & Consumer Continuos Producer and Consumer: (NOTE. Open 1 rgma client tool only for Consumer. Open one shell for Producer.) Consumer’s client: rgma> set query continuous rgma> set timeout 60 seconds rgma> set output csv rgma> select * from TutorTable Producer’s shell: rgma -c “set producer continuous” rgma -c “insert INTO TutorTable values(006,‘A','Starting',00,'Valeria');” rgma -c “insert INTO TutorTable values(005,‘A',‘Running',30,'Valeria');”
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, Exercise 6: Producer & Consumer Continuos Producer and Consumer: (NOTE. Open 1 rgma client tool only for Consumer. Open one shell for Producer.) Consumer’s client: rgma> set query continuous rgma> set timeout 60 seconds rgma> write results Results.rgma rgma> select * from ITATutTable Producer’s shell: rgma -f Prod_comm.rgma
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, R-GMA APIs APIs exist in Java, C, C++. – For clients (servlets contacted behind the scenes) They include methods for… – Creating consumers – Creating primary and secondary producers – Setting type of queries, type of produces, retention periods, time outs… – Retrieving tuples, inserting data – … You can create your own Producer or Consumer.
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, The R-GMA Browser The easiest way to try out R-GMA. – It is installed on the machine running the Registry and Schema: Using the Browser you can do the following. – Browse the tables in the schema. – Look at the table definitions. – See all the available producers for a table. – Query a table. – Query only selected producers.
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania,
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, R-GMA Web Browser gement
INFSO-SSA Valeria Ardizzone, Grid Computing Course, Univ. of Catania, References RGMA exercises on GILDA Wiki: ms#RGMA gLite 3.0 User Guide – UserGuide.pdf R-GMA home page – R-GMA in EGEE –