Presentation is loading. Please wait.

Presentation is loading. Please wait.

EDG Final Review Demonstration

Similar presentations


Presentation on theme: "EDG Final Review Demonstration"— Presentation transcript:

1 EDG Final Review Demonstration
WP9 Earth Observation Applications Meta data usage in EDG Good afternoon, Christine and myself will present metadata usage in Earth Observation. Authors: Christine Leroy, Wim Som de Cerff

2 Earth observation Meta data usage in EDG
Focus will be on RMC: Replica Metadata Catalogue Validation usecase: Ozone profile validation Common EO problem: measurement validation Applies to (almost) all instruments and data products, not only GOME, not only ozone profiles Scientists involved are spread over the world Validation consists of finding, for example, less than 10 profiles out of 28,000 in coincidence with one lidar profile for a given day Tools available for metadata on the Grid: RMC, Spitfire We will show our metadata usage using our Validation usecase. (then bullits) Our usecase is ozone profile validation: profiles are produced using satellite data (point to satellite image) and validated using ground-based measurements (point to lidar picture), a very common way of validating satellite measurements. This way of validating is used for almost all satellite instruments. Why metadata in EO? All our data have time and geolocation information as parameters. Ground-based measurements are point measurements made during a measurement campain. Satellite measurements have global coverage and span several years. (bullit) Validation consists of finding, for example, less than 10 profiles out of 28,000 in coincidence with one lidar profile for a given day. Within EDG we have two tools available for metadata: The RMC and Spitfire. We will focus on RMC.

3 Demonstation outline Replica Metadata Catalogue (RMC) usage
Profile processing Using RMC to register metadata of resulting output Profile validation Using RMC to find coincidence files RMC usage from the command line Will show the content of the RMC, the attributes we use. Show result of the validation Our demonstration will show profile processing, validation, and how the metadata is stored in the RMC. Profile processing and Validation are jobs that takes some time, therefore we will first start submit these 2 jobs and while they will run we will show you how the RMC can be used and what our catalogue looks like. Then we will see the result of our processing job and validation job.

4 GOME NNO Processing select a LFN from precompiled list of non- processed orbits verify that the Level1 product is replicated on some SE verify the Level2 product has not yet been processed create a file containing the LFN of the Level1 file to be processed create a JDL file, submit the job, monitor execution During processing profiles are registered in RM and metadata is stored in RMC query the RMC for the resulting attributes We will process a GOME Level 1 orbit file that has been previously replicated on a Grid SE, but has not yet been processed (i.e. The Level2 file is not yet registered in the RC). This to show how our metadata is produced at runtime and how it is stored in the RMC. [Christine starts NNO processing] First we will select a Logical file name from a list of non-processed orbits. Then check if the file is available on some SE, this is done by querying the Replica Manager. The same is done for checking that the level2 file is not there yet. Then a file is created which contains the LFN of the file to be processed. Normally this file will contain a number of filenames, but for the demo only one file will be processed. The next step is to generate a JDL file and submit it to the RB. During processing the metadata is extracted from the resulting profiles. The metadata is stored directly in the RMC. After successful completion, the RMC can be queried to show the resulting attributes. The attributes will be explained later on in the presentation. [if time] We have two processing algorithms for producing ozone profiles, the one we showed is fast and based on Neural Networks and is used for Near Real Time production. The other one is based on optimal interpolation and is slower but more precise and is used for offline processing. Both are validated using the lidar ground based measurements.

5 Validation Job submission
Query RMC for coincidence data LFNs (Lidar and profile data) Submit job, specifying the LFNs found Get the data location for the LFNs from RM Get the data to the WN from the SE and start calculation Get the output data plot Show the result As the validation job will take approximately 10 minutes, we will submit this first before explaining the RMC attributes we use. The validation job compares previously processed ozone profiles with measurements from a Lidar station. [Christine starts validation job] First the RMC is queried for the LFN of the ozone profiles and Lidar measurements which are coincidence in time and geolocation. In our case all profiles are collected in coincidence with a measurement campaign made at the lidar station located at Haut Provence (in France) for the day 29 January 1997 The resulting LFNs are used in the submitted job. A running job on a WN gets its input files using the LFNs (using the RM to locate the SE) and starts the validation calculation. We will see the result of the validation later, as the calculation will take some time to finish

6 RMC usage: attributes Command area All attributes Of WP9 RMC Result
During processing we produced profiles which were stored on SEs, registered in the RM and the extracted metadata stored in the RMC. We will now show the contents of our EO RMC. We do this using a specially built web interface. This web interface constructs the RMC command and submits it and shows the result. Normally we use the command line rmc commands In the area in the middle all the attributes are shown. As we have one table in the RMC database, we have put in all attributes of our four different products (level 1, level 2 NNO, level 2 Opera and Lidar data). For each file type only the valid attributes are filled in, the others are left blank. The ‘collectionname’ attribute is used for differentiation between the different products. The main attributes used for the validation are the datetimestart and datetimestop, as time attributes and the lattiude longitude parameters. [Christine starts creating a ‘select all’ query to show how much we have in the RMC] Christine will now create a query to show the complete contents of the RMC, just to show the amount of profiles produced for the demo. In the command area you will see the build up of the command, the result will be shown in the result area [Cristine starts creating a attribute query] Christine will now show how to create a command to query the RMC for specific profiles, similar to the one used in the validation The result area will show only the LFN which match the query, which is of course only a fraction of the total number of files in the RMC. In this way the validation program does not have to scan through all the available profiles, but can directly only access those of interest. [if time, I.e. validation is not finished…,Christine can show another query?] [Christine now shows validation result and explains it ;)] Result area

7 Metadata tools comparisons
Replica Metadata Catalogue Conclusions, future direction: RMC provides possibilities for metadata storage Easy to use (CLI and API) No additional installation of S/W for user RMC performance (response time) is sufficient for EO application usage More database functionalities are needed: multiple tables, more data types, polygon queries, restricted access (VO, group, sub-group) Many thanks to WP2 for helping us preparing the demo

8 Backup slides

9 EO Metadata usage Questions adressed by EO Users: How to access metadata catalogue using EDG Grid tools? Context: In EO applications, large number of files (millions) with relative small volume. How to select data corresponding to given geographical and temporal coordinates? Currently, Metadata catalogues are built and queried to find the corresponding files. Gome Ozone profile validation Usecase: ~28,000 Ozone profiles/day or 14 orbits with 2000 profiles Validation with Lidar data from 7 stations worldwide distributed Tools available for metadata on the Grid: RMC, Spitfire, Muis (operational ESA catalogue) via the EO portal Where is the right file ? CE SE File H File G File F File E File D File C Lat Long. Date File B

10 Data and Metadata storage
Data are stored on the SEs, registered using the RM commands: Metadata are stored in the RMC, using the RMC commands Link RM and RMC: Grid Unique Identifier (GUID)

11 Usecase: Ozone profile validation
Step 1: Transfer Level1 and LIDAR data to the Grid Storage Element Step 2: Register Level1 data with the Replica Manager Replicate to other SEs if necessary Step 3: Submit jobs to process Level1 data, produce Level2 data Step 4: Extract metadata from level 2 data, store it in database using Spitfire, store it in Replica Metadata Catalogue Step 5: Transfer Level2 data products to the Storage Element Register data products with the Replica Manager Step 6: Retrieve coincident level 2 data by querying Spitfire database or the Replica Metadata Catalogue Step 7: Submit jobs to produce Level-2 / LIDAR Coincident data perform VALIDATION Step 8: Visualize Results

12 Which metadata tools in EDG?
Spitfire Grid enabled middleware service for access to relational databases. Supports GSI and VOMS security Consists of: the Spitfire Server module Used to make your database accessible using Tomcat webserver and Java Servlets the Spitfire Client libraries Used from the Grid to access your database (in Java and C++) Replica Metadata Catalogue: Integral part of the data management services Accessible via CLI and API (C++) No database management necessary Both methods are developed by WP2 Focus will be on RMC

13 Scalability (Demo) this demonstrates just one job being submitted and just one orbit is being processed in a very short time but the application tools we have developed (e.g. batch and run scripts) can fully exploit possibilities for parallelism they allow to submit and monitor tens or hundreds of jobs in one go each job may process tens or hundreds of orbits just by adding more LFNs to the list of orbits to be processed batch –b option specifies the number of orbits / job batch –c option specifies the number of jobs to generate used in this way the Grid allows us to process and register several years of data very quickly example: just 47 jobs are needed to process 1 year of data (~4,700 orbits) at 100 orbits per job this is very useful when re-processing large historical datasets, for testing differently ‘tuned’ versions of the same algorithm the developed framework can be very easily reused for any kind of job

14 GOME NNO Processing – Steps 1-2
Step 1) select a LFN from precompiled list of non-processed orbits >head proclist Step 2) verify that the Level1 product is replicated on some SE >edg-rm --vo=eo lr lfn: lv1 srm://gw35.hep.ph.ic.ac.uk/eo/generated/2003/11/20/file8ab6f428-1b57-11d8- b587-e ff70

15 GOME NNO Processing – Steps 3-5
Step 3) verify the Level2 product has not yet been processed >edg-rm --vo=eo lr lfn: utv Lfn does not exist : lfn: utv Step 4) create a file containing the LFN of the Level1 file to be processed >echo lv1 > lfn Step 5) create a JDL file for the job (the batch script outputs the command to be executed) >./batch nno-edg/nno -d jobs -l lfn -t run jobs/0001/nno.jdl -t

16 GOME NNO Processing – Steps 6-7
Step 6) run the command to submit the job, monitor execution and retrieve results >run jobs/0001/nno.jdl –t Jan 14 16:28:45 Jan 14 15:31:42 Running grid001.pd.infn.it:2119/jobmanager-pbs-long Jan 14 15:57:36 Done (Success) Job terminated successfully Jan 14 16:24:01 Cleared user retrieved output sandbox Step 7) query the RMC for the resulting attributes ./listAttr utv lfn= utv instituteproducer=ESA algorithm=NNO datalevel=2 sensor=GOME orbit=10844 datetimestart= E13 datetimestop= E13 latitudemax=89.756 latitudemin= longitudemax= longitudemin=0.1884

17


Download ppt "EDG Final Review Demonstration"

Similar presentations


Ads by Google