EDG Final Review Demonstration

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
ESA Data Integration Application Open Grid Services for Earth Observation Luigi Fusco, Pedro Gonçalves.
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Data Management Expert Panel - WP2. WP2 Overview.
Réunion DataGrid France, Lyon, fév CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
AGENDA Tools used in SQL Server 2000 Graphical BOL Enterprise Manager Service Manager CLI Query Analyzer OSQL BCP.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
EU 2nd Year Review – Jan – WP9 WP9 Earth Observation Applications Demonstration Pedro Goncalves :
DataGrid is a project funded by the European Commission under contract IST Earth Observation applications in EU DataGrid (EDG) and follow on.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
EDG Application The European DataGrid Project Team
DIRAC Web User Interface A.Casajus (Universitat de Barcelona) M.Sapunov (CPPM Marseille) On behalf of the LHCb DIRAC Team.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Towards a Javascript CoG Kit Gregor von Laszewski Fugang Wang Marlon Pierce Gerald Guo
WP9 – Earth Observation Applications – n° 1 Experiences with Testbed1, plans and objectives for Testbed 2 Testbed retreat th August 2002
KNMI Applications on Testbed 1 …and other activities.
ES Metadata Management Enabling Grids for E-sciencE ES metadata OGSA-DAI NA4 GA Meeting, D. Weissenbach, IPSL, France.
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
WP9 – Earth Observation Applications – n° 1 WP9 report to Plenary ESA, KNMI, IPSL Presented by M. Petitdidier, IPSL DataGrid Plenary Session 5 th Project.
EU 2nd Year Review – Jan – WP9 WP9 Earth Observation Applications
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
Metadata Mòrag Burgon-Lyon University of Glasgow.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
EDG Applications The European DataGrid Project Team
INFSO-RI User Forum 1-3 March 2006 Enabling Grids for E-sciencE Worldwide ozone distribution by using Grid infrastructure ESA: L.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
WP 10 ATF meeting April 8, 2002 Data Management and security requirements of biomedical applications Johan Montagnat - WP10.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
Data Management The European DataGrid Project Team
ESA DataGrid Review – 10 June 2002 – n° 1 Summary item 3  ESA and WP9 part 1 (45m)  DataGrid Frascati infrastructure (AT, 7.5m)  ESA GOME application.
Data Management The European DataGrid Project Team
Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April,
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
DataGrid is a project funded by the European Commission under contract IST rd EU Review – 19-20/02/2004 WP9 Earth Observation applications.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Earth Observation inputs to ATF Annalisa Terracina EU-DataGrid Project Work Package 9 – EO Applications April 2003 CERN.
DataGrid France 12 Feb – WP9 – n° 1 WP9 Earth Observation Applications.
AIRS Meeting GSFC, February 1, 2002 ECS Data Pool Gregory Leptoukh.
SEE-GRID-SCI WRF-ARW model: Grid usage The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures.
INFSO-RI Enabling Grids for E-sciencE ESR Database Access K. Ronneberger,DKRZ, Germany H. Schwichtenberg, SCAI, Germany S. Kindermann,
Managing, Storing, and Executing DTS Packages
Grid Computing: Running your Jobs around the World
WP 9.4 Use Case (ESA-ESRIN, IPSL, KNMI)
Work Package 9 – EO Applications
StoRM: a SRM solution for disk based storage systems
U.S. ATLAS Grid Production Experience
EO Applications Parallel Session
Practical: The Information Systems
GWE Core Grid Wizard Enterprise (
INFN-GRID Workshop Bari, October, 26, 2004
Introduction to Grid Technology
Earth Observation and Climate Applications for EGEE
Data Management in Release 2
Online Application Demos
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Grid Data Replication Kurt Stockinger Scientific Data Management Group Lawrence Berkeley National Laboratory.
Presentation transcript:

EDG Final Review Demonstration WP9 Earth Observation Applications Meta data usage in EDG Good afternoon, Christine and myself will present metadata usage in Earth Observation. Authors: Christine Leroy, Wim Som de Cerff Email:Christine.Leroy@ipsl.jussieu.fr, sdecerff@knmi.nl

Earth observation Meta data usage in EDG Focus will be on RMC: Replica Metadata Catalogue Validation usecase: Ozone profile validation Common EO problem: measurement validation Applies to (almost) all instruments and data products, not only GOME, not only ozone profiles Scientists involved are spread over the world Validation consists of finding, for example, less than 10 profiles out of 28,000 in coincidence with one lidar profile for a given day Tools available for metadata on the Grid: RMC, Spitfire We will show our metadata usage using our Validation usecase. (then bullits) Our usecase is ozone profile validation: profiles are produced using satellite data (point to satellite image) and validated using ground-based measurements (point to lidar picture), a very common way of validating satellite measurements. This way of validating is used for almost all satellite instruments. Why metadata in EO? All our data have time and geolocation information as parameters. Ground-based measurements are point measurements made during a measurement campain. Satellite measurements have global coverage and span several years. (bullit) Validation consists of finding, for example, less than 10 profiles out of 28,000 in coincidence with one lidar profile for a given day. Within EDG we have two tools available for metadata: The RMC and Spitfire. We will focus on RMC.

Demonstation outline Replica Metadata Catalogue (RMC) usage Profile processing Using RMC to register metadata of resulting output Profile validation Using RMC to find coincidence files RMC usage from the command line Will show the content of the RMC, the attributes we use. Show result of the validation Our demonstration will show profile processing, validation, and how the metadata is stored in the RMC. Profile processing and Validation are jobs that takes some time, therefore we will first start submit these 2 jobs and while they will run we will show you how the RMC can be used and what our catalogue looks like. Then we will see the result of our processing job and validation job.

GOME NNO Processing select a LFN from precompiled list of non- processed orbits verify that the Level1 product is replicated on some SE verify the Level2 product has not yet been processed create a file containing the LFN of the Level1 file to be processed create a JDL file, submit the job, monitor execution During processing profiles are registered in RM and metadata is stored in RMC query the RMC for the resulting attributes We will process a GOME Level 1 orbit file that has been previously replicated on a Grid SE, but has not yet been processed (i.e. The Level2 file is not yet registered in the RC). This to show how our metadata is produced at runtime and how it is stored in the RMC. [Christine starts NNO processing] First we will select a Logical file name from a list of non-processed orbits. Then check if the file is available on some SE, this is done by querying the Replica Manager. The same is done for checking that the level2 file is not there yet. Then a file is created which contains the LFN of the file to be processed. Normally this file will contain a number of filenames, but for the demo only one file will be processed. The next step is to generate a JDL file and submit it to the RB. During processing the metadata is extracted from the resulting profiles. The metadata is stored directly in the RMC. After successful completion, the RMC can be queried to show the resulting attributes. The attributes will be explained later on in the presentation. [if time] We have two processing algorithms for producing ozone profiles, the one we showed is fast and based on Neural Networks and is used for Near Real Time production. The other one is based on optimal interpolation and is slower but more precise and is used for offline processing. Both are validated using the lidar ground based measurements.

Validation Job submission Query RMC for coincidence data LFNs (Lidar and profile data) Submit job, specifying the LFNs found Get the data location for the LFNs from RM Get the data to the WN from the SE and start calculation Get the output data plot Show the result As the validation job will take approximately 10 minutes, we will submit this first before explaining the RMC attributes we use. The validation job compares previously processed ozone profiles with measurements from a Lidar station. [Christine starts validation job] First the RMC is queried for the LFN of the ozone profiles and Lidar measurements which are coincidence in time and geolocation. In our case all profiles are collected in coincidence with a measurement campaign made at the lidar station located at Haut Provence (in France) for the day 29 January 1997 The resulting LFNs are used in the submitted job. A running job on a WN gets its input files using the LFNs (using the RM to locate the SE) and starts the validation calculation. We will see the result of the validation later, as the calculation will take some time to finish

RMC usage: attributes Command area All attributes Of WP9 RMC Result During processing we produced profiles which were stored on SEs, registered in the RM and the extracted metadata stored in the RMC. We will now show the contents of our EO RMC. We do this using a specially built web interface. This web interface constructs the RMC command and submits it and shows the result. Normally we use the command line rmc commands In the area in the middle all the attributes are shown. As we have one table in the RMC database, we have put in all attributes of our four different products (level 1, level 2 NNO, level 2 Opera and Lidar data). For each file type only the valid attributes are filled in, the others are left blank. The ‘collectionname’ attribute is used for differentiation between the different products. The main attributes used for the validation are the datetimestart and datetimestop, as time attributes and the lattiude longitude parameters. [Christine starts creating a ‘select all’ query to show how much we have in the RMC] Christine will now create a query to show the complete contents of the RMC, just to show the amount of profiles produced for the demo. In the command area you will see the build up of the command, the result will be shown in the result area [Cristine starts creating a attribute query] Christine will now show how to create a command to query the RMC for specific profiles, similar to the one used in the validation The result area will show only the LFN which match the query, which is of course only a fraction of the total number of files in the RMC. In this way the validation program does not have to scan through all the available profiles, but can directly only access those of interest. [if time, I.e. validation is not finished…,Christine can show another query?] [Christine now shows validation result and explains it ;)] Result area

Metadata tools comparisons Replica Metadata Catalogue Conclusions, future direction: RMC provides possibilities for metadata storage Easy to use (CLI and API) No additional installation of S/W for user RMC performance (response time) is sufficient for EO application usage More database functionalities are needed: multiple tables, more data types, polygon queries, restricted access (VO, group, sub-group) Many thanks to WP2 for helping us preparing the demo

Backup slides

EO Metadata usage Questions adressed by EO Users: How to access metadata catalogue using EDG Grid tools? Context: In EO applications, large number of files (millions) with relative small volume. How to select data corresponding to given geographical and temporal coordinates? Currently, Metadata catalogues are built and queried to find the corresponding files. Gome Ozone profile validation Usecase: ~28,000 Ozone profiles/day or 14 orbits with 2000 profiles Validation with Lidar data from 7 stations worldwide distributed Tools available for metadata on the Grid: RMC, Spitfire, Muis (operational ESA catalogue) via the EO portal Where is the right file ? CE SE File H File G File F File E File D File C Lat Long. Date File B

Data and Metadata storage Data are stored on the SEs, registered using the RM commands: Metadata are stored in the RMC, using the RMC commands Link RM and RMC: Grid Unique Identifier (GUID)

Usecase: Ozone profile validation Step 1: Transfer Level1 and LIDAR data to the Grid Storage Element Step 2: Register Level1 data with the Replica Manager Replicate to other SEs if necessary Step 3: Submit jobs to process Level1 data, produce Level2 data Step 4: Extract metadata from level 2 data, store it in database using Spitfire, store it in Replica Metadata Catalogue Step 5: Transfer Level2 data products to the Storage Element Register data products with the Replica Manager Step 6: Retrieve coincident level 2 data by querying Spitfire database or the Replica Metadata Catalogue Step 7: Submit jobs to produce Level-2 / LIDAR Coincident data perform VALIDATION Step 8: Visualize Results

Which metadata tools in EDG? Spitfire Grid enabled middleware service for access to relational databases. Supports GSI and VOMS security Consists of: the Spitfire Server module Used to make your database accessible using Tomcat webserver and Java Servlets the Spitfire Client libraries Used from the Grid to access your database (in Java and C++) Replica Metadata Catalogue: Integral part of the data management services Accessible via CLI and API (C++) No database management necessary Both methods are developed by WP2 Focus will be on RMC

Scalability (Demo) this demonstrates just one job being submitted and just one orbit is being processed in a very short time but the application tools we have developed (e.g. batch and run scripts) can fully exploit possibilities for parallelism they allow to submit and monitor tens or hundreds of jobs in one go each job may process tens or hundreds of orbits just by adding more LFNs to the list of orbits to be processed batch –b option specifies the number of orbits / job batch –c option specifies the number of jobs to generate used in this way the Grid allows us to process and register several years of data very quickly example: just 47 jobs are needed to process 1 year of data (~4,700 orbits) at 100 orbits per job this is very useful when re-processing large historical datasets, for testing differently ‘tuned’ versions of the same algorithm the developed framework can be very easily reused for any kind of job

GOME NNO Processing – Steps 1-2 Step 1) select a LFN from precompiled list of non-processed orbits >head proclist 70104001 70104102 70104184 70105044 70109192 70206062 70220021 70223022 70226040 70227033 Step 2) verify that the Level1 product is replicated on some SE >edg-rm --vo=eo lr lfn: 70104001.lv1 srm://gw35.hep.ph.ic.ac.uk/eo/generated/2003/11/20/file8ab6f428-1b57-11d8- b587-e6397029ff70

GOME NNO Processing – Steps 3-5 Step 3) verify the Level2 product has not yet been processed >edg-rm --vo=eo lr lfn: 70104001.utv Lfn does not exist : lfn:70104001.utv Step 4) create a file containing the LFN of the Level1 file to be processed >echo 70104001.lv1 > lfn Step 5) create a JDL file for the job (the batch script outputs the command to be executed) >./batch nno-edg/nno -d jobs -l lfn -t run jobs/0001/nno.jdl -t

GOME NNO Processing – Steps 6-7 Step 6) run the command to submit the job, monitor execution and retrieve results >run jobs/0001/nno.jdl –t Jan 14 16:28:45 https://boszwijn.nikhef.nl:9000/o1EABxUCrxzthayDTKP4_g Jan 14 15:31:42 Running grid001.pd.infn.it:2119/jobmanager-pbs-long Jan 14 15:57:36 Done (Success) Job terminated successfully Jan 14 16:24:01 Cleared user retrieved output sandbox Step 7) query the RMC for the resulting attributes ./listAttr 70517153.utv lfn=70517153.utv instituteproducer=ESA algorithm=NNO datalevel=2 sensor=GOME orbit=10844 datetimestart=1.9970499E13 datetimestop=1.9970499E13 latitudemax=89.756 latitudemin=-76.5166 longitudemax=354.461 longitudemin=0.1884