Download presentation
Presentation is loading. Please wait.
1
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk The NERC DataGrid – Building Bridges for the Environmental Sciences Bryan Lawrence Kerstin Kleese, Roy Lowry, Kevin O’Neill, Andrew Woolf & others Head, NCAS/British Atmospheric Data Centre Rutherford Appleton Laboratory, CCLRC
2
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk NDG Partners As funded a partnership between –British Atmospheric Data Centre (BADC, PI: Bryan Lawrence) –British Oceanographic Data Centre (BODC, Co-I: Roy Lowry) –CLRC E-science Centre (Co-I: Kerstin Kleese) –PCMDI at LNL in the US (Dean Williams, Bob Drach, Mike Fiorino) Project has caught the imagination, extra funding now supports: –A number of groups at the NERC Centre for Ecology and Hydrology (CEH: Ecology DataGrid) –NERC Earth Observation Data Centre & Plymouth Marine Lab Remote Sensing Not directly funded major collaborators will include: –ClimatePrediction.net, GODIVA (NERC e-science projects) –NCAS/CGAM: The Centre for Global Atmospheric Modelling at the University of Reading (via Lois Stenman-Clark and Katherine Bouton) –Already required to provide technology to support the major UK project: HIGEM (a collaboration between the Hadley Centre and the NERC academic community to develop the next generation of high resolution GCM models based on HadGEM).
3
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Outline Motivation: –The BADC, BODC, and the Metadata Gateway The NDG Goal NDG Metadata Structures and Architecture –Metadata Model –Data Model –ISO Context NDG Prototype Status Summary & Challenges
4
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk The British Oceanographic Data Centre (not for much longer, moving to a site on Liverpool University campus imminently)
5
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk BODC Mission Statement vTo operate a world class data centre in support of UK marine science by: providing data management support for UK marine science projects maintaining and developing the UK’s national oceanographic database developing innovative marine data products and digital atlases collaborating, on behalf of the UK, in the international exchange and management of oceanographic data making high quality data readily available to UK research scientists in academia, government and industry
6
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk British Atmospheric Data Centre The Role: Key words: Curation and Facilitation!
7
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk BADC Users 3800 registered in March03 ~ 300 individual users per month Users by Discipline November 02, 2150 Users
8
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk BADC Storage Capacity Approx 50 TB (Nov02) Projected to quadruple well within next couple of years given existing commitments Planning exercise under way now. Committed to keeping as much as possible on spinning disk Further backup and extra storage at national archival centre (ATLAS, PB soon) 2.5Gb
9
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Huge variety of Data Sets
10
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Querying datasets Complex Metadata, held in Ingres database: export DIF and Z39.50 No possibility of automatic data usage …
11
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Different types of data returned: Wallingford Supporting very diverse user community: NetCDF is not enough …
12
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk NERC Metadata Gateway - SST No clean handover from discovery to browse and use! Geospatial coordinates forgotten. Time reference forgotten. Need to get entire field(s), and find correct time! And if I want to compare data from different locations? - multiple logins - multiple formats - discovery?
13
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Outline Motivation: –The BADC, BODC, and the Metadata Gateway The NDG Goal NDG Metadata Structures and Architecture –Metadata Model –Data Model –ISO Context NDG Prototype Status Summary & Challenges
14
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk The NERC DataGrid
15
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Wider Internet Research Group Satellite SuperComputer Shared Resources DB Research Group Metadata Origins Consider a hierarchy of data users beginning with an individual scientist, who may herself be part of a research group, itself part of a community sharing resources, lying in the wider internet … To be well integrated the metadata should have a role at each level! (The data portal client and server interface may be different at each level). At each level “extra” metadata will be required, probably produced by dedicated staff at the research group, or data centre.
16
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk A google for data; the metadata carrot! Wider Internet Researc h Group Satellite SuperComputer Shared Resources DB Researc h Group
17
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Outline Motivation: –The BADC, BODC, and the Metadata Gateway The NDG Goal NDG Metadata Structures and Architecture –Metadata Model –Data Model –ISO Context NDG Prototype Status Summary & Challenges
18
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk NDG Metadata Taxonomy
19
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk
20
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk
21
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk
22
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Separate data (A) and metadata (B) models Clear separation of function –Difference between data use and discovery etc. –“Tuning” of metadata to include relevant detail Allows increased reuse of metadata model –Avoids tie-in to details of a particular fields data formats –Can plug-in another data model Metadata Model Data Model Data granule ID Data summary
23
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk (A) NDG Data Model: Overview Dataset: named container for a number of variables Variable: physical parameters within the dataset; controlled vocabularies eg BODC datadictionary, CF standard names Array: multidimensional container for other arrays or numeric data Coordinate: may be shared between multiple Arrays; ‘anonymous’ if not georeferenced; MappedCoordinate vs ProductCoordinate; with respect to a Coordinate reference System (ref ISO 19111, ISO 19115) GranuleDescriptor: describes data granule in terms of file storage; enables file aggregation; SQL/OGSA-DAI for RDBMS; physical or logical (eg SRB) files “Profiles” of model defined for important data types
24
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk NDG Data Model Array
25
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk (B) Metadata Model
26
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk (B) Metadata Model: an NDG Intermediate Schema, Conceptual Overview
27
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Outline Motivation: –The BADC, BODC, and the Metadata Gateway The NDG Goal NDG Metadata Structures and Architecture –Metadata Model –Data Model ISO Context NDG Prototype Status Summary & Challenges
28
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk ISO 19101: Geographic information – Reference model ISO 19103: Geographic information – Conceptual schema language ISO 19107: Geographic information – Spatial schema ISO 19108: Geographic information – Temporal schema ISO 19109: Geographic information – Rules for application schema ISO 19111: Geographic information – Spatial referencing by coordinates ISO 19115: Geographic information – Metadata ISO 19118: Geographic information – Encoding ISO 19121: Geographic information – Imagery and gridded data ISO TC211
29
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Dataset title Dataset reference date Dataset responsible party Metadata point of contact Dataset language Dataset character set Dataset topic categoryAbstract describing dataset Spatial resolution of dataset Spatial representation type Geographic location of dataset Vertical/temporal extent for dataset Reference system Lineage Distribution format On-line resource Metadata character set Metadata date stamp Metadata standard name Metadata standard version Metadata file identifier Metadata language ISO19115
30
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Metadata extensions and profiles ISO Direct relationship between ISO19115 and our (B) Intermediate schema.
31
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Profiling of ISO 191xx “The comprehensiveness and large number of options available in various base standards make it difficult to combine them for practical applications. … A profile integrates a set of base standards and/or modules (predefined subsets) of base standards to meet a specific implementation requirement.” Registration of profiles “A profile that is registered through an ISO registration procedure becomes an International Standardized Profile (ISP). National standards that are expressed as profiles of ISO base standards may be registered at a national level.” ISO19101
32
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Further Application in NERC DataGrid eg Data model “Coordinates” ISO 19111 ISO 19108
33
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Outline Motivation: –The BADC, BODC, and the Metadata Gateway The NDG Goal NDG Metadata Structures and Architecture –Metadata Model –Data Model –ISO Context NDG Prototype Status Summary & Challenges
34
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk The Data Use Chain
35
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Key Components – need APIs and standards Globus Harvest
36
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk NDG Discovery Service Element Traditional and Grid Service (GT3) Interfaces
37
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Starting with the LAS Deployment for UK users within a few weeks (constraint is primarily access control)
38
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk LAS – Simple Box fill Output Work for us to do: Labelling is inadequate as yet.. ERA40
39
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Cache management in LAS/CDAT Calls cdms.open to open data file. CDAT BADC/CDAT intercepts command and checks cache BADC/CDAT YES Spectral file is converted on-the-fly and placed in cache. NO Cache unlocked. New cdms.open command sent to CDAT and cache file opened. Cache also checks if enough room, deletes oldest files if necessary and checks against disk space limit. Locks access to cache. Checks if regular gridded file is in cache list. localCache.py 18 TB virtual dataset LAS ERA-40 4 TB Spectral Archive ERA-40 < 1TB Grid Cache Internet User NetCDF file, plot or animations delivered to user. Data object delivered to LAS.
40
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk NERC DataGrid Prototype (by hand) Ingestion of ACSOE data from BADC and BODC. NASA GCMD DIF based discovery –Exported from Intermediate Schema –Harvested by hand Working on hand-over-mechanism to pass dataset info to DataModel based LAS service –Generate and populate LAS database in response –Use standard LAS delivery Next Steps: GT3 based services, improve LAS, improve delivery, implement multiple datamodel profiles, implement multiple discovery services.
41
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk Summary NDG project running for a year now, aiming to provide grid- enabled tools to support: –a diverse community –with diverse datasets NDG part of the UK National E-science programme, and will leverage off other projects to implement grid solutions. –initial prototype web-service based –GT3 prototype due early in the new year Software development based on plagiarising the maximum amount from other groups, and a standards based approach within the NDG. –All code will be in the public domain Major challenge will not be technical; policy, attitudes, legal issues.
42
CAS2K3, September 2003 BADC: badc.nerc.ac.uk, NERC DataGrid: www.ndg.badc.rl.ac.uk You’ve gone TOO FAR!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.