Presentation is loading. Please wait.

Presentation is loading. Please wait.

BADC, BODC, CCLRC, PML and SOC

Similar presentations


Presentation on theme: "BADC, BODC, CCLRC, PML and SOC"— Presentation transcript:

1 BADC, BODC, CCLRC, PML and SOC
Distributed Data, Distributed Governance, Distributed Vocabularies: The NERC DataGrid Bryan Lawrence (on behalf of a big team, and note also a substantial piece of work with specific authorship included herein) Introduce myself and Wendy and ask them to contact her. + + + + +[ ]= BADC, BODC, CCLRC, PML and SOC

2 Outline Motivation Standards Taxonomy Overall Architecture
Feature Types Taxonomy Overall Architecture NDG Products Discovery Portal Data Extractor MOLES (NumSim relationship with NMM) CSML Description Prototyping in MarineXML Round-Tripping Vocabulary Issues IN NDG (Hughes, Kondapalli, Lowry) NDG Timeline

3 Complexity + Volume + Remote Access = Grid Challenge
British Atmospheric Data Centre NCAR British Oceanographic Data Centre

4 Integration – semantics
Want interdisciplinary semantic access to information, not abstract data getData(potential temperature from ERA-40 dataset in North Atlantic from 1990 to 2000) not: getData(“era40.nc”, ‘PTMP’, 20:50, 300:340, 190:200) or even worse: for j=1990:2000 getData(“era40_”+j+“.nc”, ‘PTMP’, 20:50, 300:340) Lossy is OK! Care less about completeness of representation than semantic unification What we really want is access to *semantic information*, not files. For the previous example, we’d like to be able to make a semantic query – e.g. all the temperature data in the North Atlantic between 1990 and Instead, typically we have to make some messy loops and know file names, file formats, and internal structures. For the present, we’re more interested in providing a semantic view of the *core information* than preserving every single piece of associated metadata inside the file.

5 Standards ISO 19101: Geographic information – Reference model
…in a defined logical structure… …delivered through services… …and described by metadata. A geospatial dataset… …consists of features and related objects…

6 Standards Geographic ‘features’ Application schema
“abstraction of real world phenomena” [ISO 19101] Type or instance Encapsulate important semantics in universe of discourse “Something you can name” Application schema Defines semantic content and logical structure ISO standards provide toolkit: spatial/temporal referencing geometry (1-, 2-, 3-D) topology dictionaries (phenomena, units, etc.) GML – canonical encoding [from ISO “Geographic information – Rules for Application Schema”]

7 Architecture: NDG Metadata Taxonomy
CSML NCML+CF MOLES THREDDS CLADDIER DIF -> ISO19115 … not one schema, not one solution!

8 Users NDG GUI Interface(s) Data Providers NDG Core Services
Vocab Services Users NDG GUI Interface(s) Data Providers Architecture: Deployment NDG Core Services

9 Users NDG GUI Interface(s) NDG Core Services Vocab Services
Architecture: Deployment NDG Core Services

10 Vocab Services Users NDG GUI Interface(s) Architecture: Deployment

11 Vocab Services Users Architecture: Deployment

12 Current Status

13 Discovery Service http://ndg.nerc.ac.uk/discovery
NDG Products: Discovery Portal NB: Web Service Interface (you can do the search from your own site and format and present the results there!

14

15

16 NDG Products: MOLES Ugly as sin! A hint of things to come: Detailed Browse Service is currently in development.

17 MOLES: implementation
Core linking concept is the deployment of a Data Production Tool at an Observation Station on behalf of an Activity that produces a Data Entity Activity Data Production Tool Observation Station Links the metadata records into a structure that can be turned into a navigable structure Deployment Each of the main metadata objects has security data attached to it. This means that this can be applied to queries on the metadata Use of Xquery Each record has its own security metadata attached, thus giving a degree of granularity to the security. Data Entity

18 Simulators as data production tools: NumSim
NDG Products: NumSim

19 NumSim Example NumSim Example

20

21 NDG Products: DataExtractor

22

23 NDG Products: GEOSPLAT
Background activity being parallelised with GODIVA/CCLRC e-science collaboration (spectral -> gridpoint + CDMS + visualisation tools) NDG Products: GEOSPLAT Download either plot or the data that went into the plot.

24 ERA40: All driven from one CDML file, 9 TB online spherical harmonics, looking like 40 TB “virtual” gridded!

25 NDG-A: Climate Science Modelling Language
Aims: provide semantic integration mechanism for NDG data explore new standards-based interoperability framework emphasise content, not container Design principles: offload semantics onto parameter type (‘phenomenon’, observable, measurand) e.g. wind-profiler, balloon temperature sounding offload semantics onto CRS e.g. scanning radar, sounding radar ‘sensible plotting’ as discriminant ‘in-principle’ unsupervised portrayal explicitly aim for small number of weakly-typed features (in accordance with governance principle and NDG remit)

26 Climate Science Modelling Language
CSML feature types defined on basis of geometric and topologic structure CSML feature type Description Examples TrajectoryFeature Discrete path in time and space of a platform or instrument. ship’s cruise track, aircraft’s flight path PointFeature Single point measurement. raingauge measurement ProfileFeature Single ‘profile’ of some parameter along a directed line in space. wind sounding, XBT, CTD, radiosonde GridFeature Single time-snapshot of a gridded field. gridded analysis field PointSeriesFeature Series of single datum measurements. tidegauge, rainfall timeseries ProfileSeriesFeature Series of profile-type measurements. vertical or scanning radar, shipborne ADCP, thermistor chain timeseries GridSeriesFeature Timeseries of gridded parameter fields. numerical weather prediction model, ocean general circulation model

27 Climate Science Modelling Language
ProfileSeriesFeature CSML feature types examples... ProfileFeature GridFeature

28 Climate Science Modelling Language
Numerical array descriptors provides ‘wrapper’ architecture for legacy data files ‘Connected’ to data model numerical content through ‘xlink:href’ Three subtypes: InlineArray ArrayGenerator FileExtract (NASAAmes, NetCDF, GRIB) Composite design pattern for aggregation

29 Climate Science Modelling Language
Inline array Array generator <NDGInlineArray> <arraySize>5 2</arraySize> <uom>udunits.xml#degreeC</uom> <numericType>float</numericType> <regExpTransform>s/10/9/ge</regExpTransform> <numericTransform>+5</numericTransform> <values> </values> </NDGInlineArray> <NDGArrayGenerator> <arraySize>10001</arraySize> <uom>udunits.xml#minute</uom> <numericType>float</numericType> <expression>0:5:50000</expression> </NDGArrayGenerator>

30 Climate Science Modelling Language
File extract <NDGNASAAmesExtract> <arraySize>526</arraySize> <numericType>double</numericType> <fileName>/data/BADC/macehead/mh cf1</fileName> <variableName>CFC-12</variableName> </NDGNASAAmesExtract> <NDGNetCDFExtract gml:id="feat04azimuth"> <arraySize>10000</arraySize> <fileName>radar_data.nc</fileName> <variableName>az</variableName> </NDGNetCDFExtract> <NDGGRIBExtract> <arraySize> </arraySize> <numericType>double</numericType> <fileName>/e40/ggas rsn.grb</fileName> <parameterCode>203</parameterCode> <recordNumber>5</ recordNumber> <fileOffset>289412</fileOffset> </NDGGRIBExtract>

31 MarineXML Testbed XML XML XSLT XML XSLT XSLT XML XSLT XML XSLT XSLT
For each XSD (for the source data) there is an XSLT to translate the data to the Feature Types (FT) defined by CSML. The FT’s and XSLT are maintained in a ‘MarineXML registry’ Data from different parts of the marine community conforming to a variety of schema (XSD) Phenomena in the XSD must have an associated portrayal XSD The FTs can then be translated to equivalent FTs for display in the ECDIS system XML Biological Species S52 Portrayal Library XSD XML Chl-a from Satellite XML Parser Marine GML (NDG) Feature Types XSLT XML XSLT XSLT SENC SeeMyDENC Measured Hydrodynamics XSD XSLT XML XML XSLT XSLT XSD ECDIS acts as an example client for the data. XML Data Dictionary Modelled Hydrodynamics The result of the translation is an encoding that contains the marine data in weakly typed (i.e. generic) Features XSD Feature described using S-57v3.1Application Schema can be imported and are equivalent to the same features in CSML’ Features in the source XSD must be present in the data dictionary. XML S-57v3 GML Slide adapted from Kieran Millard (AUKEGGS, 2005)

32 MarineXML Testbed Biological sampling station with attributes for the species sampled at each Grid of Chl-a from the MERIS instrument on ENVISAT Predicted and measured wave climate timeseries (height, direction and period) Vectors of currents from instruments Slide adapted from Kieran Millard (AUKEGGS, 2005)

33 The Concept of re-using Features
Here structured XML is converted to plain ascii text in the form required for a numerical model HTML warning service pages are generated ‘on the fly’ Here the same XML is converted to the SENC format used in a proprietary tool for viewing electronic navigation charts. XML can also be converted to SVG to display data graphically All this requires agreement on standards Slide adapted from Kieran Millard (AUKEGGS, 2005)

34 CSML Round Tripping - 1 Managing semantics Conforms to UGAS produces
conceptual model Conforms to Managing semantics 101010 New Dataset Application produces UGAS GML app schema XML <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location> </location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> GML dataset instance parser V1.0 will be in NDG Alpha

35 CSML Round Tripping - 2 CF
Managing data - 1 <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location> </location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> GML dataset scanner V1.0 in NDG Alpha GML app schema XML instance 101010 CF Dataset Application produces CF parser V1.0 in NDG Alpha

36 Managing Data 2 PUBLISH scanner Define Dataset XSLT DECISION PROCESSES
<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location> </location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> GML dataset 101010 CF Dataset scanner DECISION PROCESSES 101010 CF Dataset Define Dataset Add Information XSLT XML PUBLISH ISO19115

37 Architecture: Deployment

38 Vocabulary Management for NERC DataGrid
Michael Hughes, V.Siva Kondapalli and Roy Lowry

39 Vocabulary Presentation Outline
Problem and Solution NERC DataGrid Vocabulary Model Vocabulary Technical Governance Vocabulary Content Governance Mappings and Thesaurus Server Potential Role of Local Mappings

40 The Problem NERC DataGrid cannot function operationally without metadata and data semantic interoperability This will never be achieved without: Readily accessible standard terms whose meaning is clearly understood Readily accessible semantic maps both within and between lists of standard terms Semantic maps between local terms and standard terms

41 The Solution? Implementation of a Vocabulary Server
Building OWL ontologies mapping between domain-relevant de-facto standard vocabularies Deploying the ontologies through a Web Service thesaurus server Making tools available for users to build and deploy local ontologies

42 NDG Vocabulary Model Constraint (Aggregation of lists across which
Entry (timestamped) Key Term Definition List (versioned, timestamped and labelled with key, name and definition) List (versioned, timestamped and labelled with key, name and definition) Constraint (Aggregation of lists across which entry keys are unique)

43 NDG Vocabulary Model The vocabulary resource is built from Entries
The representation of a single object in the real world comprising: Key - A bit pattern that represents an entity. It must be unique, permanent and free from semantics. Term – Text used to label the entity to facilitate human recognition. Abbreviation – An shortened version of the term for use where space is tight. Target size is bytes. Definition – Text that unambiguously specifies the entity. Entries are aggregated into Lists (entity class or subclass e.g. UK post towns) Lists are aggregated into Constraints (entity class e.g. post towns of the world)

44 Vocabulary Technical Governance
The story so far Lists are available as flat ASCII files or XML documents as URLs e.g. ftp://ftp.pol.ac.uk/pub/bodc/jgofs/datadict/new/parameter_group.csv Some (BODC, SEA-SEARCH) include keys Some (CF, BODC) include definitions None are properly versioned

45 Vocabulary Technical Governance
Versioning should: Provide a unique label for each instantiation of the list Enable any previous instantiation of the list to be recreated Provide timestamp information for creation and modification of every object in the vocabulary system Delivery should Be from the master, not a copy Be accessible to software agents to allow automated synchronisation of local copies Have a ‘hotline’ to content governance

46 Vocabulary Technical Governance
NERC DataGrid Vocabulary Server Back End Fully automated record archive, timestamps and version numbering. Live April 2006. 47 (of 115) lists publicly accessible. Front End Web Service API. Live June 2006. XML list downloads from website (July 2006?). Web-form tools (August 2006?).

47 Vocabulary Content Governance
Standard lists need to respond to ever expanding user requirements Change needs to be rapid or users lose interest Standard lists need to maintain information quality and internal consistency Content governance has to resolve these conflicting requirements

48 Vocabulary Content Governance
Content governance in oceanographic and atmospheric domains is based on: Moderated discussion lists ‘Benign Dictator’ and well-meaning volunteers Variable success depending on right people having ‘spare’ time at the right moments More formalism underpinned by more resources required But need to be careful about going too far or levels of service become unacceptable

49 Mappings and Thesaurus Server
There will never be a single list for a given topic Term mapping therefore an essential part of semantic interoperability Marine Metadata Interoperability ( have developed tooling and trialled mappings in the measurement phenomena arena

50 Mappings and Thesaurus Server
MMI approach Harmonise lists to be mapped in OWL (Voc2OWL tool) Map on basis of ‘same as’, ‘broader than’ and ‘narrower than’ relationships (VINE tool) Place a Web Service API over the map to implement a term or thesaurus server

51 Mappings and Thesaurus Server
NERC DataGrid Plans Use MMI technology plus domain expertise available in BODC, BADC and their user communities to build a complete map between BODC Parameter Discovery Vocabulary (300 terms) CF Standard Names (5-600 terms) GCMD Parameter Valids (2-300 relevant terms) Incorporate this map into the NDG Discovery Service to facilitate smart searching (e.g. ‘pigments’ finds dataset labelled ‘chlorophyll’) through MMI Web Service Integrate ontology maintenance into source list maintenance

52 Role of Local Mappings There will always be local terms and understanding ‘Pigment data sets’ could mean: Chlorophyll OR carotenoids OR phaeopigments Chlorophyll AND carotenoids AND phaeopigments Depends on point of view

53 Role of Local Mappings Possible solution to this:
User builds an ontology reflecting local perception of the mapping between local terms and standard terms Discovery or data integration tools use ontology as a ‘plug-in’ allowing user to operate with local terminology Tools (e.g. VINE) could be made available to facilitate this

54 NDG Timeline NDG2 runs until September 2007: NDG-Alpha (June 2006)
Not all components in place (particularly delivery broker) Not many (maybe only DX) products will be deployable by non-NDG participants (too much hard work installing things that haven’t been optimised for installation) Discovery portal will be (is now) usable, linking to NCAR data etc, but isn’t very user friendly (options not obvious etc). NDG-Beta (Feb 2007) Most components should work, but deployment of software may still be difficult by non-participants NDG-Prod (Jun 2007) Should be deployable and far more user friendly (spending from Feb-June working on deployment and friendliness, no new functionality) Last few months working on sustainability etc


Download ppt "BADC, BODC, CCLRC, PML and SOC"

Similar presentations


Ads by Google