Download presentation
Presentation is loading. Please wait.
Published byAbner Pope Modified over 9 years ago
1
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence Berkeley National Laboratory Presentation to Open Metadata Forum Kobe, Japan March 21, 2006
2
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan2 XMDR means: Extended Metadata Registry
3
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan3 The Cast ● Bruce Bargmeyer (LBNL) = Principal Investigator ● Kevin Keck (LBNL) = architect & stds. (design) ● Frank Olken (LBNL) = content characterization & stds. (design) ● John McCarthy (LBNL) = prototype development (management) ● Karlo Berket (LBNL) = prototype development ● Harold Solbrig (Mayo) = content preprocessing via LexGrid, stds ● Gayle Hodge (USGS) = content characterization, acquisition ● Denise Warzel (NCI) = content acquisition, standards, design ● Larry Fitzwater (EPA) = program mgt. (vision, direction) ● Nancy Lawler (DOD) = program mgt. (vision, direction) ● Sam Chance (DOD) = program mgt. (vision, direction)
4
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan4 Organizational Cast ● Lawrence Berkeley National Laboratory ● Environmental Protection Agency ● National Cancer Institute ● Mayo Clinic ● United States Geological Survey ● Department of Defense
5
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan5 Goals ● Assist revisions of ISO/IEC 11179 Metadata Registry Standard to encompass additional semantic descriptions and resources Vocabularies, thesauri, etc. Ontologies Relationships Semantic types ● Design and implement prototype Extended Metadata Registry ● Load metadata content into prototype ● Demonstrate prototype
6
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan6 Why Metadata Registries? ● Facilitate reuse/standardization/integration/exchange of data ● Design time: Database / messaging / application / forms designers Data warehouse design ● Run-time: Query formulation / optimization Federated data query optimization / processing Extraction, Translation, Load (ETL) of Data Warehouses Semantic services, composition, workflows,... ● Users Finding, understanding data Understanding data entry forms
7
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan7 Why Standards? ● Developing metamodel to serve as design for next generation metadata registries ● Evolve ISO/IEC 11179 Metadata Data Registry Standard Edition 2 (current) ● UML modeling, relational DB technology implementation Edition 3 (new) ● UML + OWL (Ontology Web Language) / MOF (Meta Object Facility) / CL (Common Logic) modeling ● Add support for ontologies
8
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan8 More on Why MDR Standards? ● MDR Standards Can improve metadata creation practice Can improve metadata and data reuse Facilitate MDR adoption by organizations Facilitate MDR interoperability Facilitate MDR software marketing Facilitate MDR procurement Facilitate alignment / mapping among metadata schemas,...
9
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan9 Proposed Changes to ISO/IEC 11179 ● Support for ontologies, etc. ● More formal modeling of relationships ● Semantic types (?)
10
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan10 Changes to ISO/IEC 11179 Std. ● Add support for ontologies, vocabularies Add ontologies Add predicates (logical formulae) Add axioms (asserted to be true) Add support for modularization of ontologies ● Add inclusion mechanisms for concept systems and ontologies ● Assert axioms in context of containing ontology
11
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan11 Why add support for ontologies? ● More precise specification of data semantics (than natural language definitions) ● Machine processing of semantic specifications of data Classification, subsumption testing, alignment, spatial, temporal reasoning ● Reusable semantic specifications for subject domains ● Conceptual data models to facilitate data integration ● Encoding of much current work on data semantics and terminologies as ontologies ● Useful for machine learning.
12
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan12 Issues in Including Ontologies in ISO/IEC 11179 ● Lack of agreement on logical formalisms FOL, description logic (which?),... ● Hence, MDR std must be agnostic among logic formalisms ● Poses difficulties for: Standards specification MDR implementation MDR interoperability ● See work of OMG Ontology Definition Metamodel (ODM) standard
13
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan13 Changes to ISO/IEC 11179 Std. ● Formalize specification of semantic relationships Refinement of Edition 2 Classification Schemes Add relationships (types), roles, links (instances) among concepts Specify attributes of relationships ● Reflexivity, irreflexivity, symmetry, anti-symmetry, transitivity To support inference across semantic relationships ● e.g., transitive closure over is-a, part-of,...
14
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan14 Relationship Modeling in ISO/IEC 11179 Edition 3 ● Edition 2 has classification schemes and specialized relationships among various metamodel entities ● Proposed for Edition 3 ● Binary and N-ary semantic relationships among concepts (a.k.a. relations) ● Treat data element concept, conceptual value domain, value meaning, etc. as subtypes of concept ● More detailed characterization of relationships: Roles / links Reflexivity, symmetry, anti-symmetry, transitivity,....
15
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan15 Why care about relationship characterization? ● Who cares about reflexivity, irreflexivity, symmetry, transitivity? ● Answer: need this information for inference on semantic relationships (usually binary) Example: Does it make sense to compute transitive closure? ● Is-a: transitive ● Part-of: sometimes transitive ● Equals: transitive, symmetric ● Similar: usually symmetric, typically not transitive
16
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan16 Semantic Types for ISO/IEC 11179 ● ISO/IEC 11179 Edition 2 has “datatypes” Associated with “value domain” i.e., datatypes are an aspect of representation NOT semantics ● Semantic Types Concern meaning rather than representation Uses: ● Constraints over relationship roles ● Attribute of concepts, conceptual value domains,... ● Ubiquitous in ontologies, schemas,...
17
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan17 Some Issues for Semantic Types ● Alternative approaches: Build semantic types into 11179 metamodel Reuse relationships for semantic type specifications Treat semantic types as unary predicates in ontologies + axioms ● Should we have a standard set of semantic types (at least base types) Yes, for interoperability No, for flexibility ● Collection types, type constructors ?
18
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan18 Why Construct A Prototype? ● To explore alternative revisions to ISO/IEC 11179 ● To demonstrate that proposed revisions to ISO/IEC 11179 Metadata Registry Std. are: Feasible Useful ● To experiment with alternative architectures / technologies for constructing extended metadata registries. Text retrieval engines - Lucene Inference engines – Jena, Kowari (?),.... Service oriented architecture (SOA) ● To facilitate deployment of revised ISO/IEC Metadata Registries Example implementation Open Source Code !
19
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan19 Why Content? ● Content characterization assists in shaping revisions to ISO/IEC 11179 ● Content characterization assists in selection of content to load ● Content ingestion, installation, querying provides a means to exercise the prototype Testing Demonstration Performance evaluation Utility evaluation
20
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan20 Metadata Content Activities ● Content Characterization e.g., graph theoretic characterization ● Content Acquisition ● Content Preprocessing Into standard formats for loading (H. Solbrig) ● Content Loading ● Content Querying
21
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan21 Desiderata for Content Selection ● Accessibility Licensing, source cooperation, unclassified ● Documentation, familiarity to XMDR collaborators ● Funder interest ● Diversity of metadata types, subject areas ● Diverse graph structures (of semantic relationships) ● OWL encodings available ● Moderate size ● Opportunities for mappings among metadata sets ● Multi-linguality
22
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan22 Content Characterization ● Provenance: Name, source, contact,... ● Type of metadata: thesauri, ontology, ISO/IEC 11179 metadata registry,... ● Graph Characterization Tree, Faceted Classification, partial order (directed acyclic graph), cyclic graph,... ● Size: # concepts, # links, # bytes ● Definitions ? ● File Formats ● OWL encoding ? ● Multilingual ● Availability / licensing issues
23
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan23 Why Graph-theoretic Content Characterization? ● Important structural taxonomy ● Impacts: Expressivity required of registry Content representation, index structures Search, matching algorithms Computational complexity of search, matching,... Inference algorithms Computational complexity of inference Design / implementation / performance of metadata registries
24
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan24 Loaded content metadatasets ● National Cancer Institute Thesaurus (NCIT) ● Defense Technology Information Center (DTIC) Thesaurus ● General Multilingual Environmental Thesaurus (GEMET) ● Adult Mouse Anatomical Dictionary ● EPA Terms of the Environment ● ISO 3166 Country Codes ● ISO 4217 Currency Codes
25
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan25 Other Metadatasets of Interest ● NCI Cancer Data Standards Repository (caDSR) ● EPA Environmental Data Registry (EDR) ● NLM Uniform Medical Language System (UMLS) ● USGS Geographic Names Information System (GNIS) ● Integrated Taxonomic Information System (ITIS) ● NBII Biocomplexity Thesaurus ● ISO 639 Language Identifiers ● Logical Observations, Identifiers, Codes (LOINC) ● Getty Thesaurus of Geographical Names (TGN) ● NASA Semantic Web Earth and Environmental Terminologies (SWEET) ● Dublin Core Metadata (?)
26
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan26 Conclusions ● XMDR Activities ISO/IEC 11179 Revisions ● Support for ontologies, etc. ● Relationships ● Semantic types Prototype Development Content (characterization, loading, query) Prototype testing, performance evaluation, demos
27
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan27 Coming in Second Part of Talk (Kevin Keck) : ● Detailed discussion of the architecture and technology of the prototype...
28
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan28 Acknowledgements ● Financial support from U.S. Dept. of Defense, U.S. Environmental Protection Agency ● In kind contributions from U.S. National Cancer Institute, Mayo Clinic, US Geological Survey ● Support from program managers: Nancy Lawler (DOD) and Sam Chance (DOD) ● Comments on drafts of this talk by John L. McCarthy
29
2006-03-219th Open Forum on Metadata Registries, Kobe, Japan29 Contact Information: ● Project: http://xmdr.org/ http://xmdr.org/ ● Frank Olken: Lawrence Berkeley National Laboratory Email: olken@lbl.govolken@lbl.gov Tel: 510-486-5891 URL: http://www.lbl.gov/~olkenhttp://www.lbl.gov/~olken
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.