Download presentation
Presentation is loading. Please wait.
Published byAubrey Horn Modified over 9 years ago
1
1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Ispra, Italy January 17, 2006 Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 bebargmeyer@lbl.gov
2
2 XMDR Project Collaboration F Collaborative, interagency effort u EPA, USGS, NCI, Mayo Clinic, DOD, LBNL …& others F Draws on and contributes to interagency/International Cooperation on Ecoinformatics F Involves Ecoterm, international, national, state, local government agencies, other organizations as content providers and potential users F Interacts with many organizations around the world through ISO/IEC standards committees
3
3 XMDR Project Results: Bootstrapping Semantic Computing F Design for next generation metadata registries—expressed as a standard F XMDR Prototype, open source software F Content loaded in prototype: millions of concepts, terms, and relations between concepts. F Demonstrations for healthcare and the environment
4
4 Metadata Registry Extensions F Register (and manage) any semantics that are useful for managing data. u E.g., this may include registering not only permissible values (concepts), definitions, but may extend to registration of the full concept systems in which the permissible values are found. u E.g., may want to register keywords, thesauri, taxonomies, ontologies, axiomatized ontologies…. F Support traditional data management and data administration F Lay Foundation for semantic computing: Semantics Service Oriented Architecture, Semantic Grids, Semantics based workflows, Semantic Web ….
5
5 Where have we been? Where are we planning to go? System manuals Data dictionaries 11179 E1 11179 E3 XML & related standards Semantic grids 11179 E2 Semantics services (SSOA) Complex semantics management Data engineering Data Standards XMDR Project Semantics: Semantic Web Data + ontology lifecycle management Terminologies, ontologies Data Management/ Data Administration
6
6 XMDR Draws Together Metadata Registry Terminology Thesaurus Themes Data Standards Ontology GEMET Structured Metadata Users Registries Terminology CONCEPT Referent Refers To Symbolizes Stands For “Rose”, “ClipArt”
7
7 Concept System Store Metadata Registry Concept System Thesaurus Themes Data Standards Ontology GEMET Structured Metadata Users Concept systems: Keywords Controlled Vocabularies Thesauri Taxonomies Ontologies Axiomatized Ontologies (Essentially graphs: node-relation-node + axioms) }
8
8 Management of Concept Systems Metadata Registry Concept System Thesaurus Themes Data Standards Ontology GEMET Structured Metadata Users Concept system: Registration Harmonization Standardization Acceptance (vetting) Mapping (correspondences) }
9
9 Life Cycle Management Metadata Registry Concept System Thesaurus Themes Data Standards Ontology GEMET Structured Metadata Users Life cycle management: Data and Concept systems (ontologies) }
10
10 Grounding Semantics Metadata Registry Concept System Thesaurus Themes Data Standards Ontology GEMET Structured Metadata Users Registries Semantic Web RDF Triples Subject (node URI) Verb (relation URI) Object (node URI) Ontologies
11
11 Ontology Editor Protege 11179 OWL Ontology XMDR Prototype Architecture: Initial Modules MetadataValidator AuthenticationService MappingEngine Registry External Interface Generalization Composition (tight ownership) Aggregation (loose ownership) Jena, Xerces Java RetrievalIndex FullTextIndex Lucene LogicBasedIndex Jena, OWI KS Racer RegistryStore WritableRegistryStore Subversion
12
12 Ontology Editor Protege 11179 OWL Ontology XMDR Prototype Architecture: Initial Implemented Modules Registry External Interface Generalization Composition (tight ownership) Aggregation (loose ownership) Java RetrievalIndex FullTextIndex Lucene LogicBasedIndex Jena Racer,etc. RegistryStore WritableRegistryStore Subversion
13
13 UML is Used for 11179 Metamodel, XMDR uses OWL, RDF & XML Schema OWL XMDR Ontology & annotations XMDR’s Relax NG Schema XMDR XML Schema UML11179 Metamodel 11179 Relational Schema Relational Metadata RDF Spec TRang XML Schema Language spec XML Objects Types & Cardinalities What things go in own files? Which property direction stored? Sequential ordering of properties Triples: binary labeled relationships
14
14 Refined XMDR Subclasses Improve Organization & Enable Inference
15
15 XMDR Example Content Loaded from Diverse Sources via LexGrid & XSLT Original Source A Lexgrid Source A XSLT script Harold Solbrig (Mayo Clinic) Concept System A A Concepts A Relationships Content loaded to date: 2.7 million triples
16
16 XMDR Content List (partial) NBII Biocomplexity Thesaurus NCI Thesaurus National Cancer Institute Thesaurus NCI Data Elements (National Cancer Institute Data Standards Registry UMLS (non-proprietary portions) GEMET (General Multilingual Environmental Thesaurus) EDR Data Elements (Environmental Data Registry) USGS Geographic Names Information System (GNIS) HL7 Terminology, Data Elements Mouse Anatomy GO (Gene Ontology) EPA Web Registry Controlled Vocabulary BioPAX Ontology NASA SWEET Ontologies …
17
17 NASA-JPL Semantic Web for Earth and Environmental Terminology F SWEET written in OWL ontology language (W3C) u Can view with Internet Explorer 5+, Netscape 7+, etc. u Can also use OWL-specific tools (e.g., SWOOP, Protégé) F Terms in other taxonomies can be mapped to SWEET using u Global Change Master Directory (GCMD) u CF Standard Names F http://sweet.jpl.nasa.gov/ontology/ –Earth Realms –Physical Phenomena (any transient feature) –Physical Processes –Physical Properties –Physical Substances –Sun Realms –Biosphere Data –Data Centers –Human Activities –Material Things –Numerics –Sensors –Space –Time –Units
18
18 Content Loaded from EPA EDR and NASA SWEET Ontology concepts & relationships XMDR ontology SWEET (OWL) java EDR XMDR files (ontologies)
19
19 What happens to XMDR files before they can be used for text searching or inference? Concept System A A Concepts A Relationships Lucene Lucene indexes Jena Model A Model B XMDR Ontology …etc Text queries (Lucene) Inference queries (Jena) Search/Query results are sets of URLs for xmdr files pictured above Concept System B B Concepts B Relationships etc. … [all xmdr files] [each system (A,B,…etc) loaded individually] Union of all models
20
20 Object Class Chemopreventive Agent Property NSCNumber Conceptual Domain Agent Data Element Concept Chemopreventive Agent NSC Number Data Element Chemopreventive Agent Name Value Domain NSC Code Context caCORE Representation Code Classification Schemes caDSRTraining Valid Values Cyclooxygenase Inhibitor Doxercalciferol Eflornithine … Ursodiol Enterprise Vocabulary Services (EVS) Concepts Unite NCI MDR But how can we search/query such a complex system of metadata and vocabularies?
21
21 How to Search/Query Complex Concepts & Relationships New Proposed Objects Current 11179 Objects
22
22 How Can Terminologies and Ontologies Help Manage Metadata? F At the metadata registry schema level (ISO/IEC 11179 metamodel) u Ontologies specify formal relationships u Compute across the nodes and relations in the metamodel n Inheritance, aggregation, … u Search sub-classes & inverses, specify semantic pathways for indexing F At the level of metadata (concept system) instances in a registry u Compute across the nodes, relations and axioms in concept systems u Connect metadata entities via shared terms n Via automatic indexing of metadata words n Via text values from specific metadata elements
23
23 XMDR RDF Graph Query Facilities Compliment Text Query Capabilities F SQL-like queries u e.g., names of ontologies in a registry F Span items that are only indirectly connected u e.g., data elements associated with a conceptual domain F Expand queries to subsumed classes in hierarchy u e.g., ConceptualDomain includes EnnumeratedConc.. F Transitivity u e.g., all subclasses subsumed by a higher order class u e.g., all superclasses (ancestors) of a particular class F Least common ancestor u e.g., closest subsuming concept for 2 concepts
24
24 Example Subclass Queries: (Inference with Transitivity) F Environmental: u What are all the (sub)types of Wetland (in SWEET)? RDQL: SELECT ?x WHERE (?x rdfs:subClassOf earthrealm:Wetland) USING earthrealm FOR F Health u Find all the types of "Lung Carcinoma"
25
25 More Complex “Sibling” Queries: Concepts with Multiple Ancestors F Health u Find all the siblings of Breast Neoplasm n Note: This is complex, since Breast Neoplasm has two parents - Neoplasm by Site and Breast Disorder -- You would get returned both the by site Neoplasms, such as Eye Neoplasm, Respiratory System Neoplasm, etc. and the Breast Disorder siblings such as Non-Neoplastic Breast Disorder
26
26 Least Common Ancestor Queries: (Inference with Transitivity) F Health: u "Morphine Sulfate" and "Acetaminophen". n least common ancestor should be Analgesic Agent (with multiple intervening concepts.)
27
27 Searching caDSR for Data Elements via Concepts and Vice-Versa F Common Data Elements (CDEs) are 'connected' to concepts through the Object Class and Property of the CDE. A query such as this should look for the CDE's Object Class derivation rule and select only those data elements associated with those object classes.. Alternatively, you could query the caDSR Concept Class and find all related OCs where the concept was flagged as "primary concept", then get all the Data Elements.. leveraging the ISO 11179 relationships...e.g. Object Class has related Data Element Concepts, DECs have related DEs... Concepts can also be associated with Value Meanings. So, search Concept Class with concept code, find all related Value Meanings, find all Value Domains that used the value meaning, find all Data Elements that used the Value domain.
28
28 Reasoners Use OWL Ontologies to Augment RDF Graph Queries OWL 11179 Ontology OWL built-in rules RDF Query (rdql/nrdql/SPARQL) Reasoner Jena (main memory) result set includes subclasses, inverses, etc. 11179 metadata (xml/rdf files) Jena is a Java framework for building Semantic Web applications. Jena provides a programmatic environment for RDF, RDFS and OWL, including a rule-based inference engine. Jena is open source and grown out of work with the HP Labs Semantic Web Programme. Introduction: http://www-128.ibm.com/developerworks/java/library/j-jena/http://www-128.ibm.com/developerworks/java/library/j-jena/ Jena API Overview: http://www.jdocs.com/jena/2.1/api/overview-summary.html
29
29 Comparison of Different Reasoners (on 2.7m triples)
30
30 Challenges and Future Goals for XMDR Prototype F Scalability & performance F Tools u RDF tool adaptation for metadata registries u User-friendly interface u Form interface for registration & uploading metadata F References to externally maintained sources u Data, ontologies, terminologies F Evaluate alternative technologies u For different modules F Demonstrate for key use cases and ecoinformatics applications
31
31 Challenges and Future Goals (cont) F Progress proposals through standards committees F Harmonization with W3C and OMG standards F Incorporate Common Logic, Web Services, etc. F Ontology Lifecycle Management (OLM) F Improve link of concepts to data F Generate schemas from axiomatized ontologies
32
32 Ecoinformatics Challenges F How does this fit into the research, development, and demonstration activities of the Interagency/International Cooperation on Ecoinformatics? F Should this be a part of the EU-US collaborative R&D?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.