Extended Metadata Registries and Semantics (Part 2: Implementation) Karlo Berket Ecoterm IV Environmental Terminology Workshop April 18, 2007 Diplomatic Academy of Vienna Vienna, Austria
printed 7/14/2006 9:05 AM page 2 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt XMDR Prototype Progress Outline REST API Revised packaging of XMDR prototype code Content loading Demonstrate current XMDR Prototype
printed 7/14/2006 9:05 AM page 3 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt XMDR Prototype Modular Architecture Registry Store Search & Content Serving (Jena, Lucene) XMDR metamodel (OWL & xml schema) standard XMDR files Logic Index Content Loading & Transformation (Lexgrid & custom) Human User Interface (HTML fromJSP and javascript; Exhibit) Metadata Sources concept systems, data elements USERS Web Browsers…..Client Software Application Program Interface (REST) Authentication Service Validation (XML Schema) Mapping Engine Logic Indexer (Jane & Pellet) Text Indexer (Lucene) Metamodel specs (UML & Editing) (Poseidon, Protege) XMDR data model & exchange format XML, RDF, OWL Text Index
printed 7/14/2006 9:05 AM page 4 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt REST API (Search Methods)
printed 7/14/2006 9:05 AM page 5 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt REST API (Search Results)
printed 7/14/2006 9:05 AM page 6 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt REST API (Registry methods)
printed 7/14/2006 9:05 AM page 7 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt REST API (Registry Results)
printed 7/14/2006 9:05 AM page 8 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt REST API (Method Parameters)
printed 7/14/2006 9:05 AM page 9 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt Revised XMDR Text and SPARQL Searches run using REST API
printed 7/14/2006 9:05 AM page 10 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt Results Display
printed 7/14/2006 9:05 AM page 11 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt Faceted Browsing of Search Results NOTE: only interface-specified attributes are included in results from text searches.
printed 7/14/2006 9:05 AM page 12 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt Content loading, with XMDR metamodel used for inferred indexing and validation CONTENT Terminology A Terminology D Terminology B Thesaurus C Data Element Source E Terminology Source F Ontology Source G External Source H VALIDATIONTRANSFORMATIONREGISTRY INFORMATIONINDEXING Lexgrid Reasoner (Pellet) Text Indexing (Lucene) Inferred LogicIndex Asserted LogicIndex Full Text Text Index Search & Inference Framework (Jena) XSLT script E XSLT script F XSLT script G XSLT script G XMDR Files A XMDR Files D XMDR Files B XMDR Files C XMDR Files E XMDR Files F XMDR Files G XMDR Files H (virtual) Subversion XMDR metamodel In XML schema From OWL
printed 7/14/2006 9:05 AM page 13 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt Streamlined Content Loading Data loading process enables parallel processing –(1) raw files create inferred triples, –(2) load everything to DB –(3) create text index from DB rather than from raw XMDR files –old one created text index from raw XMDR files; then infer/load DON'T NEED XML SCHEMA to know item mappings XMDR software uses separate Jena models for diff concept systems -- so can be done in parallel on diff machines
printed 7/14/2006 9:05 AM page 14 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt Example content currently loaded into XMDR Prototype Concept Systems via Lexgrid NBII_ biodiversity NCI_Thesaurus_06.02d health GEMET_ Multilingual Environmental Thesaurus ISO4217_1981 currency codes ISO3166_V-10 country codes (only 2 letter codes) Mouse_1.32 anatomy DTIC_1.0 Department of Defense Portions of EPA controlled vocabulary SIC and NAICS Concept Systems & Ontologies via special purpose scripts Omega ontology (reloading) Data Element Registries caDSR (full NCI Cancer Data Standards Registry via ca-Core API) (reloading)
printed 7/14/2006 9:05 AM page 15 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt Load Times
printed 7/14/2006 9:05 AM page 16 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt Load Times (w/out NCI Thesaurus)
printed 7/14/2006 9:05 AM page 17 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt Load Times (zoom in)
printed 7/14/2006 9:05 AM page 18 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt Loading time for Omega Ontology Omega is a “terminological ontology” reorganization & synthesis of WordNet & Mikrokosmos adds higher level ontology to organize multiple ontologies Ready to try reloading 1 st try required over a week to process & load 4m files, ~250k/24 hrs
printed 7/14/2006 9:05 AM page 19 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt Demo Tomorrow morning
printed 7/14/2006 9:05 AM page 20 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt XMDR Software Split into 2 packages –XMDR Core code (general purpose for RDF/OWL) Can be used with any RDF/OWL data files –11179-specific code Smaller set of software is easier to replace when model changes, etc. Release featuring these changes and more –End of April 2007
printed 7/14/2006 9:05 AM page 21 of xxx XMDR-Prototype-Progress-July-2006-v2.ppt XMDR Prototype Web Site has downloadable code & content Note tabs for other sections!