Ontologies: Introduction and Some Uses Boyan Brodaric Bertram Ludäscher
Uses of Concept Spaces / Ontologies Concept Browsing and Searching find concept C, find all related concepts D; display: C ==R==>D “Smart Data Discovery” find instances of data sets X that are related to C: X = {....your-tagged-data-here...} ==R==> C searching for instances of D ... ... and knowing that C ==IS-A==> D ... we can find X ! => requires “Smart Source Registration” Integrated Views and Querying access, iterate over, aggregate, group-by, ... concepts
Glue Knowledge for Semantic Mediation: Unified Medical Language System (UMLS) Started by National Library of Medicine in 1986 ... to aid the development of systems that help health professionals and researchers retrieve and integrate electronic biomedical information from a variety of sources and to make it easy for users to link disparate information systems, including computer-based patient records, bibliographic databases, factual databases, and expert systems. The UMLS project develops "Knowledge Sources" that can be used by a wide variety of applications programs to overcome retrieval problems caused by differences in terminology and the scattering of relevant information across many databases.
Medical Subject Headings (MeSH) Tree Structures
Finding out about .... Paleontology Find *paleo* Find related concepts
Combining Ontologies: UMLS and Gene Ontology
UMLS Concept Space as Relational Tables concept(CUI, LUI, SUI, STR) CUI = concept ID LUI = lexical ID SUI = string ID STR = string representation relationship(CUI1, REL, CUI2, RELA, SAB, SL) REL = {chd (child), par (parent), sib (sibling), ...} RELA = {isa, has_part, adjacent_to, contains, contained_in... } SAB,SL = origin of definition (MeSH2001)
Model-Based Mediator Architecture (XML-Wrapper) CM-Wrapper USER/Client CM (Integrated View) Mediator Engine FL rule proc. LP rule proc. Graph proc. XSB Engine CM(S) = OM(S)+KB(S)+CON(S) GCM CM S1 CM S2 CM S3 CM Queries & Results (exchanged in XML) Domain Maps DMs Process Maps PMs “Glue” Maps GMs semantic context CON(S) Integrated View Definition IVD First results & Demos: KIND prototype, formal DM semantics, PMs [SSDBM00] [VLDB00] [ICDE01] [NIH-HB01] [BNCOD02] [ER02] [EDBT02] [BioInf02]
Source Contextualization & DM Refinement In addition to registering (“hanging off”) data relative to existing concepts, a source may also refine the mediator’s domain map... sources can register new concepts at the mediator ...
Demonstration: Using Ontologies in Queries/Views find data sets that are “inside” X inside = logical_inside PLUS spatially_insde logical_inside uses UMLS, and NEURONAMES spatially_inside uses Oracle-Spatial visualize @ client
Mediator View Definition Query Processing Demo Contextualization CON(Result) wrt. ANATOM. Mediator View Definition DERIVE protein_distribution(Protein, Organism,Brain_region, Feature_name, Anatom, Value) WHERE I:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> {AS:anatomical_structure[name->Anatom]}] , % from PROLAB NAE:neuro_anatomic_entity[name->Anatom; % from ANATOM located_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value]. provided by the domain expert and mediation engineer deductive OO language (here: F-logic) Query results in context
Inside Query Evaluation: Another Example "How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?” push selection @SENSELAB: X1 := select targets of “output from parallel fiber” ; determine source context @MEDIATOR: X2 := “find and situate” X1 in ANATOM Domain Map; compute region of interest (here: downward closure) @MEDIATOR: X3 := subregion-closure(X2); @NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors); compute protein distribution @MEDIATOR: X5 := compute aggregate(X4); display in context @MEDIATOR/GUI: display X5 in context (ANATOM)
Ecological Metadata Language (EML): Useful for Marking up GEON Data?