Brian A. Carlsen Apelon, Inc. Tools For Classification Integration Networked Knowledge Organization Systems/Services Workshop June 28, 2001.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
Database System Concepts and Architecture
Software Modeling SWE5441 Lecture 3 Eng. Mohammed Timraz
Chapter 7 Structuring System Process Requirements
The Role of the UMLS in Vocabulary Control CENDI Conference “Controlled Vocabulary and the Internet” Stuart J. Nelson, MD.
Information Retrieval in Practice
L4-1-S1 UML Overview © M.E. Fayad SJSU -- CmpE Software Architectures Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Understanding Metamodels. Outline Understanding metamodels Applying reference models Fundamental metamodel for describing software components Content.
1/31 CS 426 Senior Projects Chapter 1: What is UML? Chapter 2: What is UP? [Arlow and Neustadt, 2005] January 22, 2009.
“DOK 322 DBMS” Y.T. Database Design Hacettepe University Department of Information Management DOK 322: Database Management Systems.
Developed by Reneta Barneva, SUNY Fredonia Component Level Design.
1 CS 426 Senior Projects Chapter 1: What is UML? Chapter 2: What is UP? [Arlow and Neustadt, 2002] January 26, 2006.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Overview of Search Engines
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2007 National Library of Medicine National Institutes of Health U.S. Dept. of Health.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Overview of Mini-Edit and other Tools Access DB Oracle DB You Need to Send Entries From Your Std To the Registry You Need to Get Back Updated Entries From.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall 9.1.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2005 May 16 & 17, 2005 Rachel Kleinsorge.
1 Chapter 15 Methodology Conceptual Databases Design Transparencies Last Updated: April 2011 By M. Arief
Introduction to MDA (Model Driven Architecture) CYT.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
Interfacing Registry Systems December 2000.
Chapter 9 Moving to Design
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Collaborative Modeling Best Practices for Distributed Teams Ben Constable Chief Operations Officer Sparx Systems CIM Users Group Meeting,
Ontology Summit2007 Survey Response Analysis Ken Baclawski Northeastern University.
UMLS Unified Medical Language System. What is UMLS? A Unified knowledge representation system Project of NLM Large scale Distributed First launched in.
Unified Modeling Language* Keng Siau University of Nebraska-Lincoln *Adapted from “Software Architecture and the UML” by Grady Booch.
L6-S1 UML Overview 2003 SJSU -- CmpE Advanced Object-Oriented Analysis & Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College.
SupervisorStudent Prof. Atilla ElciHussam Hussein ABUAZAB June 2007 Using ORACLE XML Parser to Access Ontology CMPE 588 Engineering Semantic for.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Shawn Jones INDUS Corporation January 18, 2000 Open Forum on Metadata Registries Santa Fe, NM SDC JE-2029.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
© 2012 Saturn Infotech. All Rights Reserved. Oracle Hyperion Data Relationship Management Presented by: Prasad Bhavsar Saturn Infotech, Inc.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Information Retrieval in Practice
Databases (CS507) CHAPTER 2.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The UMLS and the Semantic Web
Methodology Conceptual Databases Design
Search Engine Architecture
Methodology Conceptual Database Design
What is UML? What is UP? [Arlow and Neustadt, 2005] October 5, 2017
Lec 6: Practical Database Design Methodology and Use of UML Diagrams
Databases.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
The Re3gistry software and the INSPIRE Registry
NKOS workshop Alicante, 2006
2. An overview of SDMX (What is SDMX? Part I)
Database Design Hacettepe University
KNOWLEDGE MANAGEMENT (KM) Session # 35
Methodology Conceptual Databases Design
Managing Private and Public Views of DDI Metadata Repositories
Reportnet 3.0 Database Feasibility Study – Approach
Presentation transcript:

Brian A. Carlsen Apelon, Inc. Tools For Classification Integration Networked Knowledge Organization Systems/Services Workshop June 28, 2001

2 Presentation Outline State of the UMLS Metathesaurus State of the UMLS Metathesaurus Life-cycle of a Source Tools and Processes Challenges Further Approaches

3 State of the UMLS Metathesaurus Concept orientation, concept persistance Growth to over 800,000 concepts and over 60 vocabulary families Over 1000 users worldwide Uses of the Metathesaurus Natural Language Processing Natural Language Processing Knowledge Representation Knowledge Representation Patient Record Systems Patient Record Systems Linking Patient Data to Knowledge Sources Linking Patient Data to Knowledge Sources Automated Indexing/ Retrieval Automated Indexing/ Retrieval

4 Concept and Name Counts By Release Year

5 English Word, String Counts by Release Year

6 Outline State of the UMLS Metathesaurus Life-cycle of a Source Life-cycle of a Source Tools and Processes Challenges Further Approaches

7 Life-cycle of a Source: Inversion Source arrives in “machine readable” format* Many formats are used, including PDF, Clipper dump files, WordPerfect files, unit-record formats, and relational flat files. Source undergoes “inversion” Requires a human Input is this machine readable file Process is source-specific Output is a common relational flat-file format used internally.

8 Life-cycle of a Source: Insertion A “Recipe” is created Test insertion to validate recipe Insertion and matching. Load common format into database Match to existing content algorithmically Use string normalization Determine SAFE vs. UNSAFE matches Prepare data for editing Process is fully undoable

9 Life-cycle of a Source: Editing Predicate-based partitioning Workflow management Review ALL content for new sources Review UNSAFE content for updates Human Review QA Driven Editing Source-specific QA Feedback QA Conservation of Mass QA

10 Life-cycle of a Source: Release Synchronize editing changes State-based model Release data in desired format Full release/partial release Transform base release “MetamorphoSys” Remove unlicensed data Create “Content Views”

11 Outline State of the UMLS Metathesaurus Life-cycle of a Source Tools and Processes Tools and Processes Challenges Further Approaches

12 Tools and Processes: Overview Humans vs. Computers Humans are good at making content decisions Computers are good at automating tasks Tools vs. Processes Tools enable computers to automate tasks Processes keep humans productive.

13 Tools and Processes: Pre-Editing No common data representation Source-by-source conversion to common format Perl, Unix tools What would a common format need? Represent terms and attributes Represent within-source relationships Represent hierarchies Represent external-source relationships Represent classifications (e.g. Concept)

14 Tools and Processes: Editing Workflow Management Report Generation State Model vs. Action Model Actions represented as new states vs. Single state + actions as data Human Editing Interface enabling “high level cognitive editing” LVG: String Normalization Automated Editing Save vs. Unsafe, Integrities

15 Tools and Processes: Release License Agreements Content Views e.g. Indexing View Filter by Semantic Type Filter by Language Alternative Release Formats Updates MetamorphoSys

16 Outline State of the UMLS Metathesaurus Life-cycle of a Source Tools and Processes Challenges Challenges Further Approaches

17 Challenges: Ambiguity Ambiguous Strings e.g. “Cold” Solution: Disambiguating strings, Preferred Names with “face validity”, Integrity checks when merging. Not fully specified Strings e.g. “Head of Pancreas” within “Malignant Neoplasm of Pancreas” Solution: Fully specified preferred name.

18 Challenges: What is a Classification? A classification is any grouping of terms with a consistent semantics. Thesauri typically group terms by meaning into concepts (synonymy). Alternatives Neighborhoods (e.g. Descriptors in MeSH). Near-synonymy No classification (identity or term classification). Lexical Connecting relationships/attributes to classifiers

19 Challenges: Precedence Concepts (or other classifications) generally have a preferred name A thesaurus will have terms from different sources competing for precedence Source precedence should be a user-level choice Preferred name should not be used as a proxy for concept-ness Every level of classification should have a preferred term Preferred name exists primarily for “face validity”

20 Challenges: Update Model Constituent sources of a thesaurus will be updated Editing cycle Updated sources will require editing Typically overlap is > 90% Overlap can safely replace the old version’s content Safe replacements should not be edited Ideally, source providers would indicate replacement otherwise it must be computed Release Release changes

21 Outline State of the UMLS Metathesaurus Life-cycle of a Source Tools and Processes Challenges Further Approaches Further Approaches

22 Further Approaches: Description Logic What is it? Concepts (or other classifications) are axioms Relationships (roles) are theorems The transitive closure of the roles across the concepts is computed to ensure no violations. e.g. A isa B, B isa C, C isa A (!violation) When is it useful? In formalized, static domains like Anatomy When is it not useful? Performance > formalism In dynamic, loosely coupled domains like Genomics

23 Further Approaches: Standards XML Standardized Terminology/Ontology Representation XML is the most likely candidate Ideally would support Links to external sources Relationships between different levels of classification Update model Description Logic Metadata Standardized Thesaurus Representation XML Repository Standard Object Representations

24 Conclusion: Lessons Learned Use the Web Use current technology Use Description Logic where appropriate Make editing intuitive Automate tasks “A well-understood, reproducible, automated process that succeeds 95% of the time is a vast improvement over a poorly-understood, labor-intensive process that is believed to succeed 100% of the time. “ Review UNSAFE automated tasks. Stop automating when marginal utility falls below a threshold.