The Future of Metadata Denise Bedford World Bank Presentation to Fall Metadata Forum November 2, 2005 Department of Homeland Security.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

® IBM Research © 2006 IBM Corporation Faceted Logic, Ontologies, and Wikis: Possible Approaches for ONTOLOG Content John Boz Handy-Bosma, Ph.D., Senior.
Ontology Assessment – Proposed Framework and Methodology.
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Why, what were the idea ? 1.Create a data infrastructure, 2.Data + the knowledge products that are produced on the basis of data a) Efficiant access to.
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
1 Transportation Librarians Roundtable Transportation Research Thesaurus: WSDOT Use Cases February 14, 2008 Andy Everett Metadata Repository Administrator.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Environmental Terminology System and Services (ETSS) June 2007.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Managing Records in 21st Century Stories from the World Bank Group.
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
ACCESS TO QUALITY RESOURCES ON RUSSIA Tanja Pursiainen, University of Helsinki, Aleksanteri institute. EVA 2004 Moscow, 29 November 2004.
IBE312: Ch15 Building an IA Team & Ch16 Tools & Software 2013.
A Future Vision of Invisible And Rigorous Records Management Denise A. D. Bedford, Ph.D. Visionary Senior Information Officer Information Quality Group.
Redefining Perspectives A thought leadership forum for technologists interested in defining a new future June COPYRIGHT ©2015 SAPIENT CORPORATION.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Introduction to UDDI From: OASIS, Introduction to UDDI: Important Features and Functional Concepts.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
1 Developing an Ontolog Ontology Denise A. D. Bedford April 13, 2006.
Landing the Raven: Positioning the Knowledge Discovery System in the Enterprise Wendi Pohs, Iris Associates
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
Achieving Semantic Interoperability – Architectures and Methods Denise A. D. Bedford Senior Information Officer World Bank.
What You Need before You Deploy Master Data Management Presented by Malcolm Chisholm Ph.D. Telephone – Fax
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
© 2008 IBM Corporation ® IBM Cognos Business Viewpoint Miguel Garcia - Solutions Architect.
Interfacing Registry Systems December 2000.
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
Delivering business value through Context Driven Content Management Karsten Fogh Ho-Lanng, CTO.
Markup and Validation Agents in Vijjana – A Pragmatic model for Self- Organizing, Collaborative, Domain- Centric Knowledge Networks S. Devalapalli, R.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
EPA’s Environmental Terminology System and Services (ETSS) Michael Pendleton Data Standards Branch, EPA/OEI Ecoiformatics Technical Collaborative Indicators.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
1 Everyday Requirements for an Open Ontology Repository Denise Bedford Ontolog Community Panel Presentation April 3, 2008.
FEA DRM Management Strategy Presented by : Mary McCaffery, US EPA.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
WEB PAGE CONTENTS VERIFICATION AGAINST TAGS USING DATA MINING TOOL IKNOW VІI scientific and practical seminar with international participation "Economic.
National Library of Finland Strategic, Systematic and Holistic Approach in Digitisation Cultural unity and diversity of the Baltic Sea Region – common.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Enterprise Taxonomies – Finding the LCD to Support Interoperability
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Managing Records in 21st Century
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Taxonomies, Lexicons and Organizing Knowledge
Lifecycle Metadata for Digital Objects
ece 627 intelligent web: ontology and beyond
NSDL Data Repository (NDR)
2. An overview of SDMX (What is SDMX? Part I)
1/18/2019 Transforming the Way the DoD Manages Data Implementing the Net Centric Data Strategy using Communities of Interest Introduction
2/15/2019 Transforming the Way the DoD Manages Data Implementing the Net Centric Data Strategy using Communities of Interest Introduction
Malte Dreyer – Matthias Razum
Database Design Hacettepe University
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Presentation transcript:

The Future of Metadata Denise Bedford World Bank Presentation to Fall Metadata Forum November 2, 2005 Department of Homeland Security

Meta-Future Most of our information use and access today is based on an anonymous access model Most of our information use and access today is based on an anonymous access model It is increasingly clear that anonymous access to information and the packaging of information for single use contexts is neither sufficient for users nor an efficient use of development/engineering resources It is increasingly clear that anonymous access to information and the packaging of information for single use contexts is neither sufficient for users nor an efficient use of development/engineering resources We need to think in terms of contextualization and sensitization of information so that it can be used in any context where it pertains We need to think in terms of contextualization and sensitization of information so that it can be used in any context where it pertains In the future, information will flow – information, not the systems in which it lives or was created, will be our focus In the future, information will flow – information, not the systems in which it lives or was created, will be our focus Information needs to be agile and mobile – it needs to be sensitized to the contexts in which it might be used, to the interests of those who might use it, and to the applications that might consume it Information needs to be agile and mobile – it needs to be sensitized to the contexts in which it might be used, to the interests of those who might use it, and to the applications that might consume it

Meta-Future Envision a future like that described in the Netcentric Information Models formulated by the Dept. of Defense Envision a future like that described in the Netcentric Information Models formulated by the Dept. of Defense Information is created, tagged, posted and shared Information is created, tagged, posted and shared Any applications or users can – according to security privileges – use any information they can find, in any application they need to use to do their work Any applications or users can – according to security privileges – use any information they can find, in any application they need to use to do their work Technology becomes increasingly invisible but more logic based Technology becomes increasingly invisible but more logic based More and different kinds of information such as reference sources need to be managed and maintained More and different kinds of information such as reference sources need to be managed and maintained This meta-future is heavily dependent upon the existence of rich, conceptual, sensitized, meaningful metadata This meta-future is heavily dependent upon the existence of rich, conceptual, sensitized, meaningful metadata This future is now – it is simply a practical view of the Semantic Web This future is now – it is simply a practical view of the Semantic Web

The problem with metadata This future sounds wonderful and the contextualization vision is exciting but there’s just one problem…metadata This future sounds wonderful and the contextualization vision is exciting but there’s just one problem…metadata Metadata…. Metadata…. –Is expensive and time consuming to create –Is sometimes subjective and not granular enough –Doesn’t always address the ways that users and systems think about the information it describes –May not tell us enough about the information to trust it –may address only one context – the context for which it is created –May lives in the source application where it was created –May not be as accessible as the information asset If a Meta-Future depends on metadata, we have to solve these problems If a Meta-Future depends on metadata, we have to solve these problems

The problem with technologies Many of the tools are so tightly integrated, you might generate rich metadata, but it will not make your information agile or mobile Many of the tools are so tightly integrated, you might generate rich metadata, but it will not make your information agile or mobile Statistical clustering engines do not get us to persistent meaning or contextualization. Clustering engines are great for thresholding or pattern tracings, but they will not generate the kind of metadata we need to realize this future Statistical clustering engines do not get us to persistent meaning or contextualization. Clustering engines are great for thresholding or pattern tracings, but they will not generate the kind of metadata we need to realize this future We need semantic engines at the base of all our metadata efforts, and these engines need to be available in multiple languages -- semantics vary by language We need semantic engines at the base of all our metadata efforts, and these engines need to be available in multiple languages -- semantics vary by language Magic black box approaches are neither meaningful nor sustainable -- you need to have access to the programs through a user-friendly interface so you can adapt them to your environment without having to have programming knowledge Magic black box approaches are neither meaningful nor sustainable -- you need to have access to the programs through a user-friendly interface so you can adapt them to your environment without having to have programming knowledge You need to have several different kinds of technologies to do what I’m going to describe today – not just one tool You need to have several different kinds of technologies to do what I’m going to describe today – not just one tool

Content Dimension User Dimension Information Diffusion (Context Sensitive – Group)_ Information Gathering & Transformation (Context Sensitive – Person) Understanding the Dimensions of Contextualization Topic Scheme Business Activity Scheme Centralized Collections Content Elements & Structure (XML) Content Metadata Ideas & Tacit Knowledge Content Quality Management Topic Thesaurus Anonymous Access (Context Free) Institutional Roles Institutional Profiles Communities Of Practice Communities SDI Social Groups Social Group Profiles Individual Profiles Individual Profiles Browsing Parametric Searching By Source Searching By Tools Programmatic Metadata Capture Results Clustering Text Classification Personal SDI Social Group SDI Individual Discovery Individual Learning Task Oriented SDI Directories of Expertise Concept Filtering Threshold Filtering User-User Profile Matching Sense Making Content Repurposing Collaborative Filtering Content Aggregation Recommender Engines Publishing Syndication Engines Business Process Awareness Community Building Social Filtering Knowledge Sharing Advisory Services Q&A Systems Concept Extraction Task Filtering Results Sorting Searching Country Scheme Region Scheme Bank’s Business Language Collection Development Policy Translation Systems Organizational Entities Client Profiles Partner Profiles Authorization Rules Authentication Rules Metadata Management Context Dimension Workflow Management Online Training

Vision of Contextualization We need to address metadata challenges not in a traditional way but in the future context – with the idea that metadata is contextualizable and sensitized – to support information agility and mobility We need to address metadata challenges not in a traditional way but in the future context – with the idea that metadata is contextualizable and sensitized – to support information agility and mobility In order to achieve contextualization you need to have ‘extreme metadata’ In order to achieve contextualization you need to have ‘extreme metadata’ –Metadata about the information –Metadata about the user –Metadata about the context –Rich metadata designed to meet many functional requirements –Metadata in multiple languages Metadata needs to be ‘interpretable’ for and in a context Metadata needs to be ‘interpretable’ for and in a context –Reference sources not only for traditional metadata but for all of the relationships and logic that are present in an ontology (simply different kinds of taxonomy representations) –Metadata must reflect any context or interest that a user might express –Still need to have some control over metadata in order to make it understandable in different contexts

Content Entity1 Content Elements Content Metadata Topic Class Scheme Business Process Scheme Thesaurus Country Names Region Names Skill Sets/ Competencies Standard Statistical Variables Has values uses Has Contains User Has relationship to Has Meaning in Context Contextual Matrix & Sensiing Contextual Logic uses HierarchyFlat TaxonomyNetwork Taxonomy Profile Has Business Rule Logic Has values Content Parts Has Metadata Has Faceted TaxonomyRing Taxonomy New View of Ontology People Referenced Orgs Referenced Metadata

Getting to Rich Metadata Given the future demand for rich, contextualizable metadata, and all of the traditional drawbacks… how will we achieve this future Given the future demand for rich, contextualizable metadata, and all of the traditional drawbacks… how will we achieve this future We need to look for a different model for creating and sustaining metadata and reference sources We need to look for a different model for creating and sustaining metadata and reference sources We need to teach technologies how to capture the metadata we need and how to maintain our reference sources We need to teach technologies how to capture the metadata we need and how to maintain our reference sources I’d like to show you an example of how we might achieve that future I’d like to show you an example of how we might achieve that future Please keep in mind that I’m showing you an example of what is possible – Enterprise Search, Authority Control/Entity Discovery Please keep in mind that I’m showing you an example of what is possible – Enterprise Search, Authority Control/Entity Discovery

Fueling Semantic Search With Metadata Or, ….if Metadata is Dead, Semantic Web and Semantic Search Are Dead

Flat taxonomy Hierarchical taxonomy Ring taxonomy Fielded Search = Faceted Taxonomy

Ring Taxonomy Network Taxonomy Metadata

More explicit View of faceted taxonomy

Building and Maintaining Taxonomies Moving towards automated metadata generation means that catalogers shift their effort to reviewing the metadata generated and to more fully developing and maintaining subject headings/thesauri and classification schemes as part of a suite of categorization tools Moving towards automated metadata generation means that catalogers shift their effort to reviewing the metadata generated and to more fully developing and maintaining subject headings/thesauri and classification schemes as part of a suite of categorization tools Level of effort shifts to training and developing the tools and away from original cataloging and metadata capture Level of effort shifts to training and developing the tools and away from original cataloging and metadata capture Continue to work closely with subject experts to define the controlled vocabularies and classification schemes Continue to work closely with subject experts to define the controlled vocabularies and classification schemes It means that you have to have a metadata infrastructure that looks something like that ontology we just reviewed It means that you have to have a metadata infrastructure that looks something like that ontology we just reviewed There is no silver bullet ontology tool out there that will do this work for you – your knowledge and skills are critical There is no silver bullet ontology tool out there that will do this work for you – your knowledge and skills are critical

Metadata Capture Methods Identification/ Distinction Use Management Compliant Document Management Human Capture Programmatic Capture Inherit from System Context Extrapolate from Business Rules Search & Browse

Smart Use of Technologies Sample structure – Bank Topics Classification Scheme (hierarchical taxonomy) Sample structure – Bank Topics Classification Scheme (hierarchical taxonomy) –Oracle data classes used to represent Topic Classification scheme  hierarchical taxonomy as reference source for the attribute – Topic  used for Browse, Search, Content Syndication, Personalization –1 st challenge is to architect the hierarchy correctly  3 distinct data classes, not a tree structure with inheritance  Allows you to use the three data classes for distinct functions across systems but still enforce relationships across the classes

Relationships across data classes 3 Oracle Data classes

Topic data class

Subtopic Data Class

Subsubtopic Data class

Categorizing and Indexing Content Let’s look at how we’re categorizing our content to this structure automatically Let’s look at how we’re categorizing our content to this structure automatically Topic classification, geographical region assignment, keywording examples Topic classification, geographical region assignment, keywording examples Can apply this approach to any kind of content Can apply this approach to any kind of content Enables us to build a robust metadata repository model, with strong metadata quality, to move towards SI at the functional level Enables us to build a robust metadata repository model, with strong metadata quality, to move towards SI at the functional level Also note that we can do this across many languages Also note that we can do this across many languages

Semantic Analysis Using The Technologies to Best Advantage Semantic analysis tools which support concept extraction, categorization, summarization and pattern matching rules engines Semantic analysis tools which support concept extraction, categorization, summarization and pattern matching rules engines Teragram works in 23 languages Teragram works in 23 languages Use categorization to capture Topics, Business Activities, Regions, Sectors, Themes, etc. Use categorization to capture Topics, Business Activities, Regions, Sectors, Themes, etc. Use Concept Extraction to capture keywords Use Concept Extraction to capture keywords Use Rules Engine to capture Loan #, Credit #, Project ID, Trust Fund #, etc. Use Rules Engine to capture Loan #, Credit #, Project ID, Trust Fund #, etc. Use Summarization to generate a ‘gist’ of the content Use Summarization to generate a ‘gist’ of the content

How does semantic analysis work?

Semantic Analysis Basics Once you have made some sense of the sentence (decompose), reconstruct entities for information extraction (compose) Once you have made some sense of the sentence (decompose), reconstruct entities for information extraction (compose) –Identify names and other fixed form expressions – people, organizations, actions, relationships, places –Identify basic noun groups, verb groups, formatting elements, logic statements –Construct complex noun groups and verb groups –Identify event structures –Identify common elements and associate

Leveraging the Topic Structure Each subtopic is a knowledge domain (hierarchical taxonomy) Each subtopic is a knowledge domain (hierarchical taxonomy) Each subtopic has an extensive concept level definition (1,000 – 5,000+ concepts) Each subtopic has an extensive concept level definition (1,000 – 5,000+ concepts) Concepts are controlled vocabularies in their raw form (flat taxonomy) Concepts are controlled vocabularies in their raw form (flat taxonomy) Concepts with relationships (extensive per new Z39.19 standard) comprise semantic network (network taxonomy) Concepts with relationships (extensive per new Z39.19 standard) comprise semantic network (network taxonomy) Categorization tools work with topic structure & concept definitions to categorize and index content Categorization tools work with topic structure & concept definitions to categorize and index content The following screen illustrates how that same structure is embedded into Teragram profile to support categorization The following screen illustrates how that same structure is embedded into Teragram profile to support categorization

Subtopics Domain concepts or controlled vocabulary

Extensive operators allow us to write grammatical rules to manage typical semantic problems

Concept based rules engine allows us to define patterns to capture other kinds of data

Example of use of Authority Control to capture country names but extract ‘authorized’ version of country name Example of use of a gazetteer + concept extraction + rules engine to support semantic interoperability

Use of concept extraction + rules engine to capture Loan #, Credit #, Project ID#

Overview of Process & Tools ActivityApproachTools Create new facet Human review & consultation, data structures, governance Oracle DBMS, in future Metadata Repository tools (ISO 11179); Oracle representation of data classes Create new class Human review & harmonization of existing information structures; tool based discovery of new structures through clustering & extraction Teragram dynamic concept extraction using grammars, categorization, clustering; Oracle representation of data classes Create new concept Create training sets working with experts, identify & integrate existing vocabularies Teragram concept extraction, Oracle representation of values Create new relationship Human relationship creation, augmented by technological discovery Teragram clustering engine, MultiTes Thesaurus Management System, Oracle copy of thesaurus relationships Create new metadata Enterprise Profile Development with human review in some cases, no review in others; Metadata in the language of the document/content Teragram enterprise profile leveraging concept extraction, categorization, and summarizaiton

Enterprise Profile Development & Maintenance Enterprise Metadata Profile Concept Extraction Technology Country Organization Name People Name Series Name/Collection Title Author/Creator Title Publisher Standard Statistical Variable Version/Edition Categorization Technology Topic Categorization Business Function Categorization Region Categorization Sector Categorization Theme Categorization Rule-Based Capture Project ID Trust Fund # Loan # Credit # Series # Publication Date Language Summarization e-CDS Reference Sources for Country, Region, Topics Business Function, Keywords, Project ID, People, Organization Data Governance Process for Topics, Business Function, Country, Region, Keywords, People, Organizations, Project ID Teragram Team TK240 Client ISP IRISImageBank Factiva JOLIS E-Journals Enterprise Profile Creation and Maintenance UCM Service Requests Update & Change Requests

ImageBank Integration Content Capture ISP Integration Enterprise Profile Development & Maintenance XML Wrapped Metadata Dedicated Server – Teragram Semantic Engine – Concept Extraction, Categorization, Clustering, Rule Based Engine, Language Detection APIs & Integration APIs & Integration Content Capture XML Wrapped Metadata Factiva Metadata Database IRIS Integration APIs & Integration Enterprise Metadata Capture Strategy TK240 Client XML Output e-CDS Reference Sources APIs & Technical Integration Content Owners Business Analyst IDU IndexersSITRC Librarians IRIS Functional Team Enterprise Metadata Capture – Functional Reference Model

Impacts & Outcomes Information Access impacts Information Access impacts –Increased precision of search –Better control over recall –Searching like we talk –Exact match searching – known item searching will work better –Metadata based searching now begins to resemble full-text searching but with all the advantages of structure & context, and a significant reduction in the amount of noise Productivity Improvements Productivity Improvements –Can now assign deep metadata to all kinds of content –Remove the human review aspect from the metadata capture –Reduce unit times where human review is still used Information Quality impacts Information Quality impacts –All metadata carries the information architecture with it –Apply quality metrics at the metadata level to eliminate need to build ‘fuzzy search architectures’ – these rarely scale or improve in performance –Use the technologies to identify and fix problems with our data

In Progress Impacts Same methodology can be leveraged to develop a structure of lines of business, entities prominent in particular domains, relationships among entities in a domain, standard statistical variables, etc. Same methodology can be leveraged to develop a structure of lines of business, entities prominent in particular domains, relationships among entities in a domain, standard statistical variables, etc. The richer the metadata and the more fully elaborated the reference structures, the closer we come to understanding at a system level what is happening in a particular domain at any point in time The richer the metadata and the more fully elaborated the reference structures, the closer we come to understanding at a system level what is happening in a particular domain at any point in time It is this overall structure which can then be leveraged in other contexts, perhaps even a counter-terrorism context, to threshold events It is this overall structure which can then be leveraged in other contexts, perhaps even a counter-terrorism context, to threshold events Without metadata, though, no information asset can be secured but still its importance known Without metadata, though, no information asset can be secured but still its importance known Without metadata, no information is agile or mobile Without metadata, no information is agile or mobile

Thank You. Questions & Discussions