Download presentation
Published byCecil Henderson Modified over 8 years ago
1
Enterprise Taxonomies – Finding the LCD to Support Interoperability
Denise A. D. Bedford, Ph.d. Senior Information Officer Information Solutions Group World Bank
2
Making sense of all the terms
What do we mean by an ontology? What do we mean by a taxonomy? What do we mean by semantics and how do they relate? What is architecture, and what kinds of architectures do you need to think about? How does the approach you take impact your ability to achieve interoperability?
3
Context Before I start down this path, please let me provide a little context for what we are trying to do, why, and how we are trying to do it You have all heard about the Bank’s experiences in knowledge management What you may not have heard about, though, is the primary lesson we learned in that initiative – that the basic component of any KM environment is an enterprise-wide integrated information foundation This support a KLE, repurposing of content…. Disclosure Policy implementation
4
Big Picture Enterprise Functional Architecture
Site Specific Searching Publications Catalog World Bank Catalog/ Enterprise Search Recommender Engines Personal Profiles Portal Content Syndication Browse & Navigation Structures Metadata Repository Of Bank Standard Metadata Reference Tables Topics, Countries Document Types Transformation Rules Data Governance Bodies Metadata Extract Metadata Extract Metadata Extract Metadata Extract Metadata Extract Metadata Extract IRIS Doc Mgmt System Web Content Mgmt. Metadata Board Documents Metadata IRAMS Metadata JOLIS Metadata InfoShop Metadata Concept Extraction, Categorization & Summarization Technologies
5
How to get there What approach should we take? Many business processes with entirely appropriate business systems supporting them 10,000 experts working in 30 subject domains, performing work in 30 business lines, generating all kinds of content, in at least six languages, 60 years of knowledge in various formats – most of it print Will an ontology work? Probably, but what does it look like?
6
Ontology - Definitions
..in the context of knowledge sharing, an ontology means a specification of a conceptualization …a description or specification of the concepts and relationships that can exist for an agent or a community of agents …set of concept definitions, or definitions of a formal vocabulary …ontological commitment is an agreement to use a vocabulary in a way that is consistent with the specifications
7
Our interpretation of definition
Architecture that can be developed to define the structural aspects of a domain – complex architecture that supports resource description, categorization/classification, and semantics Semantics pertains to language, including the sense-making, lexical, morphological aspects Semantics are sometimes confused with information architecture with the result being suboptimal information, functional, technical and presentation architectures
8
Facet as Core Component
Domain Lexicon or Thesaurus Topic Classification Languages Lines of Business Classification
9
Taxonomies, ontologies, topics, oh my!
What do you think when you hear the word, taxonomies? A hierarchy…. Partially correct, but not completely correct There are four types of taxonomies, but one typically overshadows the others A quick review of taxonomies
10
Definition of a taxonomy
“System for naming and organizing things into groups that share similar characteristics” Taxonomy Review of Basic Concepts & Approaches Essential concepts include: content metadata & metadata repositories navigation architectures search architectures portal architectures Information architectures in portals and web environments are increasingly complex, integrating technologies to deliver value to users. Seamless integration is the goal. Taxonomies are critical data structures that help us to build a sustainable foundation, to integrate efficiently and to manage the complexity. Whole information architecture is more adaptable and sustainable over time if it adheres to established standards and best practices. Whole Information Architecture is more cost effective if new technologies currently being researched or under development can be integrated without re-engineering the base. Like the content and processes they support, each component of a portal has a life cycle. Architectures Semantics
11
Facet Taxonomy Architecture
Faceted taxonomy architecture looks like a star. Each node in the star structure is associated with the object in the center.
12
Flat Taxonomy Architecture
Energy Environment Education Economics Transport Trade Labor Agriculture
13
Hierarchical Taxonomy Architecture
A hierarchical taxonomy is represented as a tree architecture. The tree consists of nodes and links. The relationships become ‘associations’ with meaning. Meanings in a hierarchy are fairly limited in scope – group membership, Type, instance. In a hierarchical taxonomy, a node can have only one parent.
14
Critical Distinction It is critical to understand the difference between a classification hierarchy which is used to build collections of objects that fit the definition of a class, and concept hierarchies which build broader/narrower relationships between concepts. The reason for this will be clear as we walk through the different build approaches - it impacts the ‘edge’ condition for domain contextuality
15
Network Taxonomy Architecture
A network taxonomy is a plex architecture. Each node can have more than one parent. Any item in a plex structure can be linked to any other item. In plex structures, links can be meaningful & different.
16
Critical Distinction It is critical to understand where you have the ability to define the meaning of the linkages or relationships between concepts in this structure Linkages carry different meanings – the most primitive of which is statistical co-occurrence If you are using semantic engines that can expose verb groups (links) as well as noun groups (nodes) and begin to categorize the type of VG’s associated with distinct NG’s, then you are taking the first step towards sense making If you use only Bayesian or regression analysis, instead of building meaning into links over time, you will shift the meaning each time you add content
17
Are there gaps or lapses when we move from theory into practice?
Beyond basics…. Are there gaps or lapses when we move from theory into practice?
18
So, how do you build an ontology or an enterprise content architecture?
Does it matter where you begin? What do you mean by the ‘top’ of the ontology? Is it the ‘top’ or starting point in the interface? Or, by ‘top’ do you mean the core components in the overall architecture? How will the approach you select facilitate or constrain your ability to grow the ontology over time? Let’s look at a typical approach taken around country today
19
Does where you start matter?
How does your decision of ‘top’ or ‘core’ impact interoperability later? What happens when you start with categorization or classification (hierarchy)? What happens when you start with faceted description? What happens when you start with network structure?
20
Starting with a hierarchy,…
This is how most ontology builds begin today Most of the ontologies I have seen begin with the programmatic topical clustering of concepts – different than classification of content Clustering of concepts is based on statistical modeling of single word concepts that appear in a training set of documents rather than noun-phrases and verb-phrase decomposition/recomposition
21
Starting with a hierarchy,…
Apply morphological rules to compensate after clustering for co-occurrence of words that are actually multiword concepts Concepts may or may not be contextualized - identified as the type of concept The cluster structure can be shallow or deep depending on statistical associations Clusters are progressively refined through statistical concentration
22
Topic Hierarchies – Two Actions Creating Structures & Classifying Content
Topic classification structures are: created dynamically using statistical models applied to electronic content structures can be created by humans with content dynamically associated with the structure humans can create structures and define the rules for associating content with each class
23
Creating a Faceted Classification Structure
When the structure becomes more than two dimensions, ie transforms from a hierarchy to a faceted taxonomy, how do you meet the application level requirements without (1) compromising the hierarchy, or (2) building extensive and unique code to create the new application layer? What is the blue classes and facets all represented policy statements – how would the user find them?
24
What happens when you add in the Real Semantics?
Morphological Rule & Sense Problems… Semantics describing different aspects of policies When you add the Real Semantics into the structure, you’re creating a semantic network structure at each classification node, depending on how you defined your classes, and potentially for each facet attached to a node. In a faceted structure, you may need to apply the semantic network to common facets that are attached to different nodes, at different levels of the classification scheme, or miss some facets that logically use the semantics
25
Taking that approach,…. You’ve built an architecture that is potentially confounded and disjointed at all levels – how will you build better contextual search system? You’re applying the solution to content at multiple levels in the architecture – defies the OHIO principle from even a system perspective The architecture shifts each time you add new content – users have to relearn the interface everytime they use it
26
Taking that approach,…. You’ve excluded content that has electronic surrogates, but is only available in print – domain information value varies – this could be a significant gap Information, functional, technical and presentation architecture are too tightly connected – no flexibility to design open and efficient technical architecture over time, to repurpose information architecture or design presentation architecture for different audiences If you used one tool to built this complex architecture and a better component comes along, how much value will you lose?
27
Taking hierarchy approach,….
Best you can do is to throw a full-text search engine against all of the architecture and the content – this results in complete loss of context Rather than leveraging domain lexicons which are already well defined for established domains, or can easily be developed for emerging domains, we revert back to Kindergarten grammars Maintenance and use are a nightmare at all levels Multilingual access requires either an exponentially complex (to do it well) or a simple (and inherently suboptimal) solution
28
Starting with Network Structure
This means starting at the very bottom – preferably at the concept level, but too often only at the word level This approach is used to dynamically discover ‘clusters’ described earlier. You can use this approach to discover clusters which can be converted into classes However, if you use this approach to try to discover classes across domains – a simple statistical approach will surface what is statistically significant (the very generic concepts across all domains) and will submerge the domain specific concepts
29
Beginning at the Concept Level – Network Structures
What happens to structure when you add new documents? What happens to concepts when you move old documents out of current index structures? Or when a new concept is introduced? How do you apply and manage this structure explicitly? How do you integrate it into an existing architecture?
30
End Game Strategy These two approaches assume navigation – and it assumes that we can present more or deeper paths whenever a problem arises, or it assumes that we push all of the options to the interface for selection This approach assumes that the architecture is explicit – what is implicit and where is it in the overall design? Taking this approach is an efficient system application and maintenance nightmare Sure, you can find a way to solve problems at each level or each extension, however you end up with something that is potentially worse than what we used to call ‘spaghetti code’
31
Starting with Faceted Structure
Facet as core enables multiple ‘top’ and ‘deep’ ontologies, in effect builds the structure based on the access points for all kinds of users Look at your users and define the End Game… Individual citizen looking for IRS information that pertains to her situation Homeland Security analyst looking for linkages across events, products, names, countries (each is a facet that provides a context) DoD Program Officer and DARPA Program Officer looking for proposed or funded programs with common elements Dept. of Interior looking for successful techniques to maintain brush growth in Western forests
32
Facet as Core Component
Domain Lexicon or Thesaurus Topic Classification Languages Lines of Business Classification
33
Taking facet approach,….
Anchor the basic description framework on objects – any and all kinds of objects Allows you to extend the architecture up and down – from ‘whole’ to ‘components within a container’ to creating ‘containers of wholes’ Allows you to integrate electronic & print content Allows you to apply class scheme values to objects, but maintain them as distinct sources outside of a single application Allows you to develop deep domain semantics and apply at facet level – again maintain as distinct sources outside of a single application
34
Back to Interoperability
Which of these approaches has a better chance of achieving true interoperability in dynamic environments? Which of these approaches has a better chance of fitting into and sustaining an Enterprise Architecture to achieve integration across source systems – at any level? Can we separate the information, functional, technical and presentation architectures to get there? Key question is how do you anchor your ontology, or your enterprise taxonomy structure?
35
Hierarchy Network Facet Recommended
Model Description Recommendation Hierarchy Begin with a structure with constraints and intuitive meanings for users – major constraint to interoperability with impact to usability Complex coding and technical architecture required to support multiple access points for users – access point proliferation across domains increases complexity Can add facets at different class levels, but users become confused when explicitly implemented While it may appear to pull together like content at high level it can scatter content very quickly walking down the hierarchy Impacts business systems not recommended Network Basic approach surfaces commonality across domains which may be the most generic of concepts Cannot effectively be explicitly implemented across domains Too detailed for explicit implementation without a context Is best managed with overall classification scheme when implemented across domains Facet Supports extensive reference description of all kinds of objects Supports interoperability at the access point level Supports interoperability or distinction of classification Supports integration and harmonization at the network level No impact to business system Recommended
36
Interoperable Enterprise Functional Architecture
Site Specific Searching Publications Catalog World Bank Catalog/ Enterprise Search Recommender Engines Personal Profiles Portal Content Syndication Browse & Navigation Structures Metadata Repository Of Bank Standard Metadata Reference Tables Topics, Countries Document Types Transformation Rules Data Governance Bodies Metadata Extract Metadata Extract Metadata Extract Metadata Extract Metadata Extract Metadata Extract IRIS Doc Mgmt System Web Content Mgmt. Metadata Board Documents Metadata IRAMS Metadata JOLIS Metadata InfoShop Metadata Architecture Interoperability Semantic Interoperability
37
Technical Architecture for Interoperability
Content Contributor End User Content Systems DELIVERY Metadata Management and Security Services ePublish PDS …. access rules Content Access Services Content Management Services view multilingual srch workflow create/del. check in/out retention schedule search syndication versioning declare Metadata browsing notification Business Activity Topic Class Scheme Content Integration and Archives Services relate Connector Concept extraction rules evaluator harmonize Adapter thesaurus Series Names monitors SAP (R/3, BW) Notes / Domino Archives Store Over Time Documents, Images, Audio, Data records Metadata warehouse logs People Soft iLAP Repositories Services Business Systems
38
Metadata as Faceted Taxonomy
Identification/ Distinction Search & Browse Compliant Document Management Use Management Flat Taxonomy Hierarchical Taxonomy Network Taxonomy Faceted Taxonomy
39
Understand Behavior of facets
Each facet in the enterprise structure may be understood to be an ontology There is an architecture, there are semantics – we have commitments to use both the architecture and the semantics Implementing the commitments, though, requires a different strategy for each facet – depending on how it behaves Back to taxonomies for a minute….
40
Types of Tools to Use Concept Extraction Classification & Clustering
Identification/ Distinction Search & Browse Compliant Document Management Use Management Pattern Matching & Rule Based Capture Classification Concept Extraction & Clustering
41
Given your End Game,… What kind of interoperability do you need?
What kind of an enterprise architecture do you need to support it? What do you have to work with? What kinds of tools do you need to build it? What kinds of tools do you need to sustain it?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.