A Global Open Standard For Building Your Controlled Vocabulary Ron Schuldt, Chair The Open Group UDEF Project July 13, 2009 U U D D E E F F
Agenda The Problem The Missing Metadata Layer A Global Metadata Managed Architecture ISO/IEC – Metadata Registry Standard Universal Data Element Framework (UDEF) Overview Leveraging Online Gap Analysis Sample Controlled Vocabularies The Need for Distributed UDEF Example Distributed UDEF Use Cases UDEF Features Summary Some UDEF Tools
Enterprises Want This External “In” Space Internal Space External “Out” Space Procuring Manufacturing Legal Finance Assembling Customer Support Procurement Systems Design Systems Online Systems ERP Systems Requirements Systems Processes
But Experience This* * Extract from The Open Group SOA White Paper At present, this problem is typically addressed within the services, and by use as far as possible of enterprise-standard and industry-standard vocabularies. In the future, it may be addressed by semantic technology that is incorporated in the infrastructure, which will be a more scalable way to deliver the ideal of Boundaryless Information Flow.
Problem – Global Perspective Each organization is attempting to set its own semantics standard Each must interface with organizations they do not control The problem is the lack of a controlled vocabulary Customers Suppliers Partners OtherRetail Local State Trans Utilities Banks Federal Organization
Problem – Internal Perspective Though semantically equal, the following are 4 different XML tag names App BApp C App A Other Apps Legacy Data Conflicting semantic overlaps between back-office systems
The Goal Current Point-to-Point Approach --- n(n-1) Global Semantics Standard Approach --- 2n $$ Savings Global Semantics Standard Reduce Requirements and Design-Time Phase Semantics Analysis Time and Cost
Data-Information-Knowledge-Wisdom
Data - Information Reference Model Databases Data Formats Information discovery Information retrieval File Servers Repositories Naming Conventions Indexing Methods Registry Ontologies Taxonomies Vocabularies Search engines Inference engines File retrieval Data standards Middleware DBMS Vaults Format standards Change control Structured Data Stores Un-Structured Data Files Data - Metadata Management Data Management Information - Metadata Management Information Management Logical Data Model File Labels Analogous to the OSI 7 layer reference model Example functions and technologies for each layer
Data - Information Reference Model Databases Data Formats Information discovery Information retrieval File Servers Repositories Naming Conventions Indexing Methods Registry Ontologies Taxonomies Vocabularies Search engines Inference engines File retrieval Data standards Middleware DBMS Vaults Format standards Change control Structured Data Stores Un-Structured Data Files Data - Metadata Management Data Management Information - Metadata Management Information Management Logical Data Model File Labels Analogous to the OSI 7 layer reference model Example functions and technologies for each layer The layer typically overlooked
Wiki - Controlled Vocabularies “Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri and taxonomies. Controlled vocabulary schemes mandate the use of predefined, authorized terms that have been preselected by the designer of the vocabulary, in contrast to natural language vocabularies, where there is no restriction on the vocabulary.” subject indexingsubject headingsthesauri taxonomies The above provides a good description of the function that the UDEF offers to any enterprise on the planet since the UDEF is open and is infinitely extensible and can transcend any domain or function of any enterprise.
The Open Group Architecture Framework (TOGAF) The Open Group Vision Statement The Open Group is a vendor-neutral and technology-neutral consortium, whose vision of Boundaryless Information Flow™ will enable access to integrated information, within and among enterprises, based on open standards and global interoperability. TOGAF does a great job with Enterprise Architecture but does not address Boundaryless Information Flow
TOGAF v8.1 Principle 10 Statement “Users have access to the data necessary to perform their duties; therefore, data is shared across enterprise functions and organizations.” Rationale “… The speed of data collection, creation, transfer, and assimilation is driven by the ability of the organization to efficiently share these islands of data across the organization. …” Implications “… need to develop standard data models, data elements, and other metadata that defines this shared environment and develop a repository system for storing this metadata to make it accessible. …” Chapter 29, Data is Shared
Metadata Management l Metadata describes the characteristics of an artifact, such as its name, type of content, sensitivity, lifecycle phase, perceived importance to specific audiences, quality or value to the enterprise, and its relationships to other artifacts. l Managed Metadata with a Controlled Vocabulary … Standard Metadata u Will enable organization to reduce the costs of building and maintaining application interfaces u Is a necessary enabler of a Service Oriented Architecture (SOA) u Will improve data and information assets discovery and reuse … data mining u Will improve knowledge worker productivity To achieve enterprise IT agility requires effective metadata management Invoice No. Qty Total Price Order Date Buyer /20/07 MetadataData John Doe $210.00
Metadata Registry ISO/IEC Standards Part 1: Metadata Registries - Framework Part 2: Metadata Registries - Classification Part 3: Metadata Registries - Registry Metamodel and Basic Attributes Part 4: Metadata Registries - Formulation of Data Definitions Part 5: Metadata Registries - Naming and Identification Principles Part 6: Metadata Registries - Registration
ISO/IEC Summary* Provides guidance for the naming or identification of data constructs administered in a metadata registry. Names are applied to data constructs through the use of naming conventions. There are semantic, syntactic and lexical rules used to form a naming convention. Identifiers are designations meant for dereferencing data constructs for metadata management and exchange. ISO/IEC :2004(E) – section *Extracts from ISO/IEC :2004(E) – section 8.1.5
ISO/IEC – Terminology Data Element Concept Data Element Value Domain ObjectClass PropertyRepresentation Core Data Element Application Data Element The “Key” for a Controlled Vocabulary of Structured Data
A Metadata Managed Architecture EAI Transformation Engines Interfaces to Back-Office Systems Data Dictionary Mapping Matrices Std XML Schema UDEF-Indexed Metadata Registry/Repository Interface Developers Run Time Data Modelers And Apps Developers Design Time Internet UDEF Extension Process Global UDEF Registry Vendors with Canonical Models Software Vendors with UDEF ID APIs Web Public Extend Matrices Use Matrices Std Schema UDEF-Indexed Metadata Registries Build/Extend Schema Centralized metadata registry/repository Enables reuse to reduce costs Encourages standardization
UDEF Thumbnail Sketch The Open Group’s Universal Data Element Framework (UDEF) The only truly open standard that will simplify building an enterprise-wide controlled vocabulary for structured data and facilitate normalization across data models while enabling SOA I/O interoperability across all services that adopt UDEF through global indexing keys Oracle’s Daniel Serain recommends using UDEF global indexing keys for SOA services I/O UDEF is an Open Standard that Enables Interoperability
Universal Data Element Framework UDEF names follow the rules of English – qualifiers precede the word they modify Data Element Concept Name Object Class Term 0...n qualifiers + 1 or more required Object Class + Example UDEF-Based Data Element Concept Names Document Abstract Text Enterprise Name Product Price Amount Product Scheduled Delivery Date Engineering Design Process Cost Amount Employee Person Family Name UDEF Object Class List Entity Document Enterprise Place Program Product Process Person Asset Law-Rule Environment Condition Liability Animal Plant Substance Event Property Term 0..n qualifiers + 1 required Property Property List* Amount Code Date Date Time Graphic Identifier Indicator Measure Name Percent Picture Quantity Rate Text Time Value Sound Video ISO/IEC Naming Convention UDEF is a proposed universal instantiation of ISO/IEC * Based on Tables 8-1 and 8-3 in ISO
Creating UDEF IDs UDEF Trees 17 Object Class Trees18 Property Trees EntityAssetPersonAmountName …… Patient au 501 Family Alias … Country… 101 … Patient Person_Family Name has UDEF ID = au.5_11.10 See Employee … a …
UDEF Provides The Indexing Key External Conceptual Physical System A External Conceptual Physical System B Registry Derived Name Emp_Start Empl-Hire-Dt EmpStr EmplHr a.5_49.6 Registry Derived ID Open Group UDEF Registry Map to registry Assign UDEF ID alias UDEF Object Term identifies the table UDEF Property Term identifies the column Employee PERSON Hired DATE UDEF Object Term UDEF Property Term
UDEF – Current Status All enterprises use the UDEF Controlled extension process Baseline UDEF in usable form Support by key vendors (IBM, SAP, Oracle..) Early adopters use the UDEF Training Demonstrated benefits Promotion Pilot – key vendor(s) & end user(s) Global Registry Homeland Security & Electronic Health Record Launch Ready Started UDEF in other languages Disaster Response Initial
UDEF IDObject Type or RoleObject Class Property TypeProperty a.5_51.6 Birth HR Example Employee Person Employee Number a.5_ Date Hire Date Last Name First Name Birth Date a.5_49.6 a.5_51.6 a.5_12.10 a.5_ Derive a structured ID based on the UDEF taxonomy that carries the UDEF inherited indexing scheme. For example
Create UDEF Mappings d.t.2_8 OAG Purchase Order: xCBL Purchase Order: d.t.2_8 Map Both Systems and/or Both Standards Independently to the UDEF
Source and Target Files OAGIS Standard xCBL Standard
Leveraging Online Tool Vocabulary Gap Software On-the-Wire Java and XSLT Source System Target System Source XML Instance Doc with UDEF IDs Target XML Instance Doc with UDEF IDs SOAP DHTML On-line Gap Analysis Report Online Gap Analysis Tool –
Gap Analysis Results OAGIS Standard xCBL Standard
Gap Analysis Results - Enhanced OAGIS Standard xCBL Standard
HR Controlled Vocabulary Examples For employee identification, the UDEF based data element concepts’ names and IDs are: UDEF Name = Controlled Vocabulary UDEF ID Employee Person Employer Assigned Identifiera.5_ Employee Person Hired Datea.5_49.6 Employee Person Family Namea.5_11.10 Employee Person Given Namea.5_12.10 Employee Person Gender Codea.5_3.4 Employee Person Birth Datea.5_51.6 All HR applications on the globe could use “a.5” as the indexing ID for identifying the application table that contains employee data. The portion of the UDEF ID that starts with the underscore (i.e., _ , _11.10, _3.4, etc.) provides the indexing ID to identify the associated column in the HR table.
Procurement Controlled Vocabulary For employee identification, the UDEF based data element concepts’ names and IDs are: UDEF Name = Controlled Vocabulary UDEF ID Purchase Order Document Identifier d.t.2_8 Purchase Order Document Date d.t.2_6 Purchase Order Document Line Item Identifier d.t.2_17.8 Ordered Product Identifier z.9_8 Ordered Product Ordered Quantity z.9_13.11 Ordered Product Total Price Amount z.9_6.2.1 All Procurement applications on the globe could use “d.t.2” as the indexing ID for identifying the application table that contains purchase order data. The portion of the UDEF ID that starts with the underscore (i.e., _8, _17.8, _6, etc.) provides the indexing ID to identify the associated column in the purchase order table.
The Need for Distributed UDEF
Distributed UDEF UDEF- tree Descendant tree connectors Global UDEF Industry UDEF Private UDEF
Example Industry SDEF Mappings
Example Private UDEF Scenario UCAV MH-60 AWACS F/A-18E/F SOF F-22 CAOC SGU USTS Displays RC-135 Global Hawk - SAMGlobal Hawk - SCUDGlobal Hawk - CSAR
PDEF Scenario – Mission Assigned UCAV F/A-18E/F Global Hawk - SAM CAOC TST Cell SGU USTS Displays RC-135 (2) Transmit SAM site target info SAM (1) Picks up SAM sites on RWR (5) Provide image (6) CAOC displays SAM site image (7) Task F/A-18E/F to destroy SAM sites (ATO) (8) ATO Ack (4) ATO Ack. (9) Task UCAV to destroy SAM sites (ATO) (10) ATO Ack (3) Task Global Hawk to provide target area images (ATO)
PDEF Scenario – Accomplish Mission UCAV CAOC TST Cell SGU USTS Displays SAM (15) Provide image/video (16) CAOC displays SAM site image/video (11) UCAV drops bombs on SAM site (13) Task Global Hawk to perform bomb damage assessment (14) ATO Ack. Global Hawk - SAMF/A-18E/FRC-135 (12) Bomb dropped on SAM site
Example PDEF Mappings See the UDEF Asset tree and UDEF Measure tree, respectively. 1) Tail-ID-1234.RC-135.UDEF-P.Military.Aircraft.Asset_Latitude.Coordinate.Measure with UDEF ID = a.a.UDEF-P.a.o.1_ The UDEF ID “a.a.UDEF-P.a.o.1_ ” would be transmitted with the applicable latitude value to the applicable command center receiver and would be interpreted as Tail-ID-1234.RC-135.Military.Aircraft.Asset_Latitude.Coordinate.Measure with the associated latitude value included within the message. The private portion of the UDEF name “Tail-ID-1234.RC-135” and associated private ID “a.a.” would only be known by those military systems controlled by the private registry that assigned the UDEF extensions below the UDEF-P tag under Military.Aircraft.Asset. Other examples include the following: 2) Tail-ID-1234.RC-135.UDEF-P.Military.Aircraft.Asset_Longitude.Coordinate.Measure with UDEF ID = a.a.UDEF-P.a.o.1_ ) Tail-ID-1234.RC-135.UDEF-P.Military.Aircraft.Asset_Altitude.Coordinate.Measure with UDEF ID = a.a.UDEF-P.a.o.1_ ) Tail-ID-9876.FA-18.UDEF-P.Military.Aircraft.Asset_Latitude.Coordinate.Measure with UDEF ID = f.b.UDEF-P.a.o.1_
Example PDEF Mappings 5) Tail-ID-9876.FA-18.UDEF-P.Military.Aircraft.Asset_Longitude.Coordinate.Measure with UDEF ID = f.b.UDEF-P.a.o.1_ ) Tail-ID-9876.FA-18.UDEF-P.Military.Aircraft.Asset_JDAM.Bomb.Available.Quantity with UDEF ID = f.b.UDEF-P.a.o.1_1.1.UDEF-P ) Tail-ID-9876.FA-18.UDEF-P.Military.Aircraft.Asset_Available-Fuel.Weight.Measure with UDEF ID = f.b.UDEF-P.a.o.1_1.UDEF-P.8.13 The private portion of the UDEF name “JDAM.Bomb” and associated private ID “1.1.” would only be known by those military systems controlled by the private registry that assigned the UDEF extensions below the UDEF-P tag under Available.Quantity. Similarly, the private portion of the UDEF name “Available-Fuel” and associated private ID “1.” would only be known by those military systems controlled by the private registry that assigned the UDEF extensions below the UDEF-P tag under Weight.Measure. 8) Enemy.Surface-To-Air-Missile-Launcher.UDEF-P.Vehicles.Asset_Latitude.Coordinate.Measure with UDEF ID = a.a.UDEF-P.i.1_ ) Enemy.Surface-To-Air-Missile-Launcher.UDEF-P.Vehicles.Asset_Longitude.Coordinate.Measure with UDEF ID = a.a.UDEF-P.i.1_ The private portion of the UDEF name “Enemy.Surface-To-Air-Missile-Launcher” and associated private ID “a.a.” would only be known by those military systems controlled by the private registry that assigned the UDEF extensions below the UDEF-P tag under Vehicles.Asset. 10 )Enemy.MIG-29.UDEF-P.Military.Aircraft.Asset_Latitude.Coordinate.Measure with UDEF ID = a.x.UDEF-P.a.o.1_
Example PDEF Mappings 11) Enemy.MIG-29.UDEF-P.Military.Aircraft.Asset_Longitude.Coordinate.Measure with UDEF ID = a.x.UDEF-P.a.o.1_ ) Enemy.MIG-29.UDEF-P.Military.Aircraft.Asset_Altitude.Coordinate.Measure with UDEF ID = a.x.UDEF-P.a.o.1_ The private portion of the UDEF name “Enemy.MIG-29” and associated private ID “a.x.” would only be known by those military systems controlled by the private registry that assigned the UDEF extensions below the UDEF-P tag under Military.Aircraft.Asset 13) Tail-ID-9876.FA-18.UDEF-P.Military.Aircraft.Asset_Target-Engagement.UDEF-P.Type.Code with UDEF ID = f.b.UDEF-P.a.o.1_2.UDEF-P.33.4 Example target-engagement code values might be: 1 = Air-To-Surface 2 = Air-to-Air The private portion of the UDEF name “Target-Engagement” and associated private ID “2.” would only be known by those military systems controlled by the private registry that assigned the UDEF extensions below the UDEF-P tag under Type.Code. 14) Enemy.Surface-To-Air-Missile-Launcher.UDEF-P.Vehicles.Asset_Destroyed.Photo.Picture with UDEF ID = a.a.UDEF-P.i.1_1.UDEF-P ) Enemy.MIG-29.UDEF-P.Military.Aircraft.Asset_ Destroyed.Photo.Picture with UDEF ID = a.x.UDEF-P.a.o.1_1.UDEF-P.1.3
Some UDEF Features Conforms to ISO/IEC Establishes global indexing scheme for data element concepts analogous to Dewey Decimal System used in libraries to index books Enables normalization of data models on global scale Viewable for anyone with a browser Online gap analysis utility to anyone with a browser Multi-lingual (German, French, Chinese soon) in addition to English baseline and Dutch equivalent Unlimited extensibility Anyone with an address can propose extensions Downloadable in XML and RDF formats to anyone with an address Supports industry or organization unique extensions in a manner analogous to Internet DNS
Free UDEF Tools Free Tools Online Gap Analysis Tool on The Open Group Web Site Download sample source and target XML files for purchase order based on two different standards See Gap Analysis section of Obtain gap analysis report UDEF Explorer Mapping Tool by Knotion
For Additional Information OPENGROUP UDEF Web Site – to download the UDEF The UDEF On-Line Dr. Chris Harding – Ron Schuldt –