1 EPA Data Architecture for the Data Reference Model: Build to Exchange, Share, and Reuse Brand L. Niemann, Senior Enterprise Architect, US EPA, and Co-Chair, Federal Semantic Interoperability Community of Practice (SICoP) July 30, 2007
2 Overview 1. Background 2. Model Driven Architecture and Ontology Development 3. EPA Data Architecture for DRM 3.0 / Web 3.0 Wiki Page and Knowledgebases 4. An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization 5. Some Next Steps
3 1. Background A. Data Reference Model B. Data Architecture Subcommittee and SICoP Matrix C. Improve Data Quality D. DRM 3.0/Web 3.0 Data and Information Architecture and Functionalities E. Line of Sight and Service Systems F. Recent Presentations G. Current Semantic Web Layer Cake
4 1. Background: A. Data Reference Model VersionSound BiteExamples 1.0Build to Exchange NEIN (CDX) NIEM 2.0Build to ShareLandView 6 (7) SICoP Pilots 3.0Build to ReuseSICoP White Paper 3 EPA Service System
5 1. Background: A. Data Reference Model A Viewer for the Environmental Protection Agency, U.S. Census Bureau, and U.S. Geological Survey Data and Maps: Database management software (LandView) and mapping software (MARPLOT®): –
6 1.Background: B. Data Architecture Subcommittee and SICoP Matrix Data Architecture Subcommittee Action Plan SICoP ActivitiesComments Data Quality Profile Data Modeling and OWL: Two Ways to Structure Data See next two slides DRM 2.0 Implementation Guide White Paper 3: DRM 3.0 and Web 3.0 Knowledgebases Person Harmonization, Etc. Vocabulary Management in Semantic Wikis See SICoP Special Conferences 1-3
7 1. Background: C. Improve Data Quality Data Modeling and OWL: Two Ways to Structure Data, David Hay, Essential Strategies, Inc.: –Objectives of a Data Model: Capture the semantics of an organization. Communicate these to the business without requiring technical skills. Provide an architecture to use as the basis for database design and system design. –Now: Provides the basis for designing Service Oriented Architectures. – UpBW/Hay_David_2_2UpBW.pdfhttp:// UpBW/Hay_David_2_2UpBW.pdf
8 1. Background: C. Improve Data Quality Data Modeling and OWL: Two Ways to Structure Data, David Hay, Essential Strategies, Inc. (continued): –Synopsis: Both data modeling and ontology languages represent the structure of business data (ontologies). Data modeling represent data being collected, and filters according to the rules. Ontology languages represent data being used, with ability to have computer make inferences. –Comment from Lucian Russell (SICoP White Paper 3 Author): So ontology can improve data quality in legacy systems! David Hay agreed.
9 1. Background D. DRM 3.0/ Web 3.0 Data and Information Architecture ToolProgramPurpose Web SearchFederal Sitemaps Google: Federal Sitemaps Locate Most searches start with Google, Yahoo, and MSN WikisCOLAB Google: COLAB Wiki Collaborate Need to Share* Semantic WikisKnowledgebases Google: DRM 3.0 and Web 3.0 Integrate Responsibility to Provide* * Mike McConnell, Director of National Intelligence: Move the intelligence community beyond the "need to share" philosophy toward a "responsibility to provide" model (March 6, 2007).
10 1. Background: D. DRM 3.0/Web 3.0 Data and Information Functionalities Metadata: –Full text of documents, standards, meeting notes, etc. Harmonization –Different ways in which the same words are used. Enhanced Search: –Across all content and showing context (e.g. words around the term or concepts) Mashups: –A website or application that combines content from more than one source into an integrated experience (repurposing, reuse).
11 1. Background: E. Line of Sight and Service Systems Line of Sight: –See FEA Reference Model Ontology Documentation: services.gov/lpBin22/lpext.dll/Folder19/Infobase5/1 ?fn=main-j.htm&f=templates&2.0http://web- services.gov/lpBin22/lpext.dll/Folder19/Infobase5/1 ?fn=main-j.htm&f=templates&2.0 Service Systems: –See Case Study of CIO Council as a Service System: 30/BNiemannSSRI dochttp://colab.cim3.net/file/work/BPC/ /BNiemannSSRI doc
12 1. Background: E. Line of Sight Definition: The indirect or direct cause and effect relationship from a specific IT investment to the processes it supports, and by extension the customers it serves and the mission-related outcomes it contributes to. –Line of Sight is described in the PRM Volume 2. The current PRM ontology includes Measurement Areas, Measurement Categories and Measurement Indicators, and the links between them, as outlined in FEA PRM Vol. 1. It also includes a model of the generic value chain from technology initiative, through process and activity, to Business and mission objects and Customer benefits, as outlined also in FEA PRM vol. 1. The linkage of the BRM lines of business as measurement categories and indicators for business and mission objectives is also achieved through OWL/RDFS connections between the FEA PRM ontology and the FEA BRM ontology.
13 1. Background: E. Line of Sight See Next Slide
14 1. Background: E. Line of Sight What: What are the relevant people, technology, and/or fixed assets? –How: How do those inputs contribute to processes and activities – and by extension the organization’s mission? What: What are the processes and activities? The products and services? –How: How do these impact customers and contribute to Mission and Business results? Who: Who are the customers of these processes? –How: How are these customers impacted by the products and services provided? What: What is the purpose and mission of the organization? –How: How do these influence Strategic Outcomes? What: What is the highest level Policy Priority?
15 1. Background: E. Service Systems PeopleBusiness Information Technology Information Office of the Chief Information OfficerOffice of Research & Development The “Medici Effect” Stakeholders Input and Outreach 2007 Report on the Environment Enterprise Architecture Office of the Chief Financial Officer Strategic Plan & Performance & Accountability Report Capture the Semantics of the Organization and the Line of Sight. Office of Human Resources Innovation & Collaboration
16 1. Background: F. Recent Presentations Data Architecture, Modeling, and Networks, January 5, 2007 (Initial outline of a book: free on-line collaborative approach): – tecture ppthttp://colab.cim3.net/file/work/SICoP/EPADRM2.0/EPADataArchi tecture ppt June 14, 2007, Gartner EA Summit Conference Keynote: Data and Information Architecture: Not Just for Enterprise Architects! – 14/SICoPGartner ppthttp://colab.cim3.net/file/work/SICoP/ /SICoPGartner ppt July 11-12, 2007, NCOIC and EPA: Building DRM 3.0 and Web 3.0 Knowledgebases: Where Do the Semantics Come From? – 11/SICoP ppthttp://colab.cim3.net/file/work/SICoP/ /SICoP ppt
17 1. Background: F. Gartner EA Summit Conference Keynote Enterprise architecture in the Federal Government is evolving from compliance-driven to value-driven with SOA leading the way. SOA itself is evolving to deal with the semantics of data and information across the distributed enterprise. Service systems (networking communities of practice) are also in play to integrate people, business, information, and information technology in an information sharing environment. This keynote addresses what these all have in common and explains the evolution of the Federal Enterprise Architecture's Data Reference Model and the Internet itself from (1) the Web, (2) the Social Web, (3) the Semantic Web, and (4) the Ubiquitous Web. A specific example of architecting and implementing an information sharing environment is provided and demonstrated. –Interagency and EPA Examples (see next two slides).
18 1. Background: F. Information Sharing Environment: US EPA Example PeopleBusiness Information Technology Information Office of the Chief Information OfficerOffice of Research & Development The “Medici Effect” Stakeholders Input and Outreach 2007 Report on the Environment Enterprise Architecture Office of the Chief Financial Officer Strategic Plan & Performance & Accountability Report Capture the Semantics of the Organization and the Line of Sight. Office of Human Resources Innovation & Collaboration
19 1. Background: F. Building DRM 3.0 and Web 3.0 Knowledgebases: Where Do the Semantics Come From? 1. Preface 2. SICoP White Paper 3 3. WordNet 4. Language Computer Corporation 5. Open Cyc 6. TopBraid Composer 7. Example: An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization
20 1.Background: G. Current Semantic Web Layer Cake Note that RDF has moved into the XML space and been expanded with query and rules!
21 2. Model Driven Architecture and Ontology Development Dragan Gasevic, Dragan Djuric, and Vladan Devedzic, Model Driven Architecture and Ontology Development, Springer, 2006: –I. Basics: Existing technologies, tools, and standards including the Semantic Web. –II. The Model Driven Architecture and Ontologies: OMG's new ODM (Ontology Definition Metamodel) Initiative. –III. Applications: Practical aspects of developing ontologies using MDA-based languages. –Web Site: Many ontologies, UML and other MDA- based models, and the transformations between them.
22 2. Model Driven Architecture and Ontology Development Abstract: Defining a formal domain ontology is generally considered a useful, not to say necessary step in almost every software project. This is because software deals with ideas rather than with self-evident physical artifacts. However, this development step is hardly ever done, as ontologies rely on well-defined and semantically powerful AI concepts such as description logics or rule-based systems, and most software engineers are largely unfamiliar with these.
23 2. Model Driven Architecture and Ontology Development Defining a formal domain ontology is a useful and often necessary step in almost any software project. But certain commonly used words have multiple meanings – all equally valid – but which, if not differentiated adequately, leads to much confusion (e.g. use Princeton WordNet). So describe the high-level structure of your software in the most expressive manner possible, but realize that different minds will still see the same thing (concepts) differently.
24 2. Model Driven Architecture and Ontology Development The book describes a practical strategy for realizing key elements of the Semantic Web and clearly demonstrates that the core technologies required for constructing the Semantic Web are available and are moving forward inexorably. Development of ontologies is still hard work. Ontologies have a price that must be paid for the benefits.
25 2. Model Driven Architecture and Ontology Development An initiative from the software engineering community called Model Driven Development (MDD) is being developed in parallel with the Semantic Web: –First develop a model of the system under study and then transform it into the real thing (e.g. an executable software entity). For example, start from an ontology, transfer it to a UML platform-neutral domain model, and then generate a Java implementation. There are lots of similarities in Artificial Intelligence (in this case Knowledge engineering) and Software Engineering (in this case the MDA) approaches and their lifecycle could be parallel.
26 2. Model Driven Architecture and Ontology Development Knowledge is the understanding of a subject area: –Concepts and facts; –Relations among them; and –How to combine them to solve problems. Organizing knowledge in a structured way (usually with XML) and using those knowledgebases to solve problems efficiently requires: –Acquisition; –Storage; and –Retrieval. Ontological knowledge is the categories in the domain and the terms that people use to talk about them.
27 3. EPA Data Architecture for DRM 3.0 / Web 3.0 Wiki Page and Knowledgebases
28 4. An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization DRM 3.0/Web 3.0 Knowledgebases That Addresses the Following: –Action Plan of the Data Architecture Subcommittee and SICoP; –Improves Data Quality; –Supports the DRM 3.0 and Web 3.0 Data and Information Architecture and Functionalities; –Provides Line of Sight and a Service System; and –Supports the FEA Practice Guidance: Enterprise Architecture, Segment Architecture, and Solutions Architecture: –A segment architecture is a scalable and repeatable process for architects to show business people how EAs can deliver value to the business area. This process helps to establish clear relationships between strategic goals, detailed business and information management requirements and measurable performance improvements. » uidance.pdfhttp:// uidance.pdf
29 4. An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization Initial Contents: –Office of Human Resources: See Innovation and Collaboration in EPA's Strategic Plan (below) –Office of the Chief Financial Officer: EPA’s Performance and Accountability Report 2006: – EPA's Strategic Plan: – –Office of the Chief Information Officer: EPA’s Enterprise Architecture: – –Office of Research & Development: EPA's 2007 Report on the Environment: Science Report and EPA's 2007 Report on the Environment: Highlights of National Trends: –
30 4. An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization Semantic Linking: –Not providing semantics in the links is one of the main navigational problems of the World Wide Web: It is not until one opens the destination page of a link that one finds out that its content is not of interest. Semantic Concept Extraction and Ontology Building: –Recall Slide 6 and see NCOIC Example: j.htm&f=templates&2.0http://web-services.gov/lpBin22/lpext.dll/Folder21/Infobase14/1?fn=main- j.htm&f=templates&2.0
31 4. An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization EPA Strategic Plan: Charting Our Course ( ) Knowledgebase: –Goals: Goal 1—Clean Air and Global Climate Change Goal 2—Clean and Safe Water Goal 3—Land Preservation and Restoration Goal 4—Healthy Communities and Ecosystems Goal 5—Compliance and Environmental Stewardship –Cross-Goal Strategies: Results and Accountability Innovation and Collaboration Best Available Science
32 4. An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization
33 4. An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization U.S. EPA Enterprise Architecture 2007 Knowledgebase: –About Enterprise Architectures –EPA EA Framework Metamodel –CIO Policy Transmittal –2007 Solution Architecture Requirements –EA Program 2007 Self Assessment Supplemental Submission, March 23, 2007 –EA Program 2007 Self Assessment Status Report Version 1.0, February 28, 2007 –FY07 Transition Strategy February 28, 2007 –2007 EPA Sequencing Plan –2007 Architecture Development Standards and Guidance –Solution Architecture Business Case Integration Milestones (2007) –Strategy for Implementing Service Oriented Architecture at EPA –EA Architecture Development Methodology –Enterprise Architecture Governance Procedure –Data Standards Policy –EPA Data Areas and Data Classes (Draft )
34 4. An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization
35 4. An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization EPA’s Performance and Accountability Report 2006 Knowledgebase: –Section I. Management’s Discussion and Analysis –Section II.1 Performance Results –Section II.2 Annual Performance Goal Results: Detailed Results FY 2003–FY 2006 –Section III Management Accomplishments and Challenges –Section IV Financial Statements –Appendices
36 4. An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization
37 4. An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization Topics (5): –Air –Water –Land –Human Health –Ecological Condition Indicator Metadata* (9): –Introduction –What the Data Show –Limitations –Data Sources –References –Downloads –Data –Metadata –Data Table Elements * Data Quality Profile EPA's 2007 Report on the Environment (ROE) Knowledgebase
38 4. An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization
39 4. An Information Sharing Environment for the US EPA: The Semantics and Line of Sight of the Organization Indicators TopicsOrganizationsJurisdictions The Economy Society Culture The Environment Cross-Cutting Publicly led Privately led Led by public-private partnership U.S. local/regional level U.S. state level National level outside the United States Supranational level Note that each of these classes can and do have multiple instances underneath them, etc. Schematic of the Ontology for Indicators
40 5. Some Next Steps Continue to build the EPA Data Architecture for DRM 3.0 / Web 3.0 Wiki Page and Knowledgebases: –Specifically introduce the use of Semantic Wikis next to support the “locate, collaborate, and integrate” paradigm. Connect Data Quality, Data Quality Profiles, and Indicator Metadata: –Specifically extract their concepts and facts, relations among them, and how to combine them to solve problems. Show How the Web Services Are Connected by SOA in a Service System: –Specifically explain the advantages of using REST versus SOAP for content versus messages.