1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior.

Slides:



Advertisements
Similar presentations
REASoN REASoN Project to link NASA's data, modeling and systems to users in research, education and applications Application of NASA ESE Data and Tools.
Advertisements

Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Dynamic Case Management for Military and Intelligence Departments Can Improve Their Enterprise Architecture Programs Dr. Brand Niemann Director and Senior.
Information and Business Work
Gov 2.0: The Government’s Web 2.0 Platform Ramesh Ramakrishnan Division Director Citizant Ph: (703) x165
1 Improved Access to EPA and Interagency Information: Before and After with Web 2.0 – Part 2 Brand Niemann Senior Enterprise Architect, US EPA, and Co-chair,
1 Improved Access to EPA Information: Before and After with Web 2.0 Brand Niemann Senior Enterprise Architect, US EPA, and Co-chair, Federal SOA CoP and.
Build the Binary Group in the Cloud Brand Niemann Senior Enterprise Architect Binary Group August 5, Updated August 8,
Environmental Terminology System and Services (ETSS) June 2007.
Business Intelligence Dr. Mahdi Esmaeili 1. Technical Infrastructure Evaluation Hardware Network Middleware Database Management Systems Tools and Standards.
Semantic Interoperability Community of Practice (SICoP) Semantic Web Applications for National Security Conference Hyatt Regency Crystal City, Regency.
The topics addressed in this briefing include:
Software Architecture April-10Confidential Proprietary Master Data Management mainly inspired from Enterprise Master Data Management – An SOA approach.
0 United States Environmental Protection Agency Office of Environmental Information Enterprise Architecture Program December 2007 EA Working Group Session.
Semantic Data Discovery: Proof of Concept for DHS
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
1 WSSC/IntrusionWorld Gov 2.0: Business Transformation Though Enterprise Mashups of Enterprise, Segment, and Solutions Architectures Implemented in Web.
1 Semantic Cloud Computing & Open Linked Data Pattern Brand Niemann Invited Expert to the NCIOC SCOPE and Services WGs September 22, 2009.
Web 2.0 for Government Knowledge Management Everyone benefits by sharing knowledge March 24, 2010 Emerging Technologies Work Group Rich Zaziski, CEO FYI.
1 Gov 2.0 for EPA: Pollution Prevention and Toxics In Support of the June 9-13, 2008 National Dialogue on How to Enhance Access to Environmental Information:
DoD Architecture Registry System DARS 16 September 2009 Walt Okon Senior Architect Engineer Senior Architect Engineer for Information Sharing Enterprise.
Information Sharing Begins With Me Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
1 Briefing for EPA and OEI Communications Coordinators and Press Officers Brand Niemann US EPA Senior Enterprise Architect and Federal CoP Leader January.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
XBRL Seminar: The New Data Reference Model
Updates from EOSDIS -- as they relate to LANCE Kevin Murphy LANCE UWG, 23rd September
DATA DOI Data Resource Management POC: Craig Tanner, IT Pioneers, L.L.C., , March 17, 2004.
1 Wikify Your Best Content in Support of the OGD and Data.gov/semantic: Information Architecture Tutorial EPA Web Work Group, EPA Wiki and Blog Work Group,
KMS Products By Justin Saunders. Overview This presentation will discuss the following: –A list of KMS products selected for review –The typical components.
XML Profile of the FEA DRM Michael C. Daconta Metadata Program Manager November 4, 2004.
OEI’s Services Portfolio December 13, 2007 Draft / Working Concepts.
The S&I Tools & Repository April 12 th, S&I Tools and Repository Agenda: siframework.org S&I Repository repository.siframework.org.
Towards Web Semantics Spreadsheets and the US Government Lee Feigenbaum, Cambridge Semantics Brand Niemann, U.S. EPA SICoP Special Conference February.
1 A Target Data Architecture for the US EPA: Implementing DRM 3.0 and Data.gov Brand Niemann Senior Enterprise Architect, US EPA April 21, 2009 PARS 2009.
Registry Services Bringing Value to US EPA, States, and Tribes Exchange Network Vendors Meeting April 24, 2007 Cynthia Dickinson EPA/OEI/OIC Data Standards.
FEA DRM Management Strategy Presented by : Mary McCaffery, US EPA.
1 An Information Model for Federal and EPA Enterprise Architecture Brand Niemann Senior Enterprise Architect U.S. EPA August 5, 2008.
1 Build the EPA Enterprise Architecture IMPART in the Cloud Brand Niemann U.S. EPA July 21, Updated September 26, 2010.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
EPA Enterprise Data Architecture Metadata Framework Assessment Kevin J. Kirby, Enterprise Data Architect EPA Enterprise Architecture Team
1 Shift Happens! Briefing for the EPA Enterprise Architecture Team Brand Niemann Senior Enterprise Architect, US EPA, and Federal Web 2.0/3.0 Community.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
1 Tutorial for the EAWG: Solution Architecture for 2010 Brand Niemann Senior Enterprise Architect U.S. EPA January 28, 2010.
NIEM 3.0 Data Analytics App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government Blogger.
1 A New Enterprise Information Architecture and Data Management Strategy for the U.S. Government Part 10: Web 2.0 for Earth Science Collaboration for Information.
1 Improved Access to EPA and Interagency Information: Before and After with Web 2.0 – Part 7 EPA Jam on Improved Access to Environmental Information, June.
1 DAS Annual Review June 2008 “Build to Share” Suzanne Acar, US DOIAdrian Gardner, US National Weather ServiceCo-Chair, Federal DAS
1 Improved Access to EPA and Interagency Information: Before and After with Web 2.0 – Part 4 Interagency and Non-government (in process) Brand Niemann.
Tutorial on XML Tag and Schema Registration in an ISO/IEC Metadata Registry Open Forum 2003 on Metadata Registries Tuesday, January 21, 2003; 4:45-5:30.
Information Architecture The Open Group UDEF Project
1 Harmonizing Taxonomies: Draft for Discussion at the OASIS eGov Technical Committee Meeting Brand Niemann US Environmental Protection Agency January 6,
The Proliferation of Metadata Standards and the Evolution of NASA’s Global Change Master Directory (GCMD) Standard for Uses in Earth Science Data Discovery.
Lessons learned from Semantic Wiki Jie Bao and Li Ding June 19, 2008.
Implementing the FEA DRM Michael C. Daconta Metadata Program Manager March 15, 2004.
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
Application of NASA ESE Data and Tools to Particulate Air Quality Management A proposal to NASA Earth Science REASoN Solicitation CAN-02-OES-01 REASoN:
The Federated Data System DataFed R. Husar, K. Hoijarvi, S. Falke, DaFed Community EPA Data Summit, Feb. 12, 2008, RTP Non-intrusive data integration infrastructure.
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
IBM Academic Initiative JazzHub Overview John Schilt Lead, IBM Academic Initiative Australia / New Zealand UNSW and IET (Young Professionals)
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
International Planetary Data Alliance Registry Project Update September 16, 2011.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
DARS Update DoDAF 2.0 Plenary Tool Vendor Session 22 July 2008.
Brand Niemann, US EPA and
Geospatial and Problem Specific Semantics Danielle Forsyth, CEO and Co-Founder Thetus Corporation 20 June, 2006.
Session 2: Metadata and Catalogues
WGISS Connected Data Assets Oct 24, 2018 Yonsook Enloe
About Thetus Thetus develops knowledge discovery and modeling infrastructure software for customers who: Have high value data that does not neatly fit.
Presentation transcript:

1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior Enterprise Architect EPA Enterprise Architecture Team March 26, 2008, Updated April 4, 2008

2 Brief History March , 2008, Enterprise Data Architecture Discussions and Activities. –Kevin Kirby, David Prompovitch, Michael Alford, and Brand Niemann. March 12, 2008, Enterprise Data Architecture Program, Kevin Kirby, Overview Presentation for CIO Biweekly. –Strategy for Program Growth (see next slide). March 13, 2008, Enterprise Data Architecture Briefing, Kevin Kirby, Enterprise Architecture Working Group Session. –Essentially repeat of March 12 th with suggestions (see slide 4). March 13, 2008, Data Architecture Subcommittee Meeting, Brand Niemann, Informal Presentation. –Vision & Implementation (see slides 5-8). –Web 2.0 (see slides 9-10). March 16-20, 2008, The DAMA International Symposium & Wilshire Meta- Data Conference, Kevin Kirby Attending. –At least nine presentations on Web 2.0, Wikis, etc. for Metadata and Data Management, etc. March 24, 2008, EPA Data Architecture: Overview of Metadata Strategy – Summary of Issues for Data Advisory Council, Kevin Kirby, Enterprise Architecture Team Call. –Metadata Framework for Discovery & Evaluation and Conceptual Federated Search Architecture.

3 Vision and Implementation Data ArchitectureComponentSpecific ArtifactExample DRM 2.0 (1) DescriptionMetadata (4)Spreadsheet * ContextTaxonomy/Ontology (5)Web 2.5 Wiki SharingData (4)Spreadsheet * DRM 3.0 (2) Semantics (3)RDF/SPARQL (6)Middleware SOAServices (7) Web 2.5 Wiki (8) Web 3.0 Wiki (9) Footnotes: See slide 4. Our initial objective is to see if this Web 2.0 Wiki can be useful in bringing about collaboration across the Metadata Management Functions Matrix, Teams-Tasks Matrix, and Data Architecture Documents. A longer range goal would be to see if this Web 2.0 Wiki could be used as an Enterprise Metadata Management and Application Development Tool (e.g. data and metadata mashups). Note: Web 2.0 does DRM 2.0 and Web 3.0 does DRM 2.0/3.0!

4 Footnotes (1) FEA DRM 2.0 and Report to Congress (2005).FEA DRM 2.0Report to Congress (2) February 6, 2007, and February 5, 2008.February 6, 2007February 5, 2008 (3) Combines Description and Context from DRM 2.0. See (2). (4) The data and metadata are combined together (see Brand Niemann).Brand Niemann (5) Information Architecture (topics and subtopics) and Data Architecture (data tables and data elements) are integrated. See Web 2.0 Wiki Pilot: Information Classifications. (6) This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources.specification (7) EPA Data Architecture Enterprise Metadata. (8) Video on data reuse in mashups that will revolutionize EPA data architecture, data management, and data reuse applications!Video * Note: This also works with relational databases.

5 Vision and Implementation (password required to see)

6 Vision and Implementation TypeStandardCurrent Repository Future Repository /Collaboration Tool WebDublin Core WebCMS Web 2.0 Wiki (1) Data ElementsISO Environmental Data Registry (EDR) Web 2.0 Wiki EPA Applications and Databases Like Dublin Core Registry of EPA Applications and Databases (READ) Web 2.0 Wiki Science Portals Like Dublin Core Environmental Information Management System (EIMS) Web 2.0 Wiki Geospatial Geospatial Metadata Standard GeoData Gateway Web 2.0 Wiki Indicators Peer Review Process Report on the Environment 2008 Web 2.0 Wiki (1) Web 2.0 Wiki pages are XML-based and have RSS Feeds! The EPA Data Architecture Metadata Community of Interest (CoI) is working to integrate the following metadata sources for information sharing and integration across the enterprise and the world.

7 Web 2.0 Source: Mills Davis, Four Stages of the Web at

8 Web 2.0 Some basic functionalities: –Author like Word –Edit/comment on every page –Some level of security for every page –Tagging –Versioning –Watchlist –RSS/XML between applications –Search –etc.

9 Overview of Metadata Strategy Direction from July 2007 Meeting: –“Enable to share” means enabling EPA to share data within programs, across programs, with partners, and with the public. Purpose and General Approach: Phase 1 (through April 14, 2008): –Objects include: DBMS Data Sets, Unstructured Data ( , docs), and Multimedia, etc. Proposed Metadata Framework for Data “Objects”: –Coverage is Incomplete. Slide 10. Federated Registries with a Common Front End Search Tool: –Conceptual Architecture Using Faceted Search. Slide 11. Governance Artifacts to Implement this Framework: –A National Data Policy Modeled after NGD.

10 Metadata Framework for Discovery & Evaluation Categories of metadata help the user assess the value of the data set. Standard taxonomies aid discovery. These might be specific to broad categories like “Admin./Financial”. EPA Data Classification is a start. Levels of metadata exist within an RDBMS set, especially for evaluating quality and security issues.

11 Conceptual Federated Search Architecture Major gap is for RDBMS Data Sets not managed by Informatica

12 Demonstrations Federated Faceted Semantic Search Data Metadata Governance DRM 2.0 Compliance Information Architecture and Data Architecture DRM 3.0/Web 3.0 Discovery (Centrifuge) (TRI data pilot slides coming)

13 Federated See Multiple Nodes on the Same or Different Web Servers.

14 Faceted See Hierarchy of Topics, Subtopics, etc. That Can be Searched.

15 Semantic Search See Query Within Context and With Various Semantic Operators.

16 Data Screen-scrape This Table and Copy It to Excel and the Structure is Preserved.

17 Metadata This is the Highest Quality-Peer Reviewed Metadata the Agency Has Produced.

18 Taxonomy This Taxonomy Was Produced by Subject Matter Experts and Peer Reviewed.

19 Governance The Words Governance and Provenance Have Both Been Used.

20 DRM 2.0 Compliance The Three Requirements for Information Sharing Have Been Satisfied!

21 Information Architecture and Data Architecture Level 1 Top-level Topics Level 2 Next-level Subtopics Level 3 Data Tables Level 4 Data Elements See: Getting to Web Semantics for Government Spreadsheets Pilot (RDF/SPARQL) – k.is/People/Brand_Nieman n/2008_Semantic_Technol ogy_Conferencehttp://semanticommunity.wi k.is/People/Brand_Nieman n/2008_Semantic_Technol ogy_Conference

22 DRM 3.0/Web 3.0 Source: Mills Davis, Four Stages of the Web at

23 Discovery (Centrifuge) Centrifuge Systems is a leading provider of next generation business intelligence software that helps organizations discover insights, patterns and relationships hidden in their data. The unique Centrifuge approach allows users to ask open ended questions of their data by interacting with visual representations of the data directly. Traditional business intelligence solutions require users to define what they want to see in advance and present the results in static dashboards. With Centrifuge, users determine what is of interest “on the fly”, then manipulate the displays directly in a highly interactive fashion. The experience is refreshingly easy-to-use and the resulting insights can be extraordinary. Centrifuge is used in some of the most demanding applications in the world, including law enforcement, counter- terrorism and homeland defense, to help analysts move from data to discovery.

24 Centrifuge Server Centrifuge provides an interactive visualization layer on data such as the Toxics Release Inventory-Made Easy for the Web (TRI-ME WEB). Data can be viewed through a desktop client or a web browser. Here is sample data through a web browser. Centrifuge Server, as a next generation Information visualization system, meets the following requirements: Ground-Breaking Interactive Visualization in a Browser A 100% browser-based thin client Collaborative Analysis Modern SOA Architecture Geospatial Integration with Google™ Earth Pluggable, Componentized and Extensible Easy to Use

25 Table View This view represents a sample table from TRI-ME WEB dataset downloaded from the EPA website.

26 Relationship Graph This relational view represents a bundled graph of MD 2006 data showing Companies linked to Chemicals. This graph shows that two of the primary collections, PBT and TRI, have multiple companies between them. The chemicals have been bundled (grouped) by their chemical classifications.

27 Relationship Graph Spinoff A subset for specific chemicals of interest can be created (spunoff). In this case, PBT chemicals are shown bundled and connected to the companies associated with them.

28 Table Spinoff The spinoff concept applies to all views, for example the table view shown on this page.

29 Relationship Graph of Table Spinoff This relational view represents a graph of the previous tables spinoff.

30 Quantitative (Charts) View This quantitative view represents a simple distribution of the number of times a chemical is referenced across all companies and facilities in MD.

31 Timeline (Temporal) View This temporal view of sample data represents how time based data can be viewed. For example this could represent toxic release events if that data were available and time stamped.

32 Geospatial View This geospatial view represents a spatial distribution of facilities across Maryland

33 Detailed Geospatial View This geospatial view represents the locations of toxic chemicals in Baltimore, Maryland.

34 DRM 3.0/Web 3.0