1 Federal Sitemaps: An XML-Based Standard for Searching the Invisible Web Presentation at the XML CoP Meeting Mills Davis and Brand Niemann, SICoP Co-Chairs,

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

The Electronic Office Some supplementary information Corporate websites Office automation Company intranet.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
MICROFORMATS Ioana B ă rb ă nan Semantic Web developer.
RBA Securitisation System Technical Delivery Forum Thursday, 29 January 2015 RBA Securitisation System.
Faculty of Electrical Engineering University of Belgrade Predrag Radenković 10/3237 Predrag Radenković 3237/10.
Embedding Knowledge in HTML Some content from a presentations by Ivan Herman of the W3c.
Open Library Environment Designing technology for the way libraries really work November 19, 2008 ~ ASERL, Atlanta Lynne O’Brien Director, Academic Technology.
1 Improved Access to EPA and Interagency Information: Before and After with Web 2.0 – Part 2 Brand Niemann Senior Enterprise Architect, US EPA, and Co-chair,
1 Improved Access to EPA Information: Before and After with Web 2.0 Brand Niemann Senior Enterprise Architect, US EPA, and Co-chair, Federal SOA CoP and.
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
Information Retrieval in Practice
Build the Binary Group in the Cloud Brand Niemann Senior Enterprise Architect Binary Group August 5, Updated August 8,
"Open discussion of the impact of Semantically-enabled data and techniques within eGov projects, e.g. Data.gov Workshop on Improving Access - Financial.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Nu Project Management Office A web based tool to Manage Projects.
Semantic Interoperability Community of Practice (SICoP) Semantic Web Applications for National Security Conference Hyatt Regency Crystal City, Regency.
Welcome to the Minnesota SharePoint User Group June 10 th, 2009 Search: From WSS to FAST Brian Caauwe, Wes Preston Bob Koviak,
Data and Information Architecture: Not Just for Enterprise Architects! Brand L. Niemann, Senior Enterprise Architect, U.S. EPA, and Co-Chair, CIO Council's.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
1 Semantic Cloud Computing & Open Linked Data Pattern Brand Niemann Invited Expert to the NCIOC SCOPE and Services WGs September 22, 2009.
History, Charter & Challenges Facing the XML Community of Practice (xmlCoP) Owen Ambur, Co-Chair Emeritus xmlCoP/XBRL CoP March 20, 2007.
1 Gov 2.0 for EPA: Pollution Prevention and Toxics In Support of the June 9-13, 2008 National Dialogue on How to Enhance Access to Environmental Information:
Google Xtras. Google Maps Google Latitude tests Site mapping What is it? A New Standard: Search Engine Giants Adopt the XML Protocol In 2005, the search.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
XBRL Seminar: The New Data Reference Model
1 Data Architecture, Modeling, and Networks Brand L. Niemann January 5, 2007.
November 2003 Presented to “Commercializing RDF” Semantic Software Solutions for Enterprise Web Management International World Wide Web Conference 2004.
Practical Project of the 2006 Joint International Master’s Degree.
The Semantic Web and Microformats. The Semantic Web Syntax = how you say something – Letters, words, punctuation Semantics = meaning behind what you say.
Nobody’s Unpredictable Ipsos Portals. © 2009 Ipsos Agenda 2 Knowledge Manager Archway Summary Portal Definition & Benefits.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
1 Building DRM 3.0 and Web 3.0 for Managing Context Across Multiple Documents and Organizations Mills Davis and Brand Niemann, SICoP Co-Chairs, and Lucian.
Microformats Randy Schauer CMSC 691M. What are Microformats? “Designed for humans first and machines second, microformats are a set of simple, open data.
1 "Wikis: The Good, The Bad, and the Ugly" Brand Niemann Senior Enterprise Architect, US EPA and COIC Semantic Interoperability CoP, Co-Chair Panel at.
1 The CollaborationProject.org & Semanticommunity.net Brand Niemann, Senior Enterprise Architect, US EPA and & Semanticommunity.net Leader March 6, 2008,
U.S. Department of Agriculture eGovernment Program eGovernment Working Group Meeting February 11, 2004.
RDFa, Microformats, and Atom Semantic Web Presented by: Anuradha Kandula Instructor: Steven Seida.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
1 Data and Information Architecture: Not Just for Enterprise Architects! Gartner Enterprise Architecture Conference June 2007, Nashville, TN Gaylord.
1 Shift Happens! Briefing for the EPA Enterprise Architecture Team Brand Niemann Senior Enterprise Architect, US EPA, and Federal Web 2.0/3.0 Community.
DOC ID © Chevron 2005 Knowledge Management and Open Standards Chevron Perspective John Hanten Chevron Technology Ventures Energistics 2007 Annual Meeting.
Semantic Web Technologies Brief Readings Discussion Class work: Research topics and Project discussion Research Presentation Topics assigned Building lightweight.
Standards for Technology in Automotive Retail STAR Update Michelle Vidanes STAR XML Data Architect April 30 th, 2008.
Internet Architecture and Governance
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
 Structured Data An Introduction to Semantic Web “It is very hard for search engines to understand the structure and semantics of data embedded in an.
IBM Software Group ® Managing Reusable Assets Using Rational Suite Shimon Nir.
Thomas Kern | The system documentation as binding agent for and in between internal and external customers April 24th, 2009 | Page 1 The system documentation.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Program Assessment User Session Experts (PAUSE) Information Sessions: RSS & Subscription Services October , 2006.
Strategy Markup Language (StratML) Enterprise Content Management Association (AIIM)
The Claromentis Digital Workplace An Introduction
Semantic Interoperability for the Office of the National Coordinator for Health Information Technology Brand Niemann and the Health Information Technology.
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
1 Collaboration: Communities of Practice Using Wiki Technology OGETA Forum Meeting, Atlanta, Georgia Atlanta-Augusta Room, 3rd Floor, Sam Nunn Atlanta.
Resources of a Resource By, Anupama Atmakur Pooja Adudodla.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
STAKEHOLDER ENGAGEMENT PROCESS FOR ENERGY EFFICIENCY BUSINESS PLAN DEVELOPMENT March, 2016.
SEARCH ENGINES & WEB CRAWLER Akshay Ghadge Roll No: 107.
Brand Niemann, US EPA & Co-Chair SICoP
Federal Communities of Practice: IBM Contributions
DWQ Web Transformation
Brand Niemann, US EPA and
Introduction Special appreciation to Panel Co-Organizers
Making online federal agency information more accessible
Introducing Semantic Web Technologies:
Microformats Randy Schauer CMSC 691M.
Metadata supported full-text search in a web archive
Presentation transcript:

1 Federal Sitemaps: An XML-Based Standard for Searching the Invisible Web Presentation at the XML CoP Meeting Mills Davis and Brand Niemann, SICoP Co-Chairs, and JL Needham, Google January 29, 2007 Part of Building DRM 3.0 and Web 3.0 for Managing Context Across Multiple Documents and Organizations

2 Overview 1. History and Wiki Page 2. EPA Experience 3. Proposed Pilot 4. Schedule 5. Questions and Answers

3 1. History and Wiki Page The Sitemap protocol is an open, XML-based standard for managing search engine crawling. The protocol provides website owners a means of communicating to search engines the location, priority, change frequency, and last modification date of all pages on a website or web-accessible database, which can ensure complete and efficient crawling of the site's contents. The Sitemap protocol was introduced by Google in June 2005 under a Creative Commons License and was adopted in November 2006 as an industry standard by Google, Microsoft and Yahoo. –See SearchEngineWatch - Search Engines Unite On Unified Sitemaps System, November 16, FederalSitemaps is an initiative to help federal agencies make their websites more accessible to search engine users through sitemapping. –See recent presentation to OMB and SICoP. See

4 2. EPA Experience Sitemaps augments, but does not replace regular crawling. Sitemaps is focused on exposing the contents of databases which estimates suggest may be as much as 90% of Web content. The current Sitemaps protocol is the “lowest-common- denominator” approach (see next slide) In EPA’s new template, we're including the Dublin Core fields that make us consistent with the eGov Act of 2002 and the OMB guidance pursuant to it (see slide 6). I will meet with the Searchmasters and discuss how we might alter our existing "jump pages" to conform to the Sitemap protocol, or to alter our jump-page creation process to also create Sitemaps. Source: John Shirey, Notes on Federal Sitemaps Discussion, January 10-11, 2007.

5 2. EPA Experience monthly 0.8 urlseturllochttp://

6 2. EPA Experience Page Title | Area Name | US EPA Source: John Shirey, New EPA Basic Template, January 8, 2007.

7 2. EPA Experience “Sitemaps as a method for discovering database content is something that I heartily endorse. It makes sense, and it's good to have a data standard for doing it. Google, et. Al. are to be commended for that. Too bad it's such a minimalist protocol! As we work to expose database contents to our internal search engine, we will keep in mind the need to express that content in a Sitemap protocol as well. EIMS is our first target database, hopefully tackling it this spring.” Source: John Shirey, Notes on Federal Sitemaps Discussion, January 10, 2007.

8 3. Proposed Pilots Microformats: –Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. –Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns (e.g. XHTML, blogging). See

9 3. Proposed Pilots Overview of microformats: –People and Organizations hCard –Calendars and Events hCalendar –Opinions, Ratings and Reviews VoteLinks, hReview –Social Networks XFN –Licenses: rel-license –Tags, Keywords, Categories rel-tag –Lists and Outlines XOXO

10 3. Proposed Pilots Pilot Outline: –Tools, Collaboration, and Content (details to be added by Mills Davis) based on Kapow Technical Roundtable, January 9, 2007, and other meetings. Example: Gleaning RDF from XML (GRDDL) as at relates to the Google Sitemaps, Semantic Wikis and X-Forms Use Cases (see next slide): – sop.inria.fr/acacia/personnel/Fabien.Gandon/tmp/grdd l/rdfaprimer/PrimerRDFaSection.html

11 3. Proposed Pilots In this example the focus is on automating the construction of indexes. The idea is to crawl GRDDL source documents and extract embedded RDFa to feed an RDF store. SPARQL queries are then solved against this store and rendered as web pages to automatically generate up-to-date indexes.

12 3. Proposed Pilots Semantic SOA Meeting, January 18, 2007: –Location: Kapow Technologies, Reston, VA –Organizations: Paremus and SAIC have worked with GITI, SoftPro, Agent Logic, Siderian, and Kapow to deploy the SSOA fabric and begin wrapping software for deployment on the fabric. –Focus: Use cases (2) and goals for DoDIIS. –Contacts: Sam Chance and Brand Niemann, Jr. (SAIC) and Mills Davis (Project10x).

13 4. Schedule January 17, 2007, XML CoP: –About 15 minutes to introduce the protocol to ~25 XML experts and advocates across federal agencies and the effort we're undertaking to encourage its adoption. May discuss the white paper idea and the prospective conference on the protocol in coming months. January 29, 2007, EPA: –One hour presentation to EPA web managers on how to implement the protocol to open EPA sites now closed to search engine crawlers. This is an opportunity to observe how we approach discussions with a major federal agency. February 15, 2007, Web Content Managers Forum (tentative): –Conference call involving Forum participants in which Google SICoP and representatives of NCES, OSTI, and PlainLanguage.gov will discuss in detail various approaches to opening flat file, fielded and other dynamic databases to crawling with the protocol. March 20-22, 2007, FOSE 2007: –Possible panel slot for an introduction of the protocol in the FIRM Forum at FOSE. Also requesting three one-hour tutorial sessions like last year on implementing DRM 2.0. April-May 2007, Federal Sitemaps Conference/Workshop: –A conference dedicated to the protocol based on further discussions of the audience it would target and who would contribute. Vint Cerf, Google CTO, who is following progress of this effort, would probably keynote.

14 4. Schedule January 29 th Meeting at EPA on Building DRM 3.0 and Web 3.0: –Sitemaps: JL Needham, Google –SICoP Special Conference, February 6, 2007, and Pilot (see slide 15): Mills Davis, SICoP Co-Chair –General Discussion: Brand Niemann, EPA Enterprise Architecture Team

15 4. Schedule SICoP Special Conference, February 6, 2007, and Pilot: –Building DRM 3.0 and Web 3.0 for Managing Context Across Multiple Documents and Organizations: bin/wiki.pl?SICoPSpecialConference_2007_02_06 –Pilot: Tools, Collaboration, and Content (details to be added by Mills Davis) based on Kapow Technical Roundtable, January 9, 2007, and other meetings.

16 5. Questions and Answers John Lewis (JL) Needham –Strategic Partner Development Manager, Google, Inc. Mills Davis –Project10x and SICoP Co-Chair Brand Niemann –EPA Enterprise Architecture Team and SICoP Co- Chair