U.S Geological Survey National Biological Information Infrastructure Technical Overview: NBII Metadata Clearinghouse May 2008 Mike Frame.

Slides:

Advertisements

Similar presentations

2008 EPA and Partners Metadata Training Program: 2008 CAP Project Geospatial Metadata: Intermediate Course Module 3: Metadata Catalogs and Geospatial One.

Advertisements

GEOSS ADC Architecture Workshop Clearinghouse, Catalogues, Registries Doug Nebert U.S. Geological Survey February 5, 2008.

How to Set Up a System for Teaching Files, Conferences, and Clinical Trials Medical Imaging Resource Center.

US GBIF Tools and Services August 12 th, 2010 Giri Palanisamy NBII, ORNL Mike Frame NBII, USGS.

Geospatial One-Stop A Federal Gateway to Federal, State & Local Geographic Data

Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.

EuroCRIS Best Practices & Solutions Members Helping Members Move Forward.

“ Leveraging SharePoint 2010 Search Technologies ” With: Ivan Neganov.

IAEA International Atomic Energy Agency United Nations Library and Information Network for Knowledge Sharing (UN-LINKS) September 2013, Geneva.

WWW Challenges : Supporting Users in Search and Navigation Natasa Milic-Frayling Microsoft Research, Cambridge UK SOFSEM 2004 January 28, 2004.

Harvesting Metadata for Use by the geodata.gov Portal Doug Nebert FGDC Secretariat Geospatial One-Stop Team.

IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features INIS Training Seminar 7-11 October 2013, Vienna Domenico.

IAEA International Atomic Energy Agency ICSTI 2013 Annual Members’ Meeting March 2013.

1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.

DEV392: Extending SharePoint Products And Technologies Through Web Parts And ASP.NET Clint Covington, Program Manager Data And Developer Services - Office.

Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.

Jeremy Boyd Director – Mindscape MSDN Regional Director

Live Meeting APIs Robert Devine Program Manager Microsoft Corporation.

Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.

Implementing search with free software An introduction to Solr By Mick England.

ECPRD seminar on the net IX”, Brussels, 2011 Faceted Search Some examples of applied faceted search on websites developed by the EP Jerry.

ISO/TC211 Geographic Information/Geomatics Implementing ISO Metadata David Danko Work Item 15—Project Leader

Building Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Semantic web technologies for secure interoperability and.

Architecting an Extensible Digital Repository Anoop Kumar, Ranjani Saigal,Rob Chavez, Nikolai Schwertner Tufts University, Medford, MA.

The GeoConnections Discovery Portal Michael Robson MacDonald Dettwiler and Associates Brian McLeod, Michael Adair Natural Resources Canada.

Publishing Clearinghouse resources to geodata.gov Doug Nebert FGDC Secretariat Geospatial One-Stop Team September 17, 2004.

OpenURL Link Resolvers 101

BEN Architecture Isovera Consulting Feb Internet consulting for non-profits 2 BEN Architecture Diagram.

7. Approaches to Models of Metadata Creation, Storage and Retrieval Metadata Standards and Applications.

University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.

Revolutionizing enterprise web development Searching with Solr.

Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK

Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.

Integrated Collaborative Information Systems Ahmet E. Topcu Advisor: Prof Dr. Geoffrey Fox 1.

NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.

Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.

SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.

U.S. Department of the Interior U.S. Geological Survey CWG Workshop December 4, 2007 Geospatial One-Stop Gateway for Discovery and Access Rob Dollison.

WDC-MARE – World Data Center for Marine Environmental Sciences Data portal based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler,

ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.

Endeca: a faceted search solution for the library catalog Kristin Antelman & Emily Lynema UNC University Library Advisory Council June 15, 2006.

Alexandria Digital Earth ProtoType DIGITAL LIBRARIES AND ENVIRONMENTAL INFORMATION Terence R. Smith Alexandria Digital Library Project.

ESIP & Geospatial One-Stop (GOS) Registering ESIP Products and Services with Geospatial One-Stop.

PatentScope - Electronic Publication World Intellectual Property Organization.

Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.

Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010.

Managed by UT-Battelle for the Department of Energy Mercury – Distributed Metadata Tool for Finding and Retrieving CDIAC Data CDIAC UWG Meeting September.

Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.

Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.

1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.

S T A T I S T I C S A U S T R I A March SuperSTAR A joint development with STR D.Burget October 2007 © STATISTICS AUSTRIA I n f.

Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.

IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features The Role of the International Nuclear Information System.

DSpace System Architecture 11 July 2002 DSpace System Architecture.

Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.

Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.

National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.

The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.

VIVO architecture March 1, Major Components Vitro is a general-purpose Web-based application leveraging semantic standards VIVO is a customized.

Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.

Alan Rykhus – MnSCU/PALS Evan Rusch – Minnesota State University, Mankato.

5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.

A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.

Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre

Open Source distributed document DB for an enterprise

Flanders Marine Institute (VLIZ)

Building Search Systems for Digital Library Collections

ORNL is Operated by UT-Battelle for DOE

Metadata supported full-text search in a web archive

Presentation transcript:

U.S Geological Survey National Biological Information Infrastructure Technical Overview: NBII Metadata Clearinghouse May 2008 Mike Frame

Topics for discussion Metadata CH Background New Metadata CH Design & Demo Underlying Architecture

Services Overview

NBII Metadata Resources

Metadata Resources: FGDC Metadata Program Tool reviewsTraining Opportunities Resources for using the Standard NBII Clearinghouse

7 Sections make up the FGDC Standard: 1. Identification Information 2. Data Quality Information 3. Spatial Data Information 4. Spatial Reference Information 5. Entity and Attribute Information 6. Data Distribution Information 7. Metadata Reference Information Some basic metadata facts…about the FGDC Standard

NBII Metadata CH

Rational for Metadata CH Redesign User Feedback Metadata creation Metadata management Metadata integration with data Open architecture framework Speed and Reliability Data quality Data visualization License Costs

NBII Metadata CH provides: Single portal to information contained in disparate data management systems Free text, fielded, spatial, and temporal search capabilities Allow individuals and database managers to distribute their data while maintaining complete control and ownership Leverage investment in existing information systems and research NBII is part of the Mercury ORNL

NBII CH: New Functionalities Rich Client Interface Combined search results (status page) Filterring search results (Facet) Dynamic sorting of search results Bookmark brief and full metadata pages Based on open source technologies: Lucene Solr

NBII CH New Functionalities Cont.. SOA based design Web services RSS services for search results Portlet support Search Sharing support Thesaurus Support Seamless data ordering/data extraction with various data partners Seamless data visualization integration with external visualization tools Improved User Statistics Collection

The Clearinghouse is operated for NBII by the Oak Ridge National Laboratory Over 38,000 records 41 partners contributing metadata records Ability to search in a variety of ways Redesigned in 2008 The NBII Clearinghouse

NBII CH Demo NBII Clearinghouse interface:

How does the NBII Clearinghouse work?

Metadata CH RSS World Data Center

NBII Metadata Clearinghouse Architecture

Metadata CH Architecture CH Function of the NBII Metadata Program Operated by ORNL NBII is 1 Organization in Mercury Consortium Established relationship in 2001 Formerly based on “Blue Angel Technologies” Currently based on Lucene/Solr Open Source Technologies

3. Remote users query the index via a Web-based browser 6. Highly detailed data and documentation are downloaded directly from the contributing agency 1. Principal investigators create detailed metadata and data files using local applications or ORNL- OME 2. NBII Mercury collects metadata and key data from contributing agencies’ servers distributed around the country and builds a centralized index 4. Metadata summaries are returned to the remote users, including links back to detailed information and data at the PIs’ server or data repository 5. Remote users select links to data of interest Index Users Virtual Internet Database P.I. Summary – John Smith Product A Container: 1; 10/12/2003 Container 2; 01/20/2002 Container 3; 07/05/2001 Product B Container 1; 03/05/1999 …. P.I. Name Product Number Product Title Site Subject Area Thematic Area Keywords etc. Distributed Data Discovery and Access System

Custom Export Program Custom Export Program Existing Database Existing Database Existing Database Existing Database Existing Database Existing Database Encrypted XML Encrypted XML Index Metadata exists in remote legacy databases using any platform, OS or RDBMS Metadata are extracted into XML files yielding standardized data objects Harvested metadata are combined at the central site, transformed (if needed), and indexed Users work with a single, simple, web-like interface to access all data simultaneously Databases can be of different structures and content Export programs are easily written and automated These files can be remotely harvested via the Internet Frequent, automated harvesting and complete re- building of the index keeps the aggregate database up to date No re-programming of existing systems required Business as usual for contributing databases Encrypted XML Encrypted XML Custom Export Program Custom Export Program Z39.50 or WS Z39.50 or WS A Virtual Aggregate Database

NBII CH Design Diagram Solr Schema for defining the fields Index metadata records NBII CH Harvester FGDC-BIO Transformed Files MySQL Mercury3_harvests_nbii DB updater tool (custom Java) Solr Indexer tool (custom java) XML Beans to extract the contents SOLR Search Server Extended Lucene Index UI Solr Searcher (custom Java Spring) Web Service RSS Portlets External Metadata http, ftp, web crawl

Future Development Phase II (May 2008 to September 2008): Harvester engine to use open source tools (Remove COTS) (Phase I & II) Portal integration through JSR-168 Portlet standard Search portlets, portlets for recent datasets, top most searched words etc.. Web service implementation (Phase I & II): Thesaurus support (semantic web integration support) Gazetteer web service implementation OGC Catalog Service (include Web Mapping/Coverage/Feature Servers in search) Universal Description, Discovery, and Integration (UDDI) Directory Services Dynamic RSS support, including Geo-RSS support ISO support OpenSearch support Documentation and Help (Phase I & II) User Statistics Application modifications Phase III (October 2008 to January 2009): Save, Retrieve and user queries Possible integration to OPeNDAP Web Service Harvesting (OAI) Internationalization ????

Search technology using Lucene/SOLR Lucene Overview Who uses Lucene Solr Overview Who uses Solr

Lucene Overview High-performance, full-featured text search engine library written entirely in Java Mature Apache Open Source Java Project Index speed and integrity, search speed uses file based full text and inverted indexing is extremely fast with built-in caching Can easily handle millions of documents Very active mailing list for support

Who uses Lucene Wikipedia MediaWiki European Bioinformatics Institute Liferay Bigsearch.ca Monster Academic Archive On-line Complete list:

SOLR Overview Open source enterprise search server based on the Lucene Java search library Lucene Java Apache project, sub-project of Lucene Advanced Full-Text Search Capabilities Optimized for High Volume Web Traffic Standards Based Open Interfaces - XML and HTTP Solr uses Lucene search library and extends it

SOLR Overview Contd.. A Real Data Schema, with Numeric Types, Date fields, Dynamic Fields Dynamic Faceted Browsing and Filtering Advanced, Configurable Text Analysis Highly Configurable and User Extensible Caching External Configuration via XML Scalability - Efficient Replication to other Solr Search Servers Administration Interface is available

Who uses SOLR CNET Reviews shopper.com AOL Music netflix search.com The Digital Commonwealth mindquarry for complete list:

Mercury Instances Demo NBII Clearinghouse interface: ORNLDAAC interface: LBA Mercury interface: DADDI Mercury interface: GFIS RSS Portal interface:

User Statistics Report Generation Tool

Open source Harvester Re-design (Aperture)

Questions, Comments, Mike Frame Thanks to: Giri Palanisamy Systems Architect and Team Leader Mercury Consortium Vivian Hutchison NBII Metadata Program Manager