1 panFMP - Ein XML-basiertes Framework für Metadaten- Portale Vortrag und „hands-on“ Seminar am GFZ Potsdam Uwe Schindler MARUM – Universität Bremen PANGAEA.

Slides:



Advertisements
Similar presentations
CNES implementation of the ISO standard An extension of the current CNES implementation of the ISO metadata standard.
Advertisements

© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Geospatial One-Stop A Federal Gateway to Federal, State & Local Geographic Data
Harvesting Metadata for Use by the geodata.gov Portal Doug Nebert FGDC Secretariat Geospatial One-Stop Team.
Depositing e-material to The National Library of Sweden.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
ARCHIMÈDE Presented by Guy Teasdale Directeur, Services soutien et développement Bibliothèque de l’Université Laval CARL Workshop on Institutional Repositories.
The KB on its way to Web 2.0 Lower the barrier for users to remix the output of services. Theo van Veen, ELAG 2006, April 26.
Extending the Capabilities of Geospatial One-Stop Through Partner-Developed Web-Services April 16, 2010 Federal Geographic Data Committee’s (FGDC) Cooperative.
1 CS 430 / INFO 430 Information Retrieval Lecture 13 Architecture of Information Retrieval Systems.
The Open Archives Initiative and OAIster: Past, Present and Future Kat Hagedorn University of Michigan Libraries April 6, 2006.
Word Up! Using Lucene for full-text search of your data set.
Databases & Data Warehouses Chapter 3 Database Processing.
ISO/TC211 Geographic Information/Geomatics Implementing ISO Metadata David Danko Work Item 15—Project Leader
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
Managing Data Interoperability with FME Tony Kent Applications Engineer IMGS.
DEF System Architecture XML Web Services Fedora and the Zebra Search Engine in an OAI Eprints Application by Gert Schmeltz Pedersen, DTV
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Serenate1 Non-standard users: The Library Raf Dekeyser K.U.Leuven.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
Jeremy D. Bartley Kansas Geological Survey An Introduction to an Index of Geospatial Web Services.
Building Search Portals With SP2013 Search. 2 SharePoint 2013 Search  Introduction  Changes in the Architecture  Result Sources  Query Rules/Result.
Connecting to Ensemble: AlgoViz. AlgoViz Community  Sharing educational resources Visualizations for data structure and algorithms  Sharing experience.
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
Publishing Clearinghouse resources to geodata.gov Doug Nebert FGDC Secretariat Geospatial One-Stop Team September 17, 2004.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Revolutionizing enterprise web development Searching with Solr.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
FGDC and GOS Metadata: Foundations to Build the NSDI Sharon Shin FGDC Secretariat / Geospatial One-Stop.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 Numeric Range Queries with Lucene TrieRange Uwe Schindler Lucene Java Contrib Committer PANGAEA ® - Publishing Network for Geoscientific.
Unit no. 5 Digital Library Adolf Knoll National Library of the Czech Republic © Adolf Knoll, National Library of the Czech Republic.
WDC-MARE – World Data Center for Marine Environmental Sciences Data portal based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler,
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
International Data Exchange Workshop, Kiel, PANGAEA Publishing Network for Geoscientific & Environmental Data.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
ESIP & Geospatial One-Stop (GOS) Registering ESIP Products and Services with Geospatial One-Stop.
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
Geospatial One-Stop FGDC and GOS: Working as One to Build the NSDI Sharon Shin Federal Geographic Data Committee Geospatial One-Stop Metadata Coordinator.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Core Integration Web Services Dean Krafft, Cornell University
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
DSpace - Digital Library Software
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
FGDC and ASF Using Structured Metadata Archie Warnock A/WWW Enterprises
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
CERN Document Server 19 tth January 2006 CERN Document Server Jean-Yves Le Meur 19 th January 2006.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
1 ABCD as a digital library tool An introduction on the concept and implementation by Egbert de Smet Univ. of Antwerp.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
High performance, full-featured text search engine written in Java. Technology suitable for nearly any application requiring full-text search, especially.
GeoNetwork OpenSource: Geographic data sharing for everyone
Open Source distributed document DB for an enterprise
Flanders Marine Institute (VLIZ)
Building Search Systems for Digital Library Collections
VI-SEEM Data Repository
Databases.
The Re3gistry software and the INSPIRE Registry
OAI and Metadata Harvesting
The New Face of Information Retrieval: The Ankara University Open Access Platform Prof. Dr. Sekine Karakaş Prof. Dr. Doğan.
Presentation transcript:

1 panFMP - Ein XML-basiertes Framework für Metadaten- Portale Vortrag und „hands-on“ Seminar am GFZ Potsdam Uwe Schindler MARUM – Universität Bremen PANGAEA ® - Publishing Network for Geoscientific & Environmental Data

2 Metadata Portals: Search Technology for distributed Catalogues Searching directly on distributed catalogues: In distributed search infrastructures, every data provider not only has his own metadata catalogue, but also a corresponding search interface to the portal (e.g., web service based). Search requests are sent to all data providers. The portal only needs to collect the search results from the providers, then rank and display these to the end user. Examples: NSDI Clearinghouse, GeoMIS.BUND Harvesting catalogues into a central searchable catalogue: Every data provider has its own metadata catalogue but the search engine is centralized. The portal periodically harvests all metadata records into a central index and serves search requests from there. Major web search engines like Google or the FGDC related Geospatial One-Stop are based on this concept. The response time is optimal because only local components are used in the search process.

3 Metadata Portals: Harvesting solutions from PANGAEA ® WDC-MARE with its information system PANGAEA ® currently provides data portals for several EU/international projects: Not all data are stored centralized, so all datasets provided in portals must be consolidated from different sources! Features: –Data stays at the data providers –Metadata is harvested by the portal –Search queries are handled by the centralized catalogue (Google-like search speed!) –Scientist gets link to data at the provider

4 Metadata Harvesting Solutions Web Accessible Folder (WAF): Simple harvesting by recursively collecting XML files from a web server‘s directory listing – simple, but inefficient Open Archives Protocol for Metadata Harvesting (OAI-PMH): 

5 Open Archives Protocol The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a protocol developed by the Open Archives Initiative.Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Almost all digital libraries support it (most famous ones: Fedora Commons, arXiv and the CERN Document Server; GeoNetwork Opensource) Fedora CommonsGeoNetwork Opensource Portals by Scientific Commons, OAIster, SUB uses it during web crawling (if available) Very simple to implement (XML over HTTP-REST) Repository software for databases or file system metadata providers is widely available (e.g. DLESE jOAI software on the data provider side)DLESE jOAI

6 Current OAI-PMH software 1.Limited to Dublin Core metadata (libraries)! 2.Limited full text search functionality due to relational databases in the background! 3.No geographic retrievals (because of Dublin Core limitation)! 4.End user interface is part of the software, this limits usability in CMS systems.

7 Central indexing requirements 1.Open for any XML metadata format 2.Any mappings to document fields should be done by XPath/XSLT 3.Possibility to map incompatible XML schemas during harvesting by XSLT on-the-fly 4.On-the-fly validation of (maybe previously transformed) documents during harvesting 5.No relational database, only a full text search engine, that contains everything needed for operation 6.Range queries on specific fields (date/time or numeric) 7.Web service interface / programming API for the end user interface that is accessible from any language (Java/JSP, PHP, Perl,...)

8 Ranked searching - best results returned first Many powerful query types: phrase queries, wildcard queries, proximity queries, range queries for date time values and numbers Fielded searching. All fields are searchable as a whole, each field separately (e.g. for author, parameter), or mixed. Any combination of boolean operators between search terms (AND, OR, NOT, exact phrase) Sorting by any field Multiple-index searching with merged results Simultaneous searching and updates due to high- performance indexing

9 Structure of a Lucene Index

10 panFMP – PANGAEA ® Framework for Metadata Portals panFMP is a generic and flexible framework for building geoscientific metadata portals independent of content standards for metadata and protocols. Data providers can be harvested with commonly used protocols (e.g., Open Archives Initiative Protocol for Metadata Harvesting) and metadata standards like Dublin Core, DIF, or ISO The new Java-based portal software supports any XML encoding and makes metadata searchable through Apache Lucene. Software administrators are free to define searchable fields independent of their type using XPath and/or XSL Templates. In addition, by extending the full-text search engine (FTS) Apache Lucene, we have significantly improved queries for numerical and date/time ranges by supplying a new trie-based algorithm, thus enabling high-performance space/time retrievals in FTS-based geo portals. The harvested metadata are stored in separate indexes, which makes it possible to combine these into different portals. The portal-specific Java API and web service interface is highly flexible and supports custom front-ends for users, provides automatic query completion (AJAX), and dynamic visualization with conventional mapping tools.Open Archives Initiative Protocol for Metadata HarvestingDublin CoreDIFISO 19115Apache LuceneXPathXSL TemplatesJava API

11 panFMP – Components of a metadata portal

12 panFMP - Harvesting

13 panFMP - Search Interface Supports all standard Lucene search features Additional support for fast range queries to enable bounding boxes, etc.: –implemented by redundant storage of “numerical terms” in different precisions –recursive reduction of distinct terms (every numerical value is a term) on range query –search time no longer dependent on index size Accessible via Java API or AXIS web service

14 panFMP – Range Queries Example on trie-based recursive splitting of range query with three precisions (simplied for demonstration): User wants to find all records with terms between "423" and "642". Instead of selecting all terms in lowermost row, query is optimized to only match on labelled terms with lower precision, where applicable. It is enough to select term "5" to match all records starting with "5" ("521", "522") or "44" for "445", "446", "448". Query is therefore simplied to match all records containing terms "423", "44", "5", "63", "641", or "642".

15 Examples bin/WebObjects/dataportalhttp://pages-dataportal.unibe.ch/cgi- bin/WebObjects/dataportal Currently not available:

16 Thank You! Software available open source on Sourceforge.net!