Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.

Slides:



Advertisements
Similar presentations
Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges.
Advertisements

© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
1 panFMP - Ein XML-basiertes Framework für Metadaten- Portale Vortrag und „hands-on“ Seminar am GFZ Potsdam Uwe Schindler MARUM – Universität Bremen PANGAEA.
Depositing e-material to The National Library of Sweden.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
ARCHIMÈDE Presented by Guy Teasdale Directeur, Services soutien et développement Bibliothèque de l’Université Laval CARL Workshop on Institutional Repositories.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
I:\Share\Bestuursinligting\OUDITfinaal\Portfolio\Statistics\BI UPSpace An institutional repository for the University of.
The Bremen core repositories and data curation with PANGAEA Hannes Grobe Alfred Wegener Institute for Polar and Marine Research.
Word Up! Using Lucene for full-text search of your data set.
INTRODUCTION DIGITAL REPOSITORY Developing a Digital Repository for International Symposium on Information Management in a Changing World Müge AKBULUT.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
DEF System Architecture XML Web Services Fedora and the Zebra Search Engine in an OAI Eprints Application by Gert Schmeltz Pedersen, DTV
Configuration Management and Server Administration Mohan Bang Endeca Server.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
GRITS Working with AVM Data Astronomy Visualization Metadata June 11th, 2010 Casey Rosenthal
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
WP 9 (former Task 1b of WP 1): Data infrastructure Robert Huber UNI-HB Esonet 2nd all regions workshop, Paris
Web based METS creation Ralf Stockmann case study.
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
AstroGrid AstroGrid increases scientific research possibilities by : enabling access to distributed astronomical data and information resources. providing.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Alexandria Digital Library Projects Alexandria Digital Earth Prototype Greg Janée Middleware architecture HTTP transport JIGISDLIP proxy web browser Bucket99.
World Data Center for Marine Environmental Sciences.
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Unit no. 5 Digital Library Adolf Knoll National Library of the Czech Republic © Adolf Knoll, National Library of the Czech Republic.
WDC-MARE – World Data Center for Marine Environmental Sciences Data portal based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler,
International Data Exchange Workshop, Kiel, PANGAEA Publishing Network for Geoscientific & Environmental Data.
1 By: Suman Negi, Technical Officer ‘B’ DESIDOC, DRDO, Delhi Presentation at NACLIN 14 (During 9-11 December 2014, Pondicherry) Design and Development.
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
MOODy :) Investigations into Massive Open Online Discovery at IU Juliet Hardesty Courtney Greene McDonald Bryan J Brown
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Core Integration Web Services Dean Krafft, Cornell University
Technical Update 2008 Sandy Payette, Executive Director Eddie Shin, Senior Developer April 3, 2008 Open Repositories 2008, Fedora User Group.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19 th 2006 / 1 Data Discovery and Basic Processing within the German.
DSpace - Digital Library Software
 Allow access to observational, model and forecast data  Likely to be in the form of a portal with consistent meta data and pointer to other online location,
DSpace System Architecture 11 July 2002 DSpace System Architecture.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
©2001 Priority Technologies, Inc. All Rights Reserved Meteor Status Miami Face to Face Meeting January 16 – 18, 2002.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Data Citation Implementation Pilot Workshop
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
CERN Document Server 19 tth January 2006 CERN Document Server Jean-Yves Le Meur 19 th January 2006.
The AstroGrid-D Information Service Stellaris A central grid component to store, manage and transform metadata - and connect to the VO!
Not Your Father’s Laserfiche AA101 Michael Allen.
Breeda Herlihy, IR Manager, UCC Library. UCC selected DSpace in 2008 Software selection group Staff from Library IT, Computer Centre, Special Collections,
1 ABCD as a digital library tool An introduction on the concept and implementation by Egbert de Smet Univ. of Antwerp.
High performance, full-featured text search engine written in Java. Technology suitable for nearly any application requiring full-text search, especially.
Bielefeld Academic Search Engine
Open Source distributed document DB for an enterprise
Flanders Marine Institute (VLIZ)
Tim Smith CERN Geneva, Switzerland
Building Search Systems for Digital Library Collections
VI-SEEM Data Repository
The Re3gistry software and the INSPIRE Registry
SDMX IT Tools SDMX Registry
Presentation transcript:

Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer 2, Michael Diepenbroek 1 1 PANGAEA ® Group at MARUM, University of Bremen, Bremen, Germany 2 Alfred Wegener Institute for Polar and Marine Research, Bremerhaven, Germany

Uwe SchindlerGES 2007 – May 2-4, 2007 Metadata Portals & Grid WDC-MARE with its information system PANGAEA ® currently provides data portals for several EU/international projects: Not all data are stored centralized, so all datasets provided in portals must be consolidated from different sources! Features: –Data stays at the data providers –Metadata is harvested by the portal –Search queries are handled by the centralized catalogue (Google-like search speed!) –Scientist gets link to data at the provider Metadata portal software is sufficient for C3-Grid, too!

Uwe SchindlerGES 2007 – May 2-4, 2007 Metadata in C3-Grid Goal: build up an infrastructure for earth system community in Germany Problem: we need an architecture which makes it possible to: –Collect metadata files from data providers –Store them in a “central index” –Provide a fast, generic access to this data for our users Solution Data Information Service

Uwe SchindlerGES 2007 – May 2-4, 2007 Open Archives Protocol The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a protocol developed by the Open Archives Initiative. Almost all digital libraries support it (most famous ones: Fedora, arXiv and the CERN Document Server) Portals by Scientific Commons, OAIster, SUB uses it during web crawling (if available) Very simple to implement (XML over HTTP-REST) Repository software for databases or file system metadata providers is widely available (C3 uses mostly DLESE jOAI software on the data provider side)

Uwe SchindlerGES 2007 – May 2-4, 2007 Metadata in C3-Grid Goal: build up an infrastructure for earth system community in Germany Problem: we need an architecture which makes it possible to: –Collect metadata files from data providers –Store them in a “central index” –Provide a fast, generic access to this data for our users Solution Data Information Service

Uwe SchindlerGES 2007 – May 2-4, 2007 Central indexing requirements 1.Open for any XML metadata format 2.Any mappings to document fields should be done by XPath 3.Possibility to map incompatible XML schemas during harvesting by XSLT on-the-fly 4.On-the-fly validation of (transformed) documents during harvesting 5.No relational database, only a full text search engine, that contains everything needed for operation 6.Range queries on specific fields (date/time or numeric) 7.Web service interface / programming API for the end user interface that is accessible from any language (Java/JSP, PHP, Perl,...)

Uwe SchindlerGES 2007 – May 2-4, 2007 features Ranked searching - best results returned first Many powerful query types: phrase queries, wildcard queries, proximity queries, range queries for date time values and numbers Fielded searching. All fields are searchable as a whole, each field separately (e.g. for author, parameter), or mixed. Any combination of boolean operators between search terms (AND, OR, NOT, exact phrase) Sorting by any field Multiple-index searching with merged results Simultaneous searching and updates due to high- performance indexing

Uwe SchindlerGES 2007 – May 2-4, 2007 Generic Framework

Uwe SchindlerGES 2007 – May 2-4, 2007 Metadata in C3-Grid Goal: build up an infrastructure for earth system community in Germany Problem: we need an architecture which makes it possible to: –Collect metadata files from data providers –Store them in a “central index” –Provide a fast, generic access to this data for our users Solution Data Information Service

Uwe SchindlerGES 2007 – May 2-4, 2007 Search Interface Supports all standard Lucene search features Additional support for fast range queries to enable bounding boxes, etc.: –implemented by redundant storage of “numerical terms” in different precisions –recursive reduction of distinct terms (every numerical value is a term) on range query –search time no longer dependent on index size Accessible via Java API or AXIS web service

Uwe SchindlerGES 2007 – May 2-4, 2007 Metadata in C3-Grid Goal: build up an infrastructure for earth system community in Germany Problem: we need an architecture which makes it possible to: –Collect metadata files from data providers –Store them in a “central index” –Provide a fast, generic access to this data for our users Solution Data Information Service

Uwe SchindlerGES 2007 – May 2-4, 2007 C3 Implementation Fig. by T. Langhammer, ZIB web service frontend Portal CERAPANGAEA ® Other Data Provider Google-style and range queries DIS Metadata1.xml, Metadata2.xml, Metadata3.xml, Metadata4.xml,... FieldTermDocument identifierABC:1232 identifierXYZ:2236 identifierMI6:00712 abstractregion2,23,112 abstractpressure3,23 abstracthumid4,33,215 min_lat min_lat data_urihttp://...4 Apache Lucene index document cache indexing of selected fields OAI-PMH full-text index harvesting backend

Uwe SchindlerGES 2007 – May 2-4, 2007 Future metadata of data metadata of workflow s workflow query data query assembl e workflow processin g

Uwe SchindlerGES 2007 – May 2-4, 2007 Thank You! Software will be available soon as open source on Sourceforge.net! News: