The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld.

Slides:



Advertisements
Similar presentations
WORKSHOP ON CRIS, CERIF AND INSTITUTIONAL REPOSITORIES, Rome, 10-11/5/2010 Interoperability Challenges and Approaches.
Advertisements

The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld.
The DRIVER Infrastructure (Digital Repository Infrastructure Vision for European Research) Paolo Manghi ISTI - National Research Council, Italy.
The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld.
DRIVER Building a worldwide scientific data repository infrastructure in support of scholarly communication 1 JISC/CNI Conference, Belfast, July.
DRIVER Institutional repositories and CRIS systems – the role of DRIVERs infrastructure, concepts and organisation 1 Nordbib Workshop 2008 Dale Peters,
DRIVER Long Term Preservation for Enhanced Publications in the DRIVER Infrastructure 1 WePreserve Workshop, October 2008 Dale Peters, Scientific Technical.
9 th International Bielefeld Conference, 3-5 February 2009 The impact of DRIVER on the repository community Sophia Jones.
Project of the Darmstadt University of Technology within the competence network New Services, Standardization, Metadata (bmb+f) Stephan Körnig Ali Mahdoui.
UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN
D-Net Technology Paolo Manghi Istituto di Scienza e Tecnologia dellInformazione (ISTI) Italian National Research Council (CNR)
Web Service Architecture
DRIVER Providing value-added services on top of Open Access institutional repositories Dr Dale Peters Scientific Technical Manager : DRIVER SUB Goettingen.
General introduction to Web services and an implementation example
11th euroCRIS Strategic Seminar Brussel, Sep 9 – Discovery Metadata Friedrich Summann COAR / Bielefeld University Library.
Networking European Digital Repositories. What to Network?
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Enhanced Publications and Complex Objects state of studies currently worked on in DRIVER II EUROPEANA Satellite of ECDL 2008 in Aarhus, DK Wolfram Horstmann.
Networking institutional repositories in Germany – DINI / DFG projects (… and DRIVER) Frank Scholze Stuttgart University Library KUB.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space User Oriented Provisioning of Secure Virtualized.
Building Repository Networks with DRIVER Wolfram Horstmann Universität Bielefeld.
Secure Systems Research Group - FAU Web Services Standards Presented by Keiko Hashizume.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Just a collection of WS diagrams… food for thought Dave Hollander.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
OpenAIRE: Open Access Infrastructure for Research in Europe Wolfram Horstmann Bielefeld University, Germany
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
DEF System Architecture XML Web Services Fedora and the Zebra Search Engine in an OAI Eprints Application by Gert Schmeltz Pedersen, DTV
Architecture domain DL.org Autumn School – Athens, 3-8 October 2010 Leonardo Candela 6 th October 2010.
Geoff Payne ARROW Project Manager 1 April Genesis Monash University information management perspective Desire to integrate initiatives such as electronic.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
1 HKU CSIS DB Seminar: HKU CSIS DB Seminar: Web Services Oriented Data Processing and Integration Speaker: Eric Lo.
ASG - Towards the Adaptive Semantic Services Enterprise Harald Meyer WWW Service Composition with Semantic Web Services
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
DL.org All WGs Meetings, Rome, May 2010 Quality Interoperability Approaches, case studies and open issues DL.org Quality Working Group Rome, 28 th.
Building a Network of European Scientific Repositories Wolfram Horstmann Universität Bielefeld.
© 2005 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice The China Digital Museum Project.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Linking research & learning technologies through standards 1 Lyle Winton lylejw AT unimelb.edu.au.
AUKEGGS Architecturally Significant Issues (that we need to solve)
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Overview of SC 32/WG 2 Standards Projects Supporting Semantics Management Open Forum 2005 on Metadata Registries 14:45 to 15:30 13 April 2005 Larry Fitzwater.
WG2 – Enabling Technologies Status of white paper Olaf Droegehorn, Klaus David University of Kassel Chair for Communication Technology (ComTec)
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
EGEE is a project funded by the European Union under contract IST Introduction to Web Services 3 – 4 June
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
INFSO-RI JRA2 Test Management Tools Eva Takacs (4D SOFT) ETICS 2 Final Review Brussels - 11 May 2010.
Reliable Web Service Execution and Deployment in Dynamic Environments * Markus Keidl, Stefan Seltzsam, and Alfons Kemper Universität Passau Passau,
The DRIVER Project Paolo Manghi ISTI - National Research Council, Italy.
Prague, 19 – 22 April 2006 OneStopGov 4 th Eastern European e-Gov Days 2006 A life-event oriented framework and platform for one-stop government: The OneStopGov.
Joint Information Systems Committee Repositories Support Project Summer School 2008 Amber Thomas, JISC.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Metadata & Repositories Jackie Knowles RSP Support Officer.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Enhancements to Galaxy for delivering on NIH Commons
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
DRIVER Digital Repository Infrastructure Vision for European Research
Session 2: Metadata and Catalogues
Presentation transcript:

The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

DRIVER motivation Scholarly communication changes towards distributed provision of text, data and services Repositories are thought as a saviour in this development building such a distributed system An infrastructure supporting distributed repositories and services is needed (and reactions) (needs explanation)

Some observations on repositories They represent a shift towards … open internet-exposure as opposed to closed database (‚graveyards‘) content orientation as opposed to mere technical orientation (‚web-servers‘) distributed systems centralized structures not immediateley required nowadays

„Everybody can be a publisher“ Common description standards e.g. Dublin Core Metadata Initiative Many subject-specific standards Common transfer protocols e.g. OAI-PMH, but also FTP, XML-RPC, WS, etc. Searchability is possible! Still: many results are lost to re-use/remix Closed: too sensible, weakly described, unimportant (???) Missing service frameworks / infrastructures Problems: Data and service interoperability Solution: „Infrastructure“ Repositories can solve access problem

What infrastructures are: DRIVER terms Not an infrastructure Single repository Single application for search and retrieval (e.g. BASE) Only local operation Backwards causation on repositories is missing Maybe an infrastructure Distributed repository landscape as a whole As a capacity for emergent properties, e.g. quality and quantity incentive for data population Nurturing development of service providers Definitely an infrastructure Many service providers in one organisational and technical context (e.g. run-time environment) Enabling re-use and remix of data and services

DRIVER Objectives Organisational structure for repositories e.g. the „Confederation“ Improving quality and standards in local rep. e.g. validation procedures Building a distributed runtime system e.g. service and data sharing Target Groups Repository Managers Service Providers Information System Executives

The DRIVER approach is incremental Start with publication metadata Existing distributed system, somehow connected Considerable homogeneity and formats: OAI-PMH Extend geographical coverage From 5 countries, to 10, to 27, to ??? Extend towards other contents From publication metadata to enhanced publications, i.e. representations of „texts + data“ Learn about subject specificity Data bring in disciplinary requirements

88 The DRIVER Initiative DRIVER-I 6/2006 – 11/2007 Organisational Models and Technical Test-Bed DRIVER-II 12/2007 – 11/2009 Running Organisation and Production Infrastructure DRIVER-Confederation 2010ff Operations Office and Technical Deployment NB: DRIVER is not an authoritative body, it is a liberal bottom-up initiative of stakeholders

DRIVER partners and related projects Networking, Support, Policy, Studies Göttingen, Nottingham, SURF, Genth, Ljubiljana, Minho, Copenhagen Technical development and deployment Athens, Bielefeld, Pisa, Warsaw Partners make links to many other things OA-services: Sherpa-ROMEO, OpenDOAR, BASE… Projects: Europeana, PEER, DELOS, DL.org, D4Science, PARSE-Insight, NESTOR… Orgs: DINI, JISC, LIBER, SPARC, KE … Platforms: DSPACE/FEDORA/OPUS/ePrints

10 DRIVER-II Midterm Review, January 30, Pisa 10 Project structure Networking ResearchService Running Infrastructure: Content & Functionality Construction of Services: ideas, design, development Technical Management Advocacy: attracting users, content and Service providers Discovery: technology watch, EPs requirements

Some results

Some Results: Studies

Some Results: A Portal

Some Results: A Search

Some Results: Repository Registration

Some Results: Guidelines Build on knowledge from past & current IR projects (EU) 26 actively involved contributors (experts and repository managers) from 8 countries. Practical answers on how to: Improve full-text access Standardize metadata quality Create a reliable infrastructure for permanent identification, resolution, traceability and storage Resolve semantic and classification issues

Some Results: Support structures

Some Results: Repositories 185+ harvested repositories 21 countries 856,264+ documents

Some Results: Service-Oriented-Arch. 9 hosting nodes 25+ Functionality typologies (services) 36 service Instances 3 applications: DRIVER Main, Belgium, Spain-Recolecta

20 Some Results: Runtime-System & Hosting Enabling Layer Data Layer EU Open Access Repositories Functionality Layer Administrators End users Advanced User Interfaces National portals Project Applications

Another Compulsory Design Diagram

Some Results: A software Meant for large service providers only!

Technicalities

DRIVER and standards Service Resources are implemented as Web Services and accessed through the corresponding Web Service Interface Parameters calls are enveloped into SOAP messages The Enabling Services are also compatible with REST XML is the lingua-franca for the whole system Resource internal status, i.e. Resource profiles Profiles in Information Service use Exist XML engine Vocabularies Names of Languages: ISO 639 – 2 (three letters, B/T) Names of Countries: ISO 3166 (two letters) Date format: ISO 8601: 1988 (E) DRIVER Aggregation Harvesting according to OAI-PMH protocol Adopting OAI-Provenance best practice (OAI-about) To be extended to other object models and harvesting protocols Queries to Search and Index obey to SRW/CQL standard

25 DRIVER-II Midterm Review, January 30, Pisa 25 Enabling Layer Developments FunctionTaskPartnerStatusD-NET IS-StoreResource profile storeEnhanced Port (PERL > JAVA) CNRRC1.1 IS-S&NW3C S&N/TopicsEnhanced Port (PERL > JAVA) CNRRC1.1 IS-LookupResource discoveryEnhanced Port (PERL > JAVA) CNRRC1.1 IS-RegistryResource registration/de- registration/update Enhanced Port (PERL > JAVA) CNRRC1.1 ManagerOrchestration of DRIVER Info Space Enhanced Port (PERL > JAVA) CNRRC1.1 Authn&AuthzService-2-Service secure interaction/multiple applications Enhanced Service (JAVA)ICMProto2.0 MonitoringAdmin User Interface and autonomic administration Novel Service (JAVA)CNRRC1.2

26 DRIVER-II Midterm Review, January 30, Pisa 26 Data-Layer Developments FunctionTaskPartnerStatusD-NET HarvesterCollects arbitrary formatsPort (PERL > JAVA)UniBi/CNRAlpha2.0 TransformatorEases arbitrary mappingsNovel service (JAVA)UniBi/CNRAlpha2.0 Feature ExtractionExecutes transform.s. and utilities Novel service (JAVA)UniBiAlpha2.0 Text-EngineUtilities, e.g. language detection, full-text-extr. Novel service (JAVA)UniBiAlpha1.1 MD-StoreSupport special MD operations Port (PERL > JAVA)UniBiAlpha1.1 StoreGeneric store for binariesNovel service (JAVA)UniBi/ICM/C NR Proto2.0 IndexLookup table for stored information Adapt from YADDAICM/UniBiProd.1.0 OAI-ORE PublisherExposure of stored information Novel service (JAVA)CNRSpec.2.0 OAI-PMH PublisherExposure of stored information --CNRProd.1.0 Content ServiceManaging complex objectsNovel service (JAVA)CNRProto2.0 Access ServiceGeneric service for using remote objects Novel service (JAVA)CNRProto2.0

27 DRIVER-II Midterm Review, January 30, Pisa 27 Functional Layer Developments FunctionTaskPartnerStatusD-NET AIDEnhanced Publications management Novel Service (JAVA)NKUASpec.2.0 Advanced searchOptimized Search Similarity Search Enhanced Service (JAVA) Novel Service (JAVA) NKUA ICM Spec. 2.0 User ServicesAdvanced personalizationEnhanced Service (JAVA)NKUASpec.2.0 Community ServiceAdvanced Community management Enhanced Service (JAVA)NKUASpec.2.0 Web InterfaceGeneric to data model and services Enhanced UIs Enhanced Service (JAVA)NKUASpec. Spec

28 Current Work: DRIVER-II Networking Confederation with who-is-who advisory board Outreach: LIBER, SPARC, US, JAPAN etc… Consolidation DRIVER-I Services packaged and performing in production quality Enhancement DRIVER-I Services Improved indexing and data aggregation functionalities DRIVER-II Services: D-NET v2.0 Enhanced publication management and functionality

DRIVER II – D-NET v2.0 Studies What are „Enhanced Publications“? >> PDFPDF Technologies for „Enhanced Publications“ >> PDFPDF Long-Term Preservation of „Enhanced Publications“ „Technology Watch“: the Future >> PDFPDF Demonstrators „Enhanced Publications“ >> LiveLive „Enhanced Publications“ Long-Term Preserv. >> FilmFilm Infrastructure Specs. ready, Development in progress >> WIKIWIKI D-NET v1.1: Java-Porting & Build-System D-NET v1.2: New Aggregator, Installer (, Contracts) D-NET v2.0: Compound Object Management

Outlook: Enhanced Publications

Based on OAI-ORE

The Web-Capable Model – OAI-ORE

The Document Model for DRIVER

The Object Model – Internal Processing Primitives: Types, Sets and Objects Object: atoms, descriptions, relations

35 The DRIVER-application

Compound Object Management Object InstancesDRIVER Processing DRIVER Application Web-Representation Web-Processing

Conclusion

Lessons learnt Distributed data infrastructure requires links between organisational and technical concepts Data specialists, computer scientists, service providers Guidelines / content policies as a „glue“ In distributed data provision, quality and access measures are the most ‚expensive‘ tasks Distributed service operation (not data provision) can be solved but asks novel questions (SLAs) „Infrastructure“ for novel paradigms for scholarly communication are hard to get across ;-)

Summary DRIVER tackles the data infrastructure challenge from the text-repository side (mostly OAI-PMH) DRIVER handshakes with primary & secondary data through „enhanced publications“ DRIVER isn‘t only a project but a forum for information specialists ‚Products‘ include: Studies, Infrastructure run-time- system in production, software, support … DRIVER has adressed many problems for data and service interoperability in a distributed repository environment and found some solutions

But… How could DRIVER link to serious processing of unstructured data?

Thanks