The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld
DRIVER motivation Scholarly communication changes towards distributed provision of text, data and services Repositories are thought as a saviour in this development building such a distributed system An infrastructure supporting distributed repositories and services is needed (and reactions)
Question today Is an overarching infrastructure bridging between distributed text-data and primary/secondary data possible? DRIVER has adressed many problems and found many answers in the domain of distributed text repositories But we dont know yet, whether or not these are transferable to the data domain
Some observations on data Data landscape very diverse Formats differ widely – unlike text publications Descriptions are often highly subject-specific Some have special provenance (e.g. vendor software) Some require special rendering, education, caution … Data require disciplinary support Better managed by researchers than service providers Still, data interoperability acknowledged Double effort: many data are lost to re-use/remix Good practice in research, also WRT publications Transparency, Falsifiability, testability …
Some observations on repositories They represent a shift towards … open internet-exposure as opposed to closed database (graveyards) content orientation as opposed to mere technical orientation (web-servers) distributed systems centralized structures not immediateley required nowadays
Everybody can be a publisher Common description standards e.g. Dublin Core Metadata Initiative Many subject-specific standards Common transfer protocols e.g. OAI-PMH, but also FTP, XML-RPC, WS, etc. Searchability is possible! Still: many data are lost to re-use/remix Closed: too sensible, weakly described, unimportant (???) Missing service frameworks / infrastructures Problems: Data and service interoperability Solution: Infrastructure Repositories can solve access problem
What infrastructures are: DRIVER terms Not an infrastructure Single repository Single application for search and retrieval (e.g. BASE) Only local operation Backwards causation on repositories is missing Maybe an infrastructure Distributed repository landscape as a whole As a capacity for emergent properties, e.g. quality and quantity incentive for data population Nurturing development of service providers Definitely an infrastructure Many service providers in one organisational and technical context (e.g. run-time environment) Enabling re-use and remix of data and services
DRIVER Objectives Organisational structure for repositories e.g. the Confederation Improving quality and standards in local rep. e.g. validation procedures Building a distributed runtime system e.g. service and data sharing Target Groups Repository Managers Service Providers Information System Executives
The DRIVER approach is incremental Start with publication metadata Existing distributed system, somehow connected Considerable homogeneity and formats: OAI-PMH Extend geographical coverage From 5 countries, to 10, to 27, to ??? Extend towards other contents From publication metadata to enhanced publications, i.e. representations of texts + data Learn about subject specificity Data bring in disciplinary requirements
10 The DRIVER Initiative DRIVER-I 6/2006 – 11/2007 Organisational Models and Technical Test-Bed DRIVER-II 12/2007 – 11/2009 Running Organisation and Production Infrastructure DRIVER-Confederation 2010ff Operations Office and Technical Deployment NB: DRIVER is not an authoritative body, it is a liberal bottom-up initiative of stakeholders
DRIVER partners and related projects Networking, Support, Policy, Studies Göttingen, Nottingham, SURF, Genth, Ljubiljana, Minho, Copenhagen Technical development and deployment Athens, Bielefeld, Pisa, Warsaw Partners make links to many other things OA-services: Sherpa-ROMEO, OpenDOAR, BASE… Projects: Europeana, PEER, DELOS, DL.org, D4Science, PARSE-Insight, NESTOR… Orgs: DINI, JISC, LIBER, SPARC, KE … Platforms: DSPACE/FEDORA/OPUS/ePrints
Some Results: Studies
Some Results: A Portal
Some Results: A Search
Some Results: Repository Registration
Some Results: Guidelines Build on knowledge from past & current IR projects (EU) 26 actively involved contributors (experts and repository managers) from 8 countries. Practical answers on how to: Improve full-text access Standardize metadata quality Create a reliable infrastructure for permanent identification, resolution, traceability and storage Resolve semantic and classification issues
Some Results: Support structures
Some Results: Repositories 185+ harvested repositories 21 countries 856,264+ documents
Some Results: Service-Oriented-Arch. 9 hosting nodes 25+ Functionality typologies (services) 36 service Instances 3 applications: DRIVER Main, Belgium, Spain-Recolecta
20 Some Results: Runtime-System & Hosting Enabling Layer Data Layer EU Open Access Repositories Functionality Layer Administrators End users Advanced User Interfaces National portals Project Applications
Some Results: A software Meant for large service providers only!
22 Current Work: DRIVER-II Networking Confederation with who-is-who advisory board Outreach: LIBER, SPARC, US, JAPAN etc… Consolidation DRIVER-I Services packaged and performing in production quality Enhancement DRIVER-I Services Improved indexing and data aggregation functionalities DRIVER-II Services Enhanced publication management and functionality
Outlook: Enhanced Publications
Based on OAI-ORE
Lessons learnt Distributed data infrastructure requires links between organisational and technical concepts Data specialists, computer scientists, service providers Guidelines / content policies as a glue In distributed data provision, quality and access measures are the most expensive tasks Distributed service operation (not data provision) can be solved but asks novel questions (SLAs) Infrastructure is a very tough concept to get across and eventually forms a complex system Simplification makes it weaker, e.g. re-use is restricted
Summary DRIVER tackles the data infrastructure challenge from the text-repository side (mostly OAI-PMH) DRIVER handshakes with primary & secondary data through enhanced publications DRIVER isnt only a project but a forum for information specialists Products include: Studies, Infrastructure run-time- system in production, software, support … DRIVER has adressed many problems for data and service interoperability and found solutions What are the required steps to support data?
Thanks