The Discovery Landscape in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK – eBank UK project A centre of expertise in digital information management eBank/R4L/Spectra workshop, London, UK20 th October 2006
Overview How is the discovery landscape shaping up in crystallography? What are the potential problems for discovery? Where do digital library technologies fit into the infrastructure? Questions, questions
Where are we now? How did we get here? Small number Tightly- managed Trusted Independent Distributed 1:1 communication Agreed formats Datalinks to journal articles Open access journals Individuals putting materials on websites e ChemReferDareNetOAIster PRE-WEBWEBSEMANTIC WEB?
Dimensions of repositories or services Management individual initiative institutional professional society Procedures Level of control Policies Formality Documentation Comprehensive Coverage subject national International Protocols
Discovery Dilemmas Services for human consumption Differences in web interface and searching capabilities With increase in numbers, user cannot search each individually Incomplete support for automated information exchange/agents
Digital Library Infrastructures & technologies Data providers Service providers Harvesting based on OAI-PMH
The OAI-PMH OAI Protocol for Metadata Harvesting simple protocol for sharing metadata records between applications currently at version 2.0 based on HTTP, XML, XML Schema and XML namespaces allows a harvester to ask a remote repository for some or all of its metadata records where some is based on date-stamps, sets, metadata formats
Metadata in the eBank UK project Simple Dublin Core Intended for resource discovery Compatible with OAI-PMH Qualified to specify vocabularies Refinements: aid interpretation of element value E.g. seafood
Metadata terms Creator Rights Date Type Identifier Specified using XML schema and documented using an Application Profile Subject InChI ChemicalFormula Organic
OAI-PMH unsolved problems Partial solution – infrastructure needs to encompass other technical solutions Immature experience of service provider models Selection: identification of repositories of interest and subset of content therein Duplication of resources Metadata quality: what makes good metadata and how to generate (consistently)
Questions, questions How can these resources be joined up to offer useful services to users? What is the role of OAI-PMH? What other interfaces need to be considered? Who are the communities of users of crystallography data? How can they be defined and described? What is a useful service?
Questions, questions Do users have overlapping (information) needs, interest in common (subsets of) sources? How can information needs be identified and described? What sorts of solutions are appropriate? What are the interface design implications? What discovery tools are already being used? Can tools/services be adapted, do we need new ones? What is the role of publishers?
Information sources for Crystallography Cross-discipline sources OAIster DAREnet Discipline-specific ChemRefer Chemistry Central Crystallography Open Database Reciprocal Net Texts/publications, chemistry general Data, crystallography
The discovery landscape Some within OAI-PMH infrastructure (metadata- based) Variety of (human) search interfaces (simple to advanced) Well established sources Cambridge Structural Database Protein Data Bank
OAIster An OAI-PMH aggregator Wide-ranging and inclusive: Any repository, all content types Metadata from 675 institutions Limit by resource type inc. datasets (5 results) Pointers to collections of data records for crystallography Results spread across several sources
OAIster
DAREnet Worldwide access to Dutch academic research results Simple search: crystallography (40 results) General advanced search (author, year)
DAREnet
ChemRefer Access to full text chemical, pharmaceutical literature Index Simple search interface
ChemRefer
ChemRefer display of results
Chemistry Central No search feature (through Biomed central)
Crystallography Open Database (COD) Promotes open data Allows submission REF format also used 40K entries
COD
Reciprocal Net A distributed crystallography network for researchers, students and the general public Search engine Crystallography-specific search interface
Reciprocal Net Search Interface
Dataset result in Reciprocal Net
Joining up the landscape Technical infrastructure differences can be overcome Agreement on common APIs, metadata sets Hide API differences from user Survey in one application area – how similar are other disciplines?
Issues with cross-search Audiences Who are the user groups? What are their information needs? Selection Identifying subsets of interest Human Interface design Search options Presentation of heterogenous information
URLs OAIster DAREnet ChemReferhttp:// Chemistry Central Crystallography Open Database Reciprocal Net