The Role of Abstract and Citation Databases in Supporting Data Repositories DataCite Workshop: Möglichkeiten und neue Lösungen im Forschungsdatenmanagement Köln - 12/12/2012 Michael Habib, MSLS Product Manager, Scopus
Broadest source for research answers 18,819 Peer reviewed journals 405 Trade journals 332 Book series A rich and extended coverage including 49M total records 28M post-1995 records 21M pre-1996 records 6.5k pre-1996 conf events 10.5k post-1995 conf events 5.3M total conference records (10%) 844k book items More than 5,000 publishers 1,900 Open Access journals “Articles in Press” from more than 3,850 titles Abstracts going back to languages covered 19,804 active titles 248 Conf. series 2 million new records are added each year via daily updates Total average processing time: 5 days
Breadth of coverage across subject areas More than 19,500 titles in Scopus, titles can be in more than one subject area Health Sciences 6,300 (100% Medline) Nursing Dentistry etc., Social Sciences 6,350 Psychology Economics Business A&H etc., Life Sciences 4,050 Neuroscience Pharmacology Biology etc., Physical Sciences 6,600 Chemistry Physics Engineering etc.,
Broader coverage than nearest peer Scopus (Total: 19,809) Web of Science (Total: 12,311) 8, ,377 Scopus added value
Broadest coverage of quality global content including Asia and emerging countries … Nearest Competitor Scopus 00 150 300 500 00 250 00 1,000 2,000 00 1,000 2,000 00 4,000 8,000 00 4,000 8,000 600 300 00 Elsevier constitutes approximately 15% of titles in Scopus
More expansive coverage does not mean lower standards Scopus Content Selection & Advisory Board (CSAB)
Scopus selection criteria Journal policy Convincing editorial concept/policy Level of peer-review Diversity in geographic distribution of editors Diversity in geographic distribution of authors Quality of content Academic contribution to the field Clarity of abstracts Quality and conformity with stated aims & scope Readability of articles Journal standing Citedness of journal articles in Scopus Editor standing Regularity No delay in publication schedule Online availability Content available online English-language journal home page Quality of home page Minimum criteria Peer-review English abstracts Regular publication References in Roman script Publication ethics statement
Titles reviewed (n=2,279, January 2011 – 15 May 2012) 2,279 titles reviewed of which 41% accepted Number of titles reviewed Acceptance rate
(Researchers, N = 3824 ; study by Publishing Research Consortium, 2010) High importance but not easily accessible
– establish easier access to research data on the Internet – increase acceptance of research data as legitimate, citable contributions to the scholarly record – support data archiving that will permit results to be verified and re-purposed for future study. From: emphasis my ownhttp://datacite.org/whatisdatacite What is DataCite?
Pro’s Coupling of data and article Peer review Preservation (byte-wise) Citation mechanism Con’s Limited data type support Compatibility (format support) Limited capacity Data not centrally stored Supplementary Material
Supplementary material is not a perfect solution Many poor solutions in use: data on PCs, university websites, personal homepages,... Data repositories: the community’s answer? – Scientists prefer independent data repositories above publishers – Domain-specific coordination – Centralized information “hubs” “Raw data should be freely accessible to researchers” “... believe that, as a general principle, data sets, raw data outputs of research, and sets or subsets of that data should wherever possible be made freely accessible to other scholars...” (Statement from STM & ALPSP, June 2006) Connecting with Data Repositories
DatabaseSubjectType of Linking CCDCCrystallographyArticle-level PANGAEAEarth SciencesArticle-level* EMBL Molecular Interactions ChemistryEntity, tagging Molecular INTeraction DBChemistryEntity, tagging GenbankNucleotidesEntity, tagging UniProtProteinsEntity, tagging Protein Data BankProteinsEntity, tagging ClinicalTrialsMedicineEntity, tagging TAIR (Arabidopsis)Model organismEntity, tagging Mendelian Inheritance in Men Genetics, inheritanceEntity, tagging *: with Application ScienceDirect Examples
PANGAEA Supplementary Data
– establish easier access to research data on the Internet – increase acceptance of research data as legitimate, citable contributions to the scholarly record – support data archiving that will permit results to be verified and re-purposed for future study. From: emphasis my ownhttp://datacite.org/whatisdatacite What is DataCite?
Scopus Example
(Researchers, N = 3824 ; study by Publishing Research Consortium, 2010) High importance but not easily accessible
1. Pilot with specific community of authors, publishers, and data repositories, to try and change behaviours (in concept phase) 2. Track, count, and analyze citations to Documents as proof of Data impact (research needs to be done) 3. Establish links from Scopus Document Records to related Data sets to improve discovery (PANGAEA first step, looking to expand) 4. Ingest and index Data Repository (DataCite) records and enable searching from Scopus (the future) 5. Track Citations from Documents to Data sets (the more distant future) Scopus priorities moving forward
Michael Habib, MSLS Product Manager, Scopus Thank you