Arnold H. Rots & Sherry L. Winkelman Chandra Data Archive Smithsonian Astrophysical Observatory Rots & Winkelman - IAU XXIX 2015, FM31
Context For the Chandra Data Archive we have: A complete bibliography for the entire mission, covering more than 50 journals With complete linking to the individual datasets in the archive using persistent identifiers With extensive bibliographic metadata which are paired with the datasets’ metadata in the archive Allowing custom compound datasets and submission of higher level data products Bidirectional harvesting of links with the ADS Rots & Winkelman - IAU XXIX 2015, FM32
Bibliography Objectives More or less in chronological order: Complete the record of a mission or observatory Usually entrusted to the librarians: a compilation of articles selected on the basis of a carefully defined set of criteria Provide observatory performance metrics Extracted from the existing bibliography: basically numbers of papers and numbers of citations Provide useful research information This requires an additional skill set Rots & Winkelman - IAU XXIX 2015, FM33
Number of Chandra Science Papers Rots & Winkelman - IAU XXIX 2015, FM34
Average Citation Count per Article Rots & Winkelman - IAU XXIX 2015, FM35
Limited Information These metrics can only provide limited information We know the character of the papers changes as the mission ages They may be enhanced by impact factors But they don’t provide information on how and how well the repository’s data are being used Rots & Winkelman - IAU XXIX 2015, FM36
Enter the Science Turning the bibliography into a research tool requires one more component: Linking articles to (individual!) datasets The crucial benefit: It allows linking bibliographic metadata with observational metadata Consequently: A rich parameter space for scientific data mining Opportunities for more informative performance metrics Rots & Winkelman - IAU XXIX 2015, FM37
New Parameters for Metrics Some of the most useful pieces of information: Observing date and publication date Exposure time Instrument Observing mode The next three slides show what the additional information allows one to do Rots & Winkelman - IAU XXIX 2015, FM38
Time till First Publication Rots & Winkelman - IAU XXIX 2015, FM39
Continued Publication Rots & Winkelman - IAU XXIX 2015, FM310
Percentage of Exposure Time Published Rots & Winkelman - IAU XXIX 2015, FM311
Getting Back to this Graph: The next slides show what else can be learned Rots & Winkelman - IAU XXIX 2015, FM312
Amount of Unique Exposure Time (in ks) Published Annually Rots & Winkelman - IAU XXIX 2015, FM313
Percentage of Available Exposure Time Published Annually Rots & Winkelman - IAU XXIX 2015, FM314
The Answer is… … 42 Regarding articles with archival content, in 2013: 20% of papers presented new data 20% of papers presented a mix of new and old data 60% of papers presented data previously published Rots & Winkelman - IAU XXIX 2015, FM315
Suggested Metrics (These metrics are indicated in slide 11) Median time τ till first publication Percentage of exposure time published at 4τ Percentage of exposure time published more than 5 times at 5τ Or: time delay after which 50% of exposure time has been published more than 5 (TBD) times Rots & Winkelman - IAU XXIX 2015, FM316
Differentiate It now is trivial to calculate these metrics for different types of instruments and observations We found, for instance: That the percentage of exposure time published is not the same for different instruments Grating observations have a longer median time till first publication: analysis takes more work DDT observations have a shorter median time: they are hot subjects – and have shorter, or no, proprietary time Rots & Winkelman - IAU XXIX 2015, FM317
Caveats However, Metrics should only be considered in context 93% of exposure time published is great for a pointed telescope like Chandra, but cannot be expected (and was never intended) for an all-sky monitor It takes resources – until we get better text analysis tools It needs to be a collaboration between librarians and scientists – which is what we are arguing for: both will benefit and it’s worth the effort Rots & Winkelman - IAU XXIX 2015, FM318
As a ResearchTool There are a host of other things that would be extremely helpful: Increasing bibliographic metadata through text analysis Devising a mechanism that allows variable granularity in PIDs and data retrieval Encouraging users to incorporate PIDs in manuscripts But that is a different story (told in poster DB1.05) Rots & Winkelman - IAU XXIX 2015, FM319