Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD 8/22/11Data Attribution and Citation
Life Science Data Repositories NLM is the elephant in the room.. However.. There are thousands on community maintained efforts – all want an NAR publication The ability to cite and attribute the data are highly variable: –DOIs assigned in some cases, but not used –Attribution is through the metadata in most cases –Citation is typically by the associated literature reference if it exists, and/or a database identifier –The use of data repositories such as Dryad is compelling for the long tail problem –Data journals are on the horizon 8/22/11Data Attribution and Citation
Consider the PDB as a Use Case Oldest data resource in biology? A resource used by ~ 200,000 individuals per month – increasing number of school kids! A resource distributing worldwide the equivalent to ¼ the National Library of Congress each month A bicoastal/worldwide resource 1TB 8/22/11Data Attribution and Citation
Number of released entries Year PDB Typical Growth Curve – But the Complexity! 8/22/11
People are doing more with the data Number of visits and page views is growing faster than number of unique visitors
The Data May Save Lives? * Jan. 2008Jan. 2009Jan. 2010Jul. 2009Jul. 2008Jul RUZ: 1918 H1 Hemagglutinin Structure Summary page activity for H1N1 Influenza related structures * 3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir
PDB Data Attribution and Citation About 25% of our budget has been spent on data remediation – multiple versions supported – the copy of record (as defined by the publication) is always available Cant publish unless data are deposited – motivated by the community - very good data to publication correspondence Data objects are discreet and we assign DOIs – but they are not used – database identifiers preferred 8/22/11Data Attribution and Citation
Ah yes.. But the CD4 Story…
1. A link brings up figures from the paper 0. Full text of PLoS papers stored in a database 2. Clicking the paper figure retrieves data from the PDB which is analyzed 3. A composite view of journal and database content results Literature/Data Integration 1.User clicks on content 2.Metadata and webservices to data provide an interactive view that can be annotated 3.Selecting features provides a data/knowledge mashup 4.Analysis leads to new content I can share 4. The composite view has links to pertinent blocks of literature text and back to the PDB The Knowledge and Data Cycle PLoS Comp. Biol (3) e34 8/22/11
Example of Interoperability: The Database View BMC Bioinformatics :220
Example of Interoperability – The Literature View From Anita de Waard, Elsevier
Acknowledgements Funding Agencies: NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS, NIDDK 128/22/11Data Attribution and Citation