A centre of expertise in digital information management UKOLN is supported by: Monica Duke Project Manager, SageCite Project #sagecite Developing Data Attribution and Citation Practices and Standards An International Symposium and Workshop August 22-23, 2011
A centre of expertise in digital information management Citation in the domain of disease network modelling Funded: August 2010 – July 2011
A centre of expertise in digital information management SageCite project overview Review of data citation (issues, technology) Understanding the domain –Sage Bionetworks partners in project –Site visit –Documenting processes (workflow tools)
A centre of expertise in digital information management SageCite project overview Demonstrator –Adding support for data citation –Using DataCite services Working with publishers Benefits analysis: KRDS Taxonomy
A centre of expertise in digital information management US-based non-profit organisation Creating a resource for community- based, data-intensive biological discovery Community-based analysis is required to build accurate model
A centre of expertise in digital information management US-based non-profit organisation Creating a resource for community- based, data-intensive biological discovery Community-based analysis is required to build accurate models
Slide by Lara Mangravite Sage Bionetworks
A centre of expertise in digital information management Sage data and processes Data curation Statistical QC Genomic analysis Network construction Network analysis Data miningValidation Idealised 7-stage process A combination of phenotypic, genetic, and expression data are processed to determine a list of genes associated with diseases Different people are responsible for different stages of the modelling process. One person oversees the whole process.
A centre of expertise in digital information management Stage 1: Data Curation –basic data validation to ensure integrity and completeness –datasets include microarray data and clinical data. –ensures that the format of the data is understood and the required metadata is present.
A centre of expertise in digital information management Data curation Statistical QC Genomic analysis Network construction Network analysis Data miningValidation
A centre of expertise in digital information management Agreeing standards to support sharing Derry J et. al Developing predictive Molecular Maps of Human Disease through Community-based Modeling. npre pdf
A centre of expertise in digital information management Workflow capture using Taverna Documenting data processes through workflow tools –supports better citation –makes the cited resource more re- usable –strengthening the reproducibility and validation of the research.
A centre of expertise in digital information management Data Citation Purposes For attribution –Leading to credit and reward For reproducibility –Supports validation, re-use Eric Schadt at Sage Bionetworks Congress 2011 – ap_Building (start at 4.28) ap_Building
A centre of expertise in digital information management Open challenges: attribution Preserving link with original data –Some discipline-based repositories have their own identifiers –Bi-directional links Attributing data creators –including individuals? Defining creation of new intellectual object e.g. curated dataset? Cultural challenge in recognising non-standard contributions; microattribution New metrics Identification of contributors
A centre of expertise in digital information management Open challenges: reproducibility Identification and granularity –Discipline identifiers, global identifiers –How much value has been added since the data entered the workflow? Identifying processes and software
A centre of expertise in digital information management Acknowledgements University of Manchester –Carole Goble –Peter Li British Library –Max Wilkinson –Tom Pollard Sage Bionetworks UKOLN –Liz Lyon –Monica Duke Nature Genetics –Myles Axton PLoS Comp Bio –Phil Bourne