Metadata For CARMEN Phillip Lord and Frank Gibson
Problems “In the standard model, one collects data, publishes a paper or papers and then gradually loses the original dataset.” THE NEW KNOWLEDGE ECONOMY AND SCIENCE AND TECHNOLOGY POLICY Geoffrey Bowker, University of California, San Diego Geoffrey Bowker, University of California, San Diego
The need for clear metadata Most neurosciences data is relative simple in structure But often contextually complex Sometimes associated with behavioural features
Neuroscience spike data The raw data is just a waveform But what is the experiment for? What stimulus is the organism/tissue receiving? Even, which channel is which? The data sets being produced are (reasonably) large (10’s of Gb, or 1Tb in three months)
Information Extraction How do we get extract the information? istockphoto.com
Multi-Author data AuthorPMIDTypeSize 1Davierwala et al Synthetic_Lethality627 2Krogan et al Affinity_Capture-MS164 3Hazbun et al Affinity_Capture-MS3210 4Gavin et al Affinity_Capture-MS3596 5Ho et al Affinity_Capture-MS733 6Ito et al Two-hybrid275 From Katherine James, NCL
How do we represent… Laboratory Experiments In silico Analysis Derived data
Joseph Whitworth
Metadata Description of results Sample How it was generated Equipment Processing steps Expensive to capture Important to validate result Lab-book
The need for standards! “established by consensus and approved by a recognized body, that provides, […] rules, […] for […] the optimum degree of order in a given context” BSI -
View from microarrays Content Standard – Minimal Information MAGE -- Structure MO -- Terminology From the MGED society
Life science communities SocietyDomainWebsite The Genomics Standards Consortium (GCS) Genomicshttp://darwin.nox.ac.uk/gsc/ Microarray and Gene Expression Data Society (MGED) Genomicswww.mged.org Proteomics Standards Initiative (PSI) Proteomicshttp://psidev.info Metabolomics Standards Initiative (MSI) Metabolomicswww.metabolomicssociety.org Flow Cytometry experiment Community Flow Cytometry
MINI – electrophysiology General Features Study Subject Recording Location Task Stimulus Recording Time Series Data
Recording Location Recording Location Structure Brain Area Slice Thickness Slice Orientation Cell Type –Cell Type co-ordintates –Location conformation
View from microarrays Content Standard – Minimal Information MAGE -- Structure MO -- Terminology From the MGED society
Functional Genomics Experiment (FuGE) Model of common components in science investigations, such as materials, data, protocols, equipment and software. Provides a framework for capturing complete laboratory workflows, enabling the integration of pre-existing data formats.
Robot Reference set of 5,000 mutant strains ‘Folate’ +-+- ‘MMS’ --++ Data curation. Functional analysis. Interactions with in silico programme. * * * Robot Screen mutants for sensitivity to damage/nutrition Part of CISBAN in a nutshell
CISBAN dataflow Neil Wipat, Newcastle University
Data Entry with SYMBA Allyson Lister, Newcastle University
Data Entry with SyMBA
Summary We are generating metadata “standards” for neurosciences We are following a well-trodden path from bioinformatics We adopted FuGE and have built MINI
Future Work More neurosciences experimental datatypes. Minimal Information about a Service –Describe analysis software as well as lab experiments. Outreach!
Acknowledgements MINI: Frank Gibson, Paul G Overton, Tom V Smulders, Simon R Schultz, Stephen J Eglen, Colin D Ingram, Stefano Panzeri, Phil Bream, Evelyne Sernagor, Mark Cunningham, Christopher Adams, Christoph Echtermeyer, Jennifer Simonotto, Marcus Kaiser, Daniel C Swan, Martyn Fletcher, Phillip Lord CISBAN: Anil Wipat (PI), Allyson Lister (Research Associate),