Download presentation
Presentation is loading. Please wait.
Published byKelley Hamilton Modified over 9 years ago
1
Thomas Lemberger Chief Editor, Molecular Systems Biology Deputy Head, Scientific Publications, EMBO Publishing actionable data
3
Big Data
4
structures sequences functional genomics proteomics genotype phenotype Structured data metabolomics
5
‘Publishing’ papers ‘Depositing’ datasets
6
“The two vital components of the scientific endeavor – the idea and the evidence – are too frequently separated”
8
Database Users Journals Centralized vs distributed infrastructure Research data
9
Scientific publishing Dominant channel for the dissemination of peer-reviewed data. Journals function as a proxy for quality in research assessment The rate of publishing keeps increasing. Papers are human-readable but poorly machine-readable.
11
Figure = Data Text = Narrative
12
Data in figures Use casesIssues Understand the data Re-analyze the data Give / claim credit for the data Seach for specific evidence Compare to related data Mine data systematically Browse through the data & the literature Complexity Unstructured No metadata standard Source data not available
13
Tools to publish figures as structured digital objects that link the human-readable illustrations with machine-readable metadata and ‘source data’ in order to improve data transparency; make published data useable; enable data-oriented search. 9/27 SourceData
14
A scientific result converted into a collection of pixels… 8/27 What is a figure?
15
11/27
16
12/27
17
Data archival service Data reproducibility Data reuse Data-oriented search
18
Reproducibility: figures as packages descriptive metadata RDF experimental data CSV figure JPEG manifest XML caption HTML code PY
19
Reproducibility: figures as packages
24
(A) Primary early-passage MEFs were infected with MSCV-Myc-ERTAM-IRES- GFP (Myc-ER) or MSCV-IRES-GFP (GFP) virus. GFP+ cells were then left untreated (−) or were treated (+) with 2 μm 4-HT±Chx pretreatment (30 min) for 24 h and assessed for their expression of the indicated mRNAs ( cks1, skp2, rcl and cdc) by SYBR-green real-time PCR analysis. Levels of mRNA were standardized to Ub.
26
Type:chemical and biological OBJECTS: small molecules, genes, proteins, sub-cellular structures, cell type, tissue, species. Role:role of each in the experimental DESIGN Assay:the type of ASSAY used to perform the measurements Measured object Target of intevention experimental system 15/27 Assay Knowledge model
27
Uniprot:P02340 Uniprot:Q61769 Gene:20613 Gene:433759 HDAC 1 SN1 Entity types uniprot:P12004 Gene:9773 Gene:107932 CHEBI:6635 Gene:6635 PubChem:72511 Uniprot:P27661 Uniprot:Q6PDQ2 Uniprot:P02340 Uniprot:O09106 A B C Y CHD4 HDAC1 p53 H2AX Phleo E-cadherin Ki67
28
‘intervention’‘observation’ Experimental roles Gene:107932 CHD4 Uniprot:Q6PDQ2 CHD4 Uniprot:O09106 HDAC1 Uniprot:P02340 p53 Uniprot:P27661 H2AX PubChem:72511 Phleo
29
Curation tool for data editors
32
Data workflow SourceData curation OK? Major issues? REJECT Positive decision? REJECT ACCEPT Query author Author response Check data integrity OK? Check data presentation OK? Check plagiarism OK? Check expanded view files OK?
33
Validation by authors
34
Application: ‘Smart Figures’ Use casesIssuesSmart Figures Understand Re-analyze Credit attribution Directed seach Contextualization Data mining Browsing Complexity Unstructured No metadata standard Source data not available Panel as coherent units Descriptive metadata Standard identifiers Source data files Visual summarization Data-oriented queries Actionable data viewer
35
35 Paper 1 Paper 2 Data viewer Data-oriented search
36
Resulting hypothesis: test drug Z in disease D. tissue T disease D gene x Paper 3 protein X P P kinase Y Paper 2 kinase Y activity drug Z Paper 1 Data integration 19/27
37
Database Users Research data Journals Centralized vs distributed infrastructure
38
‘Next Gen’ Open Access Search
39
39
40
search
41
Title Abstract Synopsis Main paper ‘Supplementary information’ Datasets & code What is a paper?
42
Smad3 Hey1 TGFbeta VE-cdh Rad51 foci AR Tsc2 1 4 62 5 3 1,4 4 5 6 2 … … Rad51 Nuclear complexes TGFb, Smad3
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.