Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thomas Lemberger Chief Editor, Molecular Systems Biology Deputy Head, Scientific Publications, EMBO Publishing actionable data.

Similar presentations


Presentation on theme: "Thomas Lemberger Chief Editor, Molecular Systems Biology Deputy Head, Scientific Publications, EMBO Publishing actionable data."— Presentation transcript:

1 Thomas Lemberger Chief Editor, Molecular Systems Biology Deputy Head, Scientific Publications, EMBO Publishing actionable data

2

3 Big Data

4 structures sequences functional genomics proteomics genotype phenotype Structured data metabolomics

5 ‘Publishing’ papers ‘Depositing’ datasets

6 “The two vital components of the scientific endeavor – the idea and the evidence – are too frequently separated”

7

8 Database Users Journals Centralized vs distributed infrastructure Research data

9 Scientific publishing Dominant channel for the dissemination of peer-reviewed data. Journals function as a proxy for quality in research assessment The rate of publishing keeps increasing. Papers are human-readable but poorly machine-readable.

10

11 Figure = Data Text = Narrative

12 Data in figures Use casesIssues Understand the data Re-analyze the data Give / claim credit for the data Seach for specific evidence Compare to related data Mine data systematically Browse through the data & the literature Complexity Unstructured No metadata standard Source data not available

13 Tools to publish figures as structured digital objects that link the human-readable illustrations with machine-readable metadata and ‘source data’ in order to improve data transparency; make published data useable; enable data-oriented search. 9/27 SourceData

14 A scientific result converted into a collection of pixels… 8/27 What is a figure?

15 11/27

16 12/27

17 Data archival service Data reproducibility Data reuse Data-oriented search

18 Reproducibility: figures as packages descriptive metadata RDF experimental data CSV figure JPEG manifest XML caption HTML code PY

19 Reproducibility: figures as packages

20

21

22

23

24 (A) Primary early-passage MEFs were infected with MSCV-Myc-ERTAM-IRES- GFP (Myc-ER) or MSCV-IRES-GFP (GFP) virus. GFP+ cells were then left untreated (−) or were treated (+) with 2 μm 4-HT±Chx pretreatment (30 min) for 24 h and assessed for their expression of the indicated mRNAs ( cks1, skp2, rcl and cdc) by SYBR-green real-time PCR analysis. Levels of mRNA were standardized to Ub.

25

26 Type:chemical and biological OBJECTS: small molecules, genes, proteins, sub-cellular structures, cell type, tissue, species. Role:role of each in the experimental DESIGN Assay:the type of ASSAY used to perform the measurements Measured object Target of intevention experimental system 15/27 Assay Knowledge model

27 Uniprot:P02340 Uniprot:Q61769 Gene:20613 Gene:433759 HDAC 1 SN1 Entity types uniprot:P12004 Gene:9773 Gene:107932 CHEBI:6635 Gene:6635 PubChem:72511 Uniprot:P27661 Uniprot:Q6PDQ2 Uniprot:P02340 Uniprot:O09106 A B C Y CHD4 HDAC1 p53 H2AX Phleo E-cadherin Ki67

28 ‘intervention’‘observation’ Experimental roles Gene:107932 CHD4 Uniprot:Q6PDQ2 CHD4 Uniprot:O09106 HDAC1 Uniprot:P02340 p53 Uniprot:P27661 H2AX PubChem:72511 Phleo

29 Curation tool for data editors

30

31

32 Data workflow SourceData curation OK? Major issues? REJECT Positive decision? REJECT ACCEPT Query author Author response Check data integrity OK? Check data presentation OK? Check plagiarism OK? Check expanded view files OK?

33 Validation by authors

34 Application: ‘Smart Figures’ Use casesIssuesSmart Figures Understand Re-analyze Credit attribution Directed seach Contextualization Data mining Browsing Complexity Unstructured No metadata standard Source data not available Panel as coherent units Descriptive metadata Standard identifiers Source data files Visual summarization Data-oriented queries Actionable data viewer

35 35 Paper 1 Paper 2 Data viewer Data-oriented search

36 Resulting hypothesis: test drug Z in disease D. tissue T disease D gene x Paper 3 protein X P P kinase Y Paper 2 kinase Y activity drug Z Paper 1 Data integration 19/27

37 Database Users Research data Journals Centralized vs distributed infrastructure

38 ‘Next Gen’ Open Access Search

39 39

40 search

41 Title Abstract Synopsis Main paper ‘Supplementary information’ Datasets & code What is a paper?

42 Smad3 Hey1 TGFbeta VE-cdh Rad51 foci AR Tsc2 1 4 62 5 3 1,4 4 5 6 2 … … Rad51 Nuclear complexes TGFb, Smad3


Download ppt "Thomas Lemberger Chief Editor, Molecular Systems Biology Deputy Head, Scientific Publications, EMBO Publishing actionable data."

Similar presentations


Ads by Google