Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data management, curation, statistical analysis & display Bob Sinkovits AfCS Bioinformatics Lab San Diego Supercomputer Center UC San Diego.

Similar presentations


Presentation on theme: "Data management, curation, statistical analysis & display Bob Sinkovits AfCS Bioinformatics Lab San Diego Supercomputer Center UC San Diego."— Presentation transcript:

1 Data management, curation, statistical analysis & display Bob Sinkovits AfCS Bioinformatics Lab San Diego Supercomputer Center UC San Diego

2 The data management problem Collecting and archiving data Tracking meta-data associated with experiments (reagents, technicians, labs, dates, machine settings, protocols, etc.) Processing raw data Curation Organization and display Data distribution

3 Data collection Data acquisition for the AfCS involves the separate transfer of experimental data and the description of the experiment (meta–data) SDSC Experimental Lab GUIs wget data (results) meta-data

4 Data collection Experimental data files transferred on a nightly basis using the UNIX wget utility under control of cron job StanfordCaltech SDSC UTSWUCSF Ca++, cAMP phosphoprotein cytokine microarray microscopy single cell Ca++ Ca++ Vanderbilt Lipid MS Myriad Y2H

5 Data collection Meta-data inserted directly into the AfCS Oracle database through a set of GUIs Sample, experiment, cell line, etc. IDs are generated automatically based on date, laboratory code, etc. Error checking, the use of pull down menus, and database constraints ensure that valid data entered into GUIs

6 Data collection

7 Barcoding All experimental samples and materials (protein extracts, gels, cell preps, plasmids, solutions, reagents, etc.) are physically labeled using a 2-d barcode. Zebra Z4M barcode printer Symbol Cyclone scanner

8 Data/information flow Labs SDSC parse.pl SRB Oracle 9i Disk / Tape silo Off-site backup (Caltech) www postprocess.pl curation GUIs data meta-data

9 Storage of processed data Each type/category of experimental data is stored in a separate database schema Easier to work with schemas containing smaller numbers of tables Minimizes possibility of data loss/corruption Avoids confusion due to multiple developers working in a single schema (overlap of namespaces) Easier recovery Privileges granted as needed between schemas

10 DataCenter organization Data organized into several main sections Ligand screen Two-ligand screen Microscopy Yeast two-hybrid Plasmid Antibody Lipid FXM

11 Ligand screen Measure response of cells due to stimulation by single ligands, using consistent conditions across all assays Splenic B cell Ca++ cAMP phosphoprotein (11) microarray (cDNA) Raw 264.7 Ca++ cAMP phosphoprotein (21) cytokine (18)

12 Ligand screen data archives Results for ligand/assay combination Y/N used to provide quick overview Assay details Ligand details

13 Ligand screen Results page contains explanation of assay, graphical display of data, and links to annotated tab- delimited files CGS_30_uM_BC data

14 Ligand screen

15 Double ligand screen Similar to single ligand screen, but involved stimulation by pairs of ligands, either sequentially or simultaneously Splenic B cell Ca++ cAMP Raw 264.7 Ca++ cAMP phosphoprotein (21) cytokine (18)

16 Double ligand screen Link to results found at intersection of ligand pair. Annotation based on additivity of ligand responses

17 Double ligand screen Sample from phosphoprotein two-ligand display. Individual thumbnails linked to additional results

18 Double ligand screen All results for phosphoprotein, ligand1, ligand2 combination

19 Phosphoprotein display in cell signaling context Quick overview of the signaling pathways activated User-friendly and attractive presentation of the data Easy way to navigate through the data Highlight of the regulated proteins http://biome.sdsc.edu:9080/WesternDisplay Goals

20 Phosphoprotein/signaling map

21

22

23

24

25 Data archives Archives of data sets can be downloaded at ftp://ftp.afcs.org/pub/datacenter

26 Data curation Need to provide convenient way for the AfCS labs to curate data By ligand (don’t release until replicated) By experiment (flag bad experiments) By sample (flag bad samples w/o discarding expt) Web interfaces for curation have been developed and are restricted by user

27 Data curation Ligand, experiments, and samples can be annotated in three ways Public – available for public Internal – restricted to internal use. Validity of data still being investigated or experimental conditions not yet replicated Invalid – experiment or sample flagged as being bad; not available to anyone

28 Data curation

29 Data curation by ligand For curation by ligand, interface is based on the public display with additional features

30 Data curation by sample/expt Curate by experiment Curate by sample

31 Data curation by sample/expt Curate by experiment Curate by sample

32 Data curation by sample/expt For some assays, such as cytokine and phosphoprotein, the large number of samples make curation by sampleid impractical. Curation limited to the experiment level

33 Data curation by sample/expt Similar curation interfaces have been setup for FXM data Lentivirally-Transduced RAW264.7 cells

34 Acknowledgements Madhusudan, Ilango Vadivelu – LIMS Stephen Lyon – web master Brad Kroeger – systems administration Chic Barna, Ray Bean – database administration Sylvain Pradervand – phosphoprotein display Shankar Subramaniam – “glue” Ron Taussig, Gil Sambrano, Richard Scheuermann - data center design Paul Sternweis – Ca++, cAMP display Susie Mumby – phosphoprotein, cytokine display Lonnie Sorrels, Keng-Mean Lin, Sangdun Choi, Nick Wong, Robert Hsueh, Heping Han, Ruth Levitz


Download ppt "Data management, curation, statistical analysis & display Bob Sinkovits AfCS Bioinformatics Lab San Diego Supercomputer Center UC San Diego."

Similar presentations


Ads by Google