Download presentation
Presentation is loading. Please wait.
Published byEric Lucas Modified over 9 years ago
1
Data management, curation, statistical analysis & display Bob Sinkovits AfCS Bioinformatics Lab San Diego Supercomputer Center UC San Diego
2
The data management problem Collecting and archiving data Tracking meta-data associated with experiments (reagents, technicians, labs, dates, machine settings, protocols, etc.) Processing raw data Curation Organization and display Data distribution
3
Data collection Data acquisition for the AfCS involves the separate transfer of experimental data and the description of the experiment (meta–data) SDSC Experimental Lab GUIs wget data (results) meta-data
4
Data collection Experimental data files transferred on a nightly basis using the UNIX wget utility under control of cron job StanfordCaltech SDSC UTSWUCSF Ca++, cAMP phosphoprotein cytokine microarray microscopy single cell Ca++ Ca++ Vanderbilt Lipid MS Myriad Y2H
5
Data collection Meta-data inserted directly into the AfCS Oracle database through a set of GUIs Sample, experiment, cell line, etc. IDs are generated automatically based on date, laboratory code, etc. Error checking, the use of pull down menus, and database constraints ensure that valid data entered into GUIs
6
Data collection
7
Barcoding All experimental samples and materials (protein extracts, gels, cell preps, plasmids, solutions, reagents, etc.) are physically labeled using a 2-d barcode. Zebra Z4M barcode printer Symbol Cyclone scanner
8
Data/information flow Labs SDSC parse.pl SRB Oracle 9i Disk / Tape silo Off-site backup (Caltech) www postprocess.pl curation GUIs data meta-data
9
Storage of processed data Each type/category of experimental data is stored in a separate database schema Easier to work with schemas containing smaller numbers of tables Minimizes possibility of data loss/corruption Avoids confusion due to multiple developers working in a single schema (overlap of namespaces) Easier recovery Privileges granted as needed between schemas
10
DataCenter organization Data organized into several main sections Ligand screen Two-ligand screen Microscopy Yeast two-hybrid Plasmid Antibody Lipid FXM
11
Ligand screen Measure response of cells due to stimulation by single ligands, using consistent conditions across all assays Splenic B cell Ca++ cAMP phosphoprotein (11) microarray (cDNA) Raw 264.7 Ca++ cAMP phosphoprotein (21) cytokine (18)
12
Ligand screen data archives Results for ligand/assay combination Y/N used to provide quick overview Assay details Ligand details
13
Ligand screen Results page contains explanation of assay, graphical display of data, and links to annotated tab- delimited files CGS_30_uM_BC data
14
Ligand screen
15
Double ligand screen Similar to single ligand screen, but involved stimulation by pairs of ligands, either sequentially or simultaneously Splenic B cell Ca++ cAMP Raw 264.7 Ca++ cAMP phosphoprotein (21) cytokine (18)
16
Double ligand screen Link to results found at intersection of ligand pair. Annotation based on additivity of ligand responses
17
Double ligand screen Sample from phosphoprotein two-ligand display. Individual thumbnails linked to additional results
18
Double ligand screen All results for phosphoprotein, ligand1, ligand2 combination
19
Phosphoprotein display in cell signaling context Quick overview of the signaling pathways activated User-friendly and attractive presentation of the data Easy way to navigate through the data Highlight of the regulated proteins http://biome.sdsc.edu:9080/WesternDisplay Goals
20
Phosphoprotein/signaling map
25
Data archives Archives of data sets can be downloaded at ftp://ftp.afcs.org/pub/datacenter
26
Data curation Need to provide convenient way for the AfCS labs to curate data By ligand (don’t release until replicated) By experiment (flag bad experiments) By sample (flag bad samples w/o discarding expt) Web interfaces for curation have been developed and are restricted by user
27
Data curation Ligand, experiments, and samples can be annotated in three ways Public – available for public Internal – restricted to internal use. Validity of data still being investigated or experimental conditions not yet replicated Invalid – experiment or sample flagged as being bad; not available to anyone
28
Data curation
29
Data curation by ligand For curation by ligand, interface is based on the public display with additional features
30
Data curation by sample/expt Curate by experiment Curate by sample
31
Data curation by sample/expt Curate by experiment Curate by sample
32
Data curation by sample/expt For some assays, such as cytokine and phosphoprotein, the large number of samples make curation by sampleid impractical. Curation limited to the experiment level
33
Data curation by sample/expt Similar curation interfaces have been setup for FXM data Lentivirally-Transduced RAW264.7 cells
34
Acknowledgements Madhusudan, Ilango Vadivelu – LIMS Stephen Lyon – web master Brad Kroeger – systems administration Chic Barna, Ray Bean – database administration Sylvain Pradervand – phosphoprotein display Shankar Subramaniam – “glue” Ron Taussig, Gil Sambrano, Richard Scheuermann - data center design Paul Sternweis – Ca++, cAMP display Susie Mumby – phosphoprotein, cytokine display Lonnie Sorrels, Keng-Mean Lin, Sangdun Choi, Nick Wong, Robert Hsueh, Heping Han, Ruth Levitz
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.