Data management, curation, statistical analysis & display Bob Sinkovits AfCS Bioinformatics Lab San Diego Supercomputer Center UC San Diego.

Slides:



Advertisements
Similar presentations
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Advertisements

Chapter 20 Oracle Secure Backup.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
EDRN’s Validation Study Information Management System Developed for EDRN by the DMCC Cancer Biomarkers Group Division of Cancer Prevention Jet Propulsion.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
ELECTRONIC RECORDS PRESERVATION ARCHIVES OF MICHIGAN.
Partnerships in a Changing World Future Relationships between Publishers, Academic Libraries and Scientists.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
(NHA) The Laboratory of Computer Communication and Networking Network Host Analyzer.
Use different sheets in one workbook to store raw data from different sources / about different aspects of the investigation. This way, you can see all.
PMI Inventory Tracker™
Laboratory Information Management Systems. Laboratory Information The sole product of any laboratory, serving any purpose, in any industry, is information.
Ligand Screens in RAW Cells – Cytokine Release Ron Taussig AfCS Annual Meeting Dallas, TX May, 2004.
Yeast 2-hybrid database Plasmid database ATCC plasmid distribution Alliance for Cell Signaling Myriad Genetics Inc. American Type Culture Collection.
Section 13.1 Add a hit counter to a Web page Identify the limitations of hit counters Describe the information gathered by tracking systems Create a guest.
TrendReader Standard 2 This generation of TrendReader Standard software utilizes the more familiar Windows format (“tree”) views of functions and file.
Flow Cytometry and Reproducible Analysis Cliburn Chan Department of Biostatistics and Bioinformatics, DUMC.
High-Speed, High Volume Document Storage, Retrieval, and Manipulation with Documentum and Snowbound March 8, 2007.
Experience of a low-maintenance distributed data management system W.Takase 1, Y.Matsumoto 1, A.Hasan 2, F.Di Lodovico 3, Y.Watase 1, T.Sasaki 1 1. High.
Linux Operations and Administration
Monitoring Protein Phosphorylation for the Ligand Screens Goal: sample diversity of cellular response to inputs Current Approach: Multiplex Western blotting.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
What’s New in VRS? GUGM May 15, 2008 Presenter: Kelly P. Robinson GIL Service Georgia State University
The AfCS Bioinformatics Laboratory The AfCS Data Coordination and Bioinformatics Laboratory 2002 Goal 1 Acquire, Process, and Disseminate Information.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice SISP Training Documentation Template.
Rahul Raman, Ram Sasisekharan Bioinformatics Core Massachusetts Institute of Technology Glue Grants Bioinformatics Meeting April 22-23, 2004 San Diego,
Introduction to Database Management. 1-2 Outline  Database characteristics  DBMS features  Architectures  Organizational roles.
Microsoft Project 2010 ® Tutorial 6: Sharing Project Information with Other People & Applications.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
0 eCPIC User Training: Dependency Mapper These training materials are owned by the Federal Government. They can be used or modified only by FESCOM member.
A Brief Documentation.  Provides basic information about connection, server, and client.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Ligand Screen for cAMP Assays in Primary B Cells and RAW264.7 Cells Keng-Mean Lin, Robert Hsueh, Madhusudan Natarajan, Paul Sternweis Alliance for Cellular.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice SISP 6.1 Delta Training Documentation.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
The AfCS Antibody Lab Rod Ceja Blythe King Eduardo Arteaga.
Data collection and organization Bob Sinkovits AfCS Bioinformatics Lab SDSC.
Acknowledgements Frank Amador and Becky Fulin of the AfCS Antibody Laboratory; Katherine Hawes, Jason Polasek, and Paul Sternweis (Director) of the AfCS.
Database Systems. Role and Advantages of the DBMS Improved data sharing Improved data security Better data integration Minimized data inconsistency Improved.
RAW264.7 Cell Ligand Screen Summary Progress Report and Perspectives AfCS 5/24/04.
Chapter 23: GUI Design Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Introduction to The Storage Resource.
E-Curator: A Web-based Curatorial Tool Ian Brown, Mona Hess Sally MacDonald, Francesca Millar Yean-Hoon Ong, Stuart Robson Graeme Were UCL Museums & Collections.
12 Copyright © 2009, Oracle. All rights reserved. Managing Backups, Development Changes, and Security.
RAW two-ligand screen Strategy to Monitor Protein Phosphorylation for the Macrophage Ligand Screen  Cell Preparation and Analysis Lab: Expose RAW.
Oracle Spatial Network Data Model Overview Oracle Life Sciences User Group Meeting Susie Stephens Life Sciences Product Manager Oracle Corporation.
Google Map Engine Can export images to Map Engine from Earth Engine
23 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Answers: Advanced Features.
Ligand Screen for Calcium Assays in Primary B Cells and RAW264.7 Cells Keng-Mean Lin, Madhusudan Natarajan, Robert Hsueh, Paul Sternweis Alliance for Cellular.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
22 Copyright © 2008, Oracle. All rights reserved. Multi-User Development.
Techniques for List Creation (2) Data formatting and control level processing Basics for Interactive Lists Detail lists The Program Interface Interactive.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
The AfCS Antibody Database Goal: –provide the signaling community with a resource on the AfCS Antibody Lab’s experience with commercial antibodies. Content:
Motivation Give the users a quick overview of the signaling pathways activated by selected ligands. Provide an easy way to navigate through the data. Offer.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Core LIMS Training: Entering Experimental Data – Simple Data Entry.
Single Sample Registration
Number Scoring of Phosphospecific
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Technical Issues in Sustainability
SDMX IT Tools SDMX Registry
Presentation transcript:

Data management, curation, statistical analysis & display Bob Sinkovits AfCS Bioinformatics Lab San Diego Supercomputer Center UC San Diego

The data management problem Collecting and archiving data Tracking meta-data associated with experiments (reagents, technicians, labs, dates, machine settings, protocols, etc.) Processing raw data Curation Organization and display Data distribution

Data collection Data acquisition for the AfCS involves the separate transfer of experimental data and the description of the experiment (meta–data) SDSC Experimental Lab GUIs wget data (results) meta-data

Data collection Experimental data files transferred on a nightly basis using the UNIX wget utility under control of cron job StanfordCaltech SDSC UTSWUCSF Ca++, cAMP phosphoprotein cytokine microarray microscopy single cell Ca++ Ca++ Vanderbilt Lipid MS Myriad Y2H

Data collection Meta-data inserted directly into the AfCS Oracle database through a set of GUIs Sample, experiment, cell line, etc. IDs are generated automatically based on date, laboratory code, etc. Error checking, the use of pull down menus, and database constraints ensure that valid data entered into GUIs

Data collection

Barcoding All experimental samples and materials (protein extracts, gels, cell preps, plasmids, solutions, reagents, etc.) are physically labeled using a 2-d barcode. Zebra Z4M barcode printer Symbol Cyclone scanner

Data/information flow Labs SDSC parse.pl SRB Oracle 9i Disk / Tape silo Off-site backup (Caltech) www postprocess.pl curation GUIs data meta-data

Storage of processed data Each type/category of experimental data is stored in a separate database schema Easier to work with schemas containing smaller numbers of tables Minimizes possibility of data loss/corruption Avoids confusion due to multiple developers working in a single schema (overlap of namespaces) Easier recovery Privileges granted as needed between schemas

DataCenter organization Data organized into several main sections Ligand screen Two-ligand screen Microscopy Yeast two-hybrid Plasmid Antibody Lipid FXM

Ligand screen Measure response of cells due to stimulation by single ligands, using consistent conditions across all assays Splenic B cell Ca++ cAMP phosphoprotein (11) microarray (cDNA) Raw Ca++ cAMP phosphoprotein (21) cytokine (18)

Ligand screen data archives Results for ligand/assay combination Y/N used to provide quick overview Assay details Ligand details

Ligand screen Results page contains explanation of assay, graphical display of data, and links to annotated tab- delimited files CGS_30_uM_BC data

Ligand screen

Double ligand screen Similar to single ligand screen, but involved stimulation by pairs of ligands, either sequentially or simultaneously Splenic B cell Ca++ cAMP Raw Ca++ cAMP phosphoprotein (21) cytokine (18)

Double ligand screen Link to results found at intersection of ligand pair. Annotation based on additivity of ligand responses

Double ligand screen Sample from phosphoprotein two-ligand display. Individual thumbnails linked to additional results

Double ligand screen All results for phosphoprotein, ligand1, ligand2 combination

Phosphoprotein display in cell signaling context Quick overview of the signaling pathways activated User-friendly and attractive presentation of the data Easy way to navigate through the data Highlight of the regulated proteins Goals

Phosphoprotein/signaling map

Data archives Archives of data sets can be downloaded at ftp://ftp.afcs.org/pub/datacenter

Data curation Need to provide convenient way for the AfCS labs to curate data By ligand (don’t release until replicated) By experiment (flag bad experiments) By sample (flag bad samples w/o discarding expt) Web interfaces for curation have been developed and are restricted by user

Data curation Ligand, experiments, and samples can be annotated in three ways Public – available for public Internal – restricted to internal use. Validity of data still being investigated or experimental conditions not yet replicated Invalid – experiment or sample flagged as being bad; not available to anyone

Data curation

Data curation by ligand For curation by ligand, interface is based on the public display with additional features

Data curation by sample/expt Curate by experiment Curate by sample

Data curation by sample/expt Curate by experiment Curate by sample

Data curation by sample/expt For some assays, such as cytokine and phosphoprotein, the large number of samples make curation by sampleid impractical. Curation limited to the experiment level

Data curation by sample/expt Similar curation interfaces have been setup for FXM data Lentivirally-Transduced RAW264.7 cells

Acknowledgements Madhusudan, Ilango Vadivelu – LIMS Stephen Lyon – web master Brad Kroeger – systems administration Chic Barna, Ray Bean – database administration Sylvain Pradervand – phosphoprotein display Shankar Subramaniam – “glue” Ron Taussig, Gil Sambrano, Richard Scheuermann - data center design Paul Sternweis – Ca++, cAMP display Susie Mumby – phosphoprotein, cytokine display Lonnie Sorrels, Keng-Mean Lin, Sangdun Choi, Nick Wong, Robert Hsueh, Heping Han, Ruth Levitz