Larry Lam Southern California Bioinformatics Summer Institute 2009 Graeber Lab – Crump Institute for Molecular Imaging UCLA A Data Management and Analysis Software Platform for Phospho-Proteomics Data
Outline Graeber Lab Background Project Objective My Experimental Project (Example Dataset) Software Design Software Demo Conclusion / Future Work Acknowledgements
Systems Biology of Cancer Signaling Lab Goals –Understand Cancer Signaling Through Systems Biology Approaches –[long term] Improve Cancer Treatment Signaling Pathway Modeling Through –Kinetics –Phospho-Profiling –Adaptor Complex Analysis
Project Objective Develop a Software Platform for Convenient Storage and Analysis of Large-Scale Data Sets -Design Database to Collect and Store Large Scale Proteomic Data Sets -Allow for Comprehensive Meta Information -Simplify Access to Multiple Data Sets -Simplify The Use of Common Tools of Analysis
BCR/Abl Leukemia BCR/Abl fusion protein found in - 90% - 95% of chronic myleoid leukemia - 20% of adult acute lymphoblastic leukemia - 5% of children acute lymphoblastic leukemia Analyze the adaptor proteins in BCR/Abl signaling - Adaptor proteins mediate protein interactions BaitBait PreyPreyPreyPrey Complex Capture Protein Interacting Protein
Experimental Workflow Experimental Protocol Mass Spectometry Quantitation Pipeline Mass Spectometry Quantitation Pipeline IPI Proteomics Database [Complex] NS Filter/ Consolidation Complex Phospho Profiling Quantitation Output File Manual Organization/ Analysis Purification Current Workflow
Identifying Interactions of the Crk Adaptor Proteins 1.Genetic modification of pro-B-lymphocytes (Baf3) Express adaptor + streptavidin binding peptide(SBP) 2.Culture 3.Lyse each culture for protein complex purification Crk I LysateCrk L LysateCrk II LysateNTAP Lysate
1.Separation of protein complex with streptavidin beads 2.Trypsin digestion from proteins to peptides 3.Separation of phosphorylated peptides with Fe(III)-NTA beads 4.Liquid Chromotography + Mass Spectometry 5.Quantitation Pipeline Protein Complex Purification P P P P
Quantitation Output File Consolidation of quantified peptides and associated proteins per sample All peptides identified All adaptor proteins used Phosphorylation position within the peptide [optional] Peptide SequenceDescription/ IPI Accession Crk ICrk LCrk IINTAP K.ADAAEFWR.KCBL IPI R.QEAVALLQGQR.HIsoform Crk-II IPI
NS Filter/Consolidation Quantitation Output File Collapse Peptides To Protein Quantity Remove Insignificant Proteins Heatmap Analysis Remove Known Contaminants Peptide SequenceDescription/ IPI Accession Crk ICrk LCrk IINTAP K.ADAAEFWR.KCBL IPI K.ALVIAHNNIEMAK.NCBL IPI R.QEAVALLQGQR.HIsoform Crk-II IPI K.IHYLDTTTLIEPVAR.SIsoform Crk-II IPI Quantity Is Normalized For Each Row
NS Filter/Consolidation Quantitation Output File Collapse Peptides To Protein Quantity Remove Insignificant Proteins Heatmap Analysis Remove Known Contaminants
NS Filter/Consolidation Quantitation Output File Collapse Peptides To Protein Quantity Remove Insignificant Proteins Heatmap Analysis Remove Known Contaminants Protein Enrichment Factor = (Median – NTAP Median)/ Protein NTAP
NS Filter/Consolidation Quantitation Output File Collapse Peptides To Protein Quantity Remove Insignificant Proteins Heatmap Analysis Remove Known Contaminants Configuration File of Known Contaminants
Statistical Analysis: Peptide Quantity Heatmap Java TreeView High Quantity Low Quantity Crk I Crk L CrkII NTAP Cbl Peptides Crk I Peptides
Experimental Workflow Experimental Protocol Mass Spectometry Quantitation Pipeline Mass Spectometry Quantitation Pipeline IPI Proteomics Database [Complex] NS Filter/ Consolidation Complex Phospho Profiling Quantitation Output File Manual Organization/ Analysis Purification Current Workflow Quantitation Import Local DB Statistical Analysis ExternalSources ExternalSources ExternalSources New Workflow
Program Design C# GUI Application Quantitation Output File DATA IMPORT MySQL Database DATA QUERY Quantitation Data Set R Statistical Function Programming Language: C# Database: MySQL –Free Statistical Computing: R –Free, Accessible to C#
Data Import Methodology 1.Define Meta Data (Descriptors) And Relationships About The Quantitation Values 2.Create The Tables In MySQL 3.Access Using MySQL Connector/Net
Statistical Analysis Methodology R Language and Environment for Statistical Computing and Graphics -Modeling -Statistical Tests -Clustering -Heatmaps Develop a Graphical User Interface To R Functions - Access R Functions Through R-(D)COM Interface
Software Demo
Conclusion Management Software –Standardized approach in maintaining lab data Analyze Data Sets –Analysis tools highly accessible to biologists of various technical levels Combine Data Sets –Potentially lead to new discoveries
Future Work Add More Links To External Database Enhance Data Query Include More Analysis Functions
Acknowledgments Graeber Lab Members –Dr. Thomas Graeber –Dr. Björn Titz SoCalBSI Faculty and Members –Dr. Jamil Momand –Dr. Sandy Sharp –Dr. Nancy Warter-Perez –Dr. Wendie Johnston –Dr. Beverly Krilowicz –Ronnie Cheng Funding
Main Window
Main Window: Options
Batch Import
Batch Information
Sample Information
Sample Information: Technical Replicates
Feature Type
Features
Project Assignment
Batch Prtotocol Assignment
Biological System Assignment
Import
Batch Query
Feature Type Selection
Matrix/Heatmap Dialog
Heatmap Options
Data Import Design Methodology BatchBatch FeatureFeature Label Description Experimenter Date Label Description Feature Type SampleSample Label Description Quality 1.Define Meta Data (Descriptors) About The Quantitation Values - Define Relationships 2.Create The Tables In MySQL 3.Develop Support for MySQL Access - MySQL Connector Feature Value Value Value Type V V V