High throughput urine biomarker discovery and integrative analysis for translational medicine High throughput urine biomarker discovery and integrative.

Slides:



Advertisements
Similar presentations
Protein Quantitation II: Multiple Reaction Monitoring
Advertisements

The Proteomics Core at Wayne State University
Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Informatics Workshop Part III: Protein Quantitation
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.
Proteome.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Comparison of chicken light and dark meat using LC MALDI-TOF mass spectrometry as a model system for biomarker discovery WP 651 Jie Du; Stephen J. Hattan.
Apostolos Zaravinos, Myrtani Pieri, Nikos Mourmouras, Natassa Anastasiadou, Ioanna Zouvani, Dimitris Delakas, Constantinos Deltas Department of Biological.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
BIOMARKER STUDIES IN CLINICAL TRIALS Vicki Seyfert-Margolis, PhD.
2007 GeneSpring MS GeneSpring for Metabolite BioMarker Analysis using Mass Spectrometry data Agilent Q-TOF VIP Visit Jan 16-17, 2007 Santa Clara, CA Thon.
© 2010 SRI International - Company Confidential and Proprietary Information Quantitative Proteomics: Approaches and Current Capabilities Pathway Tools.
Sample peptides: -Class 1:1,2,3… -Class 2:1,2,3… -Class 3:1,2,3… SCX/RP-HPLC Collect 100 fractions on MALDI plates MALDI-TOF MS for each sample LC fraction.
BIOMARKERS Diagnostics and Prognostics. OMICS Molecular Diagnostics: Promises and Possibilities, p. 12 and 26.
Common parameters At the beginning one need to set up the parameters.
Differential Protein Expression Analysis for Biomarker Discovery.
Laxman Yetukuri T : Modeling of Proteomics Data
Metabolomics Metabolome Reflects the State of the Cell, Organ or Organism Change in the metabolome is a direct consequence of protein activity changes.
Urine peptide biomarkers in Systemic Juvenile Idiopathic Arthritis (SJIA) Xuefeng B. Ling 1, Ken Lau 1, Jane Park 2,3, Claudia Macaubas 3, Jane C. Burns.
Systemic Onset Juvenile Rheumatoid Arthritis (SOJRA) Children present with spiking fevers, rash, enlarged liver / spleen, lymphadenopathy, systemic inflammation,
High throughput Protein Measurement Techniques Harin Kanani.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
Predictor discovery in training set 6 Training set SJIA (24 F, 14 Q) POLY (15 F, 10 Q) 1 DIGE raw gel images SJIA (13 F, 13 Q) POLY (5 F, 5 Q) Spot finding.
Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Introduction to Biostatistics and Bioinformatics Experimental Design.
Overview of Mass Spectrometry
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
Quantitative Proteomic Profiling by Mass Spectrometry Paolo Lecchi, Ph.D. Dept. of Pharmacology George Washington University Emerging Technologies in Protein.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
What is proteomics? Richard Mbasu and Ben Richards.
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
Data independent acquisition methods for metabolomics Stephen Tate, Ron Bonner AB SCIEX, 71 Four Valley Drive, Concord, ON, L4K 4V8 Canada A high resolution.
Target Analyses in Parallel Reaction Monitoring Mode (PRM)
Classifier training Mann Whitney Predictor discovery in training set 4 Training set SJIA (12 F, 12 Q) POLY (13 F, 10 Q) 1 DIGE raw gel images SJIA (10.
David Amar, Tom Hait, and Ron Shamir
Bottom-Up Proteomics Data collection
An Artificial Intelligence Approach to Precision Oncology
Gene expression.
Knowledge l Action l Impact
Human Health and Disease
Mining the human urine proteome for monitoring renal transplant injury
V. Protein Chips 1. What is Protein Chips 2. How to Make Protein Chips
Proteomics Informatics David Fenyő
Volume 65, Issue 2, Pages (January 2017)
A perspective on proteomics in cell biology
Softberry Mass Spectra (SMS) processing tools
Diagnostics and Prognostics
Pierre P. Massion, MD, Richard M. Caprioli, PhD 
Shotgun Proteomics in Neuroscience
Proteomics Informatics David Fenyő
Presentation transcript:

High throughput urine biomarker discovery and integrative analysis for translational medicine High throughput urine biomarker discovery and integrative analysis for translational medicine Bruce Ling, Ph.D.

A molecular indicator of a specific biological property; a biochemical feature or facet that can be used to measure the progress of disease or the effects of treatment (NIH, 2002) Biomarker

Small molecules Glucose (diabetes) Serum cholesterol (cardiovascular disease) Proteins PSA (prostate cancer) HER2 (IHC) (breast cancer Herceptin Therapy) hCG (pregnancy test) RNA/DNA HER2 (FISH) (breast cancer) OncoDX (Genomic Health, breast cancer) Biomarker examples

Pediatric Diseases Kidney transplant Acute Rejection Kawasaki Disease Systemic Juvenile Idiopathic Arthritis Necrotizing Enterocolitis Inflammatory Bowel Disease Glioblastoma multiforme Preterm Labor

Where to look for biomarkers –Disease tissue –Proximal/distal fluids Plasma/serum, urine, amniotic, synovial fluid, CSF, saliva, tears, etc.

Why Urine? Patient consenting Non-invasive Easy to collect for time course analysis Abundant and stable

Urine is a rich resource for biomarker discovery Filtration of plasma 900 liters daily Urine proteome > 1500 proteins, ~30 mg/day 30% from circulation 70% from urogenital tract Urine peptidome > 100, 000 naturally occurring peptide, ~20 mg/day

1)Equal mass of protein and peptide in urine translates into at least a ten-fold greater molar abundance of peptides than proteins 2)Urine peptide analysis is not hampered by highly abundant protein issues 3)One hour one dimensional HPLC separation is sufficient for the analysis of greater than 100,000 urine peptides, allowing a high throughput biomarker discovery Urine Peptidome: a fertile ground for biomarker discovery

Challenges of Urine Analysis Dilution factor causing concentration variations –Solution: content normalization Creatinine; house keeping urine abundant peptides; equal peptide mass Peptide content can be complicated by –Diet, exercise, circadian rhythm, circulatory levels of hormones –Solution: careful experimental design to avoid these confounding issues, e.g., Cohorts of patients of similar demographics Multi-center sample collection and validation

Urine Peptidome Profiling by Mass Spectrometry

Biomarker HTS Flows Sample peptides: -Class 1:1,2,3… -Class 2:1,2,3… -Class 3:1,2,3… RP-HPLC Collect 120 fractions on MALDI plates MALDI-TOF MS on each fraction MASS-Conductor ® Machine learning feature discovery and classification Candidate Biomarkers etc.

Biomarker Confirmation/Validation Identify Differentiating Markers New sample Sets Validation New Center sample sets Higher throughput Quantitative methods Quantitative MS Immunoassay Testing New Longitudinal sample sets Exploration Protein ID MS/MS

Data Challenges in Urine Peptide Biomarker Discovery Data tracking and storage –Patient demographics –Peptide profiles in various fractions/samples Dimension reduction and data reduction –Multi-dimensional data sets –Huge data sets and lots of noise A project of 40 samples produced GB raw data in MYSQL database HPLC fraction Peptide mass Patient ID Patient demographics Peptide signal

Decode the Urine Peptidome Patient 1Patient 2Patient 3Patient 4… peptide 1 signal … peptide 2 signal … peptide 3…………… peptide 4…………… peptide 5…………… ……………… peptide 100,000 …………… ???

Decode the Urine Peptidome Peak finding in each fraction for each sample Align the peaks across the samples Create common peak index

Data mining issues in Biomarker Discovery Peak number >> sample number False discovery in multiple hypothesis testing Multi-class classification and validation Discovery of biomarker signature

Robustly loading and tracking of high volume proteomic data Robust reduction of raw data sets and enabling of efficient and accurate peak finding, alignment and indexing Robust and automatic high throughput computing for expensive algorithms Integration of FDR analysis and multi-class classification algorithms to obtain statistically differentiating feature panels Automatic generation of data reports with graphics MASS-Conductor® Platform Support Urine Peptide Biomarker Discovery

MASS-Conductor® Platform High Throughput Computing

Urine Biomarker Discovery: Case Study

Kidney Transplant Rejection Most effective treatment for end stage renal disease 16,000 per year in US Grafts monitored by biopsy Unmet needs: –Less invasive and more frequent monitoring –Acute rejection vs. stable graft –Acute rejection vs. BK virus

Allograft Acute Rejection Urine Biomarker Discovery Peak finding Peak alignment Peak indexing Supervised Data mining Feature selection Training Testing LCMS Data reduction Unsupervised Data mining 2D - Clustering QuantitativeLCMS Validation 1234

Biomarker Panel: Supervised Analysis

Biomarker Panel: Unsupervised Analysis

NH 2 ZP-domain EGF-like Domain I EGF-like Domain II EGF-like Domain III COOH Urine THP Peptide Biomarkers Fall into a Tight Cluster in C-Terminus 1. R.VLNLGPITR.K 2. G.SVIDQSRVLNLGPI.T 3. I.DQSRVLNLGPITR.K 4. R.SGSVIDQSRVLNLGPI.T 5. S.VIDQSRVLNLGPITR.K 6.R.SGSVIDQSRVLNLGPIT.R 7. G.SVIDQSRVLNLGPITR.K 8.R.SGSVIDQSRVLNLGPITR.K

MRM: Multiplexed Quantitative Biomarker Validation

SAMPLE: URINE PEPTIDES THP VIDQSRVLNLGPITR THP SGSVIDQSRVLNLGPITR THP VIDQSRVLNLGPITR THP SGSVIDQSRVLNLGPITR AR versus STA AR versus BK Sensitivity 1- Specificity AUC: 0.83 AUC: 0.74 AUC: 0.92 AUC: 0.83 ROC Analysis of THP Peptide Biomarkers Quantified by MRM

1. COL1A APGDRGEPGPPGP 2. COL1A APGDRGEPGPPGP 3. COL1A APGDRGEPGPPGPA 4. COL1A DAGPVGPPGPPGPPG 5. COL1A GPPGPPGPPGPPGPPS 6. COL1A NGDDGEAGKPGRPGERGPPGP 7. COL1A NGDDGEAGKPGRPGERGPPGP 8. COL1A NGDDGEAGKPGRPGERGPPGPQ 9. COL1A GKNGDDGEAGKPGRPGERGPPGPQ 10. COL1A GKNGDDGEAGKPGRPGERGPPGPQ 11. COL1A GPPGKNGDDGEAGKPGRPGERGPPGPQ 12. COL1A PPGEAGKPGEQGVPGDLG 13. COL1A PPGEAGKPGEQGVPGDLGAPGP 14. COL1A ADGQPGAKGEPGDAGAKGDAGPPGP 15. COL1A ADGQPGAKGEPGDAGAKGDAGPPGP 16. COL1A ADGQPGAKGEPGDAGAKGDAGPPGPA 17. COL1A ADGQPGAKGEPGDAGAKGDAGPPGPA 18. COL1A GPPGADGQPGAKGEPGDAGAKGDAGPPGPA 19. COL1A EGSPGRDGSPGAKGDRGETGPA 20. COL1A AEGSPGRDGSPGAKGDRGETGPA 21. COL1A ESGREGAPGAEGSPGRDGSPGAKGDRGETGPA 22. COL1A SPGPDGKTGPPGPA 23. COL1A DGKTGPPGPAGQDGRPGPPGPPG 24. COL1A GRPGEVGPPGPPGPAGEKGSPG 25. COL1A DGPPGRDGQPGHKGERGYPG 26. COL1A NDGPPGRDGQPGHKGERGYPG 27. COL2A SNGNPGPPGPPGPSGKDGPK 28. COL3A NDGAPGKNGERGGPGGPGP 29. COL3A DGESGRPGRPGERGLPGPPG 30. COL3A DAGAPGAPGGKGDAGAPGERGPPG 31. COL3A GAPGQNGEPGGKGERGAPGEKGEGGPPG 32. COL3A KNGETGPQGPPGPTGPGGDKGDTGPPGPQG 33. COL4A PGQQGNPGAQGLPGP 34. COL4A GLPGLPGPKGFA 35. COL4A GEPGPPGPPGNLG 36. COL4A GLPGPPGPKGPRG 37. COL4A GPPGPPGPLGPLG 38. COL4A PGLDGMKGDPGLP 39. COL4A GIKGEKGNPGQPGLPGLP 40. COL4A GLPGPPGPPGPPS 41. COL5A KGPQGKPGLAGMPGANGPP 42. COL7A PGLPGQVGETGKPGAPGR 43. COL9A KRPDSGATGLPGRPGPPG 44. COL11A GPPGPPGLPGPQGPKG 45. COL11A DGPPGPPGERGPQGPQGPV 46. COL17A LPGPPGPPGSFLSN 47. COL18A GPPGPPGPPGPPS 1. THP VLNLGPITR 2. THP SGSVIDQSRV 3. THP DQSRVLNLGPI 4. THP SRVLNLGPITR 5. THP IDQSRVLNLGPI 6. THP VIDQSRVLNLGPI 7. THP DQSRVLNLGPITR 8. THP SVIDQSRVLNLGPI 9. THP GSVIDQSRVLNLGPI 10. THP IDQSRVLNLGPITR 11. THP SGSVIDQSRVLNLGPI 12. THP VIDQSRVLNLGPITR 13. THP SGSVIDQSRVLNLGPIT 14. THP SVIDQSRVLNLGPITR 15. THP SGSVIDQSRVLNLGPITR 16. THP SGSVIDQSRVLNLGPITRK AB AR Urine Biomarkers are Collagen and THP Peptides Collagen peptide biomarkers THP peptide biomarkers

Hypothesis 1 Gene expression alteration in AR Hypothesis 2 Protease expression alteration in AR Hypothesis 3 Protease inhibitor expression alteration in AR Hypothesis of Molecular Mechanisms for AR Urine Biomarkers

Exploration data set 6 (TGCG) 1 Affymetirics HG-U95Av2 (AR: PBL, n=6; BX, n=7) (STA: PBL, n=9; BX, n=10) (NR: PBL, n=8; BX, n=5) (HC: PBL, n=8; BX, n=9) Exploration Analysis Confirmation 2 Affymetirics HU-133 (AR: BX, n=37) (HC: BX, n=23) Confirmation Analysis Validation 3 Quantitative RT-PCR (AR: BX, n=14) (STA: BX, n=10) (HC: BX, n=10) Validation Analysis Expression analysis of peptide biomarkers’ corresponding precursor genes Expression analysis of metzincin superfamily genes Expression analysis of protease inhibitor genes Discovery  mechanism biomarkers Confirmation data set (Stanford ) Validation data set (Stanford ) Transcriptome Analysis of Allograft Biopsies

Parental Protein Expression Analysis of Allograft Biopsies Contrasting Urine Peptide Biomarker Changes

Genome-wide Protease and Protease Inhibitor Expression Analysis of Allograft Biopsies Revealed Up Regulation of MMP7, SERPING1, TIMP1

AR STA HC Signal Intensity TIMP1COL1A2UMODSERPING1MMP7COL3A Specificity Mean ( AUC): 0.98 Sensitivity Allograft Biopsies Expression Biomarkers Effectively Classified AR

Proposed Underlying Mechanisms for the AR Urine Peptide Biomarkers

Hypothesis: Collagen Breakdown and Deposition in AR Decreased Collagen Peptides In AR Increased TIMP1 (Collagenase Inhibitor) in AR Increased Collagen Deposition in AR More Graft Fibrosis After an AR episode? Biopsy Gene Expression GSE Increased MMP7 in AR Decreased Collagen Breakdown in AR Decreased Collagenase Activity In AR tissue Increased Collagen Expression in AR Integrated Analysis Urine Peptidomics Urine Renal Biopsy Urine Peptide Analysis by MS

Urine Biomarker Discovery: Case Study

Unmet Medical Needs in Necrotizing Entrocolitis Necrotizing enterocolitis (NEC) is a medical condition primarily seen in premature infants, where portions of the bowel undergo necrosis (tissue death). Despite decades of research the pathogenesis of NEC remains obscure, the diagnostic parameters unclear, and both treatment and prevention strategies remain inadequate and dated. There is the real need for better molecular identification of NEC in order to assist in altering its onset and progression.

Clinical parameters do not adequately predict outcome in Necrotizing Enterocolitis

Low Risk Group Intermediate Risk Group High Risk Group Rate of NEC-S occurrence (% patients) NEC score M: n = 2 S: n = 15 M: n = 16 S: n = 10 M: n = 26 S: n = 0 MS NEC Clinical Parameters Based Model stratifies Necrotizing Enterocolitis Patients

NEC Urine Naturally Occurring Peptide Biomarker Discovery Peak finding Peak alignment Peak indexing Supervised Data mining Feature selection Training Testing LCMS Data reduction Unsupervised Data mining 2D - Clustering 123

Biomarker Panel: Supervised Analysis (Training and Testing)

Biomarker Panel: Unsupervised Analysis

Biomarker Panel: Combined data set and ROC analysis

Permutation based FDR analysis of the biomarker signature

Discovery set n = Clinical Diagnosis Medical NEC Scoring Percent Agreement with clinical diagnosis MS NEC 70 Urine peptide based Classification MS Low n=7 Classified as M Classified as S NEC Risk Groups 96 MS Intermediate n= MS High n= % %83.3 % % Diagnosed as M Diagnosed as S P = 0.01 Clinical Diagnosis N/A n=3 Proposed Ensemble Approach to Diagnose Necrotizing Enterocolitis Patients NEC Patients Clinical Model NEC Risk Urine Biomarkers NEC Diagnosis

TABLE 2 ClusterProteinLocationMH+Sequence Relative Abundance U test P value MS 1 COL1A RGppGPPGKNGDDGEAGKPGRPGERGPpGp E-03 COL1A RGPPGppGKNGDDGEAGKpGRpGERGpPGP E-03 2 COL1A ARGEPGNIGFPGPKGPTGDPGKNGDKGHAG E-05 3 COL1A GRDGNpGNDGpPGRDGQpGHKGERGYpG E-03 COL1A DGpPGRDGQpGHKGERGYpG E-03 4 COL1A AGpPGKAGEDGHpGKPGRpGERG E-02 COL1A ARGpAGpPGKAGEDGHpGKPGRpGERG E-02 COL1A ARGpAGpPGKAGEDGHpGKpGRpGERG E-02 COL1A GpPGKAGEDGHPGKPGRpGERG E-02 COL1A GPpGKAGEDGHpGKPGRpGERG E-02 5 COL3A GApGQNGEPGGKGERGApGEKGEGGpPG E-03 6 COL3A NRGERGSEGSPGHPGQpGppGppGAPGP E-02 COL3A NRGERGSEGSpGHpGQpGPPGPpGApGp E-02 Overlapping Urine Peptide Biomarkers for NEC

Proposed Underlying Mechanisms of Urine Naturally Occurring Peptide Biomarkers

PR Enbrel CR Anakinra CRPR CR EnbrelAnakinra A B C Prediction of drug response in SJIA

Urine peptide biomarkers: the discovery process Sample peptides: -Class 1:1,2,3… -Class 2:1,2,3… -Class 3:1,2,3… SCX/RP-HPLC Collect 100 fractions on MALDI plates MALDI-TOF MS for each sample LC fraction -- m/.z --abundance MASS-Conductor ® Machine learning feature discovery and classification Biomarker panels MSMS protein ID Prospective validation with quantitative mass spec (MRM)

Interdisciplinary Skills for Biomarker Discovery Biology Analytic biochemistry Biostatistics Computer Science Medicine

Q & A

Genome vs. Proteome

The Isotope Envelope

Predictor discovery in training set 2 Training set (10 AR, 10 STA, 6 BK) 1 LCMS raw spectra Peak finding peak alignment feature extraction unique features Classifier training Six-fold Cross-validation Classify AR, STA, BK MASS-ConductorUrine biomarker discovery and testing Predictor confirmation in testing set 3 Testing set (10 AR, 10 STA, 4 BK) Predictor sets Linear discriminant analysis (LDA) Calculate estimates of predicted class probabilities Analysis of goodness of class separation Pattern analysis in all set 4 Cluster analysis All set (20 AR, 20 STA, 10 BK, 10 NS, 10 HC) Predictors of 40 peptides 2d hierarchical clustering heatmap plotting Remove background signals Normalization Platform Validation 5 Correlation Analysis 2 peptide biomarkers MRM assay development MRM assay AR, STA, BK, NS, HC Training + Testing Samples LC-MALDI  MRM Allograft Acute Rejection Urine Biomarker Discovery

Correlation Studies Between LCMS and MRM Platforms

Analytical Challenges High complexity and wide dynamic range

Tirumalai, R. S. (2003) Mol. Cell. Proteomics 2: Plasma Proteins Big Trees

Tirumalai, R. S. (2003) Mol. Cell. Proteomics 2: Plasma Proteins Big Trees Bushes

Tirumalai, R. S. (2003) Mol. Cell. Proteomics 2: Plasma Proteins Big Trees Bushes Grass + Bugs

Analytical Challenges Detect low abundance proteins Big Trees = HAP Bushes = MAP Grass + Bugs = LAP

Bottom up LCMS Biomarker Discovery Sample preparation Digestion Peptide purification SCXRP Protein mixtureDigested peptides Mass-spec Spectra Data Analysis Multi-dimensional chromatography MS/MS Protein ID

Mass Spectrometry In A Nutshell time hνhν F=ma Ion source detector m/z MS Spectrum Mass Analyzer

MS/MS Peptide Sequencing hνhν source detector Fragment ions gate Collision cell MS/MS Spectrum 1 st Mass Analyzer 2 nd Mass Analyzer

Differential Expression Analysis in Quantitative LCMS Peptide 1: M/Z Peptide 2: M/Z’ Peptide 3: M/Z’’ Peptide 1: protein ID Peptide 2: protein ID’ Peptide 3: protein ID’’ MS basedMS/MS based MASS-Conductor® Exhaustive MS comparison Spectrum counting Labeling, e.g. iTRAQ

Qualitative Comparative Analysis – Spectrum Counting PROTEIN X Sample A Sample B MS/MS Number of Detected Peptides Number of Detected Peptides [PROTEIN X] IF THEN PROTEIN IDENTIFICATION

- Peptide fragments EQUAL MS/MS b y b y b y b y MS Mix -N H N H N H N H PRG PRG PRG PRG S1 S2 S3 S4 Parallel Denature & Digest -Reporter-Balance-Peptide INTACT - 4 samples identical m/z Reporter ions DIFFERENT -Chemically identical -Migrate together in HPLC MSMS Based Comparative Analysis – iTRAQ (isobaric tag) Reporter Ions 114, 115, 116, 117

More abundant proteins tends to get more sequence coverage in MS/MS, masking away the MSMS opportunities for the peptides coming from the low abundant proteins Spectrum counting is semi-quantitative iTRAQ is not scalable for a moderate throughput biomarker discovery iTRAQ cost iTRAQ tag number Issues in MS/MS Based Analysis

MS Based Comparative Analysis – Targeted MASS-Conductor® Approach 1. ALL peptide MS signals will be exhaustively compared leading to the discovery of statistically differential signals 2. ONLY peptides of interest, usually a very small number, will be tried with full attention for the MS/MS ID. If necessary, MS/MS signals can be enhanced by more loading or fraction enrichment before MS

Robustly handling of high volume proteomic data –e.g. One SCX fraction and 120 RP fractions 40 sample project MYSQL data storage –raw data is GB –Peak data is 4.4 GB Robust and automatic high throughput computing Robust reduction of raw data sets and enabling of efficient and accurate feature discovery Sophisticated data mining approaches to obtain statistically differentiating features Graphic data analysis MASS-Conductor® Platform Data Mining Requirements

“MASS-Conductor ®” An in house software platform, including JAVA, PERL, R, RUBY and MYSQL implementations Interface with AB and Thermo mass specs –Convert LC-MALDI T2D files in a batch manner to text files Extract mono-isotopic LC-MALDI peaks Track multiple scans of the same MALDI plate and HPLC SCX/RP fractions where each peak resides Cluster mono-isotopic peaks across categorical samples for comparative analysis Interface and integrate SAM, PAM, 1d classifiers, 2d classifiers, margin tree, CART algorithm packages for differential feature selection and classification

Common Feature Alignment/Extraction Spectrum Raw datasets Peak datasets Feature datasets Indexed datasets Mass-Conductor Database Binary/Multi-class Classification False Discovery Rate Analysis Biomarker Discovery Potential Biomarkers Web-Service Collaboration Peak Extraction Feature indexing Patient datasets “MASS-Conductor ®”

DATA REDUCTION in “MASS-Conductor ®” Peak Extraction from Spectra Raw Data Patient sample LC-MALDI Spot/fraction 13. m/z 900 – 4000: raw data points  1690 peak data points 62 peaks 2530 data points m/z 1200 – 1250

Before data reductionAfter data reduction Class A Class B Class C fractions MS signal DATA REDUCTION – One Peptide Example Peak Extraction from Spectra Raw Data

SEQUENCE 640 AA; MW 001 MGQPSLTWML MVVVASWFIT TAATDTSEAR WCSECHSNAT CTEDEAVTTC TCQEGFTGDG 061 LTCVDLDECA IPGAHNCSAN SSCVNTPGSF SCVCPEGFRL SPGLGCTDVD ECAEPGLSHC 121 HALATCVNVV GSYLCVCPAG YRGDGWHCEC SPGSCGPGLD CVPEGDALVC ADPCQAHRTL 181 DEYWRSTEYG EGYACDTDLR GWYRFVGQGG ARMAETCVPV LRCNTAAPMW LNGTHPSSDE 241 GIVSRKACAH WSGHCCLWDA SVQVKACAGG YYVYNLTAPP ECHLAYCTDP SSVEGTCEEC 301 SIDEDCKSNN GRWHCQCKQD FNITDISLLE HRLECGANDM KVSLGKCQLK SLGFDKVFMY 361 LSDSRCSGFN DRDNRDWVSV VTPARDGPCG TVLTRNETHA TYSNTLYLAD EIIIRDLNIK 421 INFACSYPLD MKVSLKTALQ PMVSALNIRV GGTGMFTVRM ALFQTPSYTQ PYQGSSVTLS 481 TEAFLYVGTM LDGGDLSRFA LLMTNCYATP SSNATDPLKY FIIQDRCPHT RDSTIQVVEN 541 GESSQGRFSV QMFRFAGNYD LVYLHCEVYL CDTMNEKCKP TCSGTR F R SG SVIDQSRVLN 601 LGPITRK GVQ ATVSRAFSSL GLLKVWLPLL LSATLTLTFQ Human THP precursor, Swiss-Prot: P07911 Urine THP Peptide Biomarkers Fall into Tight Clusters in C-Terminus