Computational Methods for Biomarker Discovery in Proteomics and Glycomics Vijetha Vemulapalli School of Informatics Indiana University Capstone Advisor: Dr. Haixu Tang
What are Biomarkers? Substances present in increased or decreased amounts in body fluids or tissues that indicate exposure, disease or susceptibility to disease. Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
Some Uses of Biomarkers Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References Biomarkers are increasingly being used for the following purposes: Prognosis / Diagnosis of disease Monitoring response to medication With high sensitivity and throughput, proteomics and glycomics is capable of identifying many potential biomarkers simultaneously.
More on Biomarkers Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References A lot of times biomarkers have not been identified clearly. But based on the signature pattern of glycans and proteins, samples can be classified as healthy and diseased.
What is Proteomics? Proteomics: Proteomics is the study of proteins and proteomes using high- throughput technology. Proteome: All the proteins in a cell or bodily fluid at a given point of time under certain conditions. Proteins: A chain of amino acids including hormones, enzymes and antibodies. Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
What is Glycomics? Glycoproteins: Proteins with attached polysaccharides. Glycans: Polysaccharide chain attached to a protein Glycome: The entire set of glycans that are present in a cell or a bodily fluid at a certain point of time under certain conditions. Glycomics: Study of structure and function of oligosaccharides in a cell or organism. Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
High Throughput Technologies to Identify Biomarkers Genome Scale Scanning Genome level Micro - arrays Transcriptome level Proteomics Proteome level Glycomics Glycome level Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
Transcriptome Why the Focus on Proteomics and Glycomics? Information content Genome Transcriptome Proteome Glycome Static Dynamic Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
Biomarker Discovery using Proteomics
Liquid Chromatography / Mass Spectrometry (LC/MS) Why LC/MS for analysis of proteomes? LC spreads complexity of the sample over time. MS identifies ions based on their mass/charge value. Software exists currently to identify proteins in a sample using data from a LC-MS experiment. Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References Liquid Chromatography Mass Spectrometry Data Protein sample
Liquid Chromatography (LC) Liquid Chromatography is a technique that separates ions or molecules dissolved in a solvent based on size of the ion/molecule, adsorption, ion-exchange or other similar characteristics. Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
What is Mass Spectrometer? Mass Spectrometry (MS) is an instrument that identifies ions based on their mass-to-charge ratio. Source: & Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
Visualization of LC/MS Data : 2D Map Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
How Do We Find Biomarkers From LC-MS Data? Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References MS View Quantities of peptides identified from the sample Liquid Chromatography Mass Spectrometry Data Protein sample Identification software Identified Proteins and Peptides
How Do We Find Biomarkers From LC-MS Data? Continued… Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References Sample 1 Sample 2 Sample 3 Sample N MSView Quantification 1 Quantification 2 Quantification 3 Quantification N Analyze to find Biomarkers
MSView Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References MSView Visualization Relative Quantification Components Purpose Visual comparison /Analysis Further analysis for Biomarker Discovery
Extracted Ion Chromatogram (XIC) Chromatogram created by plotting the intensity of the signal observed at a chosen m/z value in a series of mass spectra recorded as a function of retention time. Source: Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
Visualization: XIC Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
Relative Quantification using Peptide Identification Results Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References Data from LC-MS experiment Identification of peptides Extracted Ion Chromatogram of peptide Peak selection Area calculation MS View
Actual data: Quantification: Peak Selection Algorithm Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References After Smoothing: Minima Maxima Selecting local maxima and minima Selecting peaks: Minima Maxima
Quantification: Sample Results Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
Biomarker Discovery using Glycomics
How does Capillary Electrophoresis (CE) work? Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References 84&sz=25&hl=en&start=3&um=1&tbnid=_JDf4X3dJn170M:&tbnh=108&tbnw=139&prev=/images%3Fq%3Dcapillary%2Belectrophoresis%26svnum%3D10%26um%3D1%26hl%3Den
What does the data look like? Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References Samples from different CE experiments:
Biomarker Discovery using Glycomics – Overview Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References Data from different samples Analysis of quantification for identifying Biomarkers Quantification of mapped peaks Mapping areas corresponding the same glycan from different samples CE Analyze
Direct Comparison: Dynamic Time Warping (DTW) DTW algorithm aligns two time series having similar curves but are skewed differently over time. Time Source: Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
Direct Comparison: DTW continued… Sakoe-Chuba Band is used to reduce time & space complexity. Parameters used in DTW: - Band width- Peak extention penalty - Difference in peak intensities. - Difference in peak direction Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References Stan Aslvador and Philip Chan. FastDTW:Toward Accurate Dynamic Time Warping in Linear Time and Space, KDD Workshop on Mining Temporal and Sequential Data, 2004
Method: Dynamic Time Warping Consensus Sample Align to consensus sample Align next sample to consensus sample Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
Method continued… Corresponding peaks Aligned sample Unaligned sample Corresponding peaks Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References Calculate Area Peak 1
Results Corresponding peaks Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
Summary Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References Proteomics - MSView Glycomics - CE Analyze LC-MS data Identified Peptides Quantification results for Biomarker Discovery CE Data Quantification results for Biomarker Discovery
Acknowledgements Dr. Haixu Tang- My advisor Dr. Randy J.ArnoldDr. Yehia Mechref Dr. Milos NovotnyDr. David E.Clemmer Dr. Sun Kim Dr. Jeong-Hyeon Choi Dr. Stephen J. Valentine Yin Wu Manolo D.Plasencia School of Informatics Funding: NIH/NCRR MetaCyt Indiana University Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References
[ 1] Higgs, R.E., Knierman, M.D., Gelfanova, V., Butle,r J.P. and Hale, J.E. (2005) Comprehensive label-free method for the relative quantification of proteins from biological samples. J. Proteome Res., 4, [2] Linsen, L., Locherbach, J., Berth, M., Becher, D. and Bernhardy, J. (2006) Visual Analysis of Gel-Free Proteome Data. IEEE Transactions on Visualization and Computer Graphics,12, [3] Prakash, A., Mallick, P., Whiteaker, J., Zhang, H., Paulovich, A., Flory, M., Lee, H., Aebersold, R., and Schwikowski, B. (2006) Signal maps for mass spectrometry-based comparative proteomics. Mol. Cell. Proteomics 5, 423 –432 [4] Leptos, K. C., Sarracino, D. A., Jaffe, J. D., Krastins, B., and Church, G. M. (2006) MapQuant: open-source software for large-scale protein quantification. Proteomics 6, 1770 –1782 [5] Aebersold, R., and Mann, M. (2003) Mass spectrometry-based proteomics. Nature 422, 198 –207 Problem Definition Background LC-MS Method Results CE Method Results Acknowledgements References