Presentation is loading. Please wait.

Presentation is loading. Please wait.

Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Similar presentations


Presentation on theme: "Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA."— Presentation transcript:

1 Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA

2 Proteomics National Cancer Institute and Early Detection Resource Network - Clinical Diagnostics Analyzing protein signature for general characterization of normal vs. pathogenic states

3 Project Goals Characterize the experimental variables which affect Mass Spectrometry(MS) output & the necessary steps of MS data processing What influences output and how do we correct for those influences? What information do other users need? Identify parameters for software evaluation in the processing of MS data.

4 Methodology Research a method of protein analysis Research the mechanics Analyze how the mechanics influence the output Recognize data important to other users Identify the data processing steps for extracting a useful spectrum

5 Method of Protein Analysis Mass spectrometry Measures quantity of molecules with specific mass to charge ratios Produces output which could be used as a protein signature Matrix Assisted Laser Desorption/Ionization Time of Flight for protein analysis

6 Matrix Assisted Laser Desorption/Ionization (MALDI) Light Mass Analyzer Protein sample

7 Time of Flight (TOF) Ionized particles accelerated by magnetic field

8 MALDI-TOF-MS MALDI TOF Mass Spectrometry of a protein sample has three elements with parameters that influence output Inconsistencies between them reduce the ability to compare samples Produce variation which is not necessarily caused by protein composition of sample

9 Sample Freeze/thaw cycles Source of sample Serum vs tissue Fractionated? Digested w/ protease?

10 Laser Ionization/Desorption Plate and Matrix used in LDI Crystallization pattern Laser intensity

11 Plate and Matrix

12 Laser Ionization/Desorption Plate and Matrix used in LDI Crystallization pattern Laser intensity

13 Crystallization Randomized process Introduces variation between shots

14 Laser Ionization/Desorption Plate and Matrix used in LDI Crystallization pattern Laser intensity

15 Mass analyzer Mass calibration Internal vs external Reflectron usage Detector voltage Detector saturation

16 Mass Calibration Internal External Sample + Standard SampleStandard

17 Mass analyzer Mass calibration Internal vs external Reflectron usage Detector voltage Detector saturation

18 Reflectron

19 Mass analyzer Mass calibration Internal vs external Reflectron usage Detector voltage Detector saturation

20 Mass analyzer Mass calibration Internal vs external Reflectron usage Detector voltage Detector saturation

21 Output Processing Understanding the mechanics tells us what we need to do to process the output Usability of raw output for protein signature comparison is limited

22 Baseline Correction High KE ions saturate the detector, resulting in a higher intensity output Malyarenko et al. Enhancement of Sensitivity and Resolution of Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometric Records for Serum Peptides Using Time-Series Analysis Techniques

23 Mass Calibration Required to convert time series output into m/z ratio

24 Normalization Scale the intensities based on the largest intensity Improves ability to compare samples by reducing the variability of intensity between spectra www.psrc.usm.edu/mauritz/maldi.html

25 Smoothing Decrease effects of electrical system noise

26 Peak detection Identify potential masses Reduces number of features which need to compared Where am I?

27 Peak alignment Aligns corresponding peaks across samples Reduces phase variation across samples by ensuring that peptides share their set of peak locations

28 Averaging of spectra Address variability between runs by averaging replicates Recall crystallization and shot variability Averaging of multiple laser shots often performed by machine

29 Results Identified vital information that affects the output of the machine Information useful for a researcher using the spectra Researched the processes which make the output more useful as protein signature Next step: Identify parameters for software evaluation in MS data processing

30 Goal – Identify parameters for evaluating software capabilities in the processing and analysis of Mass Spectrometry data. Three candidates VIBE (Incogen Inc.) geWorkbench (Forge) S-PLUS (Insightful Corp.)

31 Software Evaluation General parameters Input formats Algorithms for processing and analysis of proteomics data Results Benefits Limitations

32 Software Evaluation General parameters Input formats Algorithms for processing and analysis of proteomics data Results Benefits Limitations

33 General Parameters Platform/Operating system compatibility? Is the software Open source? Is the software capable of performing the necessary tasks independently? Additional modifications? Internet access? Server ?

34 Software Evaluation General parameters Input formats Algorithms for processing and analysis of proteomics data Results Benefits Limitations

35 Data Input What types of file formats can the software open? Import? What type of format must the data be? DNA (nucleotides – A, T, G, C) Proteins (amino acids – M, L, A, I, etc.)

36 Software Evaluation General parameters Input formats Algorithms for processing and analysis of proteomics data Results Benefits Limitations

37 Software algorithms necessary for Proteomics data analysis Can the software perform: Baseline subtractions? Mass calibrations? Noise reductions? Peak identifications? Normalization? Peak alignments?

38 Baseline Subtraction (Malyarenko, et al. 2005)

39 Mass Calibration (Kearsleya, et al. 2005)

40 Smoothing/Noise Reduction (Malyarenko, et al. 2005)

41 Peak Identifications (Do, 2006)

42 Normalization (Kearsleya, et al. 2005)

43 Peak Alignments (Malyarenko, et al. 2005)

44 Software Evaluation General parameters Input formats Algorithms for processing and analysis of proteomics data Results Benefits Limitations

45 Results – Visualization of results How can you visualize the data? Save/Export work Can you save/export your results? If yes, what format can it save/export? Once saved, can the files be opened by other software packages? Print out Can you print out a hard copy for record?

46 Visualization MUSCLE (Edgar) VIBE (Incogen Inc.)

47 Results – Visualization of results How can you visualize the data? Save/Export work Can you save/export your results? If yes, what format can it save/export? Once saved, can the files be opened by other software packages? Print out Can you print out a hard copy for record?

48 Software Evaluation General parameters Input formats Algorithms for processing and analysis of proteomics data Results Benefits Limitations

49 Software Benefits What benefits does the software offer? Convenience of integrated modules Efficient – saves “man-power” of having to sit there and do everything User-friendly interface

50 Convenience of Integrated Modules

51 Efficiency

52 User-friendly Interface?

53 Software Evaluation General parameters Input formats Algorithms for processing and analysis of proteomics data Results Benefits Limitations

54 Software Limitations Limitations customization Small modifications to existing modules? Adding a new module? Internet/Server Dependent?

55 Conclusion – We have identified these parameters to be crucial for the processing of MS data. Baseline subtractions Mass calibrations Noise reductions Peak identifications Normalization Peak alignments

56 Conclusion – VIBE Capable of manipulating protein sequences, but unable to process raw data. geWorkbench Did not pass general parameters for installation. S-Plus Evaluation still in progress…

57 VIBE (by Incogen Inc.) Convenient integration of nucleotide and amino acid analysis tools – BLAST (–X, –N, –P, TBLASTN, TBLASTP) Nucleotide and AA search FASTA, –X, –Y, Smith-Waterman, etc. Sequence manipulations Primer3, Conditional Filters, Translations, etc. Sequence alignments Crossmatch, ClustalW, Hidden Markov Model, etc.

58 Conclusion – We have identified these parameters to be crucial for the processing of MS data. Baseline subtractions Mass calibrations Noise reductions Peak identifications Normalization Peak alignments

59 Conclusion – VIBE Capable of manipulating protein sequences, but unable to process raw data. geWorkbench Did not pass general parameters for installation. S-Plus Evaluation still in progress…

60 Conclusion – VIBE Capable of manipulating protein sequences, but unable to process raw data. geWorkbench Did not pass general parameters for installation. S-Plus Evaluation still in progress…

61 Literature Citations 1) Do, P. Improved Peak Detection in Mass Spectrometry Spectrum by Incorporating Continuous Wavelet Transform-based Pattern Matching. Robert H. Lurie Comprehensive Cancer Center, Northwestern University. ppt slides. 2006. 2) Kearsleya, A., Wallaceb, W.E., Bernala, J., and CM Guttmanb. A numerical method for mass spectral data analysis. Applied Mathematics Letters. 18:1412– 1417, 2005. 3) Malyarenko, D.I., Cooke, W.E., Adam B-L, Malik, G., Chen, H., Tracy, E.R., Trosset, M.W., Sasinowski, M., Semmes, O.J. and D.M. Manos. Enhancement of Sensitivity and Resolution of Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometric Records for Serum Peptides Using Time- Series Analysis Techniques. Clinical Chemistry. 51(1):65-74. 2005.

62 Acknowledgements Jet Propulsion Laboratory Dr. Tina Xiao Southern California Bioinformatics Summer Institute (SoCalBSI) Dr. Sandra Sharp Dr. Jamil Momand Dr. Wendie Johnston Dr. Nancy Warter-Perez Ronnie Cheng Friends Duke University Medical Center Dr. Simon Lin Center for Disease Control and Prevention (CDC) Dr. R Cameron Craddock Huntington Medical Research Institute (HMRI) Dr. James Riggins Dr. Alfred Fonteh


Download ppt "Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA."

Similar presentations


Ads by Google