Presentation is loading. Please wait.

Presentation is loading. Please wait.

A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1.

Similar presentations


Presentation on theme: "A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1."— Presentation transcript:

1 A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

2 Outline Introduction Methods Data source Methods of comparison Results Conclusion 2

3 Introduction Using protein mass spectrometry to discriminate diseased from healthy individuals becomes more popular MALDI-TOF and SELDI-TOF have the advantage : – Fast – High through-put – Accuracy – Protein ID 3

4 SELDI-TOF MS applications in clinical oncology 4

5 The example of ovarian cancer data Large scale Full of noise Nonlinear and non-stationary 5

6 The analysis of mass spectra seems to be simple. However, we do suffer from several problems,and they need to be solved. Motivation 6

7 Common preprocessing step Baseline subtraction Denoising : very important and also complicated ! Normalization Peak detection Peak alignment 7

8 Noise component Chemical noise : – From the matrix material and sample contaminations – One kind of the biochemical material Electrical noise : The physical characteristics of the machine – Do not mean anything actually 8

9 Chemical noise Chemical component (organic acid) – We call it Matrix Ionization – Provide H + to peptide or protein for ionization and flight in the machine Protection – Protect the peptide or protein in the process of laser flash 9

10 Problem The simulated model before did not separate the chemical noise from spectra. The chemical noise mixed with the machine noise is worse. A novel preprocessing method should be developed 10

11 Kwon, D., M. Vannucci, et al. (2008). "A novel wavelet-based thresholding method for the pre-processing of mass spectrometry data that accounts for heterogeneous noise." Proteomics 8(15): 3019-29. 11

12 Goal Develop a method which can be satisfied – The electrical noise should be removed – Preserve the significant peaks even the chemical noise 12

13 Methods Hilbert Huang transform – Denoising Modification – Baseline subtraction – Rescale – Peak detection 13

14 Flow chart Ovarian cancer data HHT IMFs Remove IMFs Modification Baseline subtraction Rescale Peak detection 14

15 Hilbert Huang transform Method : Hilbert Huang transform (HHT) – Wu, Z., N. E. Huang, et al. (2007). "On the trend, detrending, and variability of nonlinear and nonstationary time series." Proc Natl Acad Sci U S A 104(38): 14889-94. – An adaptive data analysis method for nonlinear and non-stationary processes – The main feature of HHT is the empirical mode decomposition (EMD) – After the process of EMD, we get the intrinsic mode functions and remove several from them as noise Goal : denoising 15

16 Process of EMD (1/5) Find the envelope of the local maxima 16

17 Process of EMD (2/5) Find the envelope of the local minima 17

18 Process of EMD (3/5) Compute the mean envelope from the maximum envelope and minimum envelope 18

19 Process of EMD (4/5) We get IMF 1 (i.e. h 1 ) by subtracting the mean envelope m 1 from the original signal X(t) 19

20 Process of EMD (5/5) We take IMF 1 as X(t) and repeat the same process and so on. We terminate the process untill the number of the extrema and the zero-crossing of IMF n differ by more than one 20

21 IMFs : 1~16 HHT 21

22 Modification Baseline subtraction – Remove the systematic artifacts Rescale – Shift the scale to positive Peak detection – Key feature of the preprocessing method – We compare several popular methods 22

23 Baseline subtraction 23

24 Baseline subtraction 24

25 Peak detection We use three popular algorithms for peak detection – MassSpecWavelet (Du, Kibbe et al. 2006) – SpecAlign (Wong, Cagney et al. 2005) – PROcess (Li 2005) 25

26 Data source Source : National Cancer Institute Type : 50 ovarian cancer data Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88 Kwon, D., M. Vannucci, et al. (2008). "A novel wavelet-based thresholding method for the pre-processing of mass spectrometry data that accounts for heterogeneous noise." Proteomics 8(15): 3019-29 26

27 Methods of comparison Judgment – Count of peaks detected – Real location of the peaks in visual Interior comparison – HHT and modification+SpecAlign – HHT and modification+PROcess – HHT and modification+MassSpecWavelet 27

28 Methods of comparison Exterior comparison – SpecAlign Abbreviation : SA – PROcess Interpolation : PRO1 Regression : PRO2 – MassSpecWavelet Abbreviation : MSW – PRO2+MSW As suggested in Cruz-Marcelo, Guerra et al. 2008 28

29 Raw data 29

30 Results after HHT and modification 30

31 Results of interior comparison Algorithm \ peak detected 2000~ 4000 DA 4000~ 6000 DA 6000~ 8000 DA 8000~ 10000 DA 10000~ 12000 DA 12000~ 14000 DA 14000~ 15000 DA Total HHT modification +MassSpecWavelet 35322721443821218 HHT modification+SpecAlign 181761189380 HHT modification+PROcess 21181913141310108 31

32 3 13 Modified HHT+MSW Peak detected : 218 M over z range : whole region 32

33 3 13 Modified HHT+SpecAlign Peak detected : 80 M over z range : whole region 33

34 3 13 Modified HHT+PROcess Peak detected : 108 M over z range : whole region 34 Significant peak lost

35 Results of exterior comparison Algorithm \ peak detected 2000~ 4000 DA 4000~ 6000 DA 6000~ 8000 DA 8000~ 10000 DA 10000~ 12000 DA 12000~ 14000 DA 14000~ 15000 DA Total mHHT+MassSpecWavelet35322721443821218 mHHT+SpecAlign181761189380 mHHT+PROcess21181913141310108 SpecAlign6743211816147186 PRO14624191816 6145 PRO2402381213144114 MassSpecWavelet5139252422207188 PRO2+MSW5437332523188198 35

36 3 13 PRO1 Peak detected : 145 M over z range : whole region 36 Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88

37 3 13 PRO2 Peak detected : 114 M over z range : whole region 37 Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88

38 3 13 MSW Peak detected : 188 M over z range : whole region 38 Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88

39 3 13 MSW Peak detected : 25 M over z range : 6000~8000 39

40 3 13 PRO2+MSW Peak detected : 198 M over z range : whole region 40 Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88

41 3 13 PRO2+MSW Peak detected : 33 M over z range : 6000~8000 Ex 41

42 Results Interior comparison: – HHT and modification+MSW covers the most of the peaks – HHT and modification+SpecAlign pick the most important peaks Exterior comparison: – PROcess miss the significant peaks – MassSpecWavelet and PRO2MSW have many redundancies 42

43 Results of validation Validation – Data source : Cathay General Hospital – Experiments : Divide into three experiments – Water only – VrD1 43

44 Water 44 Sample : water Organic acid : CHCA (<1000 DA)

45 VrD1 Sample : VrD1 Type : protein Organic acid : CHCA (<1000 Da) Molecular weight : 5119 Da 45

46 Results of validation Algorithm\sampleWaterVrD1 (Mw: 5119)Peak located in M/Z=5119 of VrD1 Double charge of VrD1 (5119+2)/2 Number of peaks (>1000Da) detected in VrD1 MassSpecWavelet400369Detected 340 SpecAlign391477Detected 355 HHT modification+SpecAlign1722Detected 5 The amount of peaks which HHT modification removed 95.7%94.8%98.6% Number of the peaks detected 46

47 The peaks of Water detected by MassSpecWacelet 47

48 The peaks of VrD1 detected by MassSpecWacelet Molecular weight : 5119 Da 48

49 The peaks of Water detected by SpecAlign 49

50 The peaks of VrD1 detected by SpecAlign Molecular weight : 5119 Da 50

51 The peaks of water detected by HHT modification + SpecAlign 51

52 The peaks of VrD1 detected by HHT modification + SpecAlign Molecular weight : 5119 Da 52

53 The peaks of VrD1 detected by HHT modification + SpecAlign 0-5200Da Whole Molecular weight : 5119 Da 53 Double charge : (5119+2)/2

54 Results of validation MassSpecWavelet and SpecAlign do not remove the noise HHT and modification+SpecAlign detects the least peaks but the most significant peaks HHT and modification+SpecAlign removes 98.6% of the peaks (>1000Da) which are redudancies and noise 54

55 Conclusion HHT performs well at denoising As the result of comparison, HHT and modification can make the raw data more simple Simultaneously, HHT and modification preserve the significant information. After the preprocessing of HHT and modification, it is suggested that detect the peaks by SpecAlign 55

56 Acknowledgement 感謝本所陳欣昊同學的投入 感謝黃鄂院士以及本所博士班林澂同學 感謝汐止國泰醫院的鄭宇哲博士的實驗驗 證 56

57 Thanks for your attention! 57


Download ppt "A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1."

Similar presentations


Ads by Google