Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Information-Theoretic Mass Spectral Library Search Arvind Visvanathan CSCE 990 Seminar in Multi-Dimensional Chromatography Systems, Informatics, and.

Similar presentations


Presentation on theme: "1 Information-Theoretic Mass Spectral Library Search Arvind Visvanathan CSCE 990 Seminar in Multi-Dimensional Chromatography Systems, Informatics, and."— Presentation transcript:

1 1 Information-Theoretic Mass Spectral Library Search Arvind Visvanathan CSCE 990 Seminar in Multi-Dimensional Chromatography Systems, Informatics, and Applications Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Outline Introduction Related Work Method Results and Discussion

2 2 Outline Introduction –Mass spectrum search types Related Work –Other techniques NIST, PBM, DotMap Method –Probability and Information –Normalized distribution function Results Conclusion Outline Introduction Related Work Method Results and Discussion Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar

3 3 Introduction – Mass Spectrum Mass Spectrum Search Algorithm Search Types Applications Outline Introduction Related Work Method Results and Discussion Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar m/z Intensity Decane

4 4 Introduction – Mass Spectrum Search Outline Introduction Related Work Method Results and Discussion Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar MS Library Unknown Spectrum Search Algorithm Potential Matches Mass Spectrum Search Algorithm Search Types Applications

5 5 Introduction – Search Types Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Identity search –Unknown mass spectrum present in library –Looking for exact spectrum Similarity search –Unknown mass spectrum not present in library –Looking for similar spectrum Mass Spectrum Search Algorithm Search Types Applications Outline Introduction Related Work Method Results and Discussion

6 6 Introduction – MS Search Applications Steroid detection in athletes Monitor patient breath during surgery Composition of molecular species found in space Honey adulterated with corn syrup Locate oil deposits Monitor fermentation process in the biotechnology industry Detect dioxins in contaminated fish Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Mass Spectrum Search Algorithm Search Types Applications Outline Introduction Related Work Method Results and Discussion

7 7 Related Work – NIST MS-Search [Stein ‘94] Pre-search the unknown spectra in library –Reduce search domain (160K  4K compounds) Compute match factor for each compound in the pre-search result Match Factor (MF) –Range 0-999 –Higher the better Pre-search result sorted based on MF value Pick the topmost compounds as possible matches Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar MS Search Probability Based Matching DotMap Outline Introduction Related Work Method Results and Discussion

8 8 Related Work – NIST MS-Search [Stein ‘94] Match Factor Computation [Stein ‘94] –Term 1 – Mass weighted normalized dot product –Term 2 – Relative intensities of adjacent peaks in both spectra –Combination of F 1 & F 2 Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar MS Search Probability Based Matching DotMap Outline Introduction Related Work Method Results and Discussion

9 9 Related Work – NIST MS-Search [Stein ‘94] Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar MS Search Probability Based Matching DotMap Outline Introduction Related Work Method Results and Discussion m/zIntensity 35100 361 371 45999 55200 m/zIntensity 35100 361 372 45999 55200 C-1C-2 Compare C-1 & C-1 Compare C-1 & C-2 F1999 F2999824 MF999925

10 10 Related Work – Probability Based Matching [McLafferty et. al. ‘75] Confidence Value (K) instead of MF Four components for each m/z –Term 1 : U : Based on the uniqueness of a m/z value –Term 2 : A : Intensity contribution to the confidence –Term 3 : W : Window factor (measure of agreement) –Term 4 : D : Dilution factor (measure of purity) –K  ∑ (U + A + W – D) for each m/z Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap

11 11 Related Work – DotMap [Sinovec et. al. ‘04] Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap Fumaric acid Adipic acid Lactic acid DotMap

12 12 Related Work – DotMap [Sinovec et. al. ‘04] Inverse problem DotMap computed across the image Higher valued areas indicate presence of compound of interest Multiple compounds of interest –Compute DotMap overlay Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap

13 13 Related Work – DotMap [Sinovec et. al. ‘04] Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap

14 14 Related Work – DotMap [Sinovec et. al. ‘04] Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap

15 15 Method – Motivation NIST MS-Search [Stein ‘94] –No domain information utilized PBM Matching [McLafferty et. al. ‘75] –Old technique (‘75) –Ad hoc domain information utilization DotMap –No domain information utilized Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion

16 16 Method – Entropy Entropy based approach –Entropy  measure of the amount of uncertainty –Based on probabilities Include domain based knowledge (information) in computing the match factor Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion

17 17 Method – Distribution Function Library –NIST EPA Library –163K compounds Compute distribution function (DF) –2 dimensional array m/z vs intensity –DF[i][j] # compounds in library –m/z = i –Intensity = j Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion

18 18 Method – Distribution Function Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion m/z Intensity

19 19 Method – Normalized Distribution Function (NDF) Normalized Distribution Function –NDF[mz][int] = DF[mz][int] / ∑ DF[mz][i] –Where ∑ DF[mz][i] = 163K –NDF  Probabilities [0-1] Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion i i

20 20 Method – Assumptions Assumption Each m/z is treated independently in the match factor computation from normalized distribution function Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion

21 21 Method – Match Factor Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion

22 22 Results – Overview Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion Technique –Compound in library + Noise –Search noisy compound in library Evaluation metric - Average Rank –Rank = Position of correct compound in hit list –Repeat above 3000 times and take average rank Compared with –NIST –NISTDOT (First term in NIST algorithm)

23 23 Results – Noise models Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion Additive A U = A L + G(0,σ) Multiplicative A U = A L + A L* G(0,σ) Johnson Colored A U = A L + G(0,σ*√m) Random spectrum A U = A L + x * A R

24 24 Results – Additive Noise Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Compound = Compound + Additive noise Additive Gaussian noise –Zero mean –Variable standard deviation For each m/z in library spectrum A U = A L + G(0,σ) Outline Introduction Related Work Method Results and Discussion

25 25 Results – Additive Noise (Example) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

26 26 Results – Additive Noise (Performance) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

27 27 Results – Multiplicative Noise Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Compound = Compound + Multiplicative noise Multiplicative Gaussian noise –Zero mean –Variable standard deviation For each m/z in library spectrum A U = A L + A L* G(0,σ) Outline Introduction Related Work Method Results and Discussion

28 28 Results – Multiplicative Noise (Example) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

29 29 Results – Multiplicative Noise (Performance) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

30 30 Results – Johnson Colored Noise Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Compound = Compound + Colored Noise Gaussian noise –Zero mean –Variable standard deviation For each m/z in library spectrum A U = A L + G(0,σ*√m) Outline Introduction Related Work Method Results and Discussion

31 31 Results – Johnson Colored Noise (Example) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

32 32 Results – Johnson Colored Noise (Performance) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

33 33 Results – Random Spectrum Noise Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Compound = Compound + Random Spectrum Additive Spectrum –Add x% of another random spectrum For each m/z in library or random spectrum –A U = A L + x * A R Outline Introduction Related Work Method Results and Discussion

34 34 Results – Random Spectrum Noise (Example) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

35 35 Results – Random Spectrum Noise (Performance) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

36 36 Results – Summary of Noise Models Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Additive A U = A L + G(0,σ) Multiplicative A U = A L + A L* G(0,σ) Johnson Colored A U = A L + G(0,σ*√m) Random Spectrum A U = A L + x * A R Outline Introduction Related Work Method Results and Discussion

37 37 Results – Summary of Noise Models Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

38 38 Results – Summary of Noise Models Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

39 39 Conclusion Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar MS library search algorithm Information theoretic –Domain knowledge incorporated Algorithm works well for various noise models Future work –Must improve performance for the random spectrum noise case Outline Introduction Related Work Method Results and Discussion

40 40 Questions & Suggestions Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar ? Outline Introduction Related Work Method Results and Discussion


Download ppt "1 Information-Theoretic Mass Spectral Library Search Arvind Visvanathan CSCE 990 Seminar in Multi-Dimensional Chromatography Systems, Informatics, and."

Similar presentations


Ads by Google