Computing fragmentation trees from tandem mass spectrometry data Florian Rasche1, Aleš Svatoš2, Ravi Kumar Maddula2, Christoph Böttcher3 & Sebastian Böcker1* 1Chair for Bioinformatics, Friedrich-Schiller- University Jena, Ernst-Abbe-Platz 2, D Jena, Germany
The Crux Mass Spec small molecules depends on spectral library search What about unknown compounds? Proposed solution ▫At least annotate the MS2 peaks as something.
Data instrumentppm a a CID (eV)IP b b compoun ds usedmass rangemedia n average Orbitrap535,45,55,70 yes − API QSTAR(16)(16) 2015,25,45,55,90 e e yes − Micromass QTOF(11)(11) 2010,20,3 0,40,50 no −
(Left) Fragmentation graph for (S,R)-noscapine (C22H23NO7) using Orbitrap data. Nodes of the same color correspond to annotations of one measured peak (m/z, intensity, and collision energies). Arcs correspond to potential neutral losses. The weight of arcs is encoded by different line types. NLs can be computed by subtracting molecular formulas for end node and start node. Right: The corresponding hypothetical fragmentation tree of noscapine computed by our method. Nodes (blue) correspond to peaks in the tandem mass spectra and their annotated molecular formula (CE is range of collision energies); arcs (red) correspond to hypothetical neutral losses. Published in: Florian Rasche; Ales ̌ Svatos ̌ ; Ravi Kumar Maddula; Christoph Bo ̈ ttcher; Sebastian Bo ̈ cker; Anal. Chem. Article ASAP DOI: /ac101825k Copyright © 2011 American Chemical Society
Construction of graph Properties ▫Each vertex is a molecular formula associated with a peak. ▫A vertex color indicates a peak. ▫A directed edge (neutral loss) u->v implies v is a fragment of u Weighting (real serious math here) ▫ Goal: ▫Find a “colorful” tree with maximal score.
Generating Fragmentation Tree Given a directed acyclic G(V,E), a set of colors C where c(u) \in C, and edge weights w(u,v) where u,v \in V. Output a directed tree with maximum edge weight sum and is “colorful”. ▫NP-Hard ▫Heuristics were bad.
Dynamic Programming Solution Find the maximum score of the subtree rooted at v using the color set S, where S \subset C. They don’t specify, but “efficient runtime” looks like ▫O(|V|2^|C|)?
Results MS1 – mostly correct id of chemical formula Evaluation against Expert Knowledge and MS n ▫ Checked if the Neutral Losses were consistent with expert expectations Orbitrap : 76.9% “correct”, 12.4% “unsure”, 10.7% “wrong” ▫Analyzed fragmentation trees generated by Greedy solution (pointless) Evaluation against Mass Frontier (predicts spectrum based on molecular structure) ▫FragTrees annotated 4x more ▫97% agreement of peak annotation overlap (p-value 10^-167) Comparing Fragmentation Trees ▫Eyeballing.
Critiques? Not very systematic in the analysis They describe useless bits in the paper Are fragmentation trees useful?