Download presentation
Presentation is loading. Please wait.
1
PROMPT Protein Mapping and Comparison Tool By Thorsten Schmidt and Dmitrij Frishman Free for academic. Website http://webclu.bio.wzw.tum.de/prompt/ (Binary + Source)http://webclu.bio.wzw.tum.de/prompt/
2
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Motivation Past: Sparse data available single pairwise comparison Present + Future: High-throughput technologies weighting large protein datasets against each other Differences between individuals Differences between populations Hundreds of questions: Do Germans drive faster than Americans? Is one gene group significantly enriched in certain functional categories? Do GroEL depending proteins prefer certain structural folds?
3
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Input FASTA xx GenBank xxx EMBL xx Swiss-Protxxxx UniProt XMLxxxx Generic XML xx Generic XML Input allows to import any numeric or nominal data Folder with multiple files File with single (protein) entryFile with multiple (protein) entries List of identifiers Analyse annotations Additionally
4
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Protein set A (SwissProt, EMBL, GenBank, PEDANT, SIMAP, FASTA, XML) Protein set B (SwissProt, EMBL, GenBank, PEDANT, SIMAP, FASTA, XML) Dataset ADataset B Processing Layer Comparison Mapping Statistical testing Input Layer User Input Parsing Caching Retrieval Results Presentation Layer Figure Plotting Export View Within PROMPT Spreadsheet Import
5
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT
6
Statistical tests Help about each test and its parameter. Although you can apply any test manually, in the most cases appropriate tests are performed automatically.
7
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Built-in help
8
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Case study: SCOP fold comparison GroEL depending substrats vs. Lysate Background: Around 200 proteins in E.coli depend on the GroEL chaperon for folding. Questions What distinguish the GroEL depending proteins? Data: PEDANT genome from clu1.gsf.de E.coli K12 (updated version) Assignment threshold 1 E-4 for SCOP folds
9
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Symbolic Frequency Comparison (Symbolic), (Symbolic) Fraction relative to the number of proteins with annotations in each set P-value * < 0.05 ** < 0.001 *** < 0.0001
10
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Case study: Comparison of pI distributions Question: Do the proteins of E.coli and H.pylori differ with respect of their isoelectric points? Data: Protein sequences of H.pylori and E.coli The pI is calculated by PROMPT automatically (as many other sequence based properties too)
11
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Numeric Distribution Comparison (Numeric), (Numeric) Statistical tests: Kolmogorov-Smirnov test Mann-Whitney Chi Square Test
12
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Case study: Protein length and hydrophobicity Question: Is there any relationship between protein length and hydrophobicity in membrane proteins? Data: 2 multi FASTA files with amino acid sequences membrane.fastacontains all membrane* proteins of E.coli fullgenome.fastaall proteins of E.coli *) all proteins with more than 6 membrane spanning regions predicted by TMHMM 2.0 The GRAVY (grand average hydrophobicity) value and a lot of other computable properties are calculated from the sequence by PROMPT automatically
13
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Numeric Correlation New research result: The longer membrane proteins are the less hydrophobic they are X-Axes: Protein length Hydrophobicity: GRAVY value Numeric property [ Pearson coefficient -0.69; p-value 2.8 E-54 ] A. All E.coli proteinsB. Membrane proteins only (Numeric x Numeric)
14
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Protein set A (SwissProt, EMBL, GenBank, PEDANT, SIMAP, FASTA, XML) Protein set B (SwissProt, EMBL, GenBank, PEDANT, SIMAP, FASTA, XML) IDs +sequencesIDs onlyIDs +sequencesIDs only Sequences are retrieved automatically Web- services DB Query Compare A and B by BLAST, find equivalent sequences Mapped identifiers Set B Set A ID5 ID3 No equivalent ID2 ID3 ID1 A: IDs + sequences B: IDs + sequences User Input PROMPT Results
15
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Data Import and Mapping
16
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Blast parameter dialog
17
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT View Mapping Results
18
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Mapping filtering Choose correct assignments by 2 ways: Manually e.g. expert knowledge Automatic filter with user specific parameters e.g. Select SUBJECT_ID where IDENTITY>99 and MISMATCHES<5 Manual further processing e.g. save GIs to text file Generic XML file: Symbolic property holds mapping information VFDB1 GI_1234 VFDB3 GI_3456 …
19
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Case studies summary ExampleType of Data usedPROMPT Method: FunCat distribution in Human (*)(Symbolic)Symbolic feature frequencies Scop Fold enrichment of GroEL depending substrates (Symbolic), (Symbolic)Symbolic feature comparison of two sets Fold bias of virulence factor proteins (*) (Symbolic) subset of (Symbolic) Symbolic feature enrichment in subset vs. set pI comparison of H.pylori and E.coli (Numeric), (Numeric)Numeric feature comparison Protein length and hydrophobicity(Numeric x Numeric)Numeric feature correlation Essentiality and protein (*) abundance (Symbolic x Numeric)Numeric distribution within categories Note: x means corresponding data pairs e.g. here describing two values of the same protein (*) not shown in this talk As the generic XML input allows the processing of any kind of nominal or numeric data, PROMPT can be applied to nearly any problem domain
20
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Scripting Scripting ways: Interactive Console Stream (e.g. from pipeline) File Scripting commands Beanshell = simplified Java Or full Java code Advantages Run Java-code directly No compilation necessary All PROMPT classes are available from the scripts „Classpath hell“ was yesterday Just call:./prompt.sh Filename.java
21
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Conclusions PROMPT can map, compare and analyse protein sets Easy-to-use interactively Large-scale batch processing Automatical or manual testing for significance Helps to avoid to reinvent the wheel Graphical visualisations pointing up results Generic application even beyond bioinformatics Dig our data gold mine efficiently
22
PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT PROMPT Acknowledgements Dmitrij Frishman Hans-Werner Mewes All MIPSies and Lehrstuhl-people for valuable discussions http://webclu.bio.wzw.tum.de/prompt/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.