Herpes Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella
Human Herpes Virus 8 Found in Kaposi’s Sarcoma Kaposi’s Sarcoma is a type of skin cancer found in patients affected with HIV Patients infected with HHV-8 and HIV are at a high risk of developing Kaposi’s Sarcoma HHV-8 is in the same family of viruses as Chicken Pox, Shingles, Mono and Herpes Simplex
Research Background Work based on Patrick Shaugnessy’s 2008 Thesis Investigates using one organism to create a model to prediction protein-protein interaction in other organism
Protein Protein Interaction PPI part of biological function Signals coming from outside of cell to inside of cell (biological function and diseases) Forming complexes to carry to another protein Modifying another protein
Features Domains – Predicted properties from similar proteins Secondary Structure – physical structure Localization – location within cell Primary Features – amino acid sequences from proteome Physiochemical – known chemical properties
Previous Testing Focused on determining which algorithm and parameters were most useful with the dataset. Algorithms – Random Forests (found to be generally the best) – SVM – Bagging – Boosting – Decision Trees
Next Steps Eliminate domains from testing Focus on Random Forests Algorithm (Fast Random Forests) Five datasets – all combined, leave-one-out Same Organism Performance Examine effect of varying number of examples Prediction on other organisms
Same Organism Testing DatasetNum FeaturesMax % CorrectMax AUC Combined No Localization No Physiochemical No Primary Features No Secondary Structure Little variation as number of trees (500, 1000, 2000) and features were varied (0.5x, x, 2x) Best overall (77.55/0.84) with 1000 trees and X features Worst overall (73.59/0.81) Primary features appears to be most important
Testing Number of Examples Set% Correct/ROC AUC 25% A69.2/ % B 73.3/ % C72.9/ % A 67.1/ % B 82.3/ % C 66.4/ % A 71.0/ % A 71.0/ % C 74.5/ % 77.6/0.84
All vs. Varied All Training Examples Varying Number of Examples
Herpes and Yeast We trained the FastRandomForest algorithm on our Herpes data and tested the results on our Yeast data. The results were only slightly better than a coin flip.
Herpes and Yeast Data Results. NameROC AUC%Correct All No Localization No Physiochemical No Primary No Secondary
Yeast ROC Area Under Curve
Herpes and Arabidopsis We tried multiple runs of training the FastRandomForest algorithm on our Herpes data, then testing the results on the Arabidopsis data. Our results were about as good as a coin flip.
Herpes and Arabi Data Results. Technical difficulties caused incomplete data NameROC AUC%Correct All No Localization No Physiochemical-- No Primary-- No Secondary
Arabi ROC Area Under Curve