Herpes Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella.

Slides:



Advertisements
Similar presentations
Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.
Advertisements

Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
Using SuSPect to Predict the Phenotypic Effects of Missense Variants Chris Yates UCL Cancer Institute
Bioinformatics “Other techniques raise more questions than they answer. Bioinformatics is what answers the questions those techniques generate.” SheAvery
Structural bioinformatics
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
Predicting Protein Interactions HERPES! Team Question Mark Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella.
1 Pathogenic Viruses Name of virus; what family it belongs to; what disease it causes. –DNA or RNA? Ss or ds? –Characteristics of disease, symptoms. –Viral.
Micro 615 Principals of Virology: Virus Structure, Classification, Detection & Persistence Alice Telesnitsky.
(1) Risk prediction by kernels and (2) Ranking SNPs Usman Roshan.
Team Question Mark Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella A little information on: HERPES presents.
Machine Learning: Final Presentation James Dalphond James McCauley Andrew Wilkinson Phil Kovac Data Set: Yeast GOLD TEAM.
Protein Interactions and Disease Audry Kang 7/15/2013.
Proteomics Understanding Proteins in the Postgenomic Era.
Ensemble Learning (2), Tree and Forest
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Benchmarking Methods for Identifying Causal Mutations Tal Friedman.
K.U.Leuven Department of Computer Science Predicting gene functions using hierarchical multi-label decision tree ensembles Celine Vens, Leander Schietgat,
Protein Tertiary Structure Prediction
MNA M osby ’ s Long Term Care Assistant Chapter 38 Cancer, Immune System and Skin Disorders.
Copyright © 2011, 2007, 2003, 1999 by Mosby, Inc., an affiliate of Elsevier Inc. Chapter 38 Cancer, Immune System, and Skin Disorders.
Viruses Living or Not ???????.
Introduction to Gene Mining Part B: How similar are plant and human versions of a gene? After completing part B, you will demonstrate How to use NCBI BLASTp.
Biomarker and Classifier Selection in Diverse Genetic Datasets J AMES L INDSAY 1 E D H EMPHILL 2 C HIH L EE 1 I ON M ANDOIU 1 C RAIG N ELSON 2 U NIVERSITY.
Presentation by: Kyle Borge, David Byon, & Jim Hall
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Finish up array applications Move on to proteomics Protein microarrays.
TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.
Viruses Living or Not ???????. Characteristics of Viruses Among the smallest biological particles that are capable of causing diseases in living organisms.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Protein-Protein Interaction Hotspots Carved into Sequences Yanay Ofran 1,2, Burkhard Rost 1,2,3 1.Department of Biochemistry and Molecular Biophysics,
Structural proteomics
Epstein-Barr Virus Brittany Seyler.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-
AISTATS 2010 Active Learning Challenge: A Fast Active Learning Algorithm Based on Parzen Window Classification L.Lan, H.Shi, Z.Wang, S.Vucetic Temple.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Structural proteomics Handouts. Proteomics section from book already assigned.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Graves’ Disease Case: Previously Normal thyroid signaling requires circuit of signaling: hypothalamus, pituitary, thyroid Signaling between any cells requires.
Chapter 33 Cancer, Immune System, and Skin Disorders All items and derived items © 2015, 2011 by Mosby, Inc., an imprint of Elsevier Inc. All rights reserved.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Classification Ensemble Methods 1
Nonliving infectious agent that can cause disease. *Not in a kingdom
Herpes Virus.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
29-1 A Human Perspective HIV Disease and Complications of Immunodeficiency Eugene Nester Denise Anderson Evans Roberts, Jr. Nancy Pearsall Martha Nester.
Computer Science and Engineering PhD in Computer Science Monday, November 07, :00 a.m. – 11:00 a.m. Swearingen Conference Room 3A75 Network Based.
Modeling Cell Proliferation Activity of Human Interleukin-3 (IL-3) Upon Single Residue Replacements Majid Masso Bioinformatics and Computational Biology.
Viruses. Video Viruses Unit 5 - Viruses and Bacteria (Ch. 18) 1.Identify the major components of a Bacterium and Viruses 2.List the two major stages.
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
Basic machine learning background with Python scikit-learn
Feature Extraction Introduction Features Algorithms Methods
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Amino Acids.
Chapter 15 Viruses.
Proteins.
Volume 5, Issue 6, Pages e3 (December 2017)
Viruses.
Systems-wide Identification of cis-Regulatory Elements in Proteins
Reecha Khanal Mentor: Avdesh Mishra Supervisor: Dr. Md Tamjidul Hoque
Viruses Living or Not ???????.
Comparing drug sensitivity predictions from different data types in melanoma and endometrial cancer cell lines. Comparing drug sensitivity predictions.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Herpes Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella

Human Herpes Virus 8 Found in Kaposi’s Sarcoma Kaposi’s Sarcoma is a type of skin cancer found in patients affected with HIV Patients infected with HHV-8 and HIV are at a high risk of developing Kaposi’s Sarcoma HHV-8 is in the same family of viruses as Chicken Pox, Shingles, Mono and Herpes Simplex

Research Background Work based on Patrick Shaugnessy’s 2008 Thesis Investigates using one organism to create a model to prediction protein-protein interaction in other organism

Protein Protein Interaction PPI part of biological function Signals coming from outside of cell to inside of cell (biological function and diseases) Forming complexes to carry to another protein Modifying another protein

Features Domains – Predicted properties from similar proteins Secondary Structure – physical structure Localization – location within cell Primary Features – amino acid sequences from proteome Physiochemical – known chemical properties

Previous Testing Focused on determining which algorithm and parameters were most useful with the dataset. Algorithms – Random Forests (found to be generally the best) – SVM – Bagging – Boosting – Decision Trees

Next Steps Eliminate domains from testing Focus on Random Forests Algorithm (Fast Random Forests) Five datasets – all combined, leave-one-out Same Organism Performance Examine effect of varying number of examples Prediction on other organisms

Same Organism Testing DatasetNum FeaturesMax % CorrectMax AUC Combined No Localization No Physiochemical No Primary Features No Secondary Structure Little variation as number of trees (500, 1000, 2000) and features were varied (0.5x, x, 2x) Best overall (77.55/0.84) with 1000 trees and X features Worst overall (73.59/0.81) Primary features appears to be most important

Testing Number of Examples Set% Correct/ROC AUC 25% A69.2/ % B 73.3/ % C72.9/ % A 67.1/ % B 82.3/ % C 66.4/ % A 71.0/ % A 71.0/ % C 74.5/ % 77.6/0.84

All vs. Varied All Training Examples Varying Number of Examples

Herpes and Yeast We trained the FastRandomForest algorithm on our Herpes data and tested the results on our Yeast data. The results were only slightly better than a coin flip.

Herpes and Yeast Data Results. NameROC AUC%Correct All No Localization No Physiochemical No Primary No Secondary

Yeast ROC Area Under Curve

Herpes and Arabidopsis We tried multiple runs of training the FastRandomForest algorithm on our Herpes data, then testing the results on the Arabidopsis data. Our results were about as good as a coin flip.

Herpes and Arabi Data Results. Technical difficulties caused incomplete data NameROC AUC%Correct All No Localization No Physiochemical-- No Primary-- No Secondary

Arabi ROC Area Under Curve