An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

Slides:



Advertisements
Similar presentations
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent.
Advertisements

Predicting Kinase Binding Affinity Using Homology Models in CCORPS
Secondary structure prediction from amino acid sequence.
11/9/99ICTAI-99, Chicago1 Protein Secondary Structure Prediction Using Data Mining Tool C5 Meiliu Lu †, Du Zhang †, Hongjun Xu †, Ken Tse-yau Lau ‡, and.
 A cell is an organization of millions of molecules  Proper communication between these molecules is essential to the normal functioning of the cell.
High Throughput Computing and Protein Structure Stephen E. Hamby.
Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt1 Analysis of mitochondrial transit peptides of Plasmodium falciparum Andreas.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Biochemistry 301 Overview of Structural Biology Techniques Jan. 19, 2004.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
An Introduction to Bioinformatics Protein Structure Prediction.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Interactions and Disease Audry Kang 7/15/2013.
Proteomics Understanding Proteins in the Postgenomic Era.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Automatic assignment of NMR spectral data from protein sequences using NeuroBayes Slavomira Stefkova, Michal Kreps and Rudolf A Roemer Department of Physics,
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
CSCE555 Bioinformatics Lecture 18 Protein Bioinforamtics and Protein Secondary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Rational Drug Design Soma Mandal, Mee'nal Moudgil, Sanat K. Mandal.
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.
Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.
Intelligent Systems for Bioinformatics Michael J. Watts
Transmembrane proteins in the Protein Data Bank: identification and classification Gabor, E. Tusnady, Zsuzanna Dosztanyi and Istvan Simon Bioinformatics,
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
PROTEINS PROTEINS Levels of Protein Structure.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Microarrays.
Secondary structure prediction
TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.
A Design Method of DNA chips for SNP Analysis Using Self Organizing Maps Author : Honjoy Saga Graduate : Chien-Ming Hsiao.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Patentability Considerations in the 3-D Structure Arts Patentability Considerations in the 3-D Structure Arts Michael P. Woodward Supervisory Patent Examiner.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Biochemistry - as science; biomolecules; metabolic ways. Structure of proteins, methods of its determination.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Open access toolkit for nonparametric explorative pattern mining to detect events relating to disease in large scale genome sequences Thahir P. Mohamed,
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Discovery and Dissemination
Challenges in Creating an Automated Protein Structure Metaserver
SMA5422: Special Topics in Biotechnology
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Introduction to Bioinformatics II
Discovery and Dissemination
The Chemistry of Life Proteins
Protein Structures.
Genes to Function to Therapeutics
Rosetta: De Novo determination of protein structure
Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor.
Machine Learning.
Presentation transcript:

An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical Informatics University of Pittsburgh School of Medicine Pittsburgh PA USA Presented by Thahir P. Mohamed Advancing Practice, Instruction & Innovation through Informatics October 19-23, 2008

2 Protein Structure Primary Structure: Chain of amino acids Secondary Structure: Sub- structures such as helixes and strands Tertiary Structure: Atomic resolution of protein structure Protein structure is essential for successful design of drugs

3 Challenges in Protein Structure Prediction X-ray crystallography, NMR spectroscopy are wet-lab methods to determine structure. Very expensive Very time consuming Computational techniques are applied to predict protein structure

4 Computational Protein Structure Prediction Machine Learning techniques applied to predict structure Experimentally determined structures are used to learn to predict new structures When not enough data to learn from: Active learning is applied to select the next protein to be studied experimentally

5 Active Learning Unlabeled Proteins Possible Labels:

6 Cluster Unlabeled Proteins Clustered Protiens Possible Labels: Active Learning

7 Cluster Unlabeled Proteins Selection Algorithm Clustered Proteins Possible Labels: Active Learning

8 Cluster Unlabeled Proteins Selection Algorithm Clustered Proteins Possible Labels: Active Learning

9 Prediction Labeled Protiens Cluster Unlabeled Proteins Selection Algorithm Possible Labels: Active learning guides selection of data points for which you ask for labels Active Learning

10 Membrane Protein Structure Prediction Membrane Protein importance and challenges Membrane Proteins:  30% of genes  cell regulation and signaling pathways  60% of drug targets Yet,  Difficult to study experimentally  1% of known protein structures Active learning can be used as a tool against the limited number of known MP structures despite the large number of known MP sequences

11 ‘Features’ Representation Data reduction is performed by SVD, resulting in a final 4 features per window Residue: A L H W R A A G A A T V L L V I V E R G A P G A Q L I Topology: M M M M M M M M M M M M Charge: - - p – p n p E-Prop: D d.. A D D. D D a d d d d d d D A. D D. D a d d Properties Charge Size Polarity Aromaticity Electronic Properties

12 Clustering the Data Dim 1 Dim 2 Dim 3 Neural Network Self Organizing Map (SOM) Finds centroids of clusters in the data

13 Design 1: Density-based Selection Find the most dense cluster – Choose N points closest to its centroid –Find labels for these points (TM or NTM) –Find the majority label, say L –Assign L to all points in the cluster Repeat for next dense cluster Clusters with no known structures are marked for study by experiments

14 Design 1 Results Increase the number of data points for which we ask structure Compare how accuracy varies between guided selection (via active learning) versus random selection. A total of only 10 labels per node ~ 1% data

15 Design 2: Protein – based Selection Pick a random protein Find labels for all windows in this protein For each node containing labels, find the mode L of all labels it contains Assign L to remaining data in node Repeat and update for new protein, until half have been selected

16 Protein-based results Repeated for different permutations of protein selection order, and observed several metrics. Percent

Conclusions 17 We developed a framework that allows us to select a few proteins or fragments of proteins which, when annotated with experimental methods, may be used to label remaining protein sequences. We have shown that it is possible to achieve higher accuracy values with guided selection of data compared to random selection of data.

Acknowledgements Madhavi Ganapathiraju Jessica Wehner JW funded through NIH-NSF Bioengineering & Bioinformatics Summer Institute Visit us at Department of Biomedical Informatics University of Pittsburgh Thank you!  Cathedral of Learning, University of Pittsburgh