A Statistical Geometry Approach to the Study of Protein Structure Majid Masso Bioinformatics and Computational Biology George Mason University.

Slides:



Advertisements
Similar presentations
Protein Function Analysis using Computational Mutagenesis
Advertisements

1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Todd J.Taylor, Iosif I.Vaisman Abstract: A method of protein structural domain assignment using an Ising/Potts-like.
Measuring the degree of similarity: PAM and blosum Matrix
Using SuSPect to Predict the Phenotypic Effects of Missense Variants Chris Yates UCL Cancer Institute
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Molecular Evolution Revised 29/12/06
Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models A Collaborative Approach to Analyzing Stream Network Data Andrew A.
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Introduction to Bioinformatics Algorithms Sequence Alignment.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Sequence Alignments Revisited
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Protein Mutational Analysis Using Statistical Geometry Methods Majid Masso Bioinformatics and Computational.
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Bioinformatics in Biosophy
Protein Tertiary Structure Prediction
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Friday 17 rd December 2004Stuart Young Capstone Project Presentation Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Prediction of HIV-1 Drug Resistance: Representation of Target Sequence Mutational Patterns via an n-Grams Approach Majid Masso School of Systems Biology,
Representations of Molecular Structure: Bonds Only.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS BiC BioCentrum-DTU Technical University of Denmark 1/31 Prediction of significant positions in biological sequences.
Construction of Substitution Matrices
A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
The Blosum scoring matrices Morten Nielsen BioSys, DTU.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
UNC Chapel Hill David A. O’Brien Chain Growing Using Statistical Energy Functions David A. O'Brien Balasubramanian Krishnamoorthy: Jack Snoeyink Alex Tropsha.
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
A MULTIBODY ATOMIC STATISTICAL POTENTIAL FOR PREDICTING ENZYME-INHIBITOR BINDING ENERGY Majid Masso Laboratory for Structural Bioinformatics,
Blosum matrices What are they? Morten Nielsen BioSys, DTU
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
1 Giuseppe Romeo Voronoi based Source Detection. 2 Voronoi cell The Voronoi tessellation is constructed as follows: for each data point  i (also called.
1 Three-Body Delaunay Statistical Potentials of Protein Folding Andrew Leaver-Fay University of North Carolina at Chapel Hill Bala Krishnamoorthy, Alex.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
Modeling Cell Proliferation Activity of Human Interleukin-3 (IL-3) Upon Single Residue Replacements Majid Masso Bioinformatics and Computational Biology.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
Majid Masso School of Systems Biology, George Mason University
Prediction of Protein Structure and Function on a Proteomic Scale
Large-Scale Genomic Surveys
Gene Family Ancestral State Phylogenetic Profiling
SEG5010 Presentation Zhou Lanjun.
Network-Based Coverage of Mutational Profiles Reveals Cancer Genes
Presentation transcript:

A Statistical Geometry Approach to the Study of Protein Structure Majid Masso Bioinformatics and Computational Biology George Mason University

Protein Basics formed by linearly linking amino acid residues (aa’s are the building blocks of proteins) 20 distinct aa types A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T, V,W,Y

Protein Basics genes: code, or “blueprint” proteins: product, or “building” protein structure gives rise to function why do “things go wrong”? mistakes in “blueprint” incorrectly built, or nonexistent “buildings” Protein Data Bank (PDB): repository of protein structural data, including 3D coords. of all atoms ( PDB ID: 1REZ Structure reference: Muraki M., Harata K., Sugita N., Sato K., Origin of carbohydrate recognition specificity of human lysozyme revealed by affinity labeling, Biochemistry 35 (1996)

Computational Geometry Approach to Protein Structure Prediction Tessellation protein structure represented as a set of points in 3D, using C α coordinates Voronoi tessellation: convex polyhedra, each contains one C α, all interior points closer to this C α than any other Delaunay tessellation: connect four C α whose Voronoi polyhedra meet at a common vertex vertices of Delaunay simplices objectively define a set of four nearest- neighbor residues (quadruplets) 5 classes of Delaunay simplices Quickhull algorithm (qhull program), Barber et al., UMN Geometry Center Voronoi/Delaunay tessellation in 2D space. Voronoi tessellation-dashed line, Delaunay tessellation-solid line (Adapted from Singh R.K., et al. J. Comput. Biol., 1996, 3, ) Five classes of Delaunay simplices. (Adapted from Singh R.K., et al. J. Comput. Biol., 1996, 3, )

Counting Quadruplets assuming order independence among residues comprising Delaunay simplices, the maximum number of all possible combinations of quadruplets forming such simplices is 8855

Residue Environment Scores log-likelihood: = normalized frequency of quadruplets containing residues i,j,k,l in a representative training set of high- resolution protein structures with low primary sequence identity i.e., = total number of quadruplets in dataset containing only residues i,j,k,l divided by total number of observed quadruplets = frequency of random occurrence of the quadruplet (multinomial) i.e., = total number of occurrences of residue i divided by total number of residues in the dataset, where n = number of distinct residue types in the quadruplet, and t i is the number of residues of type i.

Residue Environment Scores total statistical potential (topological score) of protein: sum the log- likelihoods of all quadruplets forming the Delaunay simplices individual residue potentials: sum the log-likelihoods of all quadruplets in which the residue participates (yields a 3D-1D potential profile) Structure reference: R. Lapatto, T. Blundell, A. Hemmings, et al., X-ray analysis of HIV-1 proteinase at 2.7 Å resolution confirms structural homology among retroviral enzymes, Nature 342 (1989) PDB ID: 3phv HIV-1 Protease Monomer 99 amino acids (total potential 27.93)

HIV-1 Protease Comprehensive Mutational Profile (CMP) mutate 19 times the residue present at each of the 99 positions in the primary sequence get total potential and potential profile of each artificially created mutant protein create 20x99 matrix containing total potentials of all the single residue mutants columns labeled with residues in the primary sequence of wild-type (WT) HIV-1 protease monomer, and rows labeled with the 20 naturally occurring amino acids subtract WT total potential (TP) from each cell, then average columns to get CMP CMP j = [(mutant TP) ij -(WT TP)] = [(mutant TP) ij ], j=1,…,99

Structure-Function Correlations 536 single point missense mutations 336 published mutants: Loeb D.D., Swanstrom R., Everitt L., Manchester M., Stamper S.E., Hutchison III C.A. Complete mutagenesis of the HIV-1 protease. Nature, 1989, 340, mutants provided by R. Swanstrom (UNC) each mutant placed in one of 3 phenotypic categories, positive, negative, or intermediate, based on activity mutant activity compared with change in sequence-structure compatibility elucidated by potential data

Observations set of mutants with unaffected protease activity exhibit minimal (negative) change in potential set of mutants that inactivate protease exhibit large negative change in potential, weighted heavily by NC set of mutants with intermediate phenotypes exhibit moderate negative change in potential (similar among C and NC); wide range for intermediate phenotype in the experiments

Acknowledgements Iosif Vaisman (Ph.D. advisor, first to apply Delaunay to protein structure) Zhibin Lu (Java programs for calculating statistical potentials from tessellations) Ronald Swanstrom (experimental HIV-1 protease mutants and activity measure)